idnits 2.17.1 

draft-ietf-tcpm-tcp-dcr-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 18.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 770.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 747.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 754.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 760.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 16
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 17 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Abstract section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 2005) is 6921 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2988' is defined on line 681, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675)

  -- Obsolete informational reference (is this intentional?): RFC  896
     (Obsoleted by RFC 7805)

  -- Obsolete informational reference (is this intentional?): RFC 2960
     (Obsoleted by RFC 4960)

  -- Obsolete informational reference (is this intentional?): RFC 2988
     (Obsoleted by RFC 6298)


     Summary: 9 errors (**), 0 flaws (~~), 5 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                       Sumitha Bhandarkar
3	INTERNET DRAFT                                     A. L. Narasimha Reddy
4	draft-ietf-tcpm-tcp-dcr-04.txt                      Texas A&M University
5	Expires : November 2005                                      Mark Allman
6	                                                                    ICIR
7	                                                           Ethan Blanton
8	                                                       Purdue University
9	                                                                May 2005

11	        Improving the Robustness of TCP to Non-Congestion Events

13	Status of this Memo

15	   By submitting this Internet-Draft, each author represents that any
16	   applicable patent or other IPR claims of which he or she is aware
17	   have been or will be disclosed, and any of which he or she becomes
18	   aware will be disclosed, in accordance with Section 6 of BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF), its areas, and its working groups.  Note that
22	   other groups may also distribute working documents as Internet-
23	   Drafts.

25	   Internet-Drafts are draft documents valid for a maximum of six months
26	   and may be updated, replaced, or obsoleted by other documents at any
27	   time.  It is inappropriate to use Internet-Drafts as reference
28	   material or to cite them other than as "work in progress."

30	   The list of current Internet-Drafts can be accessed at
31	   http://www.ietf.org/ietf/1id-abstracts.txt.

33	   The list of Internet-Draft Shadow Directories can be accessed at
34	   http://www.ietf.org/shadow.html.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2005).

40	Abstract:

42	   This document specifies Non-Congestion Robustness (NCR) for TCP.  In
43	   the absence of explicit congestion notification from the network,
44	   TCP's loss recovery algorithms treat the receipt of three duplicate
45	   acknowledgments as an implicit indication of congestion in the
46	   network.  This is not always correct, notably in the case when
47	   network paths reorder segments (for whatever reason), resulting in
48	   degraded performance.  TCP-NCR is designed to mitigate this degraded
49	   performance by increasing the number of duplicate acknowledgments
50	   required to trigger loss recovery, based on the current state of the
51	   connection, in an effort to disambiguate true segment loss from
52	   segment reordering.  In addition, we specify an option, Aggressive
53	   Limited Transmit, where the TCP sender does not reduce its sending
54	   rate until a segment is actually retransmitted; this would delay the
55	   reduction of the sending rate by roughly one round-trip time compared
56	   to current TCP implementations.  This document specifies the changes
57	   to TCP, as well as the costs and benefits of these modifications.

59	Terminology

61	       The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
62	       NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
63	       "OPTIONAL" in this document are to be interpreted as described
64	       in [RFC2119].

66	       Readers should be familiar with the TCP terminology given in
67	       [RFC2581] and [RFC3517].

69	1. Introduction

71	   One strength of TCP [RFC793] lies in its ability to adjust its
72	   sending rate according to the perceived congestion in the network
73	   [Jac88,RFC2581].  In the absence of explicit notification of
74	   congestion from the network, TCP uses segment loss as an indication
75	   of congestion (i.e., assuming queue overflow).  TCP receivers send
76	   cumulative acknowledgments (ACKs) indicating the next sequence number
77	   expected from the sender for arriving segments [RFC793].  When
78	   segments arrive out-of-order, duplicate ACKs are generated.  As
79	   specified in [RFC2581], a TCP sender uses the arrival of three
80	   duplicate ACKs as an indication of segment loss.  The TCP sender
81	   retransmits the lost segment and reduces the load imposed on the
82	   network, assuming the segment loss was caused by resource contention
83	   within the network path.  The TCP sender does not assume loss on the
84	   first or second duplicate ACK, but waits for three duplicate ACKs to
85	   account for mild reordering.  However, the use of this constant
86	   threshold of duplicate ACKs has several problems that can be
87	   mitigated with a dynamic threshold.

89	   The following is an example of TCP's behavior:

91	     + TCP A is the data sender and TCP B is the data receiver.

93	     + TCP A sends 10 segments each consisting of a single data byte
94	       (i.e., transmits bytes 1-10 in segments 1-10).

96	     + Assume segment 3 is dropped in the network.

98	     + TCP B cumulatively acknowledges segments 1 and 2, making the
99	       cumulative ACK transmitted to the sender 3 (the next expected
100	       sequence number).  (Note: TCP B may generate one or two ACKs,
101	       depending on whether delayed ACKs [RFC1122,RFC2581] are
102	       employed.)

104	     + The arrival of segments 4-10 at TCP B will each trigger the
105	       transmission of a cumulative ACK for sequence number 3.  (Note:
106	       [RFC2581] recommends that delayed ACKs not be used when the ACK
107	       is triggered by an out-of-order segment.)

109	     + When TCP A receives the third duplicate ACK (or fourth ACK
110	       overall) for sequence number 3, TCP A will retransmit
111	       segment 3 and reduce the sending rate by roughly half (see
112	       [RFC2581] for specifics on the congestion control state
113	       adjustments).

115	   Alternatively, suppose segment 3 was not dropped by the network, but
116	   rather delayed such that segment 3 arrives after segment 10.  The
117	   above scenario will play out in precisely the same manner insomuch as
118	   a retransmission of segment 3 will be triggered.  In other words, TCP
119	   is not capable of disambiguating this reordering event from a segment
120	   loss.

122	   The following is the specific motivation behind making TCP robust to
123	   reordered segments:

125	     * A number of Internet measurement studies have shown that packet
126	       reordering is not a rare phenomenon [Pax97,BPS99,JIDKT03,GPL04].
127	       Further, the reordering can be well beyond that required for
128	       fast retransmit to be falsely triggered.

130	     * [BA02,ZKFP03] show the negative performance implications that
131	       packet reordering has on current TCP.

133	     * The requirement imposed by TCP for almost in-order packet
134	       delivery places a constraint on the design of future technology.
135	       Novel routing algorithms, network components, link-layer
136	       retransmission mechanisms and applications could all be looked
137	       at with a fresh perspective if TCP were to be more robust to
138	       segment reordering.  For instance, high speed packet switches
139	       could cause resequencing of packets if TCP were more robust.
140	       There has been work proposed in the literature explicitly to
141	       ensure that packet ordering is maintained in such switches
142	       [KM02].  Also, link-layer mechanisms that attempt to recover
143	       from packet corruption by retransmitting could be allowed to
144	       reorder packets and, hence, increase the chances of local loss
145	       repair rather than relying on TCP to repair the loss (and,
146	       needlessly reduce its sending rate).  Additional examples
147	       include multi-path routing, high-delay satellite links and some
148	       of the schemes proposed for differentiated services
149	       architecture.  By making TCP more robust to non-congestion
150	       events, TCP-NCR may open the design space of the future Internet
151	       components.

153	   In this document we specify a set of TCP sender modifications to
154	   provide Non-Congestion Robustness (NCR) to TCP.  In particular, these
155	   changes are built on top of TCP with selective acknowledgments
156	   (SACKs) [RFC2018] and the SACK-based loss recovery scheme given in
157	   [RFC3517], since SACK is widely deployed at this point ([MAF05]
158	   indicates that 68% of web servers and 88% of web clients utilize SACK
159	   as of spring, 2004).

161	   Finally, we note that the TCP-NCR algorithm provided in this document
162	   could be easily adapted to SCTP [RFC2960] since SCTP uses congestion
163	   control algorithms similar to TCP's (and, hence, has the same
164	   reordering robustness issues).

166	   The remainder of this document is organized as follows.  Section 2
167	   provides a high-level description of the TCP-NCR mechanisms.  In
168	   Section 3, we specify the TCP-NCR algorithm.  Section 4 provides a
169	   brief overview of the benefits of TCP-NCR, while Section 5 discusses
170	   the drawbacks of TCP-NCR.  Section 6 discusses related work.  Section
171	   7 discusses security concerns.

173	2. NCR Description

175	   As discussed above, in the face of packet reordering three duplicate
176	   ACKs may not be enough to disambiguate loss from reordering. In this
177	   section we provide a non-normative sketch of TCP-NCR.  The detailed
178	   algorithms for implementing Non-Congestion Robustness for TCP are
179	   presented in the next section.

181	   The general idea behind TCP-NCR is to increase the threshold used to
182	   trigger a fast retransmission from the current fixed value of three
183	   duplicate ACKs [RFC2581] to approximately a congestion window of data
184	   having left the network (but, not less than the currently
185	   standardized value of three duplicate ACKs).  Since cwnd represents
186	   the amount of data a TCP flow can transmit in one round-trip time
187	   (RTT), waiting to receive notice that cwnd bytes have left the
188	   network before deciding whether the root cause is loss or reordering
189	   imposes a delay of roughly one RTT.  The appropriate choice for a new
190	   value of the threshold is essentially a tradeoff between making the
191	   best decision regarding the cause of the duplicate ACKs and
192	   responsiveness.  The choice to trigger a retransmission only after a
193	   cwnd's worth of data is known to have left the network represents
194	   roughly the largest amount of time a TCP can wait before the (often
195	   costly) retransmission timeout may be triggered.  Therefore, the
196	   algorithm described in this document attempts to make the best root
197	   cause decision possible.

199	   Simply increasing the threshold before retransmitting a segment can
200	   make TCP brittle to packet loss or ACK loss since such loss reduces
201	   the number of duplicate ACKs that will arrive at the sender from the
202	   receiver.  For instance, if the cwnd is 10 segments and one segment
203	   is lost, a duplicate ACK threshold of 10 will never be met because
204	   duplicate ACKs corresponding to at most 9 segments will arrive at the
205	   sender.  To offset the issue of loss, we extend TCP's Limited
206	   Transmit [RFC3042] scheme to allow for the sending of new data during
207	   the period when the TCP sender is disambiguating loss and reordering.
208	   This new data serves to increase the likelihood of enough duplicate
209	   ACKs arriving at the sender to trigger loss recovery if it is
210	   appropriate.

212	   At this point we note that TCP tightly couples reliability and
213	   congestion control -- when a segment is declared lost, a
214	   retransmission is triggered and a change to sending rate is also made
215	   on the assumption that the drop is due to resource contention
216	   [RFC2581].  Therefore, by simply changing the retransmission trigger
217	   the congestion control response is also changed.  However, we lack
218	   experience on the Internet as to whether delaying the point that a
219	   rate reduction takes place is appropriate for wide-scale deployment.
220	   Therefore, the extended Limited Transmit mechanism proposed in this
221	   document offers two variants for experimentation.

223	   The first Extended Limited Transmit variant, Careful Limited
224	   Transmit, calls for the transmission of a previously unsent segment
225	   for every two segments that are known to have left the network.  This
226	   has the effect of halving the sending rate since normal TCP operation
227	   calls for the sending of one segment for every segment that has left
228	   the network.  Further, the halving starts immediately and is not
229	   delayed until a retransmission is triggered.  In the case of packet
230	   reordering (i.e., not segment loss) the congestion control state is
231	   restored to its previous state when reordering is determined.

233	   The second variant, Aggressive Limited Transmit, calls for
234	   transmitting a previously unsent data segment for every segment known
235	   to have left the network.  With this variant, while waiting to
236	   disambiguate the loss from a reordering event, ACK-clocked
237	   transmission continues at rougly the same rate as before the event
238	   started.  Retransmission and the sending rate reduction happen per
239	   [RFC2581,RFC3517], albeit with the delayed threshold described above.
240	   While this approach delays legitimate rate reductions (possibly
241	   slightly and temporarily aggravating overall congestion on the
242	   network) the scheme has the advantage of not reducing the
243	   transmission rate in the face of segment reordering.

245	   Which of the two Extended Limited Transmit variants is best for use
246	   on the Internet is an open question.

248	3. Algorithm

250	   The TCP-NCR modifications make two fundamental changes to the way
251	   [RFC3517] currently operates, as follows.

253	   First, the trigger for retransmitting a segment is changed from three
254	   duplicate ACKs [RFC2581,RFC3517] to indications that a congestion
255	   window's worth of data has left the network.  Second, TCP-NCR
256	   decouples initial congestion control decisions from retransmission
257	   decisions, in some cases delaying congestion control changes relative
258	   to TCP's current behavior defined in [RFC2581]. The algorithm
259	   provides two alternatives for extending Limited Transmit.  The two
260	   variants of extended Limited Transmit are:

262	       Careful Limited Transmit:

264	        This variant calls for reducing the sending rate at
265	        approximately the same time [RFC2581] implementations reduce
266	        the congestion window, while at the same time withholding a
267	        retransmission (and the final congestion determination) for
268	        approximately one RTT.

270	       Aggressive Limited Transmit:

272	        This variant calls for maintaining the sending rate in the
273	        face of duplicate ACKs until TCP concludes a segment is lost
274	        and needs to be retransmitted (which TCP-NCR delays by one
275	        RTT when compared with current loss recovery schemes).

277	   A TCP-NCR implementation MUST use either Careful Limited Transmit or
278	   Aggressive Limited Transmit.

280	   A constant MUST be set depending on which variant of extended Limited
281	   Transmit is used, as follows:

283	       Careful Limited Transmit:

285	        LT_F = 2/3

287	       Aggressive Limited Transmit:

289	        LT_F = 1/2

291	   This constant reflects the fraction of outstanding data that must be
292	   SACKed before a retransmission is triggered.  Since Aggressive
293	   Limited Transmit sends a new segment for every segment known to have
294	   left the network, a total of roughly cwnd segments will be sent
295	   during Aggressive Limited Transmit and therefore ideally a total of
296	   2*cwnd segments will be outstanding.  The duplicate ACK threshold is
297	   then set to LT_F = 1/2 of 2*cwnd (or about 1 RTT worth of data).  The
298	   factor is different for Careful Limited Transmit because the sender
299	   only transmits one new segment for every two segments that are SACKed
300	   and therefore will ideally have a total of 1.5*cwnd segments
301	   outstanding when the retransmission is to be triggered.  Hence, the
302	   required threshold is LT_F=2/3 of 1.5*cwnd to delay the
303	   retransmission by roughly 1 RTT.

305	   There are situations whereby the sender cannot transmit new data
306	   during Extended Limited Transmit (e.g., lack of data from the
307	   application, receiver's advertised window limit).  These situations
308	   can lead to the problems discussed in the last section when a TCP
309	   does not employ Extended Limited Transmit and is starved for ACKs.
310	   Therefore, TCP-NCR adapts the duplicate ACK threshold on each SACK
311	   arrival to be as robust as possible given the actual amount of data
312	   that has been transmitted, or roughly LT_F times the number of
313	   outstanding segments.

315	   The TCP-NCR modifications specified in this document lend themselves
316	   to incremental deployment. Only the TCP implementation on the sender
317	   side requires modification.  The changes themselves are modest.
318	   However, as will be discussed below, availability of additional
319	   buffer space at the receiver will help maximize the benefits of using
320	   TCP-NCR but are not strictly necessary.

322	   The following algorithms depend on the notions provided by [RFC3517]
323	   and we assume the reader is familiar with the terminology given in
324	   [RFC3517].  The TCP-NCR algorithm can be adapted to alternate SACK-
325	   based loss recovery schemes.  [BR04,BSRV04] outline non-SACK-based
326	   algorithms, however, we do not specify those algorithms in this
327	   document and do not recommend them due to both the complexity and
328	   security implications of having only a gross understanding of the
329	   number of outstanding segments in the network.

331	   A TCP connection using the Nagle algorithm [RFC896,RFC1122] MAY
332	   employ the TCP-NCR algorithm.  If a TCP implementation does implement
333	   TCP-NCR the implementation MUST follow the various specifications
334	   provided in sections 3.1 - 3.4.  If the Nagle algorithm is not being
335	   used there is no way to accurately calculate the number of
336	   outstanding segments in the network (and, therefore, no good way to
337	   derive an appropriate duplicate ACK threshold) without adding state
338	   to the TCP sender.  A TCP connection that does not employ the Nagle
339	   algorithm SHOULD NOT use TCP-NCR.  We envision that NCR could be
340	   adapted to an implementation that carefully tracks the sequence
341	   numbers transmitted in each segment.  However, we leave this as
342	   future work.

344	   3.1.  Initialization

346	   When entering a period of loss / reordering detection and Extended
347	   Limited Transmit a TCP-NCR MUST initialize several state variables.
348	   A TCP MUST enter Extended Limited Transmit upon receiving the first
349	   ACK with a SACK block after the reception of an ACK that (a) did not
350	   contain SACK information and (b) did increase the connection's
351	   cumulative ACK point.  The initializations are:

353	   (I.1) Save the current FlightSize.

355	         FlightSizePrev = FlightSize

357	   (I.2) Set a variable for tracking the number of segments for which
358	         an ACK does not trigger a transmission during Careful Limited
359	         Transmit.

361	         Skipped = 0

363	         (Note: Skipped is not used during Aggressive Limited
364	         Transmit.)

366	   (I.3) Set DupThresh (from [RFC3517]) based on the size of the
367	         current FlightSize.

369	         DupThresh = max (LT_F * (FlightSize / SMSS),3)

371	         Note: We keep the lower bound of DupThresh = 3 from
372	         [RFC2581,RFC3517].

374	   In addition to the above steps, the incoming ACK MUST be processed
375	   with the E series of steps in section 3.3.

377	   3.2.  Terminating Extended Limited Transmit and Preventing Bursts

379	   Extended Limited Transmit MUST be terminated at the start of loss
380	   recovery as outlined in section 3.4.

382	   The arrival of an ACK that advances the cumulative ACK point while in
383	   Extended Limited Transmit, but before loss recovery is triggered
384	   signals that a series of duplicate ACKs were caused by reordering and
385	   not congestion.  Therefore, the receipt of an ACK that extends the
386	   cumulative ACK point MUST terminate Extended Limited Transmit.  As
387	   described below (in (T.4)), an ACK that extends the cumulative ACK
388	   point and *also* contains SACK information will also trigger the
389	   beginning of a new Extended Limited Transmit phase.

391	   Upon the termination of Extended Limited Transmit, and especially
392	   when using the Careful variant, TCP-NCR may be in a situation where
393	   the entire cwnd is not being utilized and therefore TCP-NCR will be
394	   prone to transmitting a burst of segments into the network.
395	   Therefore, upon exiting Extended Limited Transmit the following steps
396	   MUST be taken.

398	   When a TCP-NCR in the Extended Limited Transmit phase receives an ACK
399	   that updates the cumulative ACK point (regardless of whether the ACK
400	   contains SACK information), the following steps MUST be taken:

402	   (T.1) cwnd = min (FlightSize + SMSS,FlightSizePrev)

404	         This step ensures that cwnd is not grossly larger than the
405	         amount of data outstanding --- a situation that would cause a
406	         line rate burst.

408	   (T.2) ssthresh = FlightSizePrev

410	         This step provides TCP-NCR with a sense of "history".  If step
411	         (T.1) reduces cwnd below FlightSizePrev this step ensures that
412	         TCP-NCR will slow start back to the operating point in effect
413	         before Extended Limited Transmit.

415	   (T.3) Transmit previously unsent data as allowed by cwnd,
416	         FlightSize, application data availability and the receiver's
417	         advertised window.

419	   (T.4) When the ACK extends the cumulative ACK point and also
420	         contains SACK information, the initializations in steps (I.2)
421	         and (I.3) from section 3.1 MUST be taken (but, not step (I.1))
422	         to re-start Extended Limited Transmit.  In addition, the
423	         series of steps in section 3.3 (the "E" steps) MUST be taken.

425	   3.3. Extended Limited Transmit

427	   On each ACK containing SACK information that arrives after TCP-NCR
428	   has entered the Extended Limited Transmit phase (as outlined in
429	   section 3.1) and before Extended Limited Transmit terminates, the
430	   sender MUST use the following procedure.

432	   (E.1) Use the SetPipe () procedure from [RFC3517] to set the "pipe"
433	         variable (which represents the number of bytes still considered
434	         "in the network").

436	   (E.2) If the comparison in equation (1) below holds and there are
437	         SMSS bytes of previously unsent data available for
438	         transmission then transmit one segment of SMSS bytes.

440	           (pipe + Skipped) <= (FlightSizePrev - SMSS)              (1)

442	         If the comparison in equation (1) does not hold or no new data
443	         can be transmitted (due to lack of data from the application
444	         or the advertised window limit), skip to step (E.6).

446	   (E.3) Increment pipe by SMSS bytes.

448	   (E.4) If using Careful Limited Transmit, increment Skipped by SMSS
449	         bytes to ensure that the next SMSS bytes of SACKed data
450	         processed do not trigger a Limited Transmit transmission (since
451	         the goal of Careful Limited Transmit is to send upon the
452	         reception of every second duplicate ACK).

454	   (E.5) Return to step (E.2) to ensure that as many bytes as
455	         appropriate are transmitted.  This provides robustness to ACK
456	         loss that can be (largely) compensated for using SACK
457	         information.

459	   (E.6) Reset DupThresh via:

461	           DupThresh = max (LT_F * (FlightSize / SMSS),3)

463	         where FlightSize is the total number of bytes that have not
464	         been cumulatively acknowledged (which is different from
465	         "pipe").

467	   3.4 Entering Loss Recovery

469	       When a segment is deemed lost via the algorithms in [RFC3517],
470	       Extended Limited Transmit MUST be terminated, leaving the
471	       algoritms in [RFC3517] to govern TCP's behavior.  One slight
472	       change to [RFC3517] MUST be made, however.  In section 5, step
473	       (2) of [RFC3517] MUST be changed to:

475	           (2) ssthresh = cwnd = (FlightSizePrev / 2)

477	       This ensures that the congestion control modifications are made
478	       with respect to the amount of data in the network before
479	       FlightSize was increased by Extended Limited Transmit.

481	4. Advantages

483	   The major advantages of TCP-NCR are two-fold.  As discussed in
484	   section 1, TCP-NCR will open up the design space for network
485	   applications and components that are currently constrained by TCP's
486	   lack of robustness to packet reordering.  The second advantage is in
487	   terms of an increase in TCP performance.

489	   [BR04] presents ns-2 [NS-2] simulations of a pre-cursor to the TCP-
490	   NCR algorithm specified in this document, called TCP-DCR (Delayed
491	   Congestion Response). The paper shows that TCP-DCR aids performance
492	   in comparison to unmodified TCP in the presence of packet reordering.
493	   In addition, the extended version of [BR04] presents results based on
494	   emulations involving Linux (kernel 2.4.24).  These results show that
495	   the performance of TCP-DCR is similar to Linux's native
496	   implementation that seeks to "undo" wrong decisions based on DSACK
497	   [RFC2883] feedback (similar to the schemes outlined in [ZKFP03]),
498	   when packets are reordered by less than one RTT. The advantage of
499	   using TCP-DCR over the DSACK-based scheme is that the DSACK-based
500	   scheme tries to estimate the exact amount of reordering in the
501	   network using fairly complex algorithms, whereas TCP-DCR achieves
502	   similar results with less complicated modifications.

504	   In addition, [BR04,BSRV04] illustrate the ability of TCP-DCR to allow
505	   for the improvement of other parts of the system.  For example, these
506	   papers show that increasing TCP's robustness to packet reordering
507	   allows for a novel wireless ARQ mechanism to be added at the link-
508	   layer.  The added robustness of the link-layer to channel errors, in
509	   turn, increases TCP performance by not requiring TCP to retransmit
510	   packets that were dropped due to corruption (and, hence, also
511	   prevents TCP from needlessly reducing the sending rate when
512	   retransmitting these segments).

514	5. Disadvantages

516	   While we note that all of the changes outlined above are implemented
517	   in the sender, the receiver also potentially has a part to play.  In
518	   particular, TCP-NCR increases the receiver's buffering requirement by
519	   up to an extra cwnd -- in the case of the TCP sender using Aggressive
520	   Limited Transmit and actual loss occurring in the network.
521	   Therefore, to maximize the benefits from TCP-NCR receivers should
522	   advertise a large window to absorb the extra out-of-order traffic. In
523	   the case that the additonal buffer requirements are not met, the use
524	   of the above algorithm takes into account the reduced advertised
525	   window, resulting in slighlty reduced robustness to reordering.

527	   In addition, using TCP-NCR could delay the delivery of data to the
528	   application by up to one RTT because the fast retransmission point is
529	   delayed by roughly one RTT in TCP-NCR.  Applications that are
530	   sensitive to such delays should turn off the TCP-NCR option.  For
531	   instance, a socket option could be introduced to allow applications
532	   to control whether NCR would be used for a particular connection.

534	   Finally, the use of TCP-NCR makes the recovery from congestion events
535	   sluggish in comparison to the standard reaction in [RFC2581].  [BR04,
536	   BSRV04] show (via simulation) that the delay in congestion response
537	   has minimal impact on the connection itself and the traffic sharing a
538	   bottleneck.  [BBFS01] also indicates (again, via simulation) that
539	   "slowly responsive" congestion control may be safe for deployment in
540	   the Internet.  These studies suggest that schemes that slightly delay
541	   congestion control decisions may be reasonable, however, further
542	   experimentation on the Internet is required to verify these results.

544	6. Related Work

546	   Over the past few years, several solutions have been proposed to
547	   improve the performance of TCP in the face of segment reordering.
548	   These schemes generally fall into one of two categories (with some
549	   overlap): mechanisms that try to prevent spurious retransmits from
550	   happening and mechanisms that try to detect spurious retransmits and
551	   "undo" the needless congestion control state changes that have been
552	   taken.

554	   [BA02,ZKFP03] attempt to prevent segment reordering from triggering
555	   spurious retransmits by using various algorithms to approximate the
556	   duplicate ACK threshold required to disambiguate loss and reordering
557	   over a given network path at a given time.  TCP-NCR similarly tries
558	   to prevent spurious retransmits.  However, TCP-NCR takes a simplified
559	   approach compared to those in [BA02,ZKFP03] in that TCP-NCR simply
560	   delays retransmission by a fixed amount (in comparison to standard
561	   TCP), while the other schemes use relatively complex algorithms in an
562	   attempt to derive a more precise value for DupThresh that depends on
563	   the network conditions.  While TCP-NCR offers simplicity the other
564	   schemes may offer more precision such that applications would not be
565	   forced to wait as long for their retransmissions.  Future work could
566	   be undertaken to achieve robustness without needless delay.

568	   On the other hand, several schemes have been developed to detect and
569	   mitigate needless retransmissions after the fact.
570	   [RFC3522,RFC3708,BA02,LG04,SK04] present algorithms to detect
571	   spurious retransmits and mitigate the changes these events made to
572	   the congestion control state.  TCP-NCR could be used in conjunction
573	   with these algorithms, with TCP-NCR attempting to prevent spurious
574	   retransmits and some other scheme kicking in if the prevention
575	   failed.  In addition, we note that TCP-NCR is concentrated on
576	   preventing spurious fast retransmits and some of the above algorithms
577	   also attempt to detect and mitigate spurious timeout-based
578	   retransmits.

580	7. Security Considerations

582	   We do not believe there are security implications involved with TCP-
583	   NCR over and above those for general TCP congestion control
584	   [RFC2581].  In particular, the Extended Limited Transmit algorithms
585	   specified in this document have been specifically designed not to be
586	   susceptible to the sorts of ACK splitting attacks TCP's general TCP
587	   congestion control is vulnerable to (as discussed in [RFC3465].

589	8. Acknowledgements

591	   Ted Faber, Sally Floyd, Nauzad Sadry, Pasi Sarolahti and Nitin Vaidya
592	   as well as feedback from from the TCPM working group have contributed
593	   significantly to this document.  Our thanks to all!

595	9. Normative References

597	   [RFC793] J. Postel, "Transmission Control Protocol", RFC 793,
598	   September 1981.

600	   [RFC2018] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, "TCP
601	   selective acknowledgment options," Internet RFC 2018.

603	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
604	   Requirement Levels", BCP 14, RFC 2119, March 1997.

606	   [RFC2581] M. Allman, V. Paxson, and  W. Stevens, "TCP Congestion
607	   Control", RFC 2581, April 1999.

609	   [RFC3042] M. Allman, H. Balakrishnan and S. Floyd, "Enhancing TCP's
610	   Loss Recovery Using Limited Transmit", RFC 3042, January 2001.

612	   [RFC3517] E. Blanton, M. Allman, K. Fall and L. Wang, "A Conservative
613	   Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for
614	   TCP", RFC 3517, April 2003.

616	10. Informative References

618	   [BA02] E. Blanton and M. Allman, "On Making TCP More Robust to Packet
619	   Reordering," ACM Computer Communication Review, January 2002.

621	   [BBFS01] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker,
622	   "Dynamic Behavior of Slowly Responsive Congestion Control
623	   Algorithms", Proceedings of ACM SIGCOMM, Sep. 2001.

625	   [BPS99] J. Bennett, C. Partridge, and N. Shectman, "Packet reordering
626	   is not pat hological network behavior," IEEE/ACM Transactions on
627	   Networking, December 1999.

629	   [BR04] Sumitha Bhandarkar and A. L. Narasimha Reddy, "TCP-DCR: Making
630	   TCP Robust to Non-Congestion Events", In the Proceedings of
631	   Networking 2004 conference, May 2004. Extended version available as
632	   tech report TAMU-ECE-2003-04.

634	   [BSRV04] Sumitha Bhandarkar, Nauzad Sadry, A. L. Narasimha Reddy and
635	   Nitin Vaidya, "TCP-DCR: A Novel Protocol for Tolerating Wireless
636	   Channel Errors", To appear in IEEE Transactions on Mobile Computing

638	   [GPL04] Ladan Gharai, Colin Perkins and Tom Lehman, "Packet
639	   Reordering, High Speed Networks and Transport Protocol Performance",
640	   ICCCN 2004, October 2004.

642	   [Jac88] V. Jacobson, "Congestion Avoidance and Control", Computer
643	   Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988.
644	   ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.

646	   [JIDKT03] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, and D.
647	   Towsley, "Measurement and Classification of Out-of-Sequence Packets
648	   in a Tier-1 IP Backbone," Proceedings of IEEE INFOCOM, 2003.

650	   [KM02] I. Keslassy and N. McKeown, "Maintaining packet order in
651	   twostage switche s," Proceedings of the IEEE Infocom, June 2002

653	   [LG04] R. Ludwig, A. Gurtov, "The Eifel Response Algorithm for TCP",
654	   Internet-Draft draft-ietf-tsvwg-tcp-eifel-response-06.txt (work in
655	   progress).  September 2004.

657	   [MAF05] A. Medina, M. Allman, S. Floyd.  Measuring the Evolution of
658	   Transport Protocols in the Internet.  ACM Computer Communication
659	   Review, 35(2), April 2005.

661	   [NS-2] ns-2 Network Simulator. http://www.isi.edu/nsnam/

663	   [Pax97] V. Paxson, "End-to-End Internet Packet Dynamics," Proceedings
664	   of ACM SIGCOMM, September 1997.

666	   [RFC896] J. Nagle, "Congestion Control in IP/TCP Internetworks", RFC
667	   896, January 1984.

669	   [RFC1122] R. Braden, "Requirements for Internet Hosts - Communication
670	   Layers", RFC 1122, October 1989.

672	   [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis and Matt
673	   Podolsky, "An Extension to the Selective Acknowledgement (SACK)
674	   Option for TCP," RFC 2883, July 2000.

676	   [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H.

678	   Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, V.  Paxson.
679	   Stream Control Transmission Protocol.  October 2000.

681	   [RFC2988] V. Paxson and M. Allman, "Computing TCP's Retransmission
682	   Timer", RFC 2988, November 2000.

684	   [RFC3465] M. Allman.  TCP Congestion Control with Appropriate Byte
685	   Counting (ABC), February 2003.  RFC 3465.

687	   [RFC3522] R. Ludwig and M. Meyer, "The Eifel Detection Algorithm for
688	   TCP," RFC 3522, April 2003.

690	   [RFC3708] E. Blanton and M. Allman, "Using TCP Duplicate Selective
691	   Acknowledgement (DSACKs) and Stream Control Transmission Protocol
692	   (SCTP) Duplicate Transmission Sequence Numbers (TSNs) to Detect
693	   Spurious Retransmissions", RFC 3708, February 2004.

695	   [SK04] P. Sarolahti, M. Kojo, "Forward RTO-Recovery (F-RTO): An
696	   Algorithm for Detecting Spurious Retransmission Timeouts with TCP and
697	   SCTP", Internet-Draft draft-ietf-tcpm-frto-02.txt (work in progress).
698	   November 2004.

700	   [ZKFP03] M. Zhang, B. Karp, S. Floyd, L. Peterson, "RR-TCP: A
701	   Reordering-Robust TCP with DSACK", in Proceedings of the Eleventh
702	   IEEE International Conference on Networking Protocols (ICNP 2003),
703	   Atlanta, GA, November, 2003.

705	11. Author's Addresses

707	   Sumitha Bhandarkar
708	   Dept. of Elec. Engg.
709	   214 ZACH
710	   College Station, TX 77843-3128
711	   Phone: (512) 468-8078
712	   Email: sumitha@tamu.edu
713	   URL  : http://students.cs.tamu.edu/sumitha/

715	   A. L. Narasimha Reddy
716	   Professor
717	   Dept. of Elec. Engg.
718	   315C WERC
719	   College Station, TX 77843-3128
720	   Phone : (979) 845-7598
721	   Email : reddy@ee.tamu.edu
722	   URL   : http://ee.tamu.edu/~reddy/

724	   Mark Allman
725	   ICSI Center for Internet Research
726	   1947 Center Street, Suite 600
727	   Berkeley, CA 94704-1198
728	   Phone: (216) 243-7361
729	   Email: mallman@icir.org
730	   URL: http://www.icir.org/mallman/

732	   Ethan Blanton
733	   Purdue University Computer Sciences
734	   250 North University Street
735	   West Lafayette, IN  47907
736	   Email: eblanton@cs.purdue.edu

738	Intellectual Property Statement

740	   The IETF takes no position regarding the validity or scope of any
741	   Intellectual Property Rights or other rights that might be claimed to
742	   pertain to the implementation or use of the technology described in
743	   this document or the extent to which any license under such rights
744	   might or might not be available; nor does it represent that it has
745	   made any independent effort to identify any such rights.  Information
746	   on the procedures with respect to rights in RFC documents can be
747	   found in BCP 78 and BCP 79.

749	   Copies of IPR disclosures made to the IETF Secretariat and any
750	   assurances of licenses to be made available, or the result of an
751	   attempt made to obtain a general license or permission for the use of
752	   such proprietary rights by implementers or users of this
753	   specification can be obtained from the IETF on-line IPR repository at
754	   http://www.ietf.org/ipr.

756	   The IETF invites any interested party to bring to its attention any
757	   copyrights, patents or patent applications, or other proprietary
758	   rights that may cover technology that may be required to implement
759	   this standard.  Please address the information to the IETF at
760	   ietf-ipr@ietf.org.

762	Disclaimer of Validity

764	   This document and the information contained herein are provided on an
765	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
766	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
767	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
768	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
769	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
770	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

772	Copyright Statement
773	   Copyright (C) The Internet Society (2005).  This document is subject
774	   to the rights, licenses and restrictions contained in BCP 78, and
775	   except as set forth therein, the authors retain all their rights.

777	Acknowledgment

779	   Funding for the RFC Editor function is currently provided by the
780	   Internet Society.