idnits 2.17.1 

draft-ietf-tcpm-tcp-dcr-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 18.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 819.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 796.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 803.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 809.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (January 2006) is 6676 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2988' is defined on line 728, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Obsolete normative reference: RFC 3517 (Obsoleted by RFC 6675)

  -- Obsolete informational reference (is this intentional?): RFC  896
     (Obsoleted by RFC 7805)

  -- Obsolete informational reference (is this intentional?): RFC 2960
     (Obsoleted by RFC 4960)

  -- Obsolete informational reference (is this intentional?): RFC 2988
     (Obsoleted by RFC 6298)


     Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                       Sumitha Bhandarkar
3	INTERNET DRAFT                                     A. L. Narasimha Reddy
4	draft-ietf-tcpm-tcp-dcr-07.txt                      Texas A&M University
5	Expires: July 2006                                           Mark Allman
6	                                                               ICIR/ICSI
7	                                                           Ethan Blanton
8	                                                       Purdue University
9	                                                            January 2006

11	        Improving the Robustness of TCP to Non-Congestion Events

13	Status of this Memo

15	   By submitting this Internet-Draft, each author represents that any
16	   applicable patent or other IPR claims of which he or she is aware
17	   have been or will be disclosed, and any of which he or she becomes
18	   aware will be disclosed, in accordance with Section 6 of BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF), its areas, and its working groups.  Note that
22	   other groups may also distribute working documents as Internet-
23	   Drafts.

25	   Internet-Drafts are draft documents valid for a maximum of six months
26	   and may be updated, replaced, or obsoleted by other documents at any
27	   time.  It is inappropriate to use Internet-Drafts as reference
28	   material or to cite them other than as "work in progress."

30	   The list of current Internet-Drafts can be accessed at
31	   http://www.ietf.org/ietf/1id-abstracts.txt.

33	   The list of Internet-Draft Shadow Directories can be accessed at
34	   http://www.ietf.org/shadow.html.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2006).

40	Abstract

42	   This document specifies Non-Congestion Robustness (NCR) for TCP.  In
43	   the absence of explicit congestion notification from the network TCP
44	   uses loss as an indication of congestion.  One of the ways TCP
45	   detects loss is using the arrival of three duplicate acknowledgments.
46	   However, this heuristic is not always correct, notably in the case
47	   when network paths reorder segments (for whatever reason), resulting
48	   in degraded performance.  TCP-NCR is designed to mitigate this
49	   degraded performance by increasing the number of duplicate
50	   acknowledgments required to trigger loss recovery, based on the
51	   current state of the connection, in an effort to better disambiguate
52	   true segment loss from segment reordering.  This document specifies
53	   the changes to TCP, as well as the costs and benefits of these
54	   modifications.

56	Table of Contents

58	       1.  Introduction . . . . . . . . . . . . . . . . . . . . . .   2
59	       2.  NCR Description  . . . . . . . . . . . . . . . . . . . .   5
60	       3.  Algorithm  . . . . . . . . . . . . . . . . . . . . . . .   6
61	         3.1  Initialization  . . . . . . . . . . . . . . . . . . .   8
62	         3.2  Terminating Extended Limited Transmit and
63	              Preventing Bursts . . . . . . . . . . . . . . . . . .   9
64	         3.3  Extended Limited Transmit . . . . . . . . . . . . . .  10
65	         3.4  Entering Loss Recovery  . . . . . . . . . . . . . . .  11
66	       4.  Advantages . . . . . . . . . . . . . . . . . . . . . . .  11
67	       5.  Disadvantages  . . . . . . . . . . . . . . . . . . . . .  12
68	       6.  Related Work . . . . . . . . . . . . . . . . . . . . . .  13
69	       7.  Security Considerations  . . . . . . . . . . . . . . . .  13
70	       8.  Acknowledgements . . . . . . . . . . . . . . . . . . . .  14
71	       9.  IANA Considerations  . . . . . . . . . . . . . . . . . .  14
72	       10. Normative References . . . . . . . . . . . . . . . . . .  14
73	       11. Informative References . . . . . . . . . . . . . . . . .  14
74	       12. Author's Addresses . . . . . . . . . . . . . . . . . . .  16

76	Terminology

78	       The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
79	       NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
80	       "OPTIONAL" in this document are to be interpreted as described
81	       in [RFC2119].

83	       Readers should be familiar with the TCP terminology (e.g.,
84	       FlightSize, Pipe, etc.) given in [RFC2581] and [RFC3517].

86	1. Introduction

88	   One strength of TCP [RFC793] lies in its ability to adjust its
89	   sending rate according to the perceived congestion in the network
90	   [Jac88,RFC2581].  In the absence of explicit notification of
91	   congestion from the network, TCP uses segment loss as an indication
92	   of congestion (i.e., assuming queue overflow).  TCP receivers send
93	   cumulative acknowledgments (ACKs) indicating the next sequence number
94	   expected from the sender for arriving segments [RFC793].  When
95	   segments arrive out-of-order, duplicate ACKs are generated.  As
96	   specified in [RFC2581], a TCP sender uses the arrival of three
97	   duplicate ACKs as an indication of segment loss.  The TCP sender
98	   retransmits the lost segment and reduces the load imposed on the
99	   network, assuming the segment loss was caused by resource contention
100	   within the network path.  The TCP sender does not assume loss on the
101	   first or second duplicate ACK, but waits for three duplicate ACKs to
102	   account for minor packet reordering.  However, the use of this
103	   constant threshold of duplicate ACKs has several problems that can be
104	   mitigated with a dynamic threshold.

106	   The following is an example of TCP's behavior:

108	     + TCP A is the data sender and TCP B is the data receiver.

110	     + TCP A sends 10 segments each consisting of a single data byte
111	       (i.e., transmits bytes 1-10 in segments 1-10).

113	     + Assume segment 3 is dropped in the network.

115	     + TCP B cumulatively acknowledges segments 1 and 2, making the
116	       cumulative ACK transmitted to the sender 3 (the next expected
117	       sequence number).  (Note: TCP B may generate one or two ACKs,
118	       depending on whether delayed ACKs [RFC1122,RFC2581] are
119	       employed.)

121	     + The arrival of segments 4-10 at TCP B will each trigger the
122	       transmission of a cumulative ACK for sequence number 3.  (Note:
123	       [RFC2581] recommends that delayed ACKs not be used when the ACK
124	       is triggered by an out-of-order segment.)

126	     + When TCP A receives the third duplicate ACK (or fourth ACK
127	       overall) for sequence number 3, TCP A will retransmit
128	       segment 3 and reduce the sending rate by roughly half (see
129	       [RFC2581] for specifics on the congestion control state
130	       adjustments).

132	   Alternatively, suppose segment 3 was not dropped by the network, but
133	   rather delayed such that segment 3 arrives at TCP B after segment 10.
134	   The above scenario will play out in precisely the same manner
135	   insomuch as a retransmission of segment 3 will be triggered.  In
136	   other words, TCP is not capable of disambiguating this reordering
137	   event from a segment loss, resulting in an unnecessary retransmission
138	   and rate reduction.

140	   The following is the specific motivation behind making TCP robust to
141	   reordered segments:

143	     * A number of Internet measurement studies have shown that packet
144	       reordering is not a rare phenomenon [Pax97,BPS99,JIDKT03,GPL04].
145	       Further, the reordering can be well beyond that required for
146	       fast retransmit to be falsely triggered.

148	     * [BA02,ZKFP03] show the negative performance implications that
149	       packet reordering has on current TCP.

151	     * The requirement imposed by TCP for almost in-order packet
152	       delivery places a constraint on the design of future technology.
153	       Novel routing algorithms, network components, link-layer
154	       retransmission mechanisms and applications could all be looked
155	       at with a fresh perspective if TCP were to be more robust to
156	       segment reordering.  For instance, high speed packet switches
157	       could cause resequencing of packets if TCP were more robust.
158	       There has been work proposed in the literature explicitly to
159	       ensure that packet ordering is maintained in such switches
160	       (e.g., [KM02]).  Also, link-layer mechanisms that attempt to
161	       recover from packet corruption by retransmitting could be
162	       allowed to reorder packets and, hence, increase the chances of
163	       local loss repair rather than relying on TCP to repair the loss
164	       (and, needlessly reduce its sending rate).  Additional examples
165	       include multi-path routing, high-delay satellite links and some
166	       of the schemes proposed for a differentiated services
167	       architecture.  By making TCP more robust to non-congestion
168	       events, TCP-NCR may open the design space of the future Internet
169	       components.

171	   In this document we specify a set of TCP sender modifications to
172	   provide Non-Congestion Robustness (NCR) to TCP.  In particular, these
173	   changes are built on top of TCP with selective acknowledgments
174	   (SACKs) [RFC2018] and the SACK-based loss recovery scheme given in
175	   [RFC3517], since SACK is widely deployed at this point ([MAF05]
176	   indicates that 68% of web servers and 88% of web clients utilize SACK
177	   as of spring, 2004).

179	   We note that the TCP-NCR algorithm provided in this document could be
180	   easily adapted to SCTP [RFC2960] since SCTP uses congestion control
181	   algorithms similar to TCP's (and, hence, has the same reordering
182	   robustness issues).

184	   As we note in several places in the remainder of this document, we
185	   consider TCP-NCR to be experimental in that more experience with the
186	   techniques is required before TCP-NCR should be used on a large scale
187	   on the Internet.  We encourage implementation and experimentation
188	   with TCP-NCR in the hopes of gaining an understanding of its
189	   suitability for wide-scale deployment.

191	   The remainder of this document is organized as follows.  Section 2
192	   provides a high-level description of the TCP-NCR mechanisms.  In
193	   Section 3, we specify the TCP-NCR algorithm.  Section 4 provides a
194	   brief overview of the benefits of TCP-NCR, while Section 5 discusses
195	   the drawbacks of TCP-NCR.  Section 6 discusses related work.  Section
196	   7 discusses security concerns.

198	2. NCR Description

200	   As discussed above, in the face of packet reordering, three duplicate
201	   ACKs may not be enough to disambiguate loss from reordering. In this
202	   section we provide a non-normative sketch of TCP-NCR.  The detailed
203	   algorithms for implementing Non-Congestion Robustness for TCP are
204	   presented in the next section.

206	   The general idea behind TCP-NCR is to increase the threshold used to
207	   trigger a fast retransmission from the current fixed value of three
208	   duplicate ACKs [RFC2581] to approximately a congestion window of data
209	   having left the network (but, not less than the currently
210	   standardized value of three duplicate ACKs).  Since cwnd represents
211	   the amount of data a TCP flow can transmit in one round-trip time
212	   (RTT), waiting to receive notice that cwnd bytes have left the
213	   network before deciding whether the root cause is loss or reordering
214	   imposes a delay of roughly one RTT on both the retransmission and the
215	   congestion control response.  The appropriate choice for a new value
216	   of the threshold is essentially a tradeoff between making the best
217	   decision regarding the cause of the duplicate ACKs and
218	   responsiveness.  The choice to trigger a retransmission only after a
219	   cwnd's worth of data is known to have left the network represents
220	   roughly the largest amount of time a TCP can wait before the (often
221	   costly) retransmission timeout may be triggered.  Therefore, the
222	   algorithm described in this document attempts to make the best
223	   decision possible at the expense of timeliness.

225	   Simply increasing the threshold before retransmitting a segment can
226	   make TCP brittle to packet loss or ACK loss since such loss reduces
227	   the number of duplicate ACKs that will arrive at the sender from the
228	   receiver.  For instance, if the cwnd is 10 segments and one segment
229	   is lost, a duplicate ACK threshold of 10 will never be met because
230	   duplicate ACKs corresponding to at most 9 segments will arrive at the
231	   sender.  To offset the issue of loss, we extend TCP's Limited
232	   Transmit [RFC3042] scheme to allow for the sending of new data during
233	   the period when the TCP sender is disambiguating loss and reordering.
234	   This new data serves to increase the likelihood of enough duplicate
235	   ACKs arriving at the sender to trigger loss recovery if it is
236	   appropriate.

238	   At this point we note that TCP tightly couples reliability and
239	   congestion control -- when a segment is declared lost, a
240	   retransmission is triggered and a change to the sending rate is also
241	   made on the assumption that the drop is due to resource contention

243	   [RFC2581].  Therefore, by simply changing the retransmission trigger
244	   the congestion control response is also changed.  However, we lack
245	   experience on the Internet as to whether delaying the point that a
246	   rate reduction takes place is appropriate for wide-scale deployment.
247	   Therefore, the Extended Limited Transmit mechanism proposed in this
248	   document offers two variants for experimentation.

250	   The first Extended Limited Transmit variant, Careful Limited
251	   Transmit, calls for the transmission of one previously unsent
252	   segment, in response to duplicate acknowledgements, for every two
253	   segments that are known to have left the network.  This has the
254	   effect of halving the sending rate since normal TCP operation calls
255	   for the sending of one segment for every segment that has left the
256	   network.  Further, the halving starts immediately and is not delayed
257	   until a retransmission is triggered.  In the case of packet
258	   reordering (i.e., not segment loss) the congestion control state is
259	   restored to its previous state when reordering is determined.

261	   The second variant, Aggressive Limited Transmit, calls for
262	   transmitting one previously unsent data segment, in response to
263	   duplicate acknowledgements, for every segment known to have left the
264	   network.  With this variant, while waiting to disambiguate the loss
265	   from a reordering event, ACK-clocked transmission continues at
266	   roughly the same rate as before the event started.  Retransmission
267	   and the sending rate reduction happen per [RFC2581,RFC3517], albeit
268	   with the delayed threshold described above.  While this approach
269	   delays legitimate rate reductions (possibly slightly and temporarily
270	   aggravating overall congestion on the network) the scheme has the
271	   advantage of not reducing the transmission rate in the face of
272	   segment reordering.

274	   It is an open question which of the two Extended Limited Transmit
275	   variants is best for use on the Internet.

277	3. Algorithm

279	   The TCP-NCR modifications make two fundamental changes to the way
280	   [RFC3517] currently operates, as follows.

282	   First, the trigger for retransmitting a segment is changed from three
283	   duplicate ACKs [RFC2581,RFC3517] to indications that a congestion
284	   window's worth of data has left the network.  Second, TCP-NCR
285	   decouples initial congestion control decisions from retransmission
286	   decisions, in some cases delaying congestion control changes relative
287	   to TCP's current behavior defined in [RFC2581]. The algorithm
288	   provides two alternatives for extending Limited Transmit.  The two
289	   variants of extended Limited Transmit are:

291	       Careful Limited Transmit:

293	        This variant calls for reducing the sending rate at
294	        approximately the same time [RFC2581] implementations reduce
295	        the congestion window, while at the same time withholding a
296	        retransmission (and the final congestion determination) for
297	        approximately one RTT.

299	       Aggressive Limited Transmit:

301	        This variant calls for maintaining the sending rate in the
302	        face of duplicate ACKs until TCP concludes a segment is lost
303	        and needs to be retransmitted (which TCP-NCR delays by one
304	        RTT when compared with current loss recovery schemes).

306	   A TCP-NCR implementation MUST use either Careful Limited Transmit or
307	   Aggressive Limited Transmit.

309	   A constant MUST be set depending on which variant of extended Limited
310	   Transmit is used, as follows:

312	       Careful Limited Transmit:

314	        LT_F = 2/3

316	       Aggressive Limited Transmit:

318	        LT_F = 1/2

320	   This constant reflects the fraction of outstanding data (including
321	   data sent during Extended Limited Transmit) that must be SACKed
322	   before a retransmission is triggered.  Since Aggressive Limited
323	   Transmit sends a new segment for every segment known to have left the
324	   network, a total of roughly cwnd segments will be sent during
325	   Aggressive Limited Transmit and therefore ideally a total of roughly
326	   2*cwnd segments will be outstanding when a retransmission is
327	   triggered.  The duplicate ACK threshold is then set to LT_F = 1/2 of
328	   2*cwnd (or about 1 RTT worth of data).  The factor is different for
329	   Careful Limited Transmit because the sender only transmits one new
330	   segment for every two segments that are SACKed and therefore will
331	   ideally have a total of 1.5*cwnd segments outstanding when the
332	   retransmission is to be triggered.  Hence, the required threshold is
333	   LT_F=2/3 of 1.5*cwnd to delay the retransmission by roughly 1 RTT.

335	   There are situations whereby the sender cannot transmit new data
336	   during Extended Limited Transmit (e.g., lack of data from the
337	   application, receiver's advertised window limit).  These situations
338	   can lead to the problems discussed in the last section when a TCP
339	   does not employ Extended Limited Transmit and is starved for ACKs.
340	   Therefore, TCP-NCR adapts the duplicate ACK threshold on each SACK
341	   arrival to be as robust as possible given the actual amount of data
342	   that has been transmitted, or roughly LT_F times the number of
343	   outstanding segments.

345	   The TCP-NCR modifications specified in this document lend themselves
346	   to incremental deployment. Only the TCP implementation on the sender
347	   side requires modification (assuming both hosts support SACK).  The
348	   changes themselves are modest.  However, as will be discussed below,
349	   availability of additional buffer space at the receiver will help
350	   maximize the benefits of using TCP-NCR but are not strictly
351	   necessary.

353	   The following algorithms depend on the notions provided by [RFC3517]
354	   and we assume the reader is familiar with the terminology given in
355	   [RFC3517].  The TCP-NCR algorithm can be adapted to alternate SACK-
356	   based loss recovery schemes.  [BR04,BSRV04] outline non-SACK-based
357	   algorithms, however, we do not specify those algorithms in this
358	   document and do not recommend them due to both the complexity and
359	   security implications of having only a gross understanding of the
360	   number of outstanding segments in the network.

362	   A TCP connection using the Nagle algorithm [RFC896,RFC1122] MAY
363	   employ the TCP-NCR algorithm.  If a TCP implementation does implement
364	   TCP-NCR the implementation MUST follow the various specifications
365	   provided in sections 3.1 - 3.4.  If the Nagle algorithm is not being
366	   used there is no way to accurately calculate the number of
367	   outstanding segments in the network (and, therefore, no good way to
368	   derive an appropriate duplicate ACK threshold) without adding state
369	   to the TCP sender.  A TCP connection that does not employ the Nagle
370	   algorithm SHOULD NOT use TCP-NCR.  We envision that NCR could be
371	   adapted to an implementation that carefully tracks the sequence
372	   numbers transmitted in each segment.  However, we leave this as
373	   future work.

375	3.1.  Initialization

377	   When entering a period of loss / reordering detection and Extended
378	   Limited Transmit a TCP-NCR MUST initialize several state variables.
379	   A TCP MUST enter Extended Limited Transmit upon receiving the first
380	   ACK with a SACK block after the reception of an ACK that (a) did not
381	   contain SACK information and (b) did increase the connection's
382	   cumulative ACK point.  The initializations are:

384	   (I.1) The TCP MUST save the current FlightSize.

386	         FlightSizePrev = FlightSize

388	   (I.2) The TCP MUST set a variable for tracking the number of
389	         segments for which an ACK does not trigger a transmission
390	         during Careful Limited Transmit.

392	         Skipped = 0

394	         (Note: Skipped is not used during Aggressive Limited
395	         Transmit.)

397	   (I.3) The TCP MUST set DupThresh (from [RFC3517]) based on the
398	         current FlightSize.

400	         DupThresh = max (LT_F * (FlightSize / SMSS),3)

402	         Note: We keep the lower bound of DupThresh = 3 from
403	         [RFC2581,RFC3517].

405	   In addition to the above steps, the incoming ACK MUST be processed
406	   with the E series of steps in section 3.3.

408	3.2.  Terminating Extended Limited Transmit and Preventing Bursts

410	   Extended Limited Transmit MUST be terminated at the start of loss
411	   recovery as outlined in section 3.4.

413	   The arrival of an ACK that advances the cumulative ACK point while in
414	   Extended Limited Transmit, but before loss recovery is triggered
415	   signals that a series of duplicate ACKs were caused by reordering and
416	   not congestion.  Therefore, the receipt of an ACK that extends the
417	   cumulative ACK point MUST terminate Extended Limited Transmit.  As
418	   described below (in (T.4)), an ACK that extends the cumulative ACK
419	   point and *also* contains SACK information will also trigger the
420	   beginning of a new Extended Limited Transmit phase.

422	   Upon the termination of Extended Limited Transmit, and especially
423	   when using the Careful variant, TCP-NCR may be in a situation where
424	   the entire cwnd is not being utilized and therefore TCP-NCR will be
425	   prone to transmitting a burst of segments into the network.
426	   Therefore, to mitigate this bursting when a TCP-NCR in the Extended
427	   Limited Transmit phase receives an ACK that updates the cumulative
428	   ACK point (regardless of whether the ACK contains SACK information),
429	   the following steps MUST be taken:

431	   (T.1) A TCP MUST reset cwnd to:

433	         cwnd = min (FlightSize + SMSS,FlightSizePrev)

435	         This step ensures that cwnd is not grossly larger than the
436	         amount of data outstanding --- a situation that would cause a
437	         line rate burst.

439	   (T.2) A TCP MUST set ssthresh to:

441	         ssthresh = FlightSizePrev

443	         This step provides TCP-NCR with a sense of "history".  If step
444	         (T.1) reduces cwnd below FlightSizePrev this step ensures that
445	         TCP-NCR will slow start back to the operating point in effect
446	         before Extended Limited Transmit.

448	   (T.3) A TCP is now permitted to transmit previously unsent data as
449	         allowed by cwnd, FlightSize, application data availability and
450	         the receiver's advertised window.

452	   (T.4) When an incoming ACK extends the cumulative ACK point and also
453	         contains SACK information, the initializations in steps (I.2)
454	         and (I.3) from section 3.1 MUST be taken (but, step (I.1) MUST
455	         NOT be executed) to re-start Extended Limited Transmit.  In
456	         addition, the series of steps in section 3.3 (the "E" steps)
457	         MUST be taken.

459	3.3. Extended Limited Transmit

461	   On each ACK containing SACK information that arrives after TCP-NCR
462	   has entered the Extended Limited Transmit phase (as outlined in
463	   section 3.1) and before Extended Limited Transmit terminates, the
464	   sender MUST use the following procedure.

466	   (E.1) The SetPipe () procedure from [RFC3517] MUST be used to set
467	         the "pipe" variable (which represents the number of bytes
468	         still considered "in the network").  Note: the current value
469	         of DupThresh MUST be used by SetPipe () to produce an accurate
470	         assessment of the amount of data still considered in the
471	         network.

473	   (E.2) If the comparison in equation (1) below holds and there are
474	         SMSS bytes of previously unsent data available for
475	         transmission then the sender MUST transmit one segment of SMSS
476	         bytes.

478	           (pipe + Skipped) <= (FlightSizePrev - SMSS)              (1)

480	         If the comparison in equation (1) does not hold or no new data
481	         can be transmitted (due to lack of data from the application
482	         or the advertised window limit), skip to step (E.6).

484	   (E.3) Pipe MUST be incremented by SMSS bytes.

486	   (E.4) If using Careful Limited Transmit, Skipped MUST be incremented
487	         by SMSS bytes to ensure that the next SMSS bytes of SACKed data
488	         processed does not trigger a Limited Transmit transmission
489	         (since the goal of Careful Limited Transmit is to send upon
490	         the reception of every second duplicate ACK).

492	   (E.5) A TCP MUST return to step (E.2) to ensure that as many bytes
493	         as appropriate are transmitted.  This provides robustness to
494	         ACK loss that can be (largely) compensated for using SACK
495	         information.

497	   (E.6) DupThresh MUST be reset via:

499	           DupThresh = max (LT_F * (FlightSize / SMSS),3)

501	         where FlightSize is the total number of bytes that have not
502	         been cumulatively acknowledged (which is different from
503	         "pipe").

505	3.4 Entering Loss Recovery

507	       When a segment is deemed lost via the algorithms in [RFC3517],
508	       Extended Limited Transmit MUST be terminated, leaving the
509	       algorithms in [RFC3517] to govern TCP's behavior.  One slight
510	       change to [RFC3517] MUST be made, however.  In section 5, step
511	       (2) of [RFC3517] MUST be changed to:

513	           (2) ssthresh = cwnd = (FlightSizePrev / 2)

515	       This ensures that the congestion control modifications are made
516	       with respect to the amount of data in the network before
517	       FlightSize was increased by Extended Limited Transmit.

519	       Note: Once the algorithm in [RFC3517] takes over from Extended
520	       Limited Transmit the DupThresh value MUST be held constant until
521	       the loss recovery phase is terminated.

523	4. Advantages

525	   The major advantages of TCP-NCR are two-fold.  As discussed in
526	   section 1, TCP-NCR will open up the design space for network
527	   applications and components that are currently constrained by TCP's
528	   lack of robustness to packet reordering.  The second advantage is in
529	   terms of an increase in TCP performance.

531	   [BR04] presents ns-2 [NS-2] simulations of a pre-cursor to the TCP-
532	   NCR algorithm specified in this document, called TCP-DCR (Delayed
533	   Congestion Response). The paper shows that TCP-DCR aids performance
534	   in comparison to unmodified TCP in the presence of packet reordering.
535	   In addition, the extended version of [BR04] presents results based on
536	   emulations involving Linux (kernel 2.4.24).  These results show that
537	   the performance of TCP-DCR is similar to Linux's native
538	   implementation that seeks to "undo" wrong decisions based on DSACK
539	   [RFC2883] feedback (similar to the schemes outlined in [ZKFP03]),
540	   when packets are reordered by less than one RTT. The advantage of
541	   using TCP-DCR over the DSACK-based scheme is that the DSACK-based
542	   scheme tries to estimate the exact amount of reordering in the
543	   network using fairly complex algorithms, whereas TCP-DCR achieves
544	   similar results with less complicated modifications.

546	   In addition, [BR04,BSRV04] illustrate the ability of TCP-DCR to allow
547	   for the improvement of other parts of the system.  For example, these
548	   papers show that increasing TCP's robustness to packet reordering
549	   allows for a novel wireless ARQ mechanism to be added at the link-
550	   layer.  The added robustness of the link-layer to channel errors, in
551	   turn, increases TCP performance by not requiring TCP to retransmit
552	   packets that were dropped due to corruption (and, hence, also
553	   prevents TCP from needlessly reducing the sending rate when
554	   retransmitting these segments).

556	5. Disadvantages

558	   While we note that all of the changes outlined above are implemented
559	   in the sender, the receiver also potentially has a part to play.  In
560	   particular, TCP-NCR increases the receiver's buffering requirement by
561	   up to an extra cwnd -- in the case of the TCP sender using Aggressive
562	   Limited Transmit and actual loss occurring in the network.
563	   Therefore, to maximize the benefits from TCP-NCR receivers should
564	   advertise a large window to absorb the extra out-of-order traffic. In
565	   the case that the additional buffer requirements are not met, the use
566	   of the above algorithm takes into account the reduced advertised
567	   window---with a corresponding loss in robustness to packet
568	   reordering.

570	   In addition, using TCP-NCR could delay the delivery of data to the
571	   application by up to one RTT because the fast retransmission point is
572	   delayed by roughly one RTT in TCP-NCR.  Applications that are
573	   sensitive to such delays should turn off the TCP-NCR option.  For
574	   instance, a socket option could be introduced to allow applications
575	   to control whether NCR would be used for a particular connection.

577	   Finally, the use of TCP-NCR makes the recovery from congestion events
578	   sluggish in comparison to the standard reaction in [RFC2581].  [BR04,
579	   BSRV04] show (via simulation) that the delay in congestion response
580	   has minimal impact on the connection itself and the traffic sharing a
581	   bottleneck.  [BBFS01] also indicates (again, via simulation) that
582	   "slowly responsive" congestion control may be safe for deployment in
583	   the Internet.  These studies suggest that schemes that slightly delay
584	   congestion control decisions may be reasonable, however, further
585	   experimentation on the Internet is required to verify these results.

587	6. Related Work

589	   Over the past few years, several solutions have been proposed to
590	   improve the performance of TCP in the face of segment reordering.
591	   These schemes generally fall into one of two categories (with some
592	   overlap): mechanisms that try to prevent spurious retransmits from
593	   happening and mechanisms that try to detect spurious retransmits and
594	   "undo" the needless congestion control state changes that have been
595	   taken.

597	   [BA02,ZKFP03] attempt to prevent segment reordering from triggering
598	   spurious retransmits by using various algorithms to approximate the
599	   duplicate ACK threshold required to disambiguate loss and reordering
600	   over a given network path at a given time.  TCP-NCR similarly tries
601	   to prevent spurious retransmits.  However, TCP-NCR takes a simplified
602	   approach compared to those in [BA02,ZKFP03] in that TCP-NCR simply
603	   delays retransmission by an amount based on the current cwnd (in
604	   comparison to standard TCP), while the other schemes use relatively
605	   complex algorithms in an attempt to derive a more precise value for
606	   DupThresh that depends on the current patterns of packet reordering.
607	   While TCP-NCR offers simplicity the other schemes may offer more
608	   precision such that applications would not be forced to wait as long
609	   for their retransmissions.  Future work could be undertaken to
610	   achieve robustness without needless delay.

612	   On the other hand, several schemes have been developed to detect and
613	   mitigate needless retransmissions after the fact.
614	   [RFC3522,RFC3708,BA02,RFC4015,SK04] present algorithms to detect
615	   spurious retransmits and mitigate the changes these events made to
616	   the congestion control state.  TCP-NCR could be used in conjunction
617	   with these algorithms, with TCP-NCR attempting to prevent spurious
618	   retransmits and some other scheme kicking in if the prevention
619	   failed.  In addition, we note that TCP-NCR is concentrated on
620	   preventing spurious fast retransmits and some of the above algorithms
621	   also attempt to detect and mitigate spurious timeout-based
622	   retransmits.

624	7. Security Considerations

626	   We do not believe there are security implications involved with TCP-
627	   NCR over and above those for general TCP congestion control

629	   [RFC2581].  In particular, the Extended Limited Transmit algorithms
630	   specified in this document have been specifically designed not to be
631	   susceptible to the sorts of ACK splitting attacks TCP's general TCP
632	   congestion control is vulnerable to (as discussed in [RFC3465]).

634	8. Acknowledgements

636	   Feedback from Lars Eggert, Ted Faber, Wesley Eddy, Gorry Fairhurst,
637	   Sally Floyd, Sara Landstrom, Nauzad Sadry, Pasi Sarolahti, Joe Touch
638	   and Nitin Vaidya and the TCPM working group have contributed
639	   significantly to this document.  Our thanks to all!

641	9. IANA Considerations

643	   This document requires no IANA assignments.  The RFC Editor can
644	   safely remove this section.

646	10. Normative References

648	   [RFC793] J. Postel, "Transmission Control Protocol", RFC 793,
649	   September 1981.

651	   [RFC2018] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, "TCP
652	   selective acknowledgment options," Internet RFC 2018.

654	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
655	   Requirement Levels", BCP 14, RFC 2119, March 1997.

657	   [RFC2581] M. Allman, V. Paxson, and  W. Stevens, "TCP Congestion
658	   Control", RFC 2581, April 1999.

660	   [RFC3042] M. Allman, H. Balakrishnan and S. Floyd, "Enhancing TCP's
661	   Loss Recovery Using Limited Transmit", RFC 3042, January 2001.

663	   [RFC3517] E. Blanton, M. Allman, K. Fall and L. Wang, "A Conservative
664	   Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for
665	   TCP", RFC 3517, April 2003.

667	11. Informative References

669	   [BA02] E. Blanton and M. Allman, "On Making TCP More Robust to Packet
670	   Reordering," ACM Computer Communication Review, January 2002.

672	   [BBFS01] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker,
673	   "Dynamic Behavior of Slowly Responsive Congestion Control
674	   Algorithms", Proceedings of ACM SIGCOMM, Sep. 2001.

676	   [BPS99] J. Bennett, C. Partridge, and N. Shectman, "Packet reordering
677	   is not pathological network behavior," IEEE/ACM Transactions on
678	   Networking, December 1999.

680	   [BR04] Sumitha Bhandarkar and A. L. Narasimha Reddy, "TCP-DCR: Making
681	   TCP Robust to Non-Congestion Events", In the Proceedings of
682	   Networking 2004 conference, May 2004. Extended version available as
683	   tech report TAMU-ECE-2003-04.

685	   [BSRV04] Sumitha Bhandarkar, Nauzad Sadry, A. L. Narasimha Reddy and
686	   Nitin Vaidya, "TCP-DCR: A Novel Protocol for Tolerating Wireless
687	   Channel Errors", To appear in IEEE Transactions on Mobile Computing

689	   [GPL04] Ladan Gharai, Colin Perkins and Tom Lehman, "Packet
690	   Reordering, High Speed Networks and Transport Protocol Performance",
691	   ICCCN 2004, October 2004.

693	   [Jac88] V. Jacobson, "Congestion Avoidance and Control", Computer
694	   Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988.
695	   ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.

697	   [JIDKT03] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, and D.
698	   Towsley, "Measurement and Classification of Out-of-Sequence Packets
699	   in a Tier-1 IP Backbone," Proceedings of IEEE INFOCOM, 2003.

701	   [KM02] I. Keslassy and N. McKeown, "Maintaining packet order in
702	   twostage switches," Proceedings of the IEEE Infocom, June 2002

704	   [MAF05] A. Medina, M. Allman, S. Floyd.  Measuring the Evolution of
705	   Transport Protocols in the Internet.  ACM Computer Communication
706	   Review, 35(2), April 2005.

708	   [NS-2] ns-2 Network Simulator. http://www.isi.edu/nsnam/

710	   [Pax97] V. Paxson, "End-to-End Internet Packet Dynamics," Proceedings
711	   of ACM SIGCOMM, September 1997.

713	   [RFC896] J. Nagle, "Congestion Control in IP/TCP Internetworks", RFC
714	   896, January 1984.

716	   [RFC1122] R. Braden, "Requirements for Internet Hosts - Communication
717	   Layers", RFC 1122, October 1989.

719	   [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis and Matt
720	   Podolsky, "An Extension to the Selective Acknowledgement (SACK)
721	   Option for TCP," RFC 2883, July 2000.

723	   [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H.
724	   Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, V.  Paxson.

726	   Stream Control Transmission Protocol.  October 2000.

728	   [RFC2988] V. Paxson and M. Allman, "Computing TCP's Retransmission
729	   Timer", RFC 2988, November 2000.

731	   [RFC3465] M. Allman.  TCP Congestion Control with Appropriate Byte
732	   Counting (ABC), February 2003.  RFC 3465.

734	   [RFC3522] R. Ludwig and M. Meyer, "The Eifel Detection Algorithm for
735	   TCP," RFC 3522, April 2003.

737	   [RFC3708] E. Blanton and M. Allman, "Using TCP Duplicate Selective
738	   Acknowledgement (DSACKs) and Stream Control Transmission Protocol
739	   (SCTP) Duplicate Transmission Sequence Numbers (TSNs) to Detect
740	   Spurious Retransmissions", RFC 3708, February 2004.

742	   [RFC4015] R. Ludwig, A. Gurtov, "The Eifel Response Algorithm for
743	   TCP", RFC 4015, February 2005.

745	   [SK04] P. Sarolahti, M. Kojo, "Forward RTO-Recovery (F-RTO): An
746	   Algorithm for Detecting Spurious Retransmission Timeouts with TCP and
747	   SCTP", Internet-Draft draft-ietf-tcpm-frto-02.txt (work in progress).
748	   November 2004.

750	   [ZKFP03] M. Zhang, B. Karp, S. Floyd, L. Peterson, "RR-TCP: A
751	   Reordering-Robust TCP with DSACK", in Proceedings of the Eleventh
752	   IEEE International Conference on Networking Protocols (ICNP 2003),
753	   Atlanta, GA, November, 2003.

755	12. Author's Addresses

757	   Sumitha Bhandarkar
758	   Dept. of Elec. Engg.
759	   214 ZACH
760	   College Station, TX 77843-3128
761	   Phone: (512) 468-8078
762	   Email: sumitha@tamu.edu
763	   URL  : http://students.cs.tamu.edu/sumitha/

765	   A. L. Narasimha Reddy
766	   Professor
767	   Dept. of Elec. Engg.
768	   315C WERC
769	   College Station, TX 77843-3128
770	   Phone : (979) 845-7598
771	   Email : reddy@ee.tamu.edu
772	   URL   : http://ee.tamu.edu/~reddy/
773	   Mark Allman
774	   ICSI Center for Internet Research
775	   1947 Center Street, Suite 600
776	   Berkeley, CA 94704-1198
777	   Phone: (216) 243-7361
778	   Email: mallman@icir.org
779	   URL: http://www.icir.org/mallman/

781	   Ethan Blanton
782	   Purdue University Computer Science
783	   250 North University Street
784	   West Lafayette, IN  47907
785	   Email: eblanton@cs.purdue.edu

787	Intellectual Property Statement

789	   The IETF takes no position regarding the validity or scope of any
790	   Intellectual Property Rights or other rights that might be claimed to
791	   pertain to the implementation or use of the technology described in
792	   this document or the extent to which any license under such rights
793	   might or might not be available; nor does it represent that it has
794	   made any independent effort to identify any such rights.  Information
795	   on the procedures with respect to rights in RFC documents can be
796	   found in BCP 78 and BCP 79.

798	   Copies of IPR disclosures made to the IETF Secretariat and any
799	   assurances of licenses to be made available, or the result of an
800	   attempt made to obtain a general license or permission for the use of
801	   such proprietary rights by implementers or users of this
802	   specification can be obtained from the IETF on-line IPR repository at
803	   http://www.ietf.org/ipr.

805	   The IETF invites any interested party to bring to its attention any
806	   copyrights, patents or patent applications, or other proprietary
807	   rights that may cover technology that may be required to implement
808	   this standard.  Please address the information to the IETF at
809	   ietf-ipr@ietf.org.

811	Disclaimer of Validity

813	   This document and the information contained herein are provided on an
814	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
815	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
816	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
817	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
818	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
819	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

821	Copyright Statement

823	   Copyright (C) The Internet Society (2006).  This document is subject
824	   to the rights, licenses and restrictions contained in BCP 78, and
825	   except as set forth therein, the authors retain all their rights.

827	Acknowledgment

829	   Funding for the RFC Editor function is currently provided by the
830	   Internet Society.