idnits 2.17.1 

draft-ietf-tcpm-early-rexmt-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 5 instances of too long lines in the document, the longest one
     being 2 characters in excess of 72.

  ** The abstract seems to contain references ([RFC5681], [RFC2119]), which
     it shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (January 2010) is 5213 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Unused Reference: 'AA02' is defined on line 482, but no explicit
     reference was found in the text

  == Unused Reference: 'LK98' is defined on line 520, but no explicit
     reference was found in the text

  == Unused Reference: 'Mor97' is defined on line 524, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3150' is defined on line 541, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298)

  ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260)

  -- Obsolete informational reference (is this intentional?): RFC 3517
     (Obsoleted by RFC 6675)

  -- Obsolete informational reference (is this intentional?): RFC 3782
     (Obsoleted by RFC 6582)


     Summary: 6 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                              Mark Allman
2	INTERNET DRAFT                                                      ICSI
3	File: draft-ietf-tcpm-early-rexmt-04.txt          Konstantin Avrachenkov
4	Intended Status: Experimental                                      INRIA
5	                                                            Urtzi Ayesta
6	                                           BCAM-IKERBASQUE and LAAS-CNRS
7	                                                            Josh Blanton
8	                                                         Ohio University
9	                                                              Per Hurtig
10	                                                     Karlstad University
11	                                                            January 2010
12	                                                      Expires: July 2010

14	                   Early Retransmit for TCP and SCTP

16	Status of this Memo

18	    This Internet-Draft is submitted to IETF in full conformance with
19	    the provisions of BCP 78 and BCP 79.

21	    Internet-Drafts are working documents of the Internet Engineering
22	    Task Force (IETF), its areas, and its working groups.  Note that
23	    other groups may also distribute working documents as Internet-
24	    Drafts.

26	    Internet-Drafts are draft documents valid for a maximum of six
27	    months and may be updated, replaced, or obsoleted by other documents
28	    at any time.  It is inappropriate to use Internet-Drafts as
29	    reference material or to cite them other than as "work in progress."

31	    The list of current Internet-Drafts can be accessed at
32	    http://www.ietf.org/ietf/1id-abstracts.txt.

34	    The list of Internet-Draft Shadow Directories can be accessed at
35	    http://www.ietf.org/shadow.html.

37	    This Internet-Draft will expire on July 27, 2010.

39	Copyright Notice

41	    Copyright (c) 2009 IETF Trust and the persons identified as the
42	    document authors.  All rights reserved.

44	    This document is subject to BCP 78 and the IETF Trust's Legal
45	    Provisions Relating to IETF Documents
46	    (http://trustee.ietf.org/license-info) in effect on the date of
47	    publication of this document.  Please review these documents
48	    carefully, as they describe your rights and restrictions with
49	    respect to this document.  Code Components extracted from this
50	    document must include Simplified BSD License text as described in
51	    Section 4.e of the Trust Legal Provisions and are provided without
52	    warranty as described in the BSD License.

54	Abstract
55	    This document proposes a new mechanism for TCP and SCTP that can be
56	    used to recover lost segments when a connection's congestion window
57	    is small.  The "Early Retransmit" mechanism allows the transport to
58	    reduce, in certain special circumstances, the number of duplicate
59	    acknowledgments required to trigger a fast retransmission.  This
60	    allows the transport to use fast retransmit to recover segment
61	    losses that would otherwise require a lengthy retransmission
62	    timeout.

64	Terminology

66	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
67	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
68	    document are to be interpreted as described in RFC 2119 [RFC2119].

70	    The reader is expected to be familiar with the definitions given in
71	    [RFC5681].

73	1   Introduction

75	    Many researchers have studied problems with TCP's loss recovery
76	    [RFC793,RFC5681] when the congestion window is small and have
77	    outlined possible mechanisms to mitigate these problems
78	    [Mor97,BPS+98,Bal98,LK98,RFC3150,AA02].  SCTP's [RFC4960] loss
79	    recovery and congestion control mechanisms are based on TCP and
80	    therefore the same problems impact the performance of SCTP
81	    connections.  When the transport detects a missing segment, the
82	    connection enters a loss recovery phase.  There are several variants
83	    of the loss recovery phase depending on the TCP implementation.  TCP
84	    can use slow start based recovery or Fast Recovery [RFC5681],
85	    NewReno [RFC3782], and loss recovery based on selective
86	    acknowledgments (SACKs) [RFC2018,FF96,RFC3517].  SCTP's loss
87	    recovery is not as varied due to the built-in selective
88	    acknowledgments.

90	    All the above variants have two methods for invoking loss recovery.
91	    First, if an acknowledgment (ACK) for a given segment is not
92	    received in a certain amount of time a retransmission timer fires
93	    and the segment is resent [RFC2988,RFC4960].  Second, the "Fast
94	    Retransmit" algorithm resends a segment when three duplicate ACKs
95	    arrive at the sender [Jac88,RFC5681].  Duplicate ACKs are triggered by
96	    out-of-order arrivals at the receiver.  However, because duplicate
97	    ACKs from the receiver are triggered by both segment loss and
98	    segment reordering in the network path, the sender waits for three
99	    duplicate ACKs in an attempt to disambiguate segment loss from
100	    segment reordering.  When the congestion window is small it may not
101	    be possible to generate the required number of duplicate ACKs to
102	    trigger Fast Retransmit when a loss does happen.

104	    Small congestion windows can occur in a number of situations, such
105	    as:

107	    (1) The connection is constrained by end-to-end congestion control
108	        when the connection's share of the path is small, the path has a
109	        small bandwidth-delay product or the transport is ascertaining
110	        the available bandwidth in the first few round-trip times of
111	        slow start.

113	    (2) The connection is "application limited" and has only a limited
114	        amount of data to send.  This can happen any time the
115	        application does not produce enough data to fill the congestion
116	        window.  A particular case when all connections become
117	        application limited is as the connection ends.

119	    (3) The connection is limited by the receiver's advertised window.

121	    The transport's retransmission timeout (RTO) is based on measured
122	    round-trip times (RTT) between the sender and receiver, as specified
123	    in [RFC2988] (for TCP) and [RFC4960] (for SCTP).  To prevent
124	    spurious retransmissions of segments that are only delayed and not
125	    lost, the minimum RTO is conservatively chosen to be 1 second.
126	    Therefore, it behooves TCP senders to detect and recover from as
127	    many losses as possible without incurring a lengthy timeout during
128	    which the connection remains idle.  However, if not enough duplicate
129	    ACKs arrive from the receiver, the Fast Retransmit algorithm is
130	    never triggered---this situation occurs when the congestion window
131	    is small, if a large number of segments in a window are lost or at
132	    the end of a transfer as data drains from the network.  For
133	    instance, consider a congestion window of three segments worth of
134	    data.  If one segment is dropped by the network, then at most two
135	    duplicate ACKs will arrive at the sender.  Since three duplicate
136	    ACKs are required to trigger Fast Retransmit, a timeout will be
137	    required to resend the dropped segment.  Note, delayed ACKs
138	    [RFC5681] may further reduce the number of duplicate ACKs a receiver
139	    sends.  However, we assume that receivers send immediate ACKs when
140	    there is a gap in the received sequence space per [RFC5681].

142	    [BPS+98] shows that roughly 56% of retransmissions sent by a busy
143	    web server are sent after the RTO timer expires, while only 44% are
144	    handled by Fast Retransmit.  In addition, only 4% of the RTO
145	    timer-based retransmissions could have been avoided with SACK, which
146	    has to continue to disambiguate reordering from genuine loss.
147	    Furthermore, [All00] shows that for one particular web server the
148	    median number of bytes carried by a connection is less than four
149	    segments, indicating that more than half of the connections will be
150	    forced to rely on the RTO timer to recover from any losses that
151	    occur.  Thus, loss recovery that does not rely on the conservative
152	    RTO is likely to be beneficial for short TCP transfers.

154	    The Limited Transmit mechanism introduced in [RFC3042] and currently
155	    codified in [RFC5681] allows a TCP sender to transmit previously
156	    unsent data upon the reception of each of the two duplicate ACKs
157	    that precede a Fast Retransmit.  SCTP [RFC4960] uses SACK
158	    information to calculate the number of outstanding segments in the
159	    network.  Hence, when the first two duplicate ACKs arrive at the
160	    sender they will indicate that data has left the network and allow
161	    the sender to transmit new data (if available) similar to TCP's
162	    Limited Transmit algorithm.  In the remainder of this document we
163	    use "Limited Transmit" to include both TCP and SCTP mechanisms for
164	    sending in response to the first two duplicate ACKs.  By sending
165	    these two new segments the sender is attempting to induce additional
166	    duplicate ACKs (if appropriate) so that Fast Retransmit will be
167	    triggered before the retransmission timeout expires.  The
168	    sender-side "Early Retransmit" mechanism outlined in this document
169	    covers the case when previously unsent data is not available for
170	    transmission (case (2) above) or cannot be transmitted due to an
171	    advertised window limitation (case (3) above).

173	    Note: This document is being published as an experimental RFC as
174	    part of the process for the TCPM WG and the IETF to assess whether
175	    the proposed change is useful and safe in the heterogeneous
176	    environments, including which variants of the mechanism are the most
177	    effective.  In the future, this specification may be updated and put
178	    on the standards track if the safeness and efficacy can be
179	    demonstrated.

181	2   Early Retransmit Algorithm

183	    The Early Retransmit algorithm calls for lowering the threshold for
184	    triggering Fast Retransmit when the amount of outstanding data is
185	    small and when no previously unsent data can be transmitted (such
186	    that Limited Transmit could be used).  Duplicate ACKs are triggered
187	    by each arriving out-of-order segment.  Therefore, Fast Retransmit
188	    will not be invoked when there are less than four outstanding
189	    segments (assuming only one segment loss in the window).  However,
190	    TCP and SCTP are not required to track the number of outstanding
191	    segments, but rather the number of outstanding bytes or messages.
192	    (Note, SCTP's message boundaries do not necessarily correspond to
193	    segment boundaries.)  Therefore, applying the intuitive notion of a
194	    transport with less than four segments outstanding is more
195	    complicated than it first appears.  In section 2.1 we describe a
196	    "byte-based" variant of Early Retransmit that attempts to roughly
197	    map the number of outstanding bytes to a number of outstanding
198	    segments that is then used when deciding whether to trigger Early
199	    Retransmit.  In section 2.2 we describe a "segment-based" variant
200	    that represents a more precise algorithm for triggering Early
201	    Retransmit.  The precision comes at the cost of requiring additional
202	    state to be kept by the TCP sender.  In both cases we describe
203	    SACK-based and non-SACK-based versions of the scheme (of course, the
204	    non-SACK version will not apply to SCTP).  This document explicitly
205	    does not prefer one variant over the other, but leaves the choice to
206	    the implementer.

208	2.1 Byte-based Early Retransmit

210	    A TCP or SCTP sender MAY use byte-based Early Retransmit.

212	    Upon the arrival of an ACK, a sender employing byte-based Early
213	    Retransmit MUST use the following two conditions to determine when
214	    an Early Retransmit is sent:

216	    (2.a) The amount of outstanding data (ownd)---data sent but not yet
217	          acknowledged---is less than 4*SMSS bytes.

219	          Note that in the byte-based variant of Early Retransmit
220	          'ownd' is equivalent to 'FlightSize' defined in [RFC5681].  We
221	          use different notation because 'ownd' is not consistent with
222	          FlightSize through this document.

224	          Also note that in SCTP messages will have to be converted to
225	          bytes to make this variant of Early Retransmit work.

227	    (2.b) There is either no unsent data ready for transmission at the
228	          sender or the advertised receive window does not permit new
229	          segments to be transmitted.

231	    When the above two conditions hold and a TCP connection does not
232	    support SACK the duplicate ACK threshold used to trigger a
233	    retransmission MUST be reduced to:

235	                  ER_thresh = ceiling (ownd/SMSS) - 1                 (1)

237	    duplicate ACKs, where ownd is in terms of bytes.  We call this
238	    reduced ACK threshold enabling "Early Retransmission".

240	    When conditions (2.a) and (2.b) hold and a TCP connection does
241	    support SACK or SCTP is in use, Early Retransmit MUST be used only
242	    when "ownd - SMSS" bytes have been SACKed.

244	    If either (or both) condition (2.a) or (2.b) does not hold, the
245	    transport MUST NOT use Early Retransmit, but rather prefer the
246	    standard mechanisms, including Fast Retransmit and Limited Transmit.

248	    As noted above, the drawback of this byte-based variant is precision
249	    [HB08].  We illustrate this with two examples:

251	      + Consider a non-SACK TCP sender that uses an SMSS of 1460 bytes
252	        and transmits three segments each with 400 bytes of payload.
253	        This is a case where Early Retransmit could aid loss recovery if
254	        one segment is lost.  However, in this case ER_thresh will
255	        become zero, per equation (1), because the number of outstanding
256	        bytes is a poor estimate of the number of outstanding segments.
257	        A similar problem occurs for senders that employ SACK as the
258	        expression "ownd - SMSS" will become negative.

260	      + Next, consider a non-SACK TCP sender that uses an SMSS of 1460
261	        bytes and transmits 10 segments each with 400 bytes of payload.
262	        In this case ER_thresh will be two, per equation (1).  Thus,
263	        even though there are enough segments outstanding to trigger
264	        Fast Retransmit with the standard duplicate ACK threshold Early
265	        Retransmit will be triggered.  This could cause or exacerbate
266	        performance problems caused by segment reordering in the network.

268	2.2 Segment-based Early Retransmit
269	    A TCP or SCTP sender MAY use segment-based Early Retransmit.

271	    Upon the arrival of an ACK, a sender employing segment-based Early
272	    Retransmit MUST use the following two conditions to determine when
273	    an Early Retransmit is sent:

275	    (3.a) The number of outstanding segments (oseg)---segments sent but
276	          not yet acknowledged---is less than four.

278	    (3.b) There is either no unsent data ready for transmission at the
279	          sender or the advertised receive window does not permit new
280	          segments to be transmitted.

282	    When the above two conditions hold and a TCP connection does not
283	    support SACK the duplicate ACK threshold used to trigger a
284	    retransmission MUST be reduced to:

286	                    ER_thresh = oseg - 1                              (2)

288	    duplicate ACKs, where oseg represents the number of outstanding
289	    segments.  (We discuss tracking the number of outstanding segments
290	    below.)  We call this reduced ACK threshold enabling "Early
291	    Retransmission".

293	    When conditions (3.a) and (3.b) hold and a TCP connection does
294	    support SACK or SCTP is in use, Early Retransmit MUST be used only
295	    when "oseg - 1" segments have been SACKed.  A segment is considered
296	    to be SACKed when all its data bytes (TCP) or data chunks (SCTP)
297	    have been indicated as arrived by the receiver.

299	    If either (or both) conditions (3.a) or (3.b) does not hold, the
300	    transport MUST NOT use Early Retransmit, but rather prefer the
301	    standard mechanisms, including Fast Retransmit and Limited Transmit.

303	    This version of Early Retransmit solves the precision issues
304	    discussed in the previous section.  As noted previously, the cost is
305	    that the implementation will have to track segment boundaries to
306	    form an understanding as to how many actual segments have been
307	    transmitted, but not acknowledged.  This can be done by the sender
308	    tracking the boundaries of the three segments on the right side of
309	    the current window (which involves tracking four sequence numbers in
310	    TCP).  This could be done by keeping a circular list of the segment
311	    boundaries, for instance.  Cumulative ACKs that do not fall within
312	    this region indicate that at least four segments are outstanding and
313	    therefore Early Retransmit MUST NOT be used.  When the outstanding
314	    window becomes small enough that Early Retransmit can be invoked, a
315	    full understanding of the number of outstanding segments will be
316	    available from the four sequence numbers retained.  (Note: the
317	    implicit sequence number consumed by the TCP FIN can also included
318	    in the tracking of segment boundaries.)

320	3   Discussion

322	    In this section we discuss a number of issues surrounding the Early
323	    Retransmit algorithm.

325	3.1 SACK vs. non-SACK

327	    The SACK variant of the Early Retransmit algorithm is preferred to
328	    the non-SACK variant in TCP due to its robustness in the face of ACK
329	    loss (since SACKs are sent redundantly) and due to interactions with
330	    the delayed ACK timer (SCTP does not have a non-SACK mode and
331	    therefore naturally supports SACK-based Early Retransmit).  Consider
332	    a flight of three segments, S1...S3, with S2 being dropped by the
333	    network.  When S1 arrives it is in-order and so the receiver may or
334	    may not delay the ACK, leading to two scenarios:

336	    (A) The ACK for S1 is delayed: In this case the arrival of S3 will
337	        trigger an ACK to be transmitted covering segment S1 (which was
338	        previously unacknowledged).  In this case Early Retransmit
339	        without SACK will not prevent an RTO because no duplicate ACKs
340	        will arrive.  However, with SACK the ACK for S1 will also
341	        include SACK information indicating that S3 has arrived at the
342	        receiver.  The sender can then invoke Early Retransmit on this
343	        ACK because only one segment remains outstanding.

345	    (B) The ACK for S1 is not delayed: In this case the arrival of S1
346	        triggers an ACK of previously unacknowledged data.  The arrival
347	        of S3 triggers a duplicate ACK (because it is out-of-order).
348	        Both ACKs will cover the same segment (S1).  Therefore,
349	        regardless of whether SACK is used Early Retransmit can be
350	        performed by the sender (assuming no ACK loss).

352	3.2 Segment Reordering

354	    Early Retransmit is less robust in the face of reordered segments
355	    than when using the standard Fast Retransmit threshold.  Research
356	    shows that a general reduction in the number of duplicate ACKs
357	    required to trigger Fast Retransmit to two (rather than three) leads
358	    to a reduction in the ratio of good to bad retransmits by a factor
359	    of three [Pax97].  However, this analysis did not include the
360	    additional conditioning on the event that the ownd was smaller than
361	    4 segments and that no new data was available for transmission.

363	    A number of studies have shown that network reordering is not a rare
364	    event across some network paths.  Various measurement studies have
365	    shown that reordering along most paths is negligible, but along
366	    certain paths can be quite prevalent [Pax97,BPS99,BS02,Pir05].
367	    Evaluating Early Retransmit in the face of real segment reordering is
368	    part of the experiment we hope to instigate with this document.

370	3.3 Worst Case

372	    Next, we note two "worst case" scenarios for Early Retransmit:

374	    (1) Persistent reordering of segments coupled with an application
375	        that does not constantly send data can result in large numbers
376	        of needless retransmissions when using Early Retransmit.  For
377	        instance, consider an application that sends data two segments
378	        at a time, followed by an idle period when no data is queued for
379	        delivery.  If the network consistently reorders the two
380	        segments, the sender will needlessly retransmit one out of every
381	        two unique segments transmitted when using the above algorithm
382	        (meaning that one-third of all segments sent are needless
383	        retransmissions).  However, this would only be a problem for
384	        long-lived connections from applications that transmit in
385	        spurts.

387	    (2) Similar to the above, consider the case of 2 segment transfers
388	        that always experience reordering.  Just as in (1) above, one
389	        out of every two unique data segments will be retransmitted
390	        needlessly, therefore one-third of the traffic will be spurious.

392	    Currently this document offers no suggestion on how to mitigate the
393	    above problems.  However, the worst cases are likely pathological
394	    and part of the experiments that this document hopes to trigger
395	    would involve better understanding of whether such theoretical worst
396	    case scenarios are prevalent in the network and in general to
397	    explore the tradeoff between spurious fast retransmits and the delay
398	    imposed by the RTO.  Appendix A does offer a survey of possible
399	    mitigations that call for curtailing the use of Early Retransmit
400	    when it is making poor retransmission decisions.

402	4   Related Work

404	    There are a number of similar proposals in the literature that
405	    attempt to mitigate the same problem Early Retransmit addresses.

407	    Deployment of Explicit Congestion Notification (ECN) [Flo94,RFC3168]
408	    may benefit connections with small congestion window sizes
409	    [RFC2884].  ECN provides a method for indicating congestion to the
410	    end-host without dropping segments.  While some segment drops may
411	    still occur, ECN may allow a transport to perform better with small
412	    congestion window sizes because the sender will be required to
413	    detect less segment loss [RFC2884].

415	    [Bal98] outlines another solution to the problem of having no new
416	    segments to transmit into the network when the first two duplicate
417	    ACKs arrive.  In response to these duplicate ACKs, a TCP sender
418	    transmits zero-byte segments to induce additional duplicate ACKs.
419	    This method preserves the robustness of the standard Fast Retransmit
420	    algorithm at the cost of injecting segments into the network that do
421	    not deliver any data, and therefore are potentially wasting network
422	    resources (at a time when there is a reasonable chance that the
423	    resources are scarce).

425	    [RFC4653] also defines an orthogonal method for altering the
426	    duplicate ACK threshold.  The mechanisms proposed in this document
427	    decrease the duplicate ACK threshold when a small amount of data is
428	    outstanding.  Meanwhile, the mechanisms in [RFC4653] increase the
429	    duplicate ACK threshold (over the standard of 3) when the congestion
430	    window is large in an effort to increase robustness to segment
431	    reordering.

433	5   Security Considerations

435	    The security considerations found in [RFC5681] apply to this
436	    document.  No additional security problems have been identified with
437	    Early Retransmit at this time.

439	6   IANA Considerations

441	    None

443	Acknowledgments

445	    We thank Sally Floyd for her feedback in discussions about Early
446	    Retransmit.  The notion of Early Transmit was originally sketched in
447	    an Internet-Draft co-authored by Sally Floyd and Hari Balakrishnan.
448	    Armando Caro, Joe Touch and Alexander Zimmermann and many members of
449	    the TSVWG and TCPM working groups provided good discussions that
450	    helped shape this document.  Our thanks to all!

452	Normative References

454	    [RFC793] Jon Postel.  Transmission Control Protocol.  Std 7, RFC
455	        793.  September 1981.

457	    [RFC2018] Matt Mathis, Jamshid Mahdavi, Sally Floyd, Allyn Romanow.
458	        TCP Selective Acknowledgement Options.  RFC 2018, October 1996.

460	    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
461	        Requirement Levels", BCP 14, RFC 2119, March 1997.

463	    [RFC2883] Sally Floyd, Jamshid Mahdavi, Matt Mathis, Matt Podolsky.
464	        An Extension to the Selective Acknowledgement (SACK) Option for
465	        TCP.  RFC 2883, July 2000.

467	    [RFC2988] Vern Paxson, Mark Allman. Computing TCP's Retransmission
468	        Timer.  RFC 2988, April 2000.

470	    [RFC3042] Mark Allman, Hari Balakrishnan, Sally Floyd.  Enhancing
471	        TCP's Loss Recovery Using Limited Transmit.  RFC 3042, January
472	        2001.

474	    [RFC4960] R. Stewart.  Stream Control Transmission Protocol.  RFC
475	        4960, September 2007.

477	    [RFC5681] Mark Allman, Vern Paxson, Ethan Blanton.  TCP Congestion
478	        Control.  RFC 5681, May 2009.

480	Informative References

482	    [AA02] Urtzi Ayesta, Konstantin Avrachenkov, "The Effect of the
483	        Initial Window Size and Limited Transmit Algorithm on the
484	        Transient Behavior of TCP Transfers", In Proc. of the 15th ITC
485	        Internet Specialist Seminar, Wurzburg, July 2002.

487	    [All00] Mark Allman.  A Web Server's View of the Transport Layer.
488	        ACM Computer Communications Review, October 2000.

490	    [Bal98] Hari Balakrishnan.  Challenges to Reliable Data Transport
491	        over Heterogeneous Wireless Networks.  Ph.D. Thesis, University
492	        of California at Berkeley, August 1998.

494	    [BPS+98] Hari Balakrishnan, Venkata Padmanabhan, Srinivasan Seshan,
495	        Mark Stemm, and Randy Katz.  TCP Behavior of a Busy Web Server:
496	        Analysis and Improvements.  Proc. IEEE INFOCOM Conf., San
497	        Francisco, CA, March 1998.

499	    [BPS99] Jon Bennett, Craig Partridge, Nicholas Shectman.  Packet
500	        Reordering is Not Pathological Network Behavior.  IEEE/ACM
501	        Transactions on Networking, December 1999.

503	    [BS02] John Bellardo, Stefan Savage.  Measuring Packet Reordering,
504	        ACM/USENIX Internet Measurement Workshop, November 2002.

506	    [FF96] Kevin Fall, Sally Floyd.  Simulation-based Comparisons of
507	        Tahoe, Reno, and SACK TCP.  ACM Computer Communication Review,
508	        July 1996.

510	    [Flo94] Sally Floyd.  TCP and Explicit Congestion Notification.  ACM
511	        Computer Communication Review, October 1994.

513	    [HB08] Per Hurtig, Anna Brunstrom. Enhancing SCTP Loss Recovery: An
514	        Experimental Evaluation of Early Retransmit.  Elsevier Computer
515	        Communications, Vol. 31(16), October 2008, pp. 3778-3788.

517	    [Jac88] Van Jacobson.  Congestion Avoidance and Control.  ACM
518	        SIGCOMM 1988.

520	    [LK98] Dong Lin, H.T. Kung.  TCP Fast Recovery Strategies: Analysis
521	        and Improvements.  Proceedings of InfoCom, San Francisco, CA,
522	        March 1998.

524	    [Mor97] Robert Morris.  TCP Behavior with Many Flows.  Proceedings
525	        of the Fifth IEEE International Conference on Network Protocols.
526	        October 1997.

528	    [Pax97] Vern Paxson.  End-to-End Internet Packet Dynamics.  ACM
529	        SIGCOMM, September 1997.

531	    [Pir05] N. M. Piratla, "A Theoretical Foundation, Metrics and
532	        Modeling of Packet Reordering and Methodology of Delay Modeling
533	        using Inter-packet Gaps," Ph.D. Dissertation, Department of
534	        Electrical and Computer Engineering, Colorado State University,
535	        Fort Collins, CO, Fall 2005.

537	    [RFC2884] Jamal Hadi Salim and Uvaiz Ahmed. Performance Evaluation
538	        of Explicit Congestion Notification (ECN) in IP Networks.  RFC
539	        2884, July 2000.

541	    [RFC3150] Spencer Dawkins, Gabriel Montenegro, Markku Kojo, Vincent
542	        Magret.  End-to-end Performance Implications of Slow Links.  RFC
543	        3150, July 2001.

545	    [RFC3168] K. K. Ramakrishnan, Sally Floyd, David Black.  The
546	        Addition of Explicit Congestion Notification (ECN) to IP.  RFC
547	        3168, September 2001.

549	    [RFC3517] Ethan Blanton, Mark Allman, Kevin Fall, Lili Wang.  A
550	        Conservative Selective Acknowledgment (SACK)-based Loss Recovery
551	        Algorithm for TCP.  RFC 3517, April 2003.

553	    [RFC3522] Reiner Ludwig, Michael Meyer.  The Eifel Detection
554	        Algorithm for TCP.  RFC 3522, April 2003.

556	    [RFC3782] Sally Floyd, Tom Henderson, Andrei Gurtov.  The NewReno
557	        Modification to TCP's Fast Recovery Algorithm.  RFC 3782, April
558	        2004.

560	    [RFC4653] Sumitha Bhandarkar, A. L. Narasimha Reddy, Mark Allman,
561	        Ethan Blanton.  Improving the Robustness of TCP to
562	        Non-Congestion Events, August 2006.  RFC 4653.

564	Author's Addresses:

566	    Mark Allman
567	    International Computer Science Institute
568	    1947 Center Street, Suite 600
569	    Berkeley, CA 94704-1198
570	    Phone: 440-235-1792
571	    mallman@icir.org
572	    http://www.icir.org/mallman/

574	    Konstantin Avrachenkov
575	    INRIA
576	    2004 route des Lucioles, B.P.93
577	    06902, Sophia Antipolis
578	    France
579	    Phone: 00 33 492 38 7751
580	    k.avrachenkov@sophia.inria.fr
581	    http://www.inria.fr/mistral/personnel/K.Avrachenkov/moi.html

583	    Urtzi Ayesta
584	    LAAS-CNRS
585	    7 Avenue Colonel Roche
586	    31077 Toulouse
587	    France
588	    urtzi@laas.fr
589	    http://www.laas.fr/~urtzi

591	    Josh Blanton
592	    Ohio University
593	    301 Stocker Center
594	    Athens, OH  45701
595	    jblanton@irg.cs.ohiou.edu

597	    Per Hurtig
598	    Karlstad University
599	    Department of Computer Science
600	    Universitetsgatan 2 651 88
601	    Karlstad Sweden
602	    per.hurtig@kau.se

604	Appendix A: Research Issues in Adjusting the Duplicate ACK Threshold

606	    Decreasing the number of duplicate ACKs required to trigger Fast
607	    Retransmit, as suggested in section 2, has the drawback of making
608	    Fast Retransmit less robust in the face of minor network reordering.
609	    Two egregious examples of problems caused by reordering are given in
610	    section 3.  This appendix outlines several schemes that have been
611	    suggested to mitigate the problems caused by Early Retransmit in the
612	    face of segment reordering.  These methods need further research
613	    before they are suggested for general use (and, current consensus is
614	    that the cases that make Early Retransmit unnecessarily retransmit a
615	    large amount of data are pathological and therefore these
616	    mitigations are not generally required).

618	    MITIGATION A.1: Allow a connection to use Early Retransmit as long
619	        as the algorithm is not injecting "too much" spurious data into
620	        the network.  For instance, using the information provided by
621	        TCP's DSACK option [RFC2883] or SCTP's Duplicate-TSN
622	        notification, a sender can determine when segments sent via
623	        Early Retransmit are needless.  Likewise, using Eifel [RFC3522]
624	        the sender can detect spurious Early Retransmits.  Once spurious
625	        Early Retransmits are detected the sender can either eliminate
626	        the use of Early Retransmit or limit the use of the algorithm to
627	        ensure that an acceptably small fraction of the connection's
628	        transmissions are not spurious.  For example, a connection could
629	        stop using Early Retransmit after the first spurious retransmit
630	        is detected.

632	    MITIGATION A.2: If a sender cannot reliably determine if an Early
633	        Retransmitted segment is spurious or not the sender could simply
634	        limit Early Retransmits either to some fixed number per
635	        connection (e.g., Early Retransmit is allowed only once per
636	        connection) or to some small percentage of the total traffic
637	        being transmitted.

639	    MITIGATION A.3: Allow a connection to trigger Early Retransmit using
640	        the criteria given in section 2, in addition to a "small"
641	        timeout [Pax97].  For instance, a sender may have to wait for 2
642	        duplicate ACKs and then T msec before Early Retransmit is
643	        invoked.  The added time gives reordered acknowledgments time to
644	        arrive at the sender and avoid a needless retransmit.  Designing
645	        a method for choosing an appropriate timeout is part of the
646	        research that would need to be involved in this scheme.