idnits 2.17.1 

draft-ietf-tsvwg-tcp-frto-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-19) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The abstract seems to contain references ([RFC2119]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 51: '...   The keywords MUST, MUST NOT, REQUIR...'
     RFC 2119 keyword, line 52: '...   SHOULD NOT, RECOMMENDED, MAY, and O...'
     RFC 2119 keyword, line 171: '...   A TCP sender MAY implement the basi...'
     RFC 2119 keyword, line 172: '...thm, the following steps MUST be taken...'
     RFC 2119 keyword, line 175: '..., the TCP sender SHOULD retransmit the...'
     (22 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC2026' is mentioned on line 16, but not defined

  == Missing Reference: 'RFC2119' is mentioned on line 54, but not defined

  == Missing Reference: 'RTO' is mentioned on line 779, but not defined

  == Missing Reference: 'SACK 8' is mentioned on line 787, but not defined

  == Unused Reference: 'GL03' is defined on line 562, but no explicit
     reference was found in the text

  == Unused Reference: 'Sar03' is defined on line 581, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC
     5681)

  ** Obsolete normative reference: RFC 3517 (ref. 'BAFW03') (Obsoleted by RFC
     6675)

  ** Obsolete normative reference: RFC 2988 (ref. 'PA00') (Obsoleted by RFC
     6298)

  ** Obsolete normative reference: RFC  793 (ref. 'Pos81') (Obsoleted by RFC
     9293)

  ** Obsolete normative reference: RFC 2960 (ref. 'Ste00') (Obsoleted by RFC
     4960)

  -- Obsolete informational reference (is this intentional?): RFC 1323 (ref.
     'BBJ92') (Obsoleted by RFC 7323)

  -- Obsolete informational reference (is this intentional?): RFC 2582 (ref.
     'FH99') (Obsoleted by RFC 3782)


     Summary: 10 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                             P. Sarolahti
3	INTERNET DRAFT                                     Nokia Research Center
4	File: draft-ietf-tsvwg-tcp-frto-01.txt                           M. Kojo
5	                                                  University of Helsinki
6	                                                          February, 2004
7	                                                   Expires: August, 2004

9	                   F-RTO: An Algorithm for Detecting
10	           Spurious Retransmission Timeouts with TCP and SCTP

12	Status of this Memo

14	   This document is an Internet-Draft and is in full conformance with
15	   all provisions of Section 10 of [RFC2026].

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	Abstract

35	   Spurious retransmission timeouts (RTOs) cause suboptimal TCP
36	   performance, because they often result in unnecessary retransmission
37	   of the last window of data. This document describes the "Forward RTO
38	   Recovery" (F-RTO) algorithm for detecting spurious TCP RTOs. F-RTO is
39	   a TCP sender only algorithm that does not require any TCP options to
40	   operate. After retransmitting the first unacknowledged segment
41	   triggered by an RTO, the F-RTO algorithm at a TCP sender monitors the
42	   incoming acknowledgements to determine whether the timeout was
43	   spurious and to decide whether to send new segments or retransmit
44	   unacknowledged segments. The algorithm effectively helps to avoid
45	   additional unnecessary retransmissions and thereby improves TCP
46	   performance in case of a spurious timeout. The F-RTO algorithm can
47	   also be applied with the SCTP protocol.

49	Terminology

51	   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
52	   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
53	   document, are to be interpreted as described in [RFC2119].

55	1.  Introduction

57	   The TCP protocol [Pos81] has two methods for triggering
58	   retransmissions.  Primarily, the TCP sender relies on incoming
59	   duplicate ACKs, which indicate that the receiver is missing some of
60	   the data. After a required number of successive duplicate ACKs have
61	   arrived at the sender, it retransmits the first unacknowledged
62	   segment [APS99]. Secondarily, the TCP sender maintains a
63	   retransmission timer which triggers retransmission of segments, if
64	   they have not been acknowledged within the retransmission timer
65	   expiration period. When the retransmission timer expires, the TCP
66	   sender enters the RTO recovery where congestion window is initialized
67	   to one segment and unacknowledged segments are retransmitted using
68	   the slow-start algorithm. The retransmission timer is adjusted
69	   dynamically based on the measured round-trip times [PA00].

71	   It has been pointed out that the retransmission timer can expire
72	   spuriously and trigger unnecessary retransmissions when no segments
73	   have been lost [LK00, GL02, LM03]. After a spurious retransmission
74	   timeout the late acknowledgements of original segments arrive at the
75	   sender, usually triggering unnecessary retransmissions of whole
76	   window of segments during the RTO recovery.  Furthermore, after a
77	   spurious retransmission timeout a conventional TCP sender increases
78	   the congestion window on each late acknowledgement in slow start,
79	   injecting a large number of data segments to the network within one
80	   round-trip time.

82	   There are a number of potential reasons for spurious retransmission
83	   timeouts. First, some mobile networking technologies involve sudden
84	   delay peaks on transmission because of actions taken during a hand-
85	   off. Second, arrival of competing traffic, possibly with higher
86	   priority, on a low-bandwidth link or some other change in available
87	   bandwidth involves a sudden increase of round-trip time which may
88	   trigger a spurious retransmission timeout. A persistently reliable
89	   link layer can also cause a sudden delay when several data frames are
90	   lost for some reason. This document does not distinguish the
91	   different causes of such a delay, but discusses the spurious
92	   retransmission timeouts caused by a delay in general.

94	   This document describes an alternative RTO recovery algorithm called
95	   "Forward RTO-Recovery" (F-RTO) to be used for detecting spurious RTOs
96	   and thus avoiding unnecessary retransmissions following the RTO. When
97	   the RTO is not spurious, the F-RTO algorithm reverts back to the
98	   conventional RTO recovery algorithm and should have similar behavior
99	   and performance. F-RTO does not require any TCP options in its
100	   operation, and it can be implemented by modifying only the TCP
101	   sender. This is different from alternative algorithms (Eifel [LK00],
102	   [LM03] and DSACK-based algorithms [BA02]) that have been suggested
103	   for detecting unnecessary retransmissions. The Eifel algorithm uses
104	   TCP timestamps [BBJ92] for detecting a spurious timeout upon arrival
105	   of the first acknowledgement after the retransmission. The DSACK-
106	   based algorithms require that the TCP Selective Acknowledgment Option
107	   [MMFR96] with DSACK extension [FMMP00] is in use. With DSACK, the TCP
108	   receiver can report if it has received a duplicate segment, making it
109	   possible for the sender to detect afterwards whether it has
110	   retransmitted segments unnecessarily. In addition, the F-RTO
111	   algorithm only attempts to detect and avoid unnecessary
112	   retransmissions after an RTO. Eifel and DSACK can also be used in
113	   detecting unnecessary retransmissions in other events, for example
114	   due to packet reordering.

116	   When an RTO occurs, the F-RTO sender retransmits the first
117	   unacknowledged segment as usual. Deviating from the normal operation
118	   after a timeout, it then tries to transmit new, previously unsent
119	   data, for the first acknowledgement that arrives after the timeout
120	   given that the acknowledgement advances the window. If the second
121	   acknowledgement that arrives after the timeout also advances the
122	   window, i.e., acknowledges data that was not retransmitted, the F-RTO
123	   sender declares the RTO spurious and exit the RTO recovery. However,
124	   if either of the next two acknowledgements is a duplicate ACK, there
125	   was no sufficient evidence of spurious RTO; therefore the F-RTO
126	   sender retransmits the unacknowledged segments in slow start
127	   similarly to the traditional algorithm. With a SACK-enhanced version
128	   of the F-RTO algorithm, spurious RTOs may be detected even if
129	   duplicate ACKs arrive after an RTO.

131	   The F-RTO algorithm can also be applied with the SCTP protocol
132	   [Ste00], because SCTP has similar acknowledgement and packet
133	   retransmission concepts as TCP. When a SCTP retransmission timeout
134	   occurs, the SCTP sender is required to retransmit the outstanding
135	   data similarly to TCP, thus being prone to unnecessary
136	   retransmissions and congestion control adjustments, if delay spikes
137	   occur in the network. The SACK-enhanced version of F-RTO should be
138	   directly applicable to SCTP, which has selective acknowledgements as
139	   a built-in feature. For simplicity, this document mostly refers to
140	   TCP, but the algorithms and other discussion should be applicable
141	   also to SCTP.

143	   This document is organized as follows. Section 2 describes the basic
144	   F-RTO algorithm. Section 3 outlines an optional enhancement to the F-
145	   RTO algorithm that takes leverage on the TCP SACK option.  Section 4
146	   discusses the possible actions to be taken after detecting a spurious
147	   RTO, and Section 5 discusses the security considerations.

149	2.  F-RTO Algorithm

151	   An RTO is spurious if there are segments outstanding in the network
152	   that would have prevented the RTO, had their acknowledgements arrived
153	   earlier at the sender. F-RTO affects the TCP sender behavior only
154	   after a retransmission timeout, otherwise the TCP behavior remains
155	   unmodified.  When RTO expires the F-RTO algorithm monitors incoming
156	   acknowledgements and declares an RTO spurious, if the TCP sender gets
157	   an acknowledgement for a segment that was not retransmitted due to
158	   RTO. The actions taken in response to spurious RTO are not specified
159	   in this document, but we discuss the different alternatives for
160	   congestion control in Section 4.

162	   Following the practice used with the Eifel Detection algorithm
163	   [LM03], we use the "SpuriousRecovery" variable to indicate whether
164	   the retransmission is declared spurious by the sender. This variable
165	   can be used as an input for a related response algorithm. With F-RTO,
166	   the outcome of SpuriousRecovery can either be SPUR_TO, indicating a
167	   spurious retransmission timeout; or FALSE, when the RTO is not
168	   declared spurious, and the TCP sender should follow the conventional
169	   RTO recovery algorithm.

171	   A TCP sender MAY implement the basic F-RTO algorithm, and if it
172	   chooses to apply the algorithm, the following steps MUST be taken
173	   after the retransmission timer expires.

175	   1) When RTO expires, the TCP sender SHOULD retransmit the first
176	      unacknowledged segment and set SpuriousRecovery to FALSE. Store
177	      the highest sequence number transmitted so far in variable
178	      "send_high".

180	   2) When the first acknowledgement after the RTO arrives at the
181	      sender, the sender chooses the following actions depending on
182	      whether the ACK advances the window or whether it is a duplicate
183	      ACK.

185	      a) If the acknowledgement is a duplicate ACK OR it is
186	         acknowledging a sequence number equal to (or above) the value
187	         of send_high OR it does not acknowledge all of the data that
188	         was retransmitted in step 1, the TCP sender MUST revert to the
189	         conventional RTO recovery and continue by retransmitting
190	         unacknowledged data in slow start. The TCP sender MUST NOT
191	         enter step 3 of this algorithm, and the SpuriousRecovery
192	         variable remains as FALSE.

194	      b) If the acknowledgement advances the window AND it is below the
195	         value of send_high, the TCP sender SHOULD transmit up to two
196	         new (previously unsent) segments and enter step 3 of this
197	         algorithm. If the TCP sender does not have enough unsent data,
198	         it SHOULD send only one segment. In addition, the TCP sender
199	         MAY override the Nagle algorithm and send immediately an
200	         undersized segment if needed. If the TCP sender does not have
201	         any new data to send, the TCP sender SHOULD transmit a segment
202	         from the retransmission queue. If TCP sender retransmits the
203	         first unacknowledged segment, it MUST NOT enter step 3 of this
204	         algorithm but continue with the conventional RTO recovery
205	         algorithm. In this case acknowledgement of the next segment
206	         would not unambiguously indicate that the original transmission
207	         arrived at the receiver.

209	   3) When the second acknowledgement after the RTO arrives at the
210	      sender, either declare the RTO spurious, or start retransmitting
211	      the unacknowledged segments.

213	      a) If the acknowledgement is a duplicate ACK, the TCP sender MUST
214	         set congestion window to no more than 3 * MSS, and continue
215	         with the slow start algorithm retransmitting unacknowledged
216	         segments. The sender leaves SpuriousRecovery set to FALSE.

218	      b) If the acknowledgement advances the window, i.e. it
219	         acknowledges data that was not retransmitted after the RTO, the
220	         TCP sender SHOULD declare the RTO spurious, set
221	         SpuriousRecovery to SPUR_TO and set the value of send_high
222	         variable to SND.UNA.

224	   The F-RTO sender takes cautious actions when it receives duplicate
225	   acknowledgements after an RTO. Since duplicate ACKs may indicate that
226	   segments have been lost, reliably detecting a spurious RTO is
227	   difficult in the lack of additional information. Therefore the safest
228	   alternative is to follow the conventional TCP recovery in those
229	   cases.

231	   If the first acknowledgement after RTO covers the send_high point at
232	   algorithm step (2a), there is not enough evidence that a non-
233	   retransmitted segment has arrived at the receiver after the RTO.

235	   This is a common case when a fast retransmission is lost and it has
236	   been retransmitted again after an RTO, while the rest of the
237	   unacknowledged segments have successfully been delivered to the TCP
238	   receiver before the RTO. Therefore the RTO cannot be declared
239	   spurious in this case.

241	   If the first acknowledgement after RTO does not acknowledge all of
242	   the data that was retransmitted in step 1, the TCP sender reverts to
243	   the conventional RTO recovery. Otherwise, a malicious receiver
244	   acknowledging partial segments could cause the sender to declare the
245	   RTO spurious in a case where data was lost.

247	   The TCP sender is allowed to send two new segments in algorithm
248	   branch (2b), because the conventional TCP sender would transmit two
249	   segments when the first new ACK arrives after the RTO. If sending new
250	   data is not possible in algorithm branch (2b), or the receiver window
251	   limits the transmission, the TCP sender has to send something in
252	   order to prevent the TCP transfer from stalling. If no segments were
253	   sent, the pipe between sender and receiver may run out of segments,
254	   and no further acknowledgements arrive. If transmitting previously
255	   unsent data is not possible, the following options are available for
256	   the sender.

258	   - Continue with the conventional RTO recovery algorithm and do not
259	     try to detect the spurious RTO. The disadvantage is that the sender
260	     may do unnecessary retransmissions due to possible spurious RTO. On
261	     the other hand, we believe that the benefits of detecting spurious
262	     RTO in an application limited or receiver limited situations are
263	     not very remarkable.

265	   - Use additional information if available, e.g. TCP timestamps with
266	     the Eifel Detection algorithm, for detecting a spurious RTO.
267	     However, Eifel detection may yield different results from F-RTO
268	     when ACK losses and a RTO occur within the same round-trip time
269	     [SKR03].

271	   - Retransmit data from the tail of the retransmission queue and
272	     continue with step 3 of the F-RTO algorithm. It is possible that
273	     the retransmission is unnecessarily made, hence this option is not
274	     encouraged, except for hosts that are known to operate in an
275	     environment that is highly likely to have spurious RTOs. On the
276	     other hand, with this method it is possible to avoid several
277	     unnecessary retransmissions due to spurious RTO by doing only one
278	     retransmission that may be unnecessary.

280	   - Send a zero-sized segment below SND.UNA similar to TCP Keep-Alive
281	     probe and continue with step 3 of the F-RTO algorithm. Since the
282	     receiver replies with a duplicate ACK, the sender is able to detect
283	     from the incoming acknowledgement whether the RTO was spurious.
284	     While this method does not send data unnecessarily, it delays the
285	     recovery by one round-trip time in cases where the RTO was not
286	     spurious, and therefore is not encouraged.

288	   - In receiver-limited cases, send one octet of new data regardless of
289	     the advertised window limit, and continue with step 3 of the F-RTO
290	     algorithm. It is possible that the receiver has free buffer space
291	     to receive the data by the time the segment has propagated through
292	     the network, in which case no harm is done. If the receiver is not
293	     capable of receiving the segment, it rejects the segment and sends
294	     a duplicate ACK.

296	   If the RTO is declared spurious, the TCP sender sets the value of
297	   send_high variable to SND.UNA in order to disable the NewReno
298	   "bugfix" [FH99]. The send_high variable was proposed for avoiding
299	   unnecessary multiple fast retransmits when RTO expires during fast
300	   recovery with NewReno TCP. As the sender has not retransmitted other
301	   segments but the one that triggered RTO, the problem addressed by the
302	   bugfix cannot occur. Therefore, if there are three duplicate ACKs
303	   arriving at the sender after the RTO, they are likely to indicate a
304	   packet loss, hence fast retransmit should be used to allow efficient
305	   recovery. If there are not enough duplicate ACKs arriving at the
306	   sender after a packet loss, the retransmission timer expires another
307	   time and the sender enters step 1 of this algorithm.

309	   When the RTO is declared spurious, the TCP sender cannot detect
310	   whether the unnecessary RTO retransmission was lost. In principle the
311	   loss of the RTO retransmission should be taken as a congestion
312	   signal, and thus there is a small possibility that the F-RTO sender
313	   violates the congestion control rules, if it chooses to fully revert
314	   congestion control parameters after detecting a spurious RTO. The
315	   Eifel detection algorithm has a similar property, while the DSACK
316	   option can be used to detect whether the retransmitted segment was
317	   successfully delivered to the receiver.

319	   The F-RTO algorithm has a side-effect on the TCP round-trip time
320	   measurement. Because the TCP sender can avoid most of the unnecessary
321	   retransmissions after detecting a spurious RTO, the sender is able to
322	   take round-trip time samples on the delayed segments. If the regular
323	   RTO recovery was used without TCP timestamps, this would not be
324	   possible due to retransmission ambiguity. As a result, the RTO
325	   estimator is likely to be more accurate and have larger values with
326	   F-RTO than with the regular TCP after a spurious RTO that was
327	   triggered due to delayed segments. We believe this is an advantage in
328	   the networks that are prone to delay spikes.

330	   It is possible that the F-RTO algorithm does not always avoid
331	   unnecessary retransmissions after a spurious RTO. If packet
332	   reordering or packet duplication occurs on the segment that triggered
333	   the spurious RTO, the F-RTO algorithm may not detect the spurious RTO
334	   due to incoming duplicate ACKs. Additionally, if a spurious RTO
335	   occurs during fast recovery, the F-RTO algorithm often cannot detect
336	   the spurious RTO.  However, we consider these cases relatively rare,
337	   and note that in cases where F-RTO fails to detect the spurious RTO,
338	   it performs similarly to the regular RTO recovery.

340	3.  A SACK-enhanced version of the F-RTO algorithm

342	   This section describes an alternative version of the F-RTO algorithm,
343	   that makes use of TCP Selective Acknowledgement Option [MMFR96].  By
344	   using the SACK option the TCP sender can detect spurious RTOs in most
345	   of the cases when packet reordering or packet duplication is present.
346	   The difference to the basic F-RTO algorithm is that the sender may
347	   declare RTO spurious even when duplicate ACKs follow the RTO, if the
348	   SACK blocks acknowledge new data that was not transmitted after RTO.
349	   The algorithm principle presented in this section is also applicable
350	   to be used with the SCTP protocol.

352	   Given that the TCP Selective Acknowledgement Option [MMFR96] is
353	   enabled for a TCP connection, TCP sender MAY implement the SACK-
354	   enhanced F-RTO algorithm. If the sender applies the SACK-enhanced F-
355	   RTO algorithm, it MUST follow the steps below.  This algorithm SHOULD
356	   NOT be applied, if the TCP sender is already in loss recovery when
357	   RTO occurs.  However, it should be possible to apply the principle of
358	   F-RTO within certain limitations also when RTO occurs during existing
359	   loss recovery. While this is a topic of further research, Appendix B
360	   briefly discusses the related issues.

362	   1) When RTO expires, the TCP sender SHOULD retransmit first
363	      unacknowledged segment and set SpuriousRecovery to FALSE. Variable
364	      "send_high" is set to indicate the highest segment transmitted so
365	      far.

367	   2) Wait until the acknowledgement for the segment retransmitted due
368	      to RTO arrives at the sender. If duplicate ACKs arrive, store the
369	      incoming SACK information but stay in step 2. If RTO expires,
370	      restart the algorithm.

372	      a) if the cumulative ACK acknowledges all segments up to
373	         send_high, the TCP sender SHOULD revert to the conventional RTO
374	         recovery and it MUST set congestion window to no more than 2 *
375	         MSS. The sender does not enter step 3 of this algorithm.

377	      b) otherwise, the TCP sender SHOULD transmit up to two new
378	         (previously unsent) segments, within the limitations of the
379	         congestion window. If the TCP sender is not able to transmit
380	         any previously unsent data due to receiver window limitation or
381	         because it does not have any new data to send, it MAY follow
382	         one of the options presented in Section 2. However, if the TCP
383	         sender chooses to retransmit a data segment here, SACK of that
384	         segment MUST NOT be used for declaring a spurious RTO in step
385	         (3b).

387	   3) When the next acknowledgement arrives at the sender.

389	      a) if the ACK acknowledges data above send_high, either in SACK
390	         blocks or as a cumulative ACK, the sender MUST set congestion
391	         window to no more than 3 * MSS and proceed with conventional
392	         recovery, retransmitting unacknowledged segments. The sender
393	         SHOULD take this branch also when the acknowledgement is a
394	         duplicate ACK and it does not contain any new SACK blocks for
395	         previously unacknowledged data below send_high.

397	      b) if the ACK does not acknowledge data above send_high AND it
398	         acknowledges some previously unacknowledged data below
399	         send_high, the TCP sender SHOULD declare the RTO spurious and
400	         set SpuriousRecovery to SPUR_TO.

402	   If there are unacknowledged holes between the received SACK blocks,
403	   those segments SHOULD be retransmitted similarly to the conventional
404	   SACK recovery algorithm [BAFW03].  If the algorithm exits with
405	   SpuriousRecovery set to SPUR_TO, send_high SHOULD be set to SND.UNA,
406	   thus allowing fast recovery on incoming duplicate acknowledgements.

408	4.  Taking Actions after Detecting Spurious RTO

410	   Upon retransmission timeout, a conventional TCP sender assumes that
411	   outstanding segments are lost and starts retransmitting the
412	   unacknowledged segments. When the RTO is detected to be spurious, the
413	   TCP sender should not continue retransmitting based on the RTO. For
414	   example, if the sender was in congestion avoidance phase transmitting
415	   new previously unsent segments, it should continue transmitting
416	   previously unsent segments after detecting spurious RTO. In addition,
417	   it is suggested that the RTO estimation is reinitialized and the RTO
418	   timer is adjusted to a more conservative value in order to avoid
419	   subsequent spurious RTOs [LG03].

421	   Different approaches have been discussed for adjusting the congestion
422	   control state after a spurious RTO in various research papers [SKR03,
423	   GL03, Sar03] and Internet-Drafts [SL03, LG03]. The different response
424	   suggestions vary in whether the spurious retransmission timeout
425	   should be taken as a congestion signal, thus causing the congestion
426	   window or slow start threshold to be reduced at the sender, or
427	   whether the congestion control state should be fully reverted to the
428	   state valid prior to the retransmission timeout.

430	   This document does not give recommendation on selecting the response
431	   alternative, but considers the response to spurious RTO as a subject
432	   of further research.

434	5.  SCTP Considerations

436	   The basic F-RTO or the SACK-enhanced F-RTO algorithm can be applied
437	   with the SCTP protocol. However, SCTP contains features that are not
438	   present with TCP that need to be discussed when applying the F-RTO
439	   algorithm.

441	   SCTP association can be multi-homed. The current retransmission pol-
442	   icy states that retransmissions should go to alternative addresses.
443	   If the retransmission was due to spurious RTO caused by a delay
444	   spike, it is possible that the acknowledgement for the retransmission
445	   arrives back at the sender before the acknowledgements of the origi-
446	   nal transmissions arrive. If this happens, a possible loss of the
447	   original transmission of the data chunk that was retransmitted due to
448	   the spurious RTO may remain undetected when applying the F-RTO algo-
449	   rithm.  Because the RTO was caused by the delay, and it was spurious
450	   in that respect, a suitable response is to continue by sending new
451	   data. However, if the original transmission was lost, fully reverting
452	   the congestion control parameters is too aggressive. Therefore, tak-
453	   ing conservative actions on congestion control is recommended, if the
454	   SCTP association is multi-homed and retransmissions go to alternative
455	   address. The information in duplicate TSNs can be then used for
456	   reverting congestion control, if desired [BA02].

458	   Note that the forward transmissions made in F-RTO algorithm step (2b)
459	   should be destined to the primary address, since they are not
460	   retransmissions.

462	   When making a retransmission, a SCTP sender can bundle a number of
463	   unacknowledged data chunks and include them in the same packet. This
464	   needs to be considered when implementing F-RTO for SCTP. The basic
465	   principle of F-RTO still holds: in order to declare the RTO spurious,
466	   the sender must get an acknowledgement for a data chunk that was not
467	   retransmitted after the RTO. In other words, acknowledgements of data
468	   chunks that were bundled in RTO retransmission must not be used for
469	   declaring the RTO spurious.

471	6.  Security Considerations

473	   The main security threat regarding F-RTO is the possibility of a
474	   receiver misleading the sender to set too large a congestion window
475	   after an RTO.  There are two possible ways a malicious receiver could
476	   trigger a wrong output from the F-RTO algorithm. First, the receiver
477	   can acknowledge data that it has not received. Second, it can delay
478	   acknowledgement of a segment it has received earlier, and acknowledge
479	   the segment after the TCP sender has been deluded to enter algorithm
480	   step 3.

482	   If the receiver acknowledges a segment it has not really received,
483	   the sender can be lead to declare RTO spurious in F-RTO algorithm
484	   step 3. However, since this causes the sender to have incorrect
485	   state, it cannot retransmit the segment that has never reached the
486	   receiver. Therefore, this attack is unlikely to be useful for the
487	   receiver to maliciously gain a larger congestion window.

489	   A common case of an RTO is that a fast retransmission of a segment is
490	   lost. If all other segments have been received, the RTO retransmis-
491	   sion causes the whole window to be acknowledged at once. This case is
492	   recognized in F-RTO algorithm branch (2a). However, if the receiver
493	   only acknowledges one segment after receiving the RTO retransmission,
494	   and then the rest of the segments, it could cause the RTO to be
495	   declared spurious when it is not. Therefore, it is suggested that
496	   when an RTO expires during fast recovery phase, the sender would not
497	   fully revert the congestion window even if the RTO was declared spu-
498	   rious, but reduce the congestion window to 1. However, the sender can
499	   take actions to avoid unnecessary retransmissions normally. If a TCP
500	   sender implements a burst avoidance algorithm that limits the sending
501	   rate to be no higher than in slow start, this precaution is not
502	   needed, and the sender may apply F-RTO normally.

504	   If there are more than one segments missing at the time when an RTO
505	   occurs, the receiver does not benefit from misleading the sender to
506	   declare a spurious RTO, because the sender would then have to go
507	   through another recovery period to retransmit the missing segments,
508	   usually after an RTO.

510	Acknowledgements

512	   We are grateful to Reiner Ludwig, Andrei Gurtov, Josh Blanton, Mark
513	   Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias
514	   Rodriguez, Sourabh Ladha, and Martin Duke for the discussion and
515	   feedback contributed to this text.

517	Normative References

519	   [APS99]   M. Allman, V. Paxson, and W. Stevens. TCP Congestion Con-
520	             trol. RFC 2581, April 1999.

522	   [BAFW03]  E. Blanton, M. Allman, K. Fall, and L. Wang. A Conservative
523	             Selective Acknowledgment (SACK)-based Loss Recovery Algo-
524	             rithm for TCP. RFC 3517. April 2003.

526	   [MMFR96]  M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow. TCP Selec-
527	             tive Acknowledgement Options. RFC 2018, October 1996.

529	   [PA00]    V. Paxson and M. Allman. Computing TCP's Retransmission
530	             Timer. RFC 2988, November 2000.

532	   [Pos81]   J. Postel. Transmission Control Protocol. RFC 793, Septem-
533	             ber 1981.

535	   [Ste00]   R. Stewart, et. al. Stream Control Transmission Protocol.
536	             RFC 2960, October 2000.

538	Informative References

540	   [ABF01]   M. Allman, H. Balakrishnan, and S. Floyd. Enhancing TCP's
541	             Loss Recovery Using Limited Transmit. RFC 3042, January
542	             2001.

544	   [BA02]    E. Blanton and M. Allman. On Making TCP more Robust to
545	             Packet Reordering. ACM SIGCOMM Computer Communication
546	             Review, 32(1), January 2002.

548	   [BBJ92]   D. Borman, R. Braden, and V. Jacobson. TCP Extensions for
549	             High Performance. RFC 1323, May 1992.

551	   [FH99]    S. Floyd and T. Henderson. The NewReno Modification to
552	             TCP's Fast Recovery Algorithm. RFC 2582, April 1999.

554	   [FMMP00]  S. Floyd, J. Mahdavi, M. Mathis, and M. Podolsky. An Exten-
555	             sion to the Selective Acknowledgement (SACK) Option to TCP.
556	             RFC 2883, July 2000.

558	   [GL02]    A. Gurtov and R. Ludwig. Evaluating the Eifel Algorithm for
559	             TCP in a GPRS Network. In Proc. of European Wireless, Flo-
560	             rence, Italy, February 2002

562	   [GL03]    A. Gurtov and R. Ludwig, Responding to Spurious Timeouts in
563	             TCP, In Proceedings of IEEE INFOCOM 03, March 2003.

565	   [LG03]    R. Ludwig and A. Gurtov. The Eifel Response Algorithm for
566	             TCP. Internet draft "draft-ietf-tsvwg-tcp-eifel-
567	             response-04.txt".  October 2003. Work in progress.

569	   [LK00]    R. Ludwig and R.H. Katz. The Eifel Algorithm: Making TCP
570	             Robust Against Spurious Retransmissions. ACM SIGCOMM Com-
571	             puter Communication Review, 30(1), January 2000.

573	   [LM03]    R. Ludwig and M. Meyer. The Eifel Detection Algorithm for
574	             TCP. RFC 3522, April 2003.

576	   [SKR03]   P. Sarolahti, M. Kojo, and K. Raatikainen. F-RTO: An
577	             Enhanced Recovery Algorithm for TCP Retransmission Time-
578	             outs. ACM SIGCOMM Computer Communication Review, 33(2),
579	             April 2003.

581	   [Sar03]   P. Sarolahti. Congestion Control on Spurious TCP Retrans-
582	             mission Timeouts. In Proceedings of IEEE Globecom 2003.
583	             December 2003.

585	   [SL03]    Y. Swami and K. Le. DCLOR: De-correlated Loss Recovery
586	             using SACK option for spurious timeouts. Internet draft
587	             "draft-swami-tsvwg-tcp-dclor-02.txt". September 2003. Work
588	             in progress.

590	Appendix A: Scenarios

592	   This section discusses different scenarios where RTOs occur and how
593	   the basic F-RTO algorithm performs in those scenarios. The
594	   interesting scenarios are a sudden delay triggering RTO, loss of a
595	   retransmitted packet during fast recovery, link outage causing the
596	   loss of several packets, and packet reordering. A performance
597	   evaluation with a more thorough analysis on a real implementation of
598	   F-RTO is given in [SKR03].

600	A.1.  Sudden delay

602	   The main motivation of F-RTO algorithm is to improve TCP performance
603	   when a delay spike triggers a spurious retransmission timeout.  The
604	   example below illustrates the segments and acknowledgements
605	   transmitted by the TCP end hosts when a spurious RTO occurs, but no
606	   packets are lost. For simplicity, delayed acknowledgements are not
607	   used in the example. The example below reduces the congestion window
608	   and slow start threshold by half after detecting a spurious RTO.

610	         ...
611	          (cwnd = 6,
612	          ssthresh < 6,
613	          FlightSize = 5)
614	         1.  SEND 10 ---------------------------->
615	         2.          <---------------------------- ACK 6
616	         3.  SEND 11 ---------------------------->
617	         4.                       |
618	                               [delay]
619	                                  |
620	             [RTO]
621	         5.  SEND 6  ---------------------------->
622	                     <earlier xmitted SEG 6>  --->
623	         6.          <---------------------------- ACK 7
624	             [F-RTO step (2b)]
625	         7.  SEND 12 ---------------------------->
626	         8.  SEND 13 ---------------------------->
627	                     <earlier xmitted SEG 7>  --->
628	         9.          <---------------------------- ACK 8
629	             [F-RTO step (3b)]
630	             [SpuriousRecovery <- SPUR_TO]
631	             [cwnd <- 3, ssthresh <- 3]
632	         10.         <---------------------------- ACK 9
633	         11.         <---------------------------- ACK 10
634	         12.         <---------------------------- ACK 11
635	         13. SEND 14 ---------------------------->
636	         ...

638	   When a sudden delay long enough to trigger RTO occurs at step 4, the
639	   TCP sender retransmits the first unacknowledged segment (step 5).
640	   Because the next ACK covers the RTO retransmission because originally
641	   transmitted segment 6 arrives at the receiver, the TCP sender
642	   continues by sending two new data segments (steps 7, 8). Because the
643	   second acknowledgement arriving after the RTO acknowledges data that
644	   was not retransmitted due to RTO (step 9), the TCP sender declares
645	   the RTO as spurious and continues by sending new data. Because the
646	   TCP sender reduces cwnd when it detects the spurious RTO, it has to
647	   wait for some outstanding segments to leave the network before it can
648	   continue transmitting again at step 13.

650	A.2.  Loss of a retransmission

652	   If a retransmitted segment is lost, the only way to retransmit it
653	   again is to wait for the RTO to trigger the retransmission. Once the
654	   segment is successfully received, the receiver usually acknowledges
655	   several segments at once, because other segments in the same window
656	   have been successfully delivered before the retransmission arrives at
657	   the receiver. The example below shows a scenario where retransmission
658	   (of segment 6) is lost, as well as a later segment (segment 9) in the
659	   same window. The limited transmit [ABF01] or SACK TCP [MMFR96]
660	   enhancements are not in use in this example.

662	         ...
663	          (cwnd = 6,
664	          ssthresh < 6,
665	          FlightSize = 5)
666	             <segment 6 lost>
667	             <segment 9 lost>
668	         1.  SEND 10 ---------------------------->
669	         2.          <---------------------------- ACK 6
670	         3.  SEND 11 ---------------------------->
671	         4.          <---------------------------- ACK 6
672	         5.          <---------------------------- ACK 6
673	         6.          <---------------------------- ACK 6
674	         7.  SEND 6  --------------X
675	             <segment 6 lost>
676	             [ssthresh <- 3, cwnd <- ssthresh + 3 = 6]
677	         8.          <---------------------------- ACK 6
678	                                  |
679	                                  |
680	             [RTO]
681	             [ssthresh <- 2]
682	         9.  SEND 6  ---------------------------->
683	         10.         <---------------------------- ACK 9
684	             [F-RTO step (2b)]
685	         11. SEND 12 ---------------------------->
686	         12. SEND 13 ---------------------------->
687	         13.         <---------------------------- ACK 9
688	             [F-RTO step (3a)]
689	             [SpuriousRecovery <- FALSE]
690	             [cwnd <- 3]
691	         14. SEND 9  ---------------------------->
692	         15. SEND 10 ---------------------------->
693	         16. SEND 11 ---------------------------->
694	         17.         <---------------------------- ACK 11
695	         ...

697	   In the example above, segment 6 is lost and the sender retransmits it
698	   after three duplicate ACKs in step 7. However, the retransmission is
699	   also lost, and the sender has to wait for the RTO to expire before
700	   retransmitting it again. Because the first ACK following the RTO
701	   acknowledges the RTO retransmission (step 10), the sender transmits
702	   two new segments. The second ACK in step 13 does not acknowledge any
703	   previously unacknowledged data. Therefore the F-RTO sender enters the
704	   slow start and sets cwnd to 3 * MSS. Congestion window can be set to
705	   three segments, because two round-trips have elapsed after the RTO.
706	   After this the receiver acknowledges all segments transmitted prior
707	   to entering recovery and the sender can continue transmitting new
708	   data in congestion avoidance.

710	A.3.  Link outage

712	   The example below illustrates the F-RTO behavior when 4 consecutive
713	   packets are lost in the network causing the TCP sender to fall back
714	   to RTO recovery. Limited transmit and SACK are not used in this
715	   example.

717	         ...
718	          (cwnd = 6,
719	          ssthresh < 6,
720	          FlightSize = 5)
721	             <segments 6-9 lost>
722	         1.  SEND 10 ---------------------------->
723	         2.          <---------------------------- ACK 6
724	         3.  SEND 11 ---------------------------->
725	         4.          <---------------------------- ACK 6
726	                                  |
727	                                  |
728	             [RTO]
729	             [ssthresh <- 3]
730	         5.  SEND 6  ---------------------------->
731	         6.          <---------------------------- ACK 7
732	             [F-RTO step (2b)]
733	         7.  SEND 12 ---------------------------->
734	         8.  SEND 13 ---------------------------->
735	         9.          <---------------------------- ACK 7
736	             [F-RTO step (3a)]
737	             [SpuriousRecovery <- FALSE]
738	             [cwnd <- 3]
739	         10. SEND 7  ---------------------------->
740	         11. SEND 8  ---------------------------->
741	         12. SEND 9  ---------------------------->
742	         13.         <---------------------------- ACK 14

744	   Again, F-RTO sender transmits two new segments (steps 7 and 8) after
745	   the RTO retransmission is acknowledged. Because the next ACK does not
746	   acknowledge any data that was not retransmitted after the RTO (step
747	   9), the F-RTO sender proceeds with conventional recovery and slow
748	   start retransmissions.

750	A.4.  Packet reordering

752	   Since F-RTO modifies the TCP sender behavior only after a
753	   retransmission timeout and it is intended to avoid unnecessary
754	   retransmits only after spurious RTO, we limit the discussion on the
755	   effects of packet reordering in F-RTO behavior to the cases where
756	   packet reordering occurs immediately after the RTO.  When the TCP
757	   receiver gets an out-of-order segment, it generates a duplicate ACK.
758	   If the TCP sender implements the basic F-RTO algorithm, this may
759	   prevent the sender from detecting a spurious RTO.

761	   However, if the TCP sender applies the SACK-enhanced F-RTO, it is
762	   possible to detect a spurious RTO also when packet reordering occurs.
763	   We illustrate the behavior of SACK-enhanced F-RTO below when segment
764	   8 arrives before segments 6 and 7, and segments starting from segment
765	   6 are delayed in the network. In this example the TCP sender reduces
766	   the congestion window and slow start threshold in response to
767	   spurious RTO.

769	         ...
770	          (cwnd = 6,
771	          ssthresh < 6,
772	          FlightSize = 5)
773	         1.  SEND 10 ---------------------------->
774	         2.          <---------------------------- ACK 6
775	         3.  SEND 11 ---------------------------->
776	         4.                       |
777	                               [delay]
778	                                  |
779	             [RTO]
780	         5.  SEND 6  ---------------------------->
781	                     <earlier xmitted SEG 8>  --->
782	         6.          <---------------------------- ACK 6
783	                                                   [SACK 8]
784	             [SACK F-RTO stays in step 2]
785	         7.          <earlier xmitted SEG 6>  --->
786	         8.          <---------------------------- ACK 7
787	                                                   [SACK 8]
788	             [SACK F-RTO step (2b)]
789	         9.  SEND 12 ---------------------------->
790	         10. SEND 13 ---------------------------->
791	         11.         <earlier xmitted SEG 7>  --->
792	         12.         <---------------------------- ACK 9
793	             [SACK F-RTO step (3b)]
794	             [SpuriousRecovery <- SPUR_TO]
795	             [ssthresh <- 3, cwnd <- 3]
796	         13.         <---------------------------- ACK 10
797	         14.         <---------------------------- ACK 11
798	         15. SEND 14 ---------------------------->
799	         ...

801	   After RTO expires and the sender retransmits segment 6 (step 5), the
802	   receiver gets segment 8 and generates duplicate ACK with SACK for
803	   segment 8. In response to the acknowledgement the TCP sender does not
804	   send anything but stays in F-RTO step 2. Because the next
805	   acknowledgement advances the cumulative ACK point (step 8), the
806	   sender can transmit two new segments according to SACK-enhanced F-
807	   RTO. The next segment acknowledges new data between 7 and 11 that was
808	   not acknowledged earlier (segment 7), so the F-RTO sender declares
809	   the RTO spurious.

811	Appendix B: Applying SACK-enhanced F-RTO when RTO occurs during loss
812	recovery

814	   We believe that slightly modified SACK-enhanced F-RTO algorithm can
815	   be used to detect spurious RTOs also when RTO occurs while an earlier
816	   loss recovery is underway. However, there are issues that need to be
817	   considered if F-RTO is applied in this case.

819	   The original SACK-based F-RTO requires in algorithm step 3 that an
820	   ACK acknowledges previously unacknowledged non-retransmitted data
821	   between SND.UNA and send_high. If RTO takes place during earlier
822	   (SACK-based) loss recovery, the F-RTO sender must only use
823	   acknowledgements for non-retransmitted segments transmitted before
824	   the SACK-based loss recovery started. This means that in order to
825	   declare RTO spurious the TCP sender must receive an acknowledgement
826	   for non-retransmitted segment between SND.UNA and RecoveryPoint in
827	   algorithm step 3. RecoveryPoint is defined in conservative SACK-
828	   recovery algorithm [BAFW03], and it is set to indicate the highest
829	   segment transmitted so far when SACK-based loss recovery begins. In
830	   other words, if the TCP sender receives acknowledgement for segment
831	   that was transmitted more than one RTO ago, it can declare the RTO
832	   spurious. Defining an efficient algorithm for checking these
833	   conditions remains as a future work item.

835	   When spurious RTO is detected according to the rules given above, it
836	   may be possible that the response algorithm needs to consider this
837	   case separately, for example in terms of what segments to retransmit
838	   after RTO, and whether it is safe to revert the congestion control
839	   parameters in this case. This is considered as a topic of future
840	   research.

842	Authors' Addresses

844	   Pasi Sarolahti
845	   Nokia Research Center
846	   P.O. Box 407
847	   FIN-00045 NOKIA GROUP
848	   Finland
849	   Phone: +358 50 4876607
850	   EMail: pasi.sarolahti@nokia.com
851	   http://www.cs.helsinki.fi/u/sarolaht/

853	   Markku Kojo
854	   University of Helsinki
855	   Department of Computer Science
856	   P.O. Box 26
857	   FIN-00014 UNIVERSITY OF HELSINKI
858	   Finland

860	   Phone: +358 9 1914 4179
861	   EMail: markku.kojo@cs.helsinki.fi