idnits 2.17.1 

draft-ietf-tcpm-rfc4138bis-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 19.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 865.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 876.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 883.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 889.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (14 July 2008) is 5764 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC
     5681)

  -- Duplicate reference: RFC2581, mentioned in 'APB07', was also mentioned
     in 'APS99'.

  ** Obsolete normative reference: RFC 2581 (ref. 'APB07') (Obsoleted by RFC
     5681)

  ** Obsolete normative reference: RFC 3517 (ref. 'BAFW03') (Obsoleted by RFC
     6675)

  ** Obsolete normative reference: RFC 3782 (ref. 'FHG04') (Obsoleted by RFC
     6582)

  ** Obsolete normative reference: RFC 2988 (ref. 'PA00') (Obsoleted by RFC
     6298)

  ** Obsolete normative reference: RFC  793 (ref. 'Pos81') (Obsoleted by RFC
     9293)

  -- Obsolete informational reference (is this intentional?): RFC 1323 (ref.
     'BBJ92') (Obsoleted by RFC 7323)

  -- Obsolete informational reference (is this intentional?): RFC  896 (ref.
     'Nag84') (Obsoleted by RFC 7805)

  -- Duplicate reference: RFC4138, mentioned in 'SK05', was also mentioned in
     'KYHS07'.

  -- Obsolete informational reference (is this intentional?): RFC 2960 (ref.
     'Ste00') (Obsoleted by RFC 4960)


     Summary: 8 errors (**), 0 flaws (~~), 2 warnings (==), 12 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                             P. Sarolahti
2	INTERNET-DRAFT                                     Nokia Research Center
3	draft-ietf-tcpm-rfc4138bis-02.txt                                M. Kojo
4	Expires: January 2009                             University of Helsinki
5	                                                             K. Yamamoto
6	                                                                 M. Hata
7	                                                              NTT Docomo

9	                                                            14 July 2008

11	        Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
12	               Spurious Retransmission Timeouts with TCP

14	Status of this Memo

16	    By submitting this Internet-Draft, each author represents that any
17	    applicable patent or other IPR claims of which he or she is aware
18	    have been or will be disclosed, and any of which he or she becomes
19	    aware will be disclosed, in accordance with Section 6 of BCP 79.

21	    Internet-Drafts are working documents of the Internet Engineering
22	    Task Force (IETF), its areas, and its working groups.  Note that
23	    other groups may also distribute working documents as Internet-
24	    Drafts.

26	    Internet-Drafts are draft documents valid for a maximum of six
27	    months and may be updated, replaced, or obsoleted by other documents
28	    at any time.  It is inappropriate to use Internet-Drafts as
29	    reference material or to cite them other than as "work in progress."

31	    The list of current Internet-Drafts can be accessed at
32	    http://www.ietf.org/ietf/1id-abstracts.txt.

34	    The list of Internet-Draft Shadow Directories can be accessed at
35	    http://www.ietf.org/shadow.html.

37	    This Internet-Draft will expire on January 2009.

39	Abstract

41	    Spurious retransmission timeouts cause suboptimal TCP performance
42	    because they often result in unnecessary retransmission of the last
43	    window of data.  This document describes the F-RTO detection
44	    algorithm for detecting spurious TCP retransmission timeouts.  F-RTO
45	    is a TCP sender-only algorithm that does not require any TCP options
46	    to operate.  After retransmitting the first unacknowledged segment
47	    triggered by a timeout, the F-RTO algorithm of the TCP sender
48	    monitors the incoming acknowledgments to determine whether the
49	    timeout was spurious.  It then decides whether to send new segments
50	    or retransmit unacknowledged segments.  The algorithm effectively
51	    helps to avoid additional unnecessary retransmissions and thereby
52	    improves TCP performance in the case of a spurious timeout.

54	                             Table of Contents

56	    1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . .   3
57	       1.1. Conventions and Terminology. . . . . . . . . . . . . . .   5
58	    2. Basic F-RTO Algorithm . . . . . . . . . . . . . . . . . . . .   5
59	       2.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . .   6
60	       2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . .   8
61	    3. SACK-Enhanced Version of the F-RTO Algorithm. . . . . . . . .  10
62	    4. Taking Actions after Detecting Spurious RTO . . . . . . . . .  12
63	    5. Evaluation of RFC 4138 and Differences to this
64	    Document . . . . . . . . . . . . . . . . . . . . . . . . . . . .  12
65	    6. Security Considerations . . . . . . . . . . . . . . . . . . .  14
66	    7. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . .  14
67	    Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . .  15
68	    A. Discussion of Window-Limited Cases. . . . . . . . . . . . . .  15
69	    B. List of Changes . . . . . . . . . . . . . . . . . . . . . . .  16
70	    References . . . . . . . . . . . . . . . . . . . . . . . . . . .  17
71	    Normative References . . . . . . . . . . . . . . . . . . . . . .  17
72	    Informative References . . . . . . . . . . . . . . . . . . . . .  17
73	    AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . .  19
74	    Full Copyright Statement . . . . . . . . . . . . . . . . . . . .  21
75	    Intellectual Property. . . . . . . . . . . . . . . . . . . . . .  21

77	1.  Introduction

79	    The Transmission Control Protocol (TCP) [Pos81] has two methods for
80	    triggering retransmissions.  First, the TCP sender relies on
81	    incoming duplicate ACKs, which indicate that the receiver is missing
82	    some of the data.  After a required number of successive duplicate
83	    ACKs have arrived at the sender, it retransmits the first
84	    unacknowledged segment [APS99] and continues with a loss recovery
85	    algorithm such as NewReno [FHG04] or SACK-based loss recovery
86	    [BAFW03].  Second, the TCP sender maintains a retransmission timer
87	    which triggers retransmission of segments, if they have not been
88	    acknowledged before the retransmission timeout (RTO) expires.  When
89	    the retransmission timeout occurs, the TCP sender enters the RTO
90	    recovery where the congestion window is initialized to one segment
91	    and unacknowledged segments are retransmitted using the slow-start
92	    algorithm.  The retransmission timer is adjusted dynamically, based
93	    on the measured round-trip times [PA00].

95	    It has been pointed out that the retransmission timer can expire
96	    spuriously and cause unnecessary retransmissions when no segments
97	    have been lost [LK00, GL02, LM03].  After a spurious retransmission
98	    timeout, the late acknowledgments of the original segments arrive at
99	    the sender, usually triggering unnecessary retransmissions of a
100	    whole window of segments during the RTO recovery.  Furthermore,
101	    after a spurious retransmission timeout, a conventional TCP sender
102	    increases the congestion window on each late acknowledgment in slow
103	    start.  This injects a large number of data segments into the
104	    network within one round-trip time, thus violating the packet
105	    conservation principle [Jac88].

107	    There are a number of potential reasons for spurious retransmission
108	    timeouts.  First, some mobile networking technologies involve sudden
109	    delay spikes on transmission because of actions taken during a hand-
110	    off.  Second, a hand-off may take place from a low latency path to a
111	    high latency path, suddenly increasing the round-trip time beyond
112	    the current RTO value.  Third, on a low-bandwidth link the arrival
113	    of competing traffic (possibly with higher priority), or some other
114	    change in available bandwidth, can cause a sudden increase of the
115	    round-trip time.  This may trigger a spurious retransmission
116	    timeout.  A persistently reliable link layer can also cause a sudden
117	    delay when a data frame and several retransmissions of it are lost
118	    for some reason.  This document does not distinguish between the
119	    different causes of such a delay spike.  Rather, it discusses the
120	    spurious retransmission timeouts caused by a delay spike in general.

122	    This document describes the F-RTO detection algorithm.  It is based
123	    on the detection mechanism of the "Forward RTO-Recovery" (F-RTO)
124	    algorithm [SKR03] that is used for detecting spurious retransmission
125	    timeouts and thus avoids unnecessary retransmissions following the
126	    retransmission timeout.  When the timeout is not spurious, the F-RTO
127	    algorithm reverts back to the conventional RTO recovery algorithm,
128	    and therefore has similar behavior and performance.  In contrast to
129	    alternative algorithms proposed for detecting unnecessary
130	    retransmissions (Eifel [LK00], [LM03] and DSACK-based algorithms
131	    [BA04]), F-RTO does not require any TCP options for its operation,
132	    and it can be implemented by modifying only the TCP sender.  The
133	    Eifel algorithm uses TCP timestamps [BBJ92] for detecting a spurious
134	    timeout upon arrival of the first acknowledgment after the
135	    retransmission.  The DSACK-based algorithms require that the TCP
136	    Selective Acknowledgment Option [MMFR96], with the DSACK extension
137	    [FMMP00], is in use.  With DSACK, the TCP receiver can report if it
138	    has received a duplicate segment, enabling the sender to detect
139	    afterwards whether it has retransmitted segments unnecessarily.  The
140	    F-RTO algorithm only attempts to detect and avoid unnecessary
141	    retransmissions after an RTO.  Eifel and DSACK can also be used for
142	    detecting unnecessary retransmissions caused by other events, such
143	    as packet reordering.

145	    When an RTO expires, the F-RTO sender retransmits the first
146	    unacknowledged segment as usual [APS99].  Deviating from the normal
147	    operation after a timeout, it then tries to transmit new, previously
148	    unsent data for the first acknowledgment that arrives after the
149	    timeout, given that the acknowledgment advances the window.  If the
150	    second acknowledgment that arrives after the timeout advances the
151	    window (i.e., acknowledges data that was not retransmitted), the F-
152	    RTO sender declares the timeout spurious and exits the RTO recovery.
153	    However, if either of these two acknowledgments is a duplicate ACK,
154	    there will not be sufficient evidence of a spurious timeout.
155	    Therefore, the F-RTO sender retransmits the unacknowledged segments
156	    in slow start similarly to the traditional algorithm.

158	    With a SACK-enhanced version of the F-RTO algorithm, spurious
159	    timeouts may be detected even if duplicate ACKs arrive after an RTO
160	    retransmission.  Even though this document only specifies the F-RTO
161	    algorithm for TCP, the algorithm can also be applied to the Stream
162	    Control Transmission Protocol (SCTP) [Ste00] that has acknowledgment
163	    and packet retransmission concepts similar to TCP. Considerations on
164	    applying F-RTO for SCTP are discussed in RFC 4138 [SK05].

166	    This document is organized as follows.  Section 2 describes the
167	    basic F-RTO algorithm, and the SACK-enhanced F-RTO algorithm is
168	    given in Section 3.  Section 4 discusses the possible actions to be
169	    taken after detecting a spurious RTO and Section 5 discusses the
170	    security considerations.

172	1.1.  Conventions and Terminology

174	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
175	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
176	    document are to be interpreted as described in BCP 14, RFC 2119
177	    [RFC2119] and indicate requirement levels for protocols.

179	2.  Basic F-RTO Algorithm

181	    A timeout is considered spurious if it would have been avoided had
182	    the sender waited longer for an acknowledgment to arrive [LM03].  F-
183	    RTO affects the TCP sender behavior only after a retransmission
184	    timeout.  Otherwise, the TCP behavior remains the same.  When the
185	    RTO expires, the F-RTO algorithm monitors incoming acknowledgments
186	    and if the TCP sender gets an acknowledgment for a segment that was
187	    not retransmitted due to timeout, the F-RTO algorithm declares a
188	    timeout spurious.  The actions taken in response to a spurious
189	    timeout are not specified in this document, but we discuss some
190	    alternatives in Section 4.  This section introduces the algorithm
191	    and then discusses the different steps of the algorithm in more
192	    detail.

194	    Following the practice used with the Eifel Detection algorithm

196	    [LM03], we use the "SpuriousRecovery" variable to indicate whether
197	    the retransmission is declared spurious by the sender. This variable
198	    can be used as an input for a corresponding response algorithm. With
199	    F-RTO, the value of SpuriousRecovery can be either SPUR_TO
200	    (indicating a spurious retransmission timeout) or FALSE (indicating
201	    that the timeout is not declared spurious), and the TCP sender
202	    should follow the conventional RTO recovery algorithm. In addition,
203	    we use the "recover" variable specified in the NewReno algorithm
204	    [FHG04].

206	2.1.  The Algorithm

208	    A TCP sender implementing the basic F-RTO algorithm MUST take the
209	    following steps after the retransmission timer expires.  If the
210	    retransmission timer expires again during the execution of the F-RTO
211	    algorithm, the TCP sender MUST re-start the algorithm processing
212	    from step 1.  If the sender implements some loss recovery algorithm
213	    other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT
214	    be entered when earlier fast recovery is underway.

216	    The F-RTO algorithm takes different actions based on whether an
217	    incoming acknowledgement advances the cumulative acknowledgement
218	    point for a received in-order segment, or whether it is a duplicate
219	    acknowledgement to indicate an out-of-order segment. Duplicate
220	    acknowledgement is defined in [APB07]. The F-RTO algorithm does not
221	    specify actions for receiving a segment that does not acknowledge
222	    new data but is not a duplicate acknowledgement. The TCP sender
223	    SHOULD ignore such segments and wait for a segment that either
224	    acknowledges new data or is a duplicate acknowledgment.

226	    1) When RTO expires, retransmit the first unacknowledged segment and
227	       set SpuriousRecovery to FALSE. If the TCP sender is already in
228	       RTO recovery AND "recover" is larger then SND.UNA (the oldest
229	       unacknowledged sequence number [Pos81]), do not enter step 2 of
230	       this algorithm. Instead, store the highest sequence number
231	       transmitted so far in variable "recover" and continue with slow
232	       start retransmissions following the conventional RTO recovery
233	       algorithm.

235	    2) When the first acknowledgment after the RTO retransmission
236	       arrives at the TCP sender, store the highest sequence number
237	       transmitted so far in variable "recover". The TCP sender chooses
238	       one of the following actions, depending on whether the ACK
239	       advances the window or whether it is a duplicate ACK.

241	       a) If the acknowledgment is a duplicate ACK OR the
242	          Acknowledgement field covers "recover" but not more than
243	          "recover" OR the acknowledgment does not acknowledge all of
244	          the data that was retransmitted in step 1, revert to the
245	          conventional RTO recovery and continue by retransmitting
246	          unacknowledged data in slow start.  Do not enter step 3 of
247	          this algorithm.  The SpuriousRecovery variable remains as
248	          FALSE.

250	       b) Else, if the acknowledgment advances the window AND the
251	          Acknowledgement field does not cover "recover", transmit up to
252	          two new (previously unsent) segments and enter step 3 of this
253	          algorithm. If the TCP sender does not have enough unsent data,
254	          it can send only one segment. In addition, the TCP sender MAY
255	          override the Nagle algorithm [Nag84] and immediately send a
256	          segment if needed. Note that sending two segments in this step
257	          is allowed by TCP congestion control requirements [APS99]: An
258	          F-RTO TCP sender simply chooses different segments to
259	          transmit.

261	          If the TCP sender does not have any new data to send, or the
262	          advertised window prohibits new transmissions, the recommended
263	          action is to skip step 3 of this algorithm and continue with
264	          slow start retransmissions, following the conventional RTO
265	          recovery algorithm.  However, alternative ways of handling the
266	          window-limited cases that could result in better performance
267	          are discussed in Appendix A.

269	    3) When the second acknowledgment after the RTO retransmission
270	       arrives at the TCP sender, the TCP sender either declares the
271	       timeout spurious, or starts retransmitting the unacknowledged
272	       segments.

274	       a) If the acknowledgment is a duplicate ACK, set the congestion
275	          window to no more than 3 * MSS, and continue with the slow
276	          start algorithm retransmitting unacknowledged segments.  The
277	          congestion window can be set to 3 * MSS, because two round-
278	          trip times have elapsed since the RTO, and a conventional TCP
279	          sender would have increased cwnd to 3 during the same time.
280	          Leave SpuriousRecovery set to FALSE.

282	       b) If the acknowledgment advances the window (i.e., if it
283	          acknowledges data that was not retransmitted after the
284	          timeout), declare the timeout spurious, set SpuriousRecovery
285	          to SPUR_TO, and set the value of the "recover" variable to
286	          SND.UNA (the oldest unacknowledged sequence number [Pos81]).

288	2.2.  Discussion

290	    The F-RTO sender takes cautious actions when it receives duplicate
291	    acknowledgments after a retransmission timeout.  Because duplicate
292	    ACKs may indicate that segments have been lost, reliably detecting a
293	    spurious timeout is difficult due to the lack of additional
294	    information.  Therefore, it is prudent to follow the conventional
295	    TCP recovery in those cases.

297	    The condition in step 1 prevents the execution of the F-RTO
298	    algorithm in case a previous RTO recovery is underway when the
299	    retransmission timer expires, except in case the retransmission
300	    timer expires multiple times for the same segment. If RTO expires
301	    during an earlier RTO-based loss recovery, acknowledgements for
302	    retransmitted segments may falsely lead the TCP sender to declare
303	    the timeout spurious.

305	    If the first acknowledgment after the RTO retransmission covers the
306	    "recover" point at algorithm step (2a), there is not enough evidence
307	    that a non-retransmitted segment has arrived at the receiver after
308	    the timeout.  This is a common case when a fast retransmission is
309	    lost and has been retransmitted again after an RTO, while the rest
310	    of the unacknowledged segments were successfully delivered to the
311	    TCP receiver before the retransmission timeout.  Therefore, the
312	    timeout cannot be declared spurious in this case.

314	    If the first acknowledgment after the RTO retransmission does not
315	    acknowledge all of the data that was retransmitted in step 1, the
316	    TCP sender reverts to the conventional RTO recovery.  Otherwise, a
317	    malicious receiver acknowledging partial segments could cause the
318	    sender to declare the timeout spurious in a case where data was
319	    lost.

321	    The TCP sender is allowed to send two new segments in algorithm
322	    branch (2b) because the conventional TCP sender would transmit two
323	    segments when the first new ACK arrives after the RTO
324	    retransmission.  If sending new data is not possible in algorithm
325	    branch (2b), or if the receiver window limits the transmission, the
326	    TCP sender has to send something in order to prevent the TCP
327	    transfer from stalling.  If no segments were sent, the pipe between
328	    sender and receiver might run out of segments, and no further
329	    acknowledgments would arrive.  Therefore, in the window-limited
330	    case, the recommendation is to revert to the conventional RTO
331	    recovery with slow start retransmissions.  Appendix A discusses some
332	    alternative solutions for window-limited situations.

334	    If the retransmission timeout is declared spurious, the TCP sender
335	    sets the value of the "recover" variable to SND.UNA in order to
336	    allow fast retransmit [FHG04].  The "recover" variable was proposed
337	    for avoiding unnecessary, multiple fast retransmits when RTO expires
338	    during fast recovery with NewReno TCP.  Because the F-RTO sender
339	    retransmits only the segment that triggered the timeout, the problem
340	    of unnecessary multiple fast retransmits [FHG04] cannot occur.
341	    Therefore, if three duplicate ACKs arrive at the sender after the
342	    timeout, they probably indicate a packet loss, and thus fast
343	    retransmit should be used to allow efficient recovery.  If there are
344	    not enough duplicate ACKs arriving at the sender after a packet
345	    loss, the retransmission timer expires again and the sender enters
346	    step 1 of this algorithm.

348	    When the timeout is declared spurious, the TCP sender cannot detect
349	    whether the unnecessary RTO retransmission was lost.  In principle,
350	    the loss of the RTO retransmission should be taken as a congestion
351	    signal.  Thus, there is a small possibility that the F-RTO sender
352	    will violate the congestion control rules, if it chooses to fully
353	    revert congestion control parameters after detecting a spurious
354	    timeout.  The Eifel detection algorithm has a similar property,
355	    while the DSACK option can be used to detect whether the
356	    retransmitted segment was successfully delivered to the receiver.

358	    The F-RTO algorithm has a side-effect on the TCP round-trip time
359	    measurement.  Because the TCP sender can avoid most of the
360	    unnecessary retransmissions after detecting a spurious timeout, the
361	    sender is able to take round-trip time samples on the delayed
362	    segments.  If the regular RTO recovery was used without TCP
363	    timestamps, this would not be possible due to the retransmission
364	    ambiguity.  As a result, the RTO is likely to have more accurate and
365	    larger values with F-RTO than with the regular TCP after a spurious
366	    timeout that was triggered due to delayed segments.  We believe this
367	    is an advantage in networks that are prone to delay spikes.

369	    There are some situations where the F-RTO algorithm may not avoid
370	    unnecessary retransmissions after a spurious timeout.  If packet
371	    reordering or packet duplication occurs on the segment that
372	    triggered the spurious timeout, the F-RTO algorithm may not detect
373	    the spurious timeout due to incoming duplicate ACKs.  Additionally,
374	    if a spurious timeout occurs during fast recovery, the F-RTO
375	    algorithm often cannot detect the spurious timeout because the
376	    segments that were transmitted before the fast recovery trigger
377	    duplicate ACKs.  However, we consider these cases rare, and note
378	    that in cases where F-RTO fails to detect the spurious timeout, it
379	    retransmits the unacknowledged segments in slow start, and thus
380	    performs similarly to the regular RTO recovery.

382	3.  SACK-Enhanced Version of the F-RTO Algorithm

384	    This section describes an alternative version of the F-RTO algorithm
385	    that uses the TCP Selective Acknowledgment Option [MMFR96].  By
386	    using the SACK option, the TCP sender detects spurious timeouts in
387	    most of the cases when packet reordering or packet duplication is
388	    present.  If the SACK blocks acknowledge new data that was not
389	    transmitted after the RTO retransmission, the sender may declare the
390	    timeout spurious, even when duplicate ACKs follow the RTO.

392	    Given that the TCP Selective Acknowledgment Option [MMFR96] is
393	    enabled for a TCP connection, a TCP sender MAY implement the SACK-
394	    enhanced F-RTO algorithm.  If the sender applies the SACK-enhanced
395	    F-RTO algorithm, it MUST follow the steps below.  This algorithm
396	    SHOULD NOT be applied if the TCP sender is already in loss recovery
397	    when retransmission timeout occurs.

399	    The steps of the SACK-enhanced version of the F-RTO algorithm are as
400	    follows.  If the retransmission timer expires again during the
401	    execution of the SACK-enhanced F-RTO algorithm, the TCP sender MUST
402	    re-start the algorithm processing from step 1.

404	    1) When the RTO expires, retransmit the first unacknowledged segment
405	       and set SpuriousRecovery to FALSE. Following the recommendation
406	       in SACK specification [MMFR96], reset the SACK scoreboard.  If
407	       "RecoveryPoint" is larger than SND.UNA, do not enter step 2 of
408	       this algorithm.  Instead, set variable "RecoveryPoint" to
409	       indicate the highest sequence number transmitted so far and
410	       continue with slow start retransmissions following the
411	       conventional RTO recovery algorithm.

413	    2) Wait until the acknowledgment of the data retransmitted due to
414	       the timeout arrives at the sender.  If duplicate ACKs arrive
415	       before the cumulative acknowledgment for retransmitted data,
416	       adjust the scoreboard according to the incoming SACK information.
417	       Stay in step 2 and wait for the next new acknowledgment. If RTO
418	       expires again, go to step 1 of the algorithm. When a new
419	       acknowledgment arrives, set variable "RecoveryPoint" to indicate
420	       the highest sequence number transmitted so far.

422	       a) If the Cumulative Acknowledgement field covers "RecoveryPoint"
423	          but not more than "RecoveryPoint", revert to the conventional
424	          RTO recovery and set the congestion window to no more than 2 *
425	          MSS, like a regular TCP would do. Do not enter step 3 of this
426	          algorithm.

428	       b) Else, if the Cumulative Acknowledgement field does not cover
429	          "RecoveryPoint" but is larger than SND.UNA, transmit up to two
430	          new (previously unsent) segments and proceed to step 3.  If
431	          the TCP sender is not able to transmit any previously unsent
432	          data -- either due to receiver window limitation or because it
433	          does not have any new data to send -- the recommended action
434	          is to refrain from entering step 3 of this algorithm.  Rather,
435	          continue with slow start retransmissions following the
436	          conventional RTO recovery algorithm.

438	          It is also possible to apply some of the alternatives for
439	          handling window-limited cases discussed in Appendix A.

441	    3) The next acknowledgment arrives at the sender.  Either a
442	       duplicate ACK or a new cumulative ACK (advancing the window)
443	       applies in this step. Other types of ACKs are ignored without any
444	       action.

446	       a) If the Cumulative Acknowledgement field or a SACK block covers
447	          more than "RecoveryPoint", set the congestion window to no
448	          more than 3 * MSS and proceed with the conventional RTO
449	          recovery, retransmitting unacknowledged segments.  Take this
450	          branch also when the acknowledgment is a duplicate ACK and it
451	          does not acknowledge any new, previously unacknowledged data
452	          below "RecoveryPoint" in the SACK blocks.  Leave
453	          SpuriousRecovery set to FALSE.

455	       b) If the Cumulative Acknowledgement field or a SACK block in the
456	          ACK does not cover more than "RecoveryPoint" AND it
457	          acknowledges data that was not acknowledged earlier (either
458	          with cumulative acknowledgment or using SACK blocks), declare
459	          the timeout spurious and set SpuriousRecovery to SPUR_TO.  The
460	          retransmission timeout can be declared spurious, because the
461	          segment acknowledged with this ACK was transmitted before the
462	          timeout.

464	    If there are unacknowledged holes between the received SACK blocks,
465	    those segments are retransmitted similarly to the conventional SACK
466	    recovery algorithm [BAFW03].  If the algorithm exits with
467	    SpuriousRecovery set to SPUR_TO, "RecoveryPoint" is set to SND.UNA,
468	    thus allowing fast recovery on incoming duplicate acknowledgments.

470	    The SACK enhanced algorithm works on the same principle as the basic
471	    algorithm, but by utilizing the additional information from the SACK
472	    option. When a genuine retransmission timeout occurs during a steady
473	    state of a connection, it can be assumed that there are no segments
474	    left in the pipe. Otherwise, the acknowledgments triggered by these
475	    segments would have triggered the SACK loss recovery or transmission
476	    of new segments. Therefore, if the F-RTO sender receives
477	    acknowledgements for segments transmitted before the retransmission
478	    timeout in response to the two new segments sent at the algorithm
479	    step 2, the normal operation of TCP has been just delayed, and the
480	    retransmission timeout is considered spurious. Note that this
481	    reasoning works only when the TCP sender is not in loss recovery at
482	    the time the retransmission timeout occurs. The condition in step 1
483	    checking that "RecoveryPoint" is larger than SND.UNA prevents the
484	    execution of the F-RTO algorithm in case a previous loss recovery,
485	    either RTO recovery or SACK loss recovery, is underway when the
486	    retransmission timer expires. It, however, allows the execution of
487	    the F-RTO algorithm, if the retransmission timer expires multiple
488	    times for the same segment.

490	4.  Taking Actions after Detecting Spurious RTO

492	    Upon a retransmission timeout, a conventional TCP sender assumes
493	    that outstanding segments are lost and starts retransmitting the
494	    unacknowledged segments.  When the retransmission timeout is
495	    detected to be spurious, the TCP sender should not continue
496	    retransmitting based on the timeout.  For example, if the sender was
497	    in congestion avoidance phase transmitting new, previously unsent
498	    segments, it should continue transmitting previously unsent segments
499	    in congestion avoidance.

501	    There are currently two alternatives specified for a spurious
502	    timeout response algorithm, the Eifel Response Algorithm [LG04], and
503	    an algorithm for adapting the retransmission timeout after a
504	    spurious RTO [BBA06]. If no specific response algorithm is
505	    implemented, the TCP SHOULD respond to spurious timeout
506	    conservatively, applying the TCP congestion control specification
507	    [APS99]. Different response algorithms for spurious retransmission
508	    timeouts have been analyzed in some research papers [GL03, Sar03]
509	    and IETF documents [SL03].

511	5.  Evaluation of RFC 4138 and Differences to this Document

513	    F-RTO was first specified in an Experimental RFC 4138 that has been
514	    implemented in a number of operating systems since it was published.
515	    Gained experience has been documented in a separate document
516	    [KYHS07], and can be summarized as follows.

518	    If the TCP sender employs F-RTO, it is able to detect spurious RTOs
519	    and avoid the unnecessary retransmission of the whole window of
520	    data. Because F-RTO avoids the unnecessary retransmissions after a
521	    spurious RTO, it is able to adhere to the packet conservation
522	    principle, unlike a regular TCP that enters the slow-start recovery
523	    unnecessarily an inappropriately restarts the ACK clock while there
524	    are segments outstanding in the network. When a spurious RTO has
525	    been detected, a sender can select an appropriate congestion control
526	    response instead of setting the congestion window to one segment.
527	    Because F-RTO avoids unnecessary retransmissions, it is able to take
528	    the RTT of the delayed segments into account when calculating the
529	    RTO estimate, which may help in avoiding further spurious
530	    retransmission timeouts.

532	    Experimental results with the basic F-RTO have been reported in an
533	    emulated network using a Linux implementation [SKR03]. Also
534	    different congestion control responses along with the SACK-enhanced
535	    version of F-RTO were tested in a similar environment [Sar03]. There
536	    are publications analyzing F-RTO performance over commercial W-CDMA
537	    networks, and in an emulated HSDPA network [Yam05, Hok05].  Also
538	    Microsoft reported positive experiences with their implementation of
539	    F-RTO in the IETF-68 meeting.

541	    It is known that some spurious RTOs may remain undetected by F-RTO
542	    if duplicate acknowledgements arrive at the sender immediately after
543	    the spurious RTO, for example due to packet reordering or packet
544	    loss. There are rare corner cases where F-RTO could "hide" a packet
545	    loss and therefore lead to inappropriate behavior with non-
546	    conservative congestion control response: first, if a massive packet
547	    reordering occurred so that the acknowledgement of RTO
548	    retransmission arrived at the sender before the acknowledgments of
549	    original transmissions, the sender might not detect the loss of the
550	    segment that triggered the RTO. Second, a malicious receiver could
551	    lead F-RTO to make a wrong conclusion after an RTO by acknowledging
552	    segments it has not received. Such receiver would, however, risk
553	    breaking the consistency of the TCP state between the sender and
554	    receiver, causing the connection to become unusable, which cannot be
555	    of any benefit to the receiver. Therefore we believe it is not
556	    likely that receivers would start employing such tricks in a
557	    significant scale. Finally, loss of the unnecessary RTO
558	    retransmission cannot be detected without using some explicit
559	    acknowledgement scheme such as DSACK. This is common to the other
560	    mechanisms for detecting spurious RTO, as well as to regular TCP
561	    that does not use DSACK. We note that if the congestion control
562	    response to spurious RTO is conservative enough, the above corner
563	    cases do not cause problems due to increased congestion.

565	6.  Security Considerations

567	    The main security threat regarding F-RTO is the possibility that a
568	    receiver could mislead the sender into setting too large a
569	    congestion window after an RTO.  There are two possible ways a
570	    malicious receiver could trigger a wrong output from the F-RTO
571	    algorithm.  First, the receiver can acknowledge data that it has not
572	    received.  Second, it can delay acknowledgment of a segment it has
573	    received earlier, and acknowledge the segment after the TCP sender
574	    has been deluded to enter algorithm step 3.

576	    If the receiver acknowledges a segment it has not really received,
577	    the sender can be led to declare spurious timeout in the F-RTO
578	    algorithm, step 3.  However, because the sender will have an
579	    incorrect state, it cannot retransmit the segment that has never
580	    reached the receiver.  Therefore, this attack is unlikely to be
581	    useful for the receiver to maliciously gain a larger congestion
582	    window.

584	    A common case for a retransmission timeout is that a fast
585	    retransmission of a segment is lost.  If all other segments have
586	    been received, the RTO retransmission causes the whole window to be
587	    acknowledged at once.  This case is recognized in F-RTO algorithm
588	    branch (2a).  However, if the receiver only acknowledges one segment
589	    after receiving the RTO retransmission, and then the rest of the
590	    segments, it could cause the timeout to be declared spurious when it
591	    is not.  Therefore, it is suggested that, when an RTO expires during
592	    the fast recovery phase, the sender would not fully revert the
593	    congestion window even if the timeout was declared spurious.
594	    Instead, the sender would reduce the congestion window to 1.

596	    If there is more than one segment missing at the time of a
597	    retransmission timeout, the receiver does not benefit from
598	    misleading the sender to declare a spurious timeout because the
599	    sender would have to go through another recovery period to
600	    retransmit the missing segments, usually after an RTO has elapsed.

602	7.  Acknowledgements

604	    The authors would like to thank Alfred Hoenes, Ilpo Jarvinen and
605	    Murari Sridharan for the comments on this document.

607	    We are also thankful to Reiner Ludwig, Andrei Gurtov, Josh Blanton,
608	    Mark Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias
609	    Rodriguez, Sourabh Ladha, Martin Duke, Motoharu Miyake, Ted Faber,
610	    Samu Kontinen, and Kostas Pentikousis who gave valuable feedback
611	    during the preparation of RFC 4138, the precursor of this document.

613	Appendix

615	A.  Discussion of Window-Limited Cases

617	    When the advertised window limits the transmission of two new
618	    previously unsent segments, or there are no new data to send, it is
619	    recommended in F-RTO algorithm step (2b) that the TCP sender
620	    continue with the conventional RTO recovery algorithm.  The
621	    disadvantage is that the sender may continue unnecessary
622	    retransmissions due to possible spurious timeout.  This section
623	    briefly discusses the options that can potentially improve
624	    performance when transmitting previously unsent data is not
625	    possible.

627	    - The TCP sender could reserve an unused space of a size of one or
628	      two segments in the advertised window to ensure the use of
629	      algorithms such as F-RTO or Limited Transmit [ABF01] in receiver
630	      window-limited situations.  On the other hand, while doing this,
631	      the TCP sender should ensure that the window of outstanding
632	      segments is large enough for proper utilization of the available
633	      pipe.

635	    - Use additional information if available, e.g., TCP timestamps with
636	      the Eifel Detection algorithm, for detecting a spurious timeout.
637	      However, Eifel detection may yield different results from F-RTO
638	      when ACK losses and an RTO occur within the same round-trip time
639	      [SKR03].

641	    - Retransmit data from the tail of the retransmission queue and
642	      continue with step 3 of the F-RTO algorithm.  It is possible that
643	      the retransmission will be made unnecessarily. Furthermore, the
644	      operation of the SACK-based F-RTO algorithm would need to consider
645	      this case separately, to not use the retransmitted segment to
646	      indicate spurious timeout. Given these considerations, this option
647	      is not recommended.

649	    - Send a zero-sized segment below SND.UNA, similar to a TCP Keep-
650	      Alive probe, and continue with step 3 of the F-RTO algorithm.
651	      Because the receiver replies with a duplicate ACK, the sender is
652	      able to detect whether the timeout was spurious from the incoming
653	      acknowledgment. This method does not send data unnecessarily, but
654	      it delays the recovery by one round-trip time in cases where the
655	      timeout was not spurious.  Therefore, this method is not
656	      encouraged.

658	    - In receiver-limited cases, send one octet of new data, regardless
659	      of the advertised window limit, and continue with step 3 of the F-
660	      RTO algorithm.  It is possible that the receiver will have free
661	      buffer space to receive the data by the time the segment has
662	      propagated through the network, in which case no harm is done.  If
663	      the receiver is not capable of receiving the segment, it rejects
664	      the segment and sends a duplicate ACK.

666	B.  List of Changes

668	    Changes between different document versions are summarized below,
669	    apart from minor editing and language improvements.

671	    Changes from draft-ietf-tcpm-rfc4138bis-01:

673	    * Modified the basic F-RTO algorithm and SACK-enhanced F-RTO
674	    algorithm to prevent the TCP sender from applying F-RTO algorithm if
675	    retransmission timer expires when an earlier RTO recovery is
676	    underway, except when RTO expires multiple times for the same
677	    segment.

679	    Changes from draft-ietf-tcpm-rfc4138bis-00:

681	    * Added back the original SACK-algorithm from RFC 4138 after the
682	    common feedback to have the SACK-algorithm in the document.
683	    Clarified the algorithm a bit, and added one paragraph of
684	    description of the basic idea of the algorithm.

686	    * Clarified behavior on multiple timeouts.

688	    * Added a paragraph on acknowledgements that do not acknowledge new
689	    data but are not duplicate acknowledgements

691	    Changes from RFC 4138:

693	    * Removed description of the SACK-enhanced algorithm

695	    * Removed SCTP considerations

697	    * Removed earlier Appendix sections, except Appendix C from RFC
698	    4138, which is now Appendix A

700	    * Clarified text about the possible response algorithms
701	    * Added section that summarizes the evaluation of RFC 4138

703	References

705	Normative References

707	    [APS99]   Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
708	              Control",  RFC 2581, April 1999.

710	    [APB07]   Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
711	              Control", Internet-Draft "draft-ietf-tcpm-
712	    rfc2581bis-03.txt",
713	              September 2007.

715	    [BAFW03]  Blanton, E., Allman, M., Fall, K., and L. Wang, "A
716	              Conservative Selective Acknowledgment (SACK)-based Loss
717	              Recovery Algorithm for TCP", RFC 3517, April 2003.

719	    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
720	              Requirement Levels", BCP 14, RFC 2119, March 1997.

722	    [FHG04]   Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
723	              Modification to TCP's Fast Recovery Algorithm", RFC 3782,
724	              April 2004.

726	    [MMFR96]  Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
727	              Selective Acknowledgement Options", RFC 2018,
728	              October 1996.

730	    [PA00]    Paxson, V. and M. Allman, "Computing TCP's Retransmission
731	              Timer", RFC 2988, November 2000.

733	    [Pos81]   Postel, J., "Transmission Control Protocol", STD 7, RFC
734	              793, September 1981.

736	Informative References

738	    [ABF01]   Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
739	              TCP's Loss Recovery Using Limited Transmit", RFC 3042,
740	              January 2001.

742	    [BA04]    Blanton, E. and M. Allman, "Using TCP Duplicate Selective
743	              Acknowledgement (DSACKs) and Stream Control Transmission
744	              Protocol (SCTP) Duplicate Transmission Sequence Numbers
745	              (TSNs) to Detect Spurious Retransmissions", RFC 3708,
746	              February 2004.

748	    [BBA06]   J. Blanton, E. Blanton, and M. Allman. Using Spurious
749	              Retransmissions to Adapt the Retransmission Timeout,
750	              Internet-Draft "draft-allman-rto-backoff-04.txt", December
751	              2006. Work in progress.

753	    [BBJ92]   Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
754	              for High Performance", RFC 1323, May 1992.

756	    [FMMP00]  Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
757	              Extension to the Selective Acknowledgement (SACK) Option
758	              for TCP", RFC 2883, July 2000.

760	    [GL02]    A. Gurtov and R. Ludwig.  Evaluating the Eifel Algorithm
761	              for TCP in a GPRS Network.  In Proc. of European Wireless,
762	              Florence, Italy, February 2002.

764	    [GL03]    A. Gurtov and R. Ludwig, Responding to Spurious Timeouts
765	              in TCP.  In Proceedings of IEEE INFOCOM 03, San Francisco,
766	              CA, USA, March 2003.

768	    [Jac88]   V. Jacobson. Congestion Avoidance and Control.  In
769	              Proceedings of ACM SIGCOMM 88.

771	    [Hok05]   A. Hokamura, et al. "Performance Evaluation of F-RTO and
772	              Eifel Response Algorithms over W-CDMA packet network".
773	              Wireless Personal Multimedia Communications (WPMC'05),
774	              Sept. 2005.

776	    [KYHS07]  M. Kojo, K. Yamamoto, M. Hata, and P. Sarolahti.
777	              Evaluation of RFC 4138. Internet-draft
778	              "draft-kojo-tcpm-frto-eval-00.txt", June 2007. Work
779	              in progress.

781	    [LG04]    Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm
782	              for TCP", RFC 4015, February 2005.

784	    [LK00]    R. Ludwig and R.H. Katz.  The Eifel Algorithm: Making TCP
785	              Robust Against Spurious Retransmissions.  ACM SIGCOMM
786	              Computer Communication Review, 30(1), January 2000.

788	    [LM03]    Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
789	              for TCP", RFC 3522, April 2003.

791	    [Nag84]   Nagle, J., "Congestion Control in IP/TCP Internetworks",
792	              RFC 896, January 1984.

794	    [SK05]    P. Sarolahti and M. Kojo, "Forward RTO-Recovery (F-RTO):
795	              An Algorithm for Detecting Spurious Retransmission
796	              Timeouts with TCP and the Stream Control Transmission
797	              Protocol (SCTP), RFC 4138, August 2005.

799	    [SKR03]   P. Sarolahti, M. Kojo, and K. Raatikainen. F-RTO: An
800	              Enhanced Recovery Algorithm for TCP Retransmission
801	              Timeouts.  ACM SIGCOMM Computer Communication Review,
802	              33(2), April 2003.

804	    [Sar03]   P. Sarolahti.  Congestion Control on Spurious TCP
805	              Retransmission Timeouts.  In Proceedings of IEEE Globecom
806	              2003, San Francisco, CA, USA. December 2003.

808	    [SL03]    Y. Swami and K. Le, "DCLOR: De-correlated Loss Recovery
809	              using SACK Option for Spurious Timeouts", Expired
810	              Internet-Draft, September 2003.

812	    [Ste00]   R. Stewart, et. al. Stream Control Transmission Protocol,
813	              RFC 2960, October 2000.

815	    [Yam05]   K. Yamamoto, et al. "Effects of F-RTO and Eifel Response
816	              Algorithms for W-CDMA and HSDPA networks". Wireless
817	              Personal Multimedia Communications (WPMC'05),
818	              Sept. 2005.

820	AUTHORS' ADDRESSES

822	    Pasi Sarolahti
823	    Nokia Research Center
824	    P.O. Box 407
825	    FI-00045 NOKIA GROUP
826	    Finland
827	    Phone: +358 50 4876607
828	    Email: pasi.sarolahti@nokia.com

830	    Markku Kojo
831	    University of Helsinki
832	    P.O. Box 68
833	    FI-00014 UNIVERSITY OF HELSINKI
834	    Finland
835	    Email: kojo@cs.helsinki.fi

837	    Kazunori Yamamoto
838	    NTT Docomo, Inc.

840	    3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan
841	    Phone: +81-46-840-3812
842	    Email: yamamotokaz@nttdocomo.co.jp

844	    Max Hata
845	    NTT Docomo, Inc.
846	    3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan
847	    Phone: +81-46-840-3812
848	    Email: hatama@s1.nttdocomo.co.jp

850	Full Copyright Statement

852	    Copyright (C) The IETF Trust (2007).

854	    This document is subject to the rights, licenses and restrictions
855	    contained in BCP 78, and except as set forth therein, the authors
856	    retain all their rights.

858	    This document and the information contained herein are provided on
859	    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
860	    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
861	    IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
862	    WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
863	    WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
864	    ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
865	    FOR A PARTICULAR PURPOSE.

867	Intellectual Property

869	    The IETF takes no position regarding the validity or scope of any
870	    Intellectual Property Rights or other rights that might be claimed
871	    to pertain to the implementation or use of the technology described
872	    in this document or the extent to which any license under such
873	    rights might or might not be available; nor does it represent that
874	    it has made any independent effort to identify any such rights.
875	    Information on the procedures with respect to rights in RFC
876	    documents can be found in BCP 78 and BCP 79.

878	    Copies of IPR disclosures made to the IETF Secretariat and any
879	    assurances of licenses to be made available, or the result of an
880	    attempt made to obtain a general license or permission for the use
881	    of such proprietary rights by implementers or users of this
882	    specification can be obtained from the IETF on-line IPR repository
883	    at http://www.ietf.org/ipr.

885	    The IETF invites any interested party to bring to its attention any
886	    copyrights, patents or patent applications, or other proprietary
887	    rights that may cover technology that may be required to implement
888	    this standard.  Please address the information to the IETF at ietf-
889	    ipr@ietf.org.