idnits 2.17.1 

draft-ietf-tcpm-rfc4138bis-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 19.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 830.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 841.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 848.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 854.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (18 November 2007) is 6002 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC
     5681)

  -- Duplicate reference: RFC2581, mentioned in 'APB07', was also mentioned
     in 'APS99'.

  ** Obsolete normative reference: RFC 2581 (ref. 'APB07') (Obsoleted by RFC
     5681)

  ** Obsolete normative reference: RFC 3517 (ref. 'BAFW03') (Obsoleted by RFC
     6675)

  ** Obsolete normative reference: RFC 3782 (ref. 'FHG04') (Obsoleted by RFC
     6582)

  ** Obsolete normative reference: RFC 2988 (ref. 'PA00') (Obsoleted by RFC
     6298)

  ** Obsolete normative reference: RFC  793 (ref. 'Pos81') (Obsoleted by RFC
     9293)

  -- Obsolete informational reference (is this intentional?): RFC 1323 (ref.
     'BBJ92') (Obsoleted by RFC 7323)

  -- Obsolete informational reference (is this intentional?): RFC  896 (ref.
     'Nag84') (Obsoleted by RFC 7805)

  -- Duplicate reference: RFC4138, mentioned in 'SK05', was also mentioned in
     'KYHS07'.

  -- Obsolete informational reference (is this intentional?): RFC 2960 (ref.
     'Ste00') (Obsoleted by RFC 4960)


     Summary: 8 errors (**), 0 flaws (~~), 2 warnings (==), 12 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                             P. Sarolahti
2	INTERNET-DRAFT                                     Nokia Research Center
3	draft-ietf-tcpm-rfc4138bis-01.txt                                M. Kojo
4	Expires: May 2008                                 University of Helsinki
5	                                                             K. Yamamoto
6	                                                                 M. Hata
7	                                                              NTT Docomo

9	                                                        18 November 2007

11	        Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
12	               Spurious Retransmission Timeouts with TCP

14	Status of this Memo

16	    By submitting this Internet-Draft, each author represents that any
17	    applicable patent or other IPR claims of which he or she is aware
18	    have been or will be disclosed, and any of which he or she becomes
19	    aware will be disclosed, in accordance with Section 6 of BCP 79.

21	    Internet-Drafts are working documents of the Internet Engineering
22	    Task Force (IETF), its areas, and its working groups.  Note that
23	    other groups may also distribute working documents as Internet-
24	    Drafts.

26	    Internet-Drafts are draft documents valid for a maximum of six
27	    months and may be updated, replaced, or obsoleted by other documents
28	    at any time.  It is inappropriate to use Internet-Drafts as
29	    reference material or to cite them other than as "work in progress."

31	    The list of current Internet-Drafts can be accessed at
32	    http://www.ietf.org/ietf/1id-abstracts.txt.

34	    The list of Internet-Draft Shadow Directories can be accessed at
35	    http://www.ietf.org/shadow.html.

37	    This Internet-Draft will expire on May 2008.

39	Abstract

41	    Spurious retransmission timeouts cause suboptimal TCP performance
42	    because they often result in unnecessary retransmission of the last
43	    window of data.  This document describes the F-RTO detection
44	    algorithm for detecting spurious TCP retransmission timeouts.  F-RTO
45	    is a TCP sender-only algorithm that does not require any TCP options
46	    to operate.  After retransmitting the first unacknowledged segment
47	    triggered by a timeout, the F-RTO algorithm of the TCP sender
48	    monitors the incoming acknowledgments to determine whether the
49	    timeout was spurious.  It then decides whether to send new segments
50	    or retransmit unacknowledged segments.  The algorithm effectively
51	    helps to avoid additional unnecessary retransmissions and thereby
52	    improves TCP performance in the case of a spurious timeout.

54	                             Table of Contents

56	    1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . .   3
57	       1.1. Conventions and Terminology. . . . . . . . . . . . . . .   5
58	    2. Basic F-RTO Algorithm . . . . . . . . . . . . . . . . . . . .   5
59	       2.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . .   6
60	       2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . .   7
61	    3. SACK-Enhanced Version of the F-RTO Algorithm. . . . . . . . .   9
62	    4. Taking Actions after Detecting Spurious RTO . . . . . . . . .  11
63	    5. Evaluation of RFC 4138 and Differences to this
64	    Document . . . . . . . . . . . . . . . . . . . . . . . . . . . .  12
65	    6. Security Considerations . . . . . . . . . . . . . . . . . . .  13
66	    7. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . .  14
67	    Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . .  14
68	    A. Discussion of Window-Limited Cases. . . . . . . . . . . . . .  14
69	    B. List of Changes . . . . . . . . . . . . . . . . . . . . . . .  15
70	    References . . . . . . . . . . . . . . . . . . . . . . . . . . .  16
71	    Normative References . . . . . . . . . . . . . . . . . . . . . .  16
72	    Informative References . . . . . . . . . . . . . . . . . . . . .  17
73	    AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . .  18
74	    Full Copyright Statement . . . . . . . . . . . . . . . . . . . .  20
75	    Intellectual Property. . . . . . . . . . . . . . . . . . . . . .  20

77	1.  Introduction

79	    The Transmission Control Protocol (TCP) [Pos81] has two methods for
80	    triggering retransmissions.  First, the TCP sender relies on
81	    incoming duplicate ACKs, which indicate that the receiver is missing
82	    some of the data.  After a required number of successive duplicate
83	    ACKs have arrived at the sender, it retransmits the first
84	    unacknowledged segment [APS99] and continues with a loss recovery
85	    algorithm such as NewReno [FHG04] or SACK-based loss recovery
86	    [BAFW03].  Second, the TCP sender maintains a retransmission timer
87	    which triggers retransmission of segments, if they have not been
88	    acknowledged before the retransmission timeout (RTO) expires.  When
89	    the retransmission timeout occurs, the TCP sender enters the RTO
90	    recovery where the congestion window is initialized to one segment
91	    and unacknowledged segments are retransmitted using the slow-start
92	    algorithm.  The retransmission timer is adjusted dynamically, based
93	    on the measured round-trip times [PA00].

95	    It has been pointed out that the retransmission timer can expire
96	    spuriously and cause unnecessary retransmissions when no segments
97	    have been lost [LK00, GL02, LM03].  After a spurious retransmission
98	    timeout, the late acknowledgments of the original segments arrive at
99	    the sender, usually triggering unnecessary retransmissions of a
100	    whole window of segments during the RTO recovery.  Furthermore,
101	    after a spurious retransmission timeout, a conventional TCP sender
102	    increases the congestion window on each late acknowledgment in slow
103	    start.  This injects a large number of data segments into the
104	    network within one round-trip time, thus violating the packet
105	    conservation principle [Jac88].

107	    There are a number of potential reasons for spurious retransmission
108	    timeouts.  First, some mobile networking technologies involve sudden
109	    delay spikes on transmission because of actions taken during a hand-
110	    off.  Second, a hand-off may take place from a low latency path to a
111	    high latency path, suddenly increasing the round-trip time beyond
112	    the current RTO value.  Third, on a low-bandwidth link the arrival
113	    of competing traffic (possibly with higher priority), or some other
114	    change in available bandwidth, can cause a sudden increase of the
115	    round-trip time.  This may trigger a spurious retransmission
116	    timeout.  A persistently reliable link layer can also cause a sudden
117	    delay when a data frame and several retransmissions of it are lost
118	    for some reason.  This document does not distinguish between the
119	    different causes of such a delay spike.  Rather, it discusses the
120	    spurious retransmission timeouts caused by a delay spike in general.

122	    This document describes the F-RTO detection algorithm.  It is based
123	    on the detection mechanism of the "Forward RTO-Recovery" (F-RTO)
124	    algorithm [SKR03] that is used for detecting spurious retransmission
125	    timeouts and thus avoids unnecessary retransmissions following the
126	    retransmission timeout.  When the timeout is not spurious, the F-RTO
127	    algorithm reverts back to the conventional RTO recovery algorithm,
128	    and therefore has similar behavior and performance.  In contrast to
129	    alternative algorithms proposed for detecting unnecessary
130	    retransmissions (Eifel [LK00], [LM03] and DSACK-based algorithms
131	    [BA04]), F-RTO does not require any TCP options for its operation,
132	    and it can be implemented by modifying only the TCP sender.  The
133	    Eifel algorithm uses TCP timestamps [BBJ92] for detecting a spurious
134	    timeout upon arrival of the first acknowledgment after the
135	    retransmission.  The DSACK-based algorithms require that the TCP
136	    Selective Acknowledgment Option [MMFR96], with the DSACK extension
137	    [FMMP00], is in use.  With DSACK, the TCP receiver can report if it
138	    has received a duplicate segment, enabling the sender to detect
139	    afterwards whether it has retransmitted segments unnecessarily.  The
140	    F-RTO algorithm only attempts to detect and avoid unnecessary
141	    retransmissions after an RTO.  Eifel and DSACK can also be used for
142	    detecting unnecessary retransmissions caused by other events, such
143	    as packet reordering.

145	    When an RTO expires, the F-RTO sender retransmits the first
146	    unacknowledged segment as usual [APS99].  Deviating from the normal
147	    operation after a timeout, it then tries to transmit new, previously
148	    unsent data for the first acknowledgment that arrives after the
149	    timeout, given that the acknowledgment advances the window.  If the
150	    second acknowledgment that arrives after the timeout advances the
151	    window (i.e., acknowledges data that was not retransmitted), the F-
152	    RTO sender declares the timeout spurious and exits the RTO recovery.
153	    However, if either of these two acknowledgments is a duplicate ACK,
154	    there will not be sufficient evidence of a spurious timeout.
155	    Therefore, the F-RTO sender retransmits the unacknowledged segments
156	    in slow start similarly to the traditional algorithm.

158	    With a SACK-enhanced version of the F-RTO algorithm, spurious
159	    timeouts may be detected even if duplicate ACKs arrive after an RTO
160	    retransmission.  Even though this document only specifies F-RTO
161	    algorithm for TCP, the algorithm can also be applied to the Stream
162	    Control Transmission Protocol (SCTP) [Ste00] that has acknowledgment
163	    and packet retransmission concepts similar to TCP. Considerations on
164	    applying F-RTO for SCTP are discussed in RFC 4138 [SK05].

166	    This document is organized as follows.  Section 2 describes the
167	    basic F-RTO algorithm, and the SACK-enhanced F-RTO algorithm is
168	    given in Section 3.  Section 4 discusses the possible actions to be
169	    taken after detecting a spurious RTO and Section 5 discusses the
170	    security considerations.

172	1.1.  Conventions and Terminology

174	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
175	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
176	    document are to be interpreted as described in BCP 14, RFC 2119
177	    [RFC2119] and indicate requirement levels for protocols.

179	2.  Basic F-RTO Algorithm

181	    A timeout is considered spurious if it would have been avoided had
182	    the sender waited longer for an acknowledgment to arrive [LM03].  F-
183	    RTO affects the TCP sender behavior only after a retransmission
184	    timeout.  Otherwise, the TCP behavior remains the same.  When the
185	    RTO expires, the F-RTO algorithm monitors incoming acknowledgments
186	    and if the TCP sender gets an acknowledgment for a segment that was
187	    not retransmitted due to timeout, the F-RTO algorithm declares a
188	    timeout spurious.  The actions taken in response to a spurious
189	    timeout are not specified in this document, but we discuss some
190	    alternatives in Section 4.  This section introduces the algorithm
191	    and then discusses the different steps of the algorithm in more
192	    detail.

194	    Following the practice used with the Eifel Detection algorithm

196	    [LM03], we use the "SpuriousRecovery" variable to indicate whether
197	    the retransmission is declared spurious by the sender.  This
198	    variable can be used as an input for a corresponding response
199	    algorithm.  With F-RTO, the value of SpuriousRecovery can be either
200	    SPUR_TO (indicating a spurious retransmission timeout) or FALSE
201	    (indicating that the timeout is not declared spurious), and the TCP
202	    sender should follow the conventional RTO recovery algorithm.

204	2.1.  The Algorithm

206	    A TCP sender implementing the basic F-RTO algorithm MUST take the
207	    following steps after the retransmission timer expires.  If the
208	    retransmission timer expires again during the execution of the F-RTO
209	    algorithm, the TCP sender MUST re-start the algorithm processing
210	    from step 1.  If the sender implements some loss recovery algorithm
211	    other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT
212	    be entered when earlier fast recovery is underway.

214	    The F-RTO algorithm takes different actions based on whether an
215	    incoming acknowledgement advances the cumulative acknowledgement
216	    point for an received in-order segment, or whether it is a duplicate
217	    acknowledgement to indicate an out-of-order segment. Duplicate
218	    acknowledgement is defined in [APB07]. The F-RTO algorithm does not
219	    specify actions for receiving a segment that does not acknowledge
220	    new data but is not a duplicate acknowledgement. The TCP sender
221	    SHOULD ignore such segments and wait for a segment that either
222	    acknowledges new data or is a duplicate acknowledgment.

224	    1) When RTO expires, retransmit the first unacknowledged segment and
225	       set SpuriousRecovery to FALSE.  Also, store the highest sequence
226	       number transmitted so far in variable "recover".

228	    2) When the first acknowledgment after the RTO retransmission
229	       arrives at the TCP sender, the TCP sender chooses one of the
230	       following actions, depending on whether the ACK advances the
231	       window or whether it is a duplicate ACK.

233	       a) If the acknowledgment is a duplicate ACK OR the
234	          Acknowledgement field covers "recover" but not more than
235	          "recover" OR the acknowledgment does not acknowledge all of
236	          the data that was retransmitted in step 1, revert to the
237	          conventional RTO recovery and continue by retransmitting
238	          unacknowledged data in slow start.  Do not enter step 3 of
239	          this algorithm.  The SpuriousRecovery variable remains as
240	          FALSE.

242	       b) Else, if the acknowledgment advances the window AND the
243	          Acknowledgement field does not cover "recover", transmit up to
244	          two new (previously unsent) segments and enter step 3 of this
245	          algorithm. If the TCP sender does not have enough unsent data,
246	          it can send only one segment. In addition, the TCP sender MAY
247	          override the Nagle algorithm [Nag84] and immediately send a
248	          segment if needed. Note that sending two segments in this step
249	          is allowed by TCP congestion control requirements [APS99]: An
250	          F-RTO TCP sender simply chooses different segments to
251	          transmit.

253	          If the TCP sender does not have any new data to send, or the
254	          advertised window prohibits new transmissions, the recommended
255	          action is to skip step 3 of this algorithm and continue with
256	          slow start retransmissions, following the conventional RTO
257	          recovery algorithm.  However, alternative ways of handling the
258	          window-limited cases that could result in better performance
259	          are discussed in Appendix A.

261	    3) When the second acknowledgment after the RTO retransmission
262	       arrives at the TCP sender, the TCP sender either declares the
263	       timeout spurious, or starts retransmitting the unacknowledged
264	       segments.

266	       a) If the acknowledgment is a duplicate ACK, set the congestion
267	          window to no more than 3 * MSS, and continue with the slow
268	          start algorithm retransmitting unacknowledged segments.  The
269	          congestion window can be set to 3 * MSS, because two round-
270	          trip times have elapsed since the RTO, and a conventional TCP
271	          sender would have increased cwnd to 3 during the same time.
272	          Leave SpuriousRecovery set to FALSE.

274	       b) If the acknowledgment advances the window (i.e., if it
275	          acknowledges data that was not retransmitted after the
276	          timeout), declare the timeout spurious, set SpuriousRecovery
277	          to SPUR_TO, and set the value of the "recover" variable to
278	          SND.UNA (the oldest unacknowledged sequence number [Pos81]).

280	2.2.  Discussion

282	    The F-RTO sender takes cautious actions when it receives duplicate
283	    acknowledgments after a retransmission timeout.  Because duplicate
284	    ACKs may indicate that segments have been lost, reliably detecting a
285	    spurious timeout is difficult due to the lack of additional
286	    information.  Therefore, it is prudent to follow the conventional
287	    TCP recovery in those cases.

289	    If the first acknowledgment after the RTO retransmission covers the
290	    "recover" point at algorithm step (2a), there is not enough evidence
291	    that a non-retransmitted segment has arrived at the receiver after
292	    the timeout.  This is a common case when a fast retransmission is
293	    lost and has been retransmitted again after an RTO, while the rest
294	    of the unacknowledged segments were successfully delivered to the
295	    TCP receiver before the retransmission timeout.  Therefore, the
296	    timeout cannot be declared spurious in this case.

298	    If the first acknowledgment after the RTO retransmission does not
299	    acknowledge all of the data that was retransmitted in step 1, the
300	    TCP sender reverts to the conventional RTO recovery.  Otherwise, a
301	    malicious receiver acknowledging partial segments could cause the
302	    sender to declare the timeout spurious in a case where data was
303	    lost.

305	    The TCP sender is allowed to send two new segments in algorithm
306	    branch (2b) because the conventional TCP sender would transmit two
307	    segments when the first new ACK arrives after the RTO
308	    retransmission.  If sending new data is not possible in algorithm
309	    branch (2b), or if the receiver window limits the transmission, the
310	    TCP sender has to send something in order to prevent the TCP
311	    transfer from stalling.  If no segments were sent, the pipe between
312	    sender and receiver might run out of segments, and no further
313	    acknowledgments would arrive.  Therefore, in the window-limited
314	    case, the recommendation is to revert to the conventional RTO
315	    recovery with slow start retransmissions.  Appendix A discusses some
316	    alternative solutions for window-limited situations.

318	    If the retransmission timeout is declared spurious, the TCP sender
319	    sets the value of the "recover" variable to SND.UNA in order to
320	    allow fast retransmit [FHG04].  The "recover" variable was proposed
321	    for avoiding unnecessary, multiple fast retransmits when RTO expires
322	    during fast recovery with NewReno TCP.  Because the F-RTO sender
323	    retransmits only the segment that triggered the timeout, the problem
324	    of unnecessary multiple fast retransmits [FHG04] cannot occur.
325	    Therefore, if three duplicate ACKs arrive at the sender after the
326	    timeout, they probably indicate a packet loss, and thus fast
327	    retransmit should be used to allow efficient recovery.  If there are
328	    not enough duplicate ACKs arriving at the sender after a packet
329	    loss, the retransmission timer expires again and the sender enters
330	    step 1 of this algorithm.

332	    When the timeout is declared spurious, the TCP sender cannot detect
333	    whether the unnecessary RTO retransmission was lost.  In principle,
334	    the loss of the RTO retransmission should be taken as a congestion
335	    signal.  Thus, there is a small possibility that the F-RTO sender
336	    will violate the congestion control rules, if it chooses to fully
337	    revert congestion control parameters after detecting a spurious
338	    timeout.  The Eifel detection algorithm has a similar property,
339	    while the DSACK option can be used to detect whether the
340	    retransmitted segment was successfully delivered to the receiver.

342	    The F-RTO algorithm has a side-effect on the TCP round-trip time
343	    measurement.  Because the TCP sender can avoid most of the
344	    unnecessary retransmissions after detecting a spurious timeout, the
345	    sender is able to take round-trip time samples on the delayed
346	    segments.  If the regular RTO recovery was used without TCP
347	    timestamps, this would not be possible due to the retransmission
348	    ambiguity.  As a result, the RTO is likely to have more accurate and
349	    larger values with F-RTO than with the regular TCP after a spurious
350	    timeout that was triggered due to delayed segments.  We believe this
351	    is an advantage in networks that are prone to delay spikes.

353	    There are some situations where the F-RTO algorithm may not avoid
354	    unnecessary retransmissions after a spurious timeout.  If packet
355	    reordering or packet duplication occurs on the segment that
356	    triggered the spurious timeout, the F-RTO algorithm may not detect
357	    the spurious timeout due to incoming duplicate ACKs.  Additionally,
358	    if a spurious timeout occurs during fast recovery, the F-RTO
359	    algorithm often cannot detect the spurious timeout because the
360	    segments that were transmitted before the fast recovery trigger
361	    duplicate ACKs.  However, we consider these cases rare, and note
362	    that in cases where F-RTO fails to detect the spurious timeout, it
363	    retransmits the unacknowledged segments in slow start, and thus
364	    performs similarly to the regular RTO recovery.

366	3.  SACK-Enhanced Version of the F-RTO Algorithm

368	    This section describes an alternative version of the F-RTO algorithm
369	    that uses the TCP Selective Acknowledgment Option [MMFR96].  By
370	    using the SACK option, the TCP sender detects spurious timeouts in
371	    most of the cases when packet reordering or packet duplication is
372	    present.  If the SACK blocks acknowledge new data that was not
373	    transmitted after the RTO retransmission, the sender may declare the
374	    timeout spurious, even when duplicate ACKs follow the RTO.

376	    Given that the TCP Selective Acknowledgment Option [MMFR96] is
377	    enabled for a TCP connection, a TCP sender MAY implement the SACK-
378	    enhanced F-RTO algorithm.  If the sender applies the SACK-enhanced
379	    F-RTO algorithm, it MUST follow the steps below.  This algorithm
380	    SHOULD NOT be applied if the TCP sender is already in SACK loss
381	    recovery when retransmission timeout occurs.

383	    The steps of the SACK-enhanced version of the F-RTO algorithm are as
384	    follows.  If the retransmission timer expires again during the
385	    execution of the SACK-enhanced F-RTO algorithm, the TCP sender MUST
386	    re-start the algorithm processing from step 1.

388	    1) When the RTO expires, retransmit the first unacknowledged segment
389	       and set SpuriousRecovery to FALSE. Set variable "RecoveryPoint"
390	       to indicate the highest segment transmitted so far. Following the
391	       recommendation in SACK specification [MMFR96], reset the SACK
392	       scoreboard.

394	    2) Wait until the acknowledgment of the data retransmitted due to
395	       the timeout arrives at the sender.  If duplicate ACKs arrive
396	       before the cumulative acknowledgment for retransmitted data,
397	       adjust the scoreboard according to the incoming SACK information.
398	       Stay in step 2 and wait for the next new acknowledgment.  If RTO
399	       expires again, go to step 1 of the algorithm.

401	       a) if the Cumulative Acknowledgement field covers "RecoveryPoint"
402	          but not more than "RecoveryPoint", revert to the conventional
403	          RTO recovery and set the congestion window to no more than 2 *
404	          MSS, like a regular TCP would do. Do not enter step 3 of this
405	          algorithm.

407	       b) else, if the Cumulative Acknowledgement field does not cover
408	          "RecoveryPoint" but is larger than SND.UNA, transmit up to two
409	          new (previously unsent) segments and proceed to step 3.  If
410	          the TCP sender is not able to transmit any previously unsent
411	          data -- either due to receiver window limitation or because it
412	          does not have any new data to send -- the recommended action
413	          is to refrain from entering step 3 of this algorithm.  Rather,
414	          continue with slow start retransmissions following the
415	          conventional RTO recovery algorithm.

417	          It is also possible to apply some of the alternatives for
418	          handling window-limited cases discussed in Appendix A.

420	    3) The next acknowledgment arrives at the sender.  Either a
421	       duplicate ACK or a new cumulative ACK (advancing the window)
422	       applies in this step. Other types of ACKs are ignored without any
423	       action.

425	       a) if the Cumulative Acknowledgement field or a SACK block covers
426	          more than "RecoveryPoint", set the congestion window to no
427	          more than 3 * MSS and proceed with the conventional RTO
428	          recovery, retransmitting unacknowledged segments.  Take this
429	          branch also when the acknowledgment is a duplicate ACK and it
430	          does not acknowledge any new, previously unacknowledged data
431	          below "RecoveryPoint" in the SACK blocks.  Leave
432	          SpuriousRecovery set to FALSE.

434	       b) if the Cumulative Acknowledgement field or a SACK block in the
435	          ACK does not cover more than "RecoveryPoint" AND it
436	          acknowledges data that was not acknowledged earlier (either
437	          with cumulative acknowledgment or using SACK blocks), declare
438	          the timeout spurious and set SpuriousRecovery to SPUR_TO.  The
439	          retransmission timeout can be declared spurious, because the
440	          segment acknowledged with this ACK was transmitted before the
441	          timeout.

443	    If there are unacknowledged holes between the received SACK blocks,
444	    those segments are retransmitted similarly to the conventional SACK
445	    recovery algorithm [BAFW03].  If the algorithm exits with
446	    SpuriousRecovery set to SPUR_TO, "RecoveryPoint" is set to SND.UNA,
447	    thus allowing fast recovery on incoming duplicate acknowledgments.

449	    The SACK enhanced algorithm works on the same principle as the basic
450	    algorithm, but by utilizing the additional information from the SACK
451	    option. When a genuine retransmission timeout occurs during a steady
452	    state of a connection, it can be assumed that there are no segments
453	    left in the pipe. Otherwise, the acknowledgments triggered by these
454	    segments would have triggered the SACK loss recovery or transmission
455	    of new segments. Therefore, if the F-RTO sender receives
456	    acknowledgements for segments transmitted before the retransmission
457	    timeout in response to the two new segments sent at the algorithm
458	    step 2, the normal operation of TCP has been just delayed, and the
459	    retransmission timeout is considered spurious. Note that this
460	    reasoning works only when the TCP sender is not in SACK loss
461	    recovery at the time the retransmission timeout occurs.

463	4.  Taking Actions after Detecting Spurious RTO

465	    Upon a retransmission timeout, a conventional TCP sender assumes
466	    that outstanding segments are lost and starts retransmitting the
467	    unacknowledged segments.  When the retransmission timeout is
468	    detected to be spurious, the TCP sender should not continue
469	    retransmitting based on the timeout.  For example, if the sender was
470	    in congestion avoidance phase transmitting new, previously unsent
471	    segments, it should continue transmitting previously unsent segments
472	    in congestion avoidance.

474	    There are currently two alternatives specified for a spurious
475	    timeout response algorithm, the Eifel Response Algorithm [LG04], and
476	    an algorithm for adapting the retransmission timeout after a
477	    spurious RTO [BBA06]. If no specific response algorithm is
478	    implemented, the TCP SHOULD respond to spurious timeout
479	    conservatively, applying the TCP congestion control specification
480	    [APS99]. Different response algorithms for spurious retransmission
481	    timeouts have been analyzed in some research papers [GL03, Sar03]
482	    and IETF documents [SL03].

484	5.  Evaluation of RFC 4138 and Differences to this Document

486	    F-RTO was first specified in an Experimental RFC 4138 that has been
487	    implemented in a number of operating systems since it was published.
488	    Gained experience has been documented in a separate document
489	    [KYHS07], and can be summarized as follows.

491	    If the TCP sender employs F-RTO, it is able to detect spurious RTOs
492	    and avoid the unnecessary retransmission of the whole window of
493	    data. Because F-RTO avoids the unnecessary retransmissions after a
494	    spurious RTO, it is able to adhere to the packet conservation
495	    principle, unlike a regular TCP that enters the slow-start recovery
496	    unnecessarily an inappropriately restarts the ACK clock while there
497	    are segments outstanding in the network. When a spurious RTO has
498	    been detected, a sender can select an appropriate congestion control
499	    response instead of setting the congestion window to one segment.
500	    Because F-RTO avoids unnecessary retransmissions, it is able to take
501	    the RTT of the delayed segments into account when calculating the
502	    RTO estimate, which may help in avoiding further spurious
503	    retransmission timeouts.

505	    Experimental results with the basic F-RTO have been reported in an
506	    emulated network using a Linux implementation [SKR03]. Also
507	    different congestion control responses along with the SACK-enhanced
508	    version of F-RTO were tested in a similar environment [Sar03]. There
509	    are publications analyzing F-RTO performance over commercial W-CDMA
510	    networks, and in an emulated HSDPA network [Yam05, Hok05].  Also
511	    Microsoft reported positive experiences with their implementation of
512	    F-RTO in the IETF-68 meeting.

514	    It is known that some spurious RTOs may remain undetected by F-RTO
515	    if duplicate acknowledgements arrive at the sender immediately after
516	    the spurious RTO, for example due to packet reordering or packet
517	    loss. There are rare corner cases where F-RTO could "hide" a packet
518	    loss and therefore lead to inappropriate behavior with non-
519	    conservative congestion control response: first, if a massive packet
520	    reordering occurred so that the acknowledgement of RTO
521	    retransmission arrived at the sender before the acknowledgments of
522	    original transmissions, the sender might not detect the loss of the
523	    segment that triggered the RTO. Second, a malicious receiver could
524	    lead F-RTO to make a wrong conclusion after an RTO by acknowledging
525	    segments it has not received. Such receiver would, however, risk
526	    breaking the consistency of the TCP state between the sender and
527	    receiver, causing the connection to become unusable, which cannot be
528	    of any benefit to the receiver. Therefore we believe it is not
529	    likely that receivers would start employing such tricks in a
530	    significant scale. Finally, loss of the unnecessary RTO
531	    retransmission cannot be detected without using some explicit
532	    acknowledgement scheme such as DSACK. This is common to the other
533	    mechanisms for detecting spurious RTO, as well as to regular TCP
534	    that does not use DSACK. We note that if the congestion control
535	    response to spurious RTO is conservative enough, the above corner
536	    cases do not cause problems due to increased congestion.

538	6.  Security Considerations

540	    The main security threat regarding F-RTO is the possibility that a
541	    receiver could mislead the sender into setting too large a
542	    congestion window after an RTO.  There are two possible ways a
543	    malicious receiver could trigger a wrong output from the F-RTO
544	    algorithm.  First, the receiver can acknowledge data that it has not
545	    received.  Second, it can delay acknowledgment of a segment it has
546	    received earlier, and acknowledge the segment after the TCP sender
547	    has been deluded to enter algorithm step 3.

549	    If the receiver acknowledges a segment it has not really received,
550	    the sender can be led to declare spurious timeout in the F-RTO
551	    algorithm, step 3.  However, because the sender will have an
552	    incorrect state, it cannot retransmit the segment that has never
553	    reached the receiver.  Therefore, this attack is unlikely to be
554	    useful for the receiver to maliciously gain a larger congestion
555	    window.

557	    A common case for a retransmission timeout is that a fast
558	    retransmission of a segment is lost.  If all other segments have
559	    been received, the RTO retransmission causes the whole window to be
560	    acknowledged at once.  This case is recognized in F-RTO algorithm
561	    branch (2a).  However, if the receiver only acknowledges one segment
562	    after receiving the RTO retransmission, and then the rest of the
563	    segments, it could cause the timeout to be declared spurious when it
564	    is not.  Therefore, it is suggested that, when an RTO expires during
565	    the fast recovery phase, the sender would not fully revert the
566	    congestion window even if the timeout was declared spurious.
567	    Instead, the sender would reduce the congestion window to 1.

569	    If there is more than one segment missing at the time of a
570	    retransmission timeout, the receiver does not benefit from
571	    misleading the sender to declare a spurious timeout because the
572	    sender would have to go through another recovery period to
573	    retransmit the missing segments, usually after an RTO has elapsed.

575	7.  Acknowledgements

577	    The authors would like to thank Alfred Hoenes and Ilpo Jarvinen for
578	    the comments on this document.

580	    We are also thankful to Reiner Ludwig, Andrei Gurtov, Josh Blanton,
581	    Mark Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias
582	    Rodriguez, Sourabh Ladha, Martin Duke, Motoharu Miyake, Ted Faber,
583	    Samu Kontinen, and Kostas Pentikousis who gave valuable feedback
584	    during the preparation of RFC 4138, the precursor of this document.

586	Appendix

588	A.  Discussion of Window-Limited Cases

590	    When the advertised window limits the transmission of two new
591	    previously unsent segments, or there are no new data to send, it is
592	    recommended in F-RTO algorithm step (2b) that the TCP sender
593	    continue with the conventional RTO recovery algorithm.  The
594	    disadvantage is that the sender may continue unnecessary
595	    retransmissions due to possible spurious timeout.  This section
596	    briefly discusses the options that can potentially improve
597	    performance when transmitting previously unsent data is not
598	    possible.

600	    - The TCP sender could reserve an unused space of a size of one or
601	      two segments in the advertised window to ensure the use of
602	      algorithms such as F-RTO or Limited Transmit [ABF01] in receiver
603	      window-limited situations.  On the other hand, while doing this,
604	      the TCP sender should ensure that the window of outstanding
605	      segments is large enough for proper utilization of the available
606	      pipe.

608	    - Use additional information if available, e.g., TCP timestamps with
609	      the Eifel Detection algorithm, for detecting a spurious timeout.
610	      However, Eifel detection may yield different results from F-RTO
611	      when ACK losses and an RTO occur within the same round-trip time

613	      [SKR03].

615	    - Retransmit data from the tail of the retransmission queue and
616	      continue with step 3 of the F-RTO algorithm.  It is possible that
617	      the retransmission will be made unnecessarily. Furthermore, the
618	      operation of the SACK-based F-RTO algorithm would need to consider
619	      this case separately, to not use the retransmitted segment to
620	      indicate spurious timeout. Given these considerations, this option
621	      is not recommended.

623	    - Send a zero-sized segment below SND.UNA, similar to TCP Keep-Alive
624	      probe, and continue with step 3 of the F-RTO algorithm.  Because
625	      the receiver replies with a duplicate ACK, the sender is able to
626	      detect whether the timeout was spurious from the incoming
627	      acknowledgment.  This method does not send data unnecessarily, but
628	      it delays the recovery by one round-trip time in cases where the
629	      timeout was not spurious.  Therefore, this method is not
630	      encouraged.

632	    - In receiver-limited cases, send one octet of new data, regardless
633	      of the advertised window limit, and continue with step 3 of the F-
634	      RTO algorithm.  It is possible that the receiver will have free
635	      buffer space to receive the data by the time the segment has
636	      propagated through the network, in which case no harm is done.  If
637	      the receiver is not capable of receiving the segment, it rejects
638	      the segment and sends a duplicate ACK.

640	B.  List of Changes

642	    Changes between different document versions are summarized below,
643	    apart from minor editing and language improvements.

645	    Changes from draft-ietf-tcpm-rfc4138bis-00:

647	    * Added back the original SACK-algorithm from RFC 4138 after the
648	    common feedback to have the SACK-algorithm in the document.
649	    Clarified the algorithm a bit, and added one paragraph of
650	    description of the basic idea of the algorithm.

652	    * Clarified behavior on multiple timeouts.

654	    * Added a paragraph on acknowledgements that do not acknowledge new
655	    data but are not duplicate acknowledgements

657	    Changes from RFC 4138:

659	    * Removed description of the SACK-enhanced algorithm

661	    * Removed SCTP considerations

663	    * Removed earlier Appendix sections, except Appendix C from RFC
664	    4138, which is now Appendix A

666	    * Clarified text about the possible response algorithms

668	    * Added section that summarizes the evaluation of RFC 4138

670	References

672	Normative References

674	    [APS99]   Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
675	              Control",  RFC 2581, April 1999.

677	    [APB07]   Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
678	              Control", Internet-Draft "draft-ietf-tcpm-
679	    rfc2581bis-03.txt",
680	              September 2007.

682	    [BAFW03]  Blanton, E., Allman, M., Fall, K., and L. Wang, "A
683	              Conservative Selective Acknowledgment (SACK)-based Loss
684	              Recovery Algorithm for TCP", RFC 3517, April 2003.

686	    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
687	              Requirement Levels", BCP 14, RFC 2119, March 1997.

689	    [FHG04]   Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
690	              Modification to TCP's Fast Recovery Algorithm", RFC 3782,
691	              April 2004.

693	    [MMFR96]  Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
694	              Selective Acknowledgement Options", RFC 2018,
695	              October 1996.

697	    [PA00]    Paxson, V. and M. Allman, "Computing TCP's Retransmission
698	              Timer", RFC 2988, November 2000.

700	    [Pos81]   Postel, J., "Transmission Control Protocol", STD 7, RFC
701	              793, September 1981.

703	Informative References

705	    [ABF01]   Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
706	              TCP's Loss Recovery Using Limited Transmit", RFC 3042,
707	              January 2001.

709	    [BA04]    Blanton, E. and M. Allman, "Using TCP Duplicate Selective
710	              Acknowledgement (DSACKs) and Stream Control Transmission
711	              Protocol (SCTP) Duplicate Transmission Sequence Numbers
712	              (TSNs) to Detect Spurious Retransmissions", RFC 3708,
713	              February 2004.

715	    [BBA06]   J. Blanton, E. Blanton, and M. Allman. Using Spurious
716	              Retransmissions to Adapt the Retransmission Timeout,
717	              Internet-Draft "draft-allman-rto-backoff-04.txt", December
718	              2006. Work in progress.

720	    [BBJ92]   Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
721	              for High Performance", RFC 1323, May 1992.

723	    [FMMP00]  Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
724	              Extension to the Selective Acknowledgement (SACK) Option
725	              for TCP", RFC 2883, July 2000.

727	    [GL02]    A. Gurtov and R. Ludwig.  Evaluating the Eifel Algorithm
728	              for TCP in a GPRS Network.  In Proc. of European Wireless,
729	              Florence, Italy, February 2002.

731	    [GL03]    A. Gurtov and R. Ludwig, Responding to Spurious Timeouts
732	              in TCP.  In Proceedings of IEEE INFOCOM 03, San Francisco,
733	              CA, USA, March 2003.

735	    [Jac88]   V. Jacobson. Congestion Avoidance and Control.  In
736	              Proceedings of ACM SIGCOMM 88.

738	    [Hok05]   A. Hokamura, et al. "Performance Evaluation of F-RTO and
739	              Eifel Response Algorithms over W-CDMA packet network".
740	              Wireless Personal Multimedia Communications (WPMC'05),
741	              Sept. 2005.

743	    [KYHS07]  M. Kojo, K. Yamamoto, M. Hata, and P. Sarolahti.
744	              Evaluation of RFC 4138.
745	              Internet-draft "draft-kojo-tcpm-frto-eval-00.txt",
746	              June 2007. Work in progress.

748	    [LG04]    Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm
749	              for TCP", RFC 4015, February 2005.

751	    [LK00]    R. Ludwig and R.H. Katz.  The Eifel Algorithm: Making TCP
752	              Robust Against Spurious Retransmissions.  ACM SIGCOMM
753	              Computer Communication Review, 30(1), January 2000.

755	    [LM03]    Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
756	              for TCP", RFC 3522, April 2003.

758	    [Nag84]   Nagle, J., "Congestion Control in IP/TCP Internetworks",
759	              RFC 896, January 1984.

761	    [SK05]    P. Sarolahti and M. Kojo, "Forward RTO-Recovery (F-RTO):
762	              An Algorithm for Detecting Spurious Retransmission
763	              Timeouts with TCP and the Stream Control Transmission
764	              Protocol (SCTP), RFC 4138, August 2005.

766	    [SKR03]   P. Sarolahti, M. Kojo, and K. Raatikainen. F-RTO: An
767	              Enhanced Recovery Algorithm for TCP Retransmission
768	              Timeouts.  ACM SIGCOMM Computer Communication Review,
769	              33(2), April 2003.

771	    [Sar03]   P. Sarolahti.  Congestion Control on Spurious TCP
772	              Retransmission Timeouts.  In Proceedings of IEEE Globecom
773	              2003, San Francisco, CA, USA. December 2003.

775	    [SL03]    Y. Swami and K. Le, "DCLOR: De-correlated Loss Recovery
776	              using SACK Option for Spurious Timeouts", Expired
777	              Internet-Draft, September 2003.

779	    [Ste00]   R. Stewart, et. al. Stream Control Transmission Protocol,
780	              RFC 2960, October 2000.

782	    [Yam05]   K. Yamamoto, et al. "Effects of F-RTO and Eifel Response
783	              Algorithms for W-CDMA and HSDPA networks". Wireless
784	              Personal Multimedia Communications (WPMC'05),
785	              Sept. 2005.

787	AUTHORS' ADDRESSES

789	    Pasi Sarolahti
790	    Nokia Research Center
791	    P.O. Box 407
792	    FI-00045 NOKIA GROUP
793	    Finland
794	    Phone: +358 50 4876607
795	    Email: pasi.sarolahti@nokia.com
796	    Markku Kojo
797	    University of Helsinki
798	    P.O. Box 68
799	    FI-00014 UNIVERSITY OF HELSINKI
800	    Finland
801	    Email: kojo@cs.helsinki.fi

803	    Kazunori Yamamoto
804	    NTT Docomo, Inc.
805	    3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan
806	    Phone: +81-46-840-3812
807	    Email: yamamotokaz@nttdocomo.co.jp

809	    Max Hata
810	    NTT Docomo, Inc.
811	    3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan
812	    Phone: +81-46-840-3812
813	    Email: hatama@s1.nttdocomo.co.jp

815	Full Copyright Statement

817	    Copyright (C) The IETF Trust (2007).

819	    This document is subject to the rights, licenses and restrictions
820	    contained in BCP 78, and except as set forth therein, the authors
821	    retain all their rights.

823	    This document and the information contained herein are provided on
824	    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
825	    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
826	    IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
827	    WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
828	    WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
829	    ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
830	    FOR A PARTICULAR PURPOSE.

832	Intellectual Property

834	    The IETF takes no position regarding the validity or scope of any
835	    Intellectual Property Rights or other rights that might be claimed
836	    to pertain to the implementation or use of the technology described
837	    in this document or the extent to which any license under such
838	    rights might or might not be available; nor does it represent that
839	    it has made any independent effort to identify any such rights.
840	    Information on the procedures with respect to rights in RFC
841	    documents can be found in BCP 78 and BCP 79.

843	    Copies of IPR disclosures made to the IETF Secretariat and any
844	    assurances of licenses to be made available, or the result of an
845	    attempt made to obtain a general license or permission for the use
846	    of such proprietary rights by implementers or users of this
847	    specification can be obtained from the IETF on-line IPR repository
848	    at http://www.ietf.org/ipr.

850	    The IETF invites any interested party to bring to its attention any
851	    copyrights, patents or patent applications, or other proprietary
852	    rights that may cover technology that may be required to implement
853	    this standard.  Please address the information to the IETF at ietf-
854	    ipr@ietf.org.