idnits 2.17.1 

draft-ietf-tcpm-rfc4138bis-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 19.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 861.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 872.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 879.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 885.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (9 September 2008) is 5708 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 2581 (ref. 'APS99') (Obsoleted by RFC
     5681)

  -- Duplicate reference: RFC2581, mentioned in 'APB08', was also mentioned
     in 'APS99'.

  ** Obsolete normative reference: RFC 2581 (ref. 'APB08') (Obsoleted by RFC
     5681)

  ** Obsolete normative reference: RFC 3517 (ref. 'BAFW03') (Obsoleted by RFC
     6675)

  ** Obsolete normative reference: RFC 3782 (ref. 'FHG04') (Obsoleted by RFC
     6582)

  ** Obsolete normative reference: RFC 2988 (ref. 'PA00') (Obsoleted by RFC
     6298)

  ** Obsolete normative reference: RFC  793 (ref. 'Pos81') (Obsoleted by RFC
     9293)

  -- Obsolete informational reference (is this intentional?): RFC 1323 (ref.
     'BBJ92') (Obsoleted by RFC 7323)

  -- Obsolete informational reference (is this intentional?): RFC  896 (ref.
     'Nag84') (Obsoleted by RFC 7805)

  -- Duplicate reference: RFC4138, mentioned in 'SK05', was also mentioned in
     'KYHS07'.

  -- Obsolete informational reference (is this intentional?): RFC 4960 (ref.
     'Ste07') (Obsoleted by RFC 9260)


     Summary: 7 errors (**), 0 flaws (~~), 1 warning (==), 12 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                             P. Sarolahti
2	INTERNET-DRAFT                                     Nokia Research Center
3	draft-ietf-tcpm-rfc4138bis-03.txt                                M. Kojo
4	Intended status: Proposed Standard                University of Helsinki
5	Expires: March 2009                                          K. Yamamoto
6	                                                                 M. Hata
7	                                                              NTT Docomo

9	                                                        9 September 2008

11	        Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
12	               Spurious Retransmission Timeouts with TCP

14	Status of this Memo

16	    By submitting this Internet-Draft, each author represents that any
17	    applicable patent or other IPR claims of which he or she is aware
18	    have been or will be disclosed, and any of which he or she becomes
19	    aware will be disclosed, in accordance with Section 6 of BCP 79.

21	    Internet-Drafts are working documents of the Internet Engineering
22	    Task Force (IETF), its areas, and its working groups.  Note that
23	    other groups may also distribute working documents as Internet-
24	    Drafts.

26	    Internet-Drafts are draft documents valid for a maximum of six
27	    months and may be updated, replaced, or obsoleted by other documents
28	    at any time.  It is inappropriate to use Internet-Drafts as
29	    reference material or to cite them other than as "work in progress."

31	    The list of current Internet-Drafts can be accessed at
32	    http://www.ietf.org/ietf/1id-abstracts.txt.

34	    The list of Internet-Draft Shadow Directories can be accessed at
35	    http://www.ietf.org/shadow.html.

37	    This Internet-Draft will expire on March 2009.

39	Abstract

41	    Spurious retransmission timeouts cause suboptimal TCP performance
42	    because they often result in unnecessary retransmission of the last
43	    window of data.  This document describes the F-RTO detection
44	    algorithm for detecting spurious TCP retransmission timeouts.  F-RTO
45	    is a TCP sender-only algorithm that does not require any TCP options
46	    to operate.  After retransmitting the first unacknowledged segment
47	    triggered by a timeout, the F-RTO algorithm of the TCP sender
48	    monitors the incoming acknowledgments to determine whether the
49	    timeout was spurious.  It then decides whether to send new segments
50	    or retransmit unacknowledged segments.  The algorithm effectively
51	    helps to avoid additional unnecessary retransmissions and thereby
52	    improves TCP performance in the case of a spurious timeout.

54	                             Table of Contents

56	    1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . .   3
57	       1.1. Conventions and Terminology. . . . . . . . . . . . . . .   5
58	    2. Basic F-RTO Algorithm . . . . . . . . . . . . . . . . . . . .   5
59	       2.1. The Algorithm. . . . . . . . . . . . . . . . . . . . . .   6
60	       2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . .   8
61	    3. SACK-Enhanced Version of the F-RTO Algorithm. . . . . . . . .  10
62	    4. Taking Actions after Detecting Spurious RTO . . . . . . . . .  12
63	    5. Evaluation of RFC 4138 and Differences to this
64	    Document . . . . . . . . . . . . . . . . . . . . . . . . . . . .  12
65	    6. Security Considerations . . . . . . . . . . . . . . . . . . .  14
66	    7. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  14
67	    8. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . .  15
68	    Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . .  15
69	    A. Discussion of Window-Limited Cases. . . . . . . . . . . . . .  15
70	    B. List of Changes . . . . . . . . . . . . . . . . . . . . . . .  16
71	    References . . . . . . . . . . . . . . . . . . . . . . . . . . .  17
72	    Normative References . . . . . . . . . . . . . . . . . . . . . .  17
73	    Informative References . . . . . . . . . . . . . . . . . . . . .  17
74	    AUTHORS' ADDRESSES . . . . . . . . . . . . . . . . . . . . . . .  19
75	    Full Copyright Statement . . . . . . . . . . . . . . . . . . . .  21
76	    Intellectual Property. . . . . . . . . . . . . . . . . . . . . .  21

78	1.  Introduction

80	    The Transmission Control Protocol (TCP) [Pos81] has two methods for
81	    triggering retransmissions.  First, the TCP sender relies on
82	    incoming duplicate ACKs, which indicate that the receiver is missing
83	    some of the data.  After a required number of successive duplicate
84	    ACKs have arrived at the sender, it retransmits the first
85	    unacknowledged segment [APS99] and continues with a loss recovery
86	    algorithm such as NewReno [FHG04] or SACK-based loss recovery
87	    [BAFW03].  Second, the TCP sender maintains a retransmission timer
88	    which triggers retransmission of segments, if they have not been
89	    acknowledged before the retransmission timeout (RTO) expires.  When
90	    the retransmission timeout occurs, the TCP sender enters the RTO
91	    recovery where the congestion window is initialized to one segment
92	    and unacknowledged segments are retransmitted using the slow-start
93	    algorithm.  The retransmission timer is adjusted dynamically, based
94	    on the measured round-trip times [PA00].

96	    It has been pointed out that the retransmission timer can expire
97	    spuriously and cause unnecessary retransmissions when no segments
98	    have been lost [LK00, GL02, LM03].  After a spurious retransmission
99	    timeout, the late acknowledgments of the original segments arrive at
100	    the sender, usually triggering unnecessary retransmissions of a
101	    whole window of segments during the RTO recovery.  Furthermore,
102	    after a spurious retransmission timeout, a conventional TCP sender
103	    increases the congestion window on each late acknowledgment in slow
104	    start.  This injects a large number of data segments into the
105	    network within one round-trip time, thus violating the packet
106	    conservation principle [Jac88].

108	    There are a number of potential reasons for spurious retransmission
109	    timeouts.  First, some mobile networking technologies involve sudden
110	    delay spikes on transmission because of actions taken during a hand-
111	    off.  Second, a hand-off may take place from a low latency path to a
112	    high latency path, suddenly increasing the round-trip time beyond
113	    the current RTO value.  Third, on a low-bandwidth link the arrival
114	    of competing traffic (possibly with higher priority), or some other
115	    change in available bandwidth, can cause a sudden increase of the
116	    round-trip time.  This may trigger a spurious retransmission
117	    timeout.  A persistently reliable link layer can also cause a sudden
118	    delay when a data frame and several retransmissions of it are lost
119	    for some reason.  This document does not distinguish between the
120	    different causes of such a delay spike.  Rather, it discusses the
121	    spurious retransmission timeouts caused by a delay spike in general.

123	    This document describes the F-RTO detection algorithm.  It is based
124	    on the detection mechanism of the "Forward RTO-Recovery" (F-RTO)
125	    algorithm [SKR03] that is used for detecting spurious retransmission
126	    timeouts and thus avoids unnecessary retransmissions following the
127	    retransmission timeout.  When the timeout is not spurious, the F-RTO
128	    algorithm reverts back to the conventional RTO recovery algorithm,
129	    and therefore has similar behavior and performance.  In contrast to
130	    alternative algorithms proposed for detecting unnecessary
131	    retransmissions (Eifel [LK00], [LM03] and DSACK-based algorithms
132	    [BA04]), F-RTO does not require any TCP options for its operation,
133	    and it can be implemented by modifying only the TCP sender.  The
134	    Eifel algorithm uses TCP timestamps [BBJ92] for detecting a spurious
135	    timeout upon arrival of the first acknowledgment after the
136	    retransmission.  The DSACK-based algorithms require that the TCP
137	    Selective Acknowledgment Option [MMFR96], with the DSACK extension
138	    [FMMP00], is in use.  With DSACK, the TCP receiver can report if it
139	    has received a duplicate segment, enabling the sender to detect
140	    afterwards whether it has retransmitted segments unnecessarily.  The
141	    F-RTO algorithm only attempts to detect and avoid unnecessary
142	    retransmissions after an RTO.  Eifel and DSACK can also be used for
143	    detecting unnecessary retransmissions caused by other events, such
144	    as packet reordering.

146	    When an RTO expires, the F-RTO sender retransmits the first
147	    unacknowledged segment as usual [APS99].  Deviating from the normal
148	    operation after a timeout, it then tries to transmit new, previously
149	    unsent data for the first acknowledgment that arrives after the
150	    timeout, given that the acknowledgment advances the window.  If the
151	    second acknowledgment that arrives after the timeout advances the
152	    window (i.e., acknowledges data that was not retransmitted), the F-
153	    RTO sender declares the timeout spurious and exits the RTO recovery.
154	    However, if either of these two acknowledgments is a duplicate ACK,
155	    there will not be sufficient evidence of a spurious timeout.
156	    Therefore, the F-RTO sender retransmits the unacknowledged segments
157	    in slow start similarly to the traditional algorithm.

159	    With a SACK-enhanced version of the F-RTO algorithm, spurious
160	    timeouts may be detected even if duplicate ACKs arrive after an RTO
161	    retransmission.  Even though this document only specifies the F-RTO
162	    algorithm for TCP, the algorithm can also be applied to the Stream
163	    Control Transmission Protocol (SCTP) [Ste07] that has acknowledgment
164	    and packet retransmission concepts similar to TCP. Considerations on
165	    applying F-RTO for SCTP are discussed in RFC 4138 [SK05].

167	    This document is organized as follows.  Section 2 describes the
168	    basic F-RTO algorithm, and the SACK-enhanced F-RTO algorithm is
169	    given in Section 3.  Section 4 discusses the possible actions to be
170	    taken after detecting a spurious RTO and Section 5 discusses the
171	    security considerations.

173	1.1.  Conventions and Terminology

175	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
176	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
177	    document are to be interpreted as described in BCP 14, RFC 2119
178	    [RFC2119] and indicate requirement levels for protocols.

180	2.  Basic F-RTO Algorithm

182	    A timeout is considered spurious if it would have been avoided had
183	    the sender waited longer for an acknowledgment to arrive [LM03].  F-
184	    RTO affects the TCP sender behavior only after a retransmission
185	    timeout.  Otherwise, the TCP behavior remains the same.  When the
186	    RTO expires, the F-RTO algorithm monitors incoming acknowledgments
187	    and if the TCP sender gets an acknowledgment for a segment that was
188	    not retransmitted due to timeout, the F-RTO algorithm declares a
189	    timeout spurious.  The actions taken in response to a spurious
190	    timeout are not specified in this document, but we discuss some
191	    alternatives in Section 4.  This section introduces the algorithm
192	    and then discusses the different steps of the algorithm in more
193	    detail.

195	    Following the practice used with the Eifel Detection algorithm
196	    [LM03], we use the "SpuriousRecovery" variable to indicate whether
197	    the retransmission is declared spurious by the sender. This variable
198	    can be used as an input for a corresponding response algorithm. With
199	    F-RTO, the value of SpuriousRecovery can be either SPUR_TO
200	    (indicating a spurious retransmission timeout) or FALSE (indicating
201	    that the timeout is not declared spurious), and the TCP sender
202	    should follow the conventional RTO recovery algorithm. In addition,
203	    we use the "recover" variable specified in the NewReno algorithm
204	    [FHG04].

206	2.1.  The Algorithm

208	    A TCP sender implementing the basic F-RTO algorithm MUST take the
209	    following steps after the retransmission timer expires.  If the
210	    retransmission timer expires again during the execution of the F-RTO
211	    algorithm, the TCP sender MUST re-start the algorithm processing
212	    from step 1.  If the sender implements some loss recovery algorithm
213	    other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT
214	    be entered when earlier fast recovery is underway.

216	    The F-RTO algorithm takes different actions based on whether an
217	    incoming acknowledgement advances the cumulative acknowledgement
218	    point for a received in-order segment, or whether it is a duplicate
219	    acknowledgement to indicate an out-of-order segment. Duplicate
220	    acknowledgement is defined in [APB08]. The F-RTO algorithm does not
221	    specify actions for receiving a segment that does not acknowledge
222	    new data but is not a duplicate acknowledgement. The TCP sender
223	    SHOULD ignore such segments and wait for a segment that either
224	    acknowledges new data or is a duplicate acknowledgment.

226	    1) When RTO expires, retransmit the first unacknowledged segment and
227	       set SpuriousRecovery to FALSE. If the TCP sender is already in
228	       RTO recovery AND "recover" is larger than or equal to SND.UNA
229	       (the oldest unacknowledged sequence number [Pos81]), do not enter
230	       step 2 of this algorithm. Instead, store the highest sequence
231	       number transmitted so far in variable "recover" and continue with
232	       slow start retransmissions following the conventional RTO
233	       recovery algorithm.

235	    2) When the first acknowledgment after the RTO retransmission
236	       arrives at the TCP sender, store the highest sequence number
237	       transmitted so far in variable "recover". The TCP sender chooses
238	       one of the following actions, depending on whether the ACK
239	       advances the window or whether it is a duplicate ACK.

241	       a) If the acknowledgment is a duplicate ACK OR the
242	          Acknowledgement field covers "recover" but not more than
243	          "recover" OR the acknowledgment does not acknowledge all of
244	          the data that was retransmitted in step 1, revert to the
245	          conventional RTO recovery and continue by retransmitting
246	          unacknowledged data in slow start.  Do not enter step 3 of
247	          this algorithm.  The SpuriousRecovery variable remains as
248	          FALSE.

250	       b) Else, if the acknowledgment advances the window AND the
251	          Acknowledgement field does not cover "recover", transmit up to
252	          two new (previously unsent) segments and enter step 3 of this
253	          algorithm. If the TCP sender does not have enough unsent data,
254	          it can send only one segment. In addition, the TCP sender MAY
255	          override the Nagle algorithm [Nag84] and immediately send a
256	          segment if needed. Note that sending two segments in this step
257	          is allowed by TCP congestion control requirements [APS99]: An
258	          F-RTO TCP sender simply chooses different segments to
259	          transmit.

261	          If the TCP sender does not have any new data to send, or the
262	          advertised window prohibits new transmissions, the recommended
263	          action is to skip step 3 of this algorithm and continue with
264	          slow start retransmissions, following the conventional RTO
265	          recovery algorithm.  However, alternative ways of handling the
266	          window-limited cases that could result in better performance
267	          are discussed in Appendix A.

269	    3) When the second acknowledgment after the RTO retransmission
270	       arrives at the TCP sender, the TCP sender either declares the
271	       timeout spurious, or starts retransmitting the unacknowledged
272	       segments.

274	       a) If the acknowledgment is a duplicate ACK, set the congestion
275	          window to no more than 3 * MSS, and continue with the slow
276	          start algorithm retransmitting unacknowledged segments.  The
277	          congestion window can be set to 3 * MSS, because two round-
278	          trip times have elapsed since the RTO, and a conventional TCP
279	          sender would have increased cwnd to 3 during the same time.
280	          Leave SpuriousRecovery set to FALSE.

282	       b) If the acknowledgment advances the window (i.e., if it
283	          acknowledges data that was not retransmitted after the
284	          timeout), declare the timeout spurious, set SpuriousRecovery
285	          to SPUR_TO, and set the value of the "recover" variable to
286	          SND.UNA (the oldest unacknowledged sequence number [Pos81]).

288	2.2.  Discussion

290	    The F-RTO sender takes cautious actions when it receives duplicate
291	    acknowledgments after a retransmission timeout.  Because duplicate
292	    ACKs may indicate that segments have been lost, reliably detecting a
293	    spurious timeout is difficult due to the lack of additional
294	    information.  Therefore, it is prudent to follow the conventional
295	    TCP recovery in those cases.

297	    The condition in step 1 prevents the execution of the F-RTO
298	    algorithm in case a previous RTO recovery is underway when the
299	    retransmission timer expires, except in case the retransmission
300	    timer expires multiple times for the same segment. If RTO expires
301	    during an earlier RTO-based loss recovery, acknowledgements for
302	    retransmitted segments may falsely lead the TCP sender to declare
303	    the timeout spurious.

305	    If the first acknowledgment after the RTO retransmission covers the
306	    "recover" point at algorithm step (2a), there is not enough evidence
307	    that a non-retransmitted segment has arrived at the receiver after
308	    the timeout.  This is a common case when a fast retransmission is
309	    lost and has been retransmitted again after an RTO, while the rest
310	    of the unacknowledged segments were successfully delivered to the
311	    TCP receiver before the retransmission timeout.  Therefore, the
312	    timeout cannot be declared spurious in this case.

314	    If the first acknowledgment after the RTO retransmission does not
315	    acknowledge all of the data that was retransmitted in step 1, the
316	    TCP sender reverts to the conventional RTO recovery.  Otherwise, a
317	    malicious receiver acknowledging partial segments could cause the
318	    sender to declare the timeout spurious in a case where data was
319	    lost.

321	    The TCP sender is allowed to send two new segments in algorithm
322	    branch (2b) because the conventional TCP sender would transmit two
323	    segments when the first new ACK arrives after the RTO
324	    retransmission.  If sending new data is not possible in algorithm
325	    branch (2b), or if the receiver window limits the transmission, the
326	    TCP sender has to send something in order to prevent the TCP
327	    transfer from stalling.  If no segments were sent, the pipe between
328	    sender and receiver might run out of segments, and no further
329	    acknowledgments would arrive.  Therefore, in the window-limited
330	    case, the recommendation is to revert to the conventional RTO
331	    recovery with slow start retransmissions.  Appendix A discusses some
332	    alternative solutions for window-limited situations.

334	    If the retransmission timeout is declared spurious, the TCP sender
335	    sets the value of the "recover" variable to SND.UNA in order to
336	    allow fast retransmit [FHG04].  The "recover" variable was proposed
337	    for avoiding unnecessary, multiple fast retransmits when RTO expires
338	    during fast recovery with NewReno TCP.  Because the F-RTO sender
339	    retransmits only the segment that triggered the timeout, the problem
340	    of unnecessary multiple fast retransmits [FHG04] cannot occur.
341	    Therefore, if three duplicate ACKs arrive at the sender after the
342	    timeout, they probably indicate a packet loss, and thus fast
343	    retransmit should be used to allow efficient recovery.  If there are
344	    not enough duplicate ACKs arriving at the sender after a packet
345	    loss, the retransmission timer expires again and the sender enters
346	    step 1 of this algorithm.

348	    When the timeout is declared spurious, the TCP sender cannot detect
349	    whether the unnecessary RTO retransmission was lost.  In principle,
350	    the loss of the RTO retransmission should be taken as a congestion
351	    signal.  Thus, there is a small possibility that the F-RTO sender
352	    will violate the congestion control rules, if it chooses to fully
353	    revert congestion control parameters after detecting a spurious
354	    timeout.  The Eifel detection algorithm has a similar property,
355	    while the DSACK option can be used to detect whether the
356	    retransmitted segment was successfully delivered to the receiver.

358	    The F-RTO algorithm has a side-effect on the TCP round-trip time
359	    measurement.  Because the TCP sender can avoid most of the
360	    unnecessary retransmissions after detecting a spurious timeout, the
361	    sender is able to take round-trip time samples on the delayed
362	    segments.  If the regular RTO recovery was used without TCP
363	    timestamps, this would not be possible due to the retransmission
364	    ambiguity.  As a result, the RTO is likely to have more accurate and
365	    larger values with F-RTO than with the regular TCP after a spurious
366	    timeout that was triggered due to delayed segments.  We believe this
367	    is an advantage in networks that are prone to delay spikes.

369	    There are some situations where the F-RTO algorithm may not avoid
370	    unnecessary retransmissions after a spurious timeout.  If packet
371	    reordering or packet duplication occurs on the segment that
372	    triggered the spurious timeout, the F-RTO algorithm may not detect
373	    the spurious timeout due to incoming duplicate ACKs.  Additionally,
374	    if a spurious timeout occurs during fast recovery, the F-RTO
375	    algorithm often cannot detect the spurious timeout because the
376	    segments that were transmitted before the fast recovery trigger
377	    duplicate ACKs.  However, we consider these cases rare, and note
378	    that in cases where F-RTO fails to detect the spurious timeout, it
379	    retransmits the unacknowledged segments in slow start, and thus
380	    performs similarly to the regular RTO recovery.

382	3.  SACK-Enhanced Version of the F-RTO Algorithm

384	    This section describes an alternative version of the F-RTO algorithm
385	    that uses the TCP Selective Acknowledgment Option [MMFR96].  By
386	    using the SACK option, the TCP sender detects spurious timeouts in
387	    most of the cases when packet reordering or packet duplication is
388	    present.  If the SACK blocks acknowledge new data that was not
389	    transmitted after the RTO retransmission, the sender may declare the
390	    timeout spurious, even when duplicate ACKs follow the RTO.

392	    Given that the TCP Selective Acknowledgment Option [MMFR96] is
393	    enabled for a TCP connection, a TCP sender MAY implement the SACK-
394	    enhanced F-RTO algorithm.  If the sender applies the SACK-enhanced
395	    F-RTO algorithm, it MUST follow the steps below.  This algorithm
396	    SHOULD NOT be applied if the TCP sender is already in loss recovery
397	    when retransmission timeout occurs.

399	    The steps of the SACK-enhanced version of the F-RTO algorithm are as
400	    follows.  If the retransmission timer expires again during the
401	    execution of the SACK-enhanced F-RTO algorithm, the TCP sender MUST
402	    re-start the algorithm processing from step 1.

404	    1) When the RTO expires, retransmit the first unacknowledged segment
405	       and set SpuriousRecovery to FALSE. Following the recommendation
406	       in SACK specification [MMFR96], reset the SACK scoreboard.  If
407	       "RecoveryPoint" is larger than or equal to SND.UNA, do not enter
408	       step 2 of this algorithm. Instead, set variable "RecoveryPoint"
409	       to indicate the highest sequence number transmitted so far and
410	       continue with slow start retransmissions following the
411	       conventional RTO recovery algorithm.

413	    2) Wait until the acknowledgment of the data retransmitted due to
414	       the timeout arrives at the sender.  If duplicate ACKs arrive
415	       before the cumulative acknowledgment for retransmitted data,
416	       adjust the scoreboard according to the incoming SACK information.
417	       Stay in step 2 and wait for the next new acknowledgment. If RTO
418	       expires again, go to step 1 of the algorithm. When a new
419	       acknowledgment arrives, set variable "RecoveryPoint" to indicate
420	       the highest sequence number transmitted so far.

422	       a) If the Cumulative Acknowledgement field covers "RecoveryPoint"
423	          but not more than "RecoveryPoint", revert to the conventional
424	          RTO recovery and set the congestion window to no more than 2 *
425	          MSS, like a regular TCP would do. Do not enter step 3 of this
426	          algorithm.

428	       b) Else, if the Cumulative Acknowledgement field does not cover
429	          "RecoveryPoint" but is larger than SND.UNA, transmit up to two
430	          new (previously unsent) segments and proceed to step 3.  If
431	          the TCP sender is not able to transmit any previously unsent
432	          data -- either due to receiver window limitation or because it
433	          does not have any new data to send -- the recommended action
434	          is to refrain from entering step 3 of this algorithm.  Rather,
435	          continue with slow start retransmissions following the
436	          conventional RTO recovery algorithm.

438	          It is also possible to apply some of the alternatives for
439	          handling window-limited cases discussed in Appendix A.

441	    3) The next acknowledgment arrives at the sender.  Either a
442	       duplicate ACK or a new cumulative ACK (advancing the window)
443	       applies in this step. Other types of ACKs are ignored without any
444	       action.

446	       a) If the Cumulative Acknowledgement field or a SACK block covers
447	          more than "RecoveryPoint", set the congestion window to no
448	          more than 3 * MSS and proceed with the conventional RTO
449	          recovery, retransmitting unacknowledged segments.  Take this
450	          branch also when the acknowledgment is a duplicate ACK and it
451	          does not acknowledge any new, previously unacknowledged data
452	          below "RecoveryPoint" in the SACK blocks.  Leave
453	          SpuriousRecovery set to FALSE.

455	       b) If the Cumulative Acknowledgement field or a SACK block in the
456	          ACK does not cover more than "RecoveryPoint" AND it
457	          acknowledges data that was not acknowledged earlier (either
458	          with cumulative acknowledgment or using SACK blocks), declare
459	          the timeout spurious and set SpuriousRecovery to SPUR_TO.  The
460	          retransmission timeout can be declared spurious, because the
461	          segment acknowledged with this ACK was transmitted before the
462	          timeout.

464	    If there are unacknowledged holes between the received SACK blocks,
465	    those segments are retransmitted similarly to the conventional SACK
466	    recovery algorithm [BAFW03].  If the algorithm exits with
467	    SpuriousRecovery set to SPUR_TO, "RecoveryPoint" is set to SND.UNA,
468	    thus allowing fast recovery on incoming duplicate acknowledgments.

470	    The SACK enhanced algorithm works on the same principle as the basic
471	    algorithm, but by utilizing the additional information from the SACK
472	    option. When a genuine retransmission timeout occurs during a steady
473	    state of a connection, it can be assumed that there are no segments
474	    left in the pipe. Otherwise, the acknowledgments triggered by these
475	    segments would have triggered the SACK loss recovery or transmission
476	    of new segments. Therefore, if the F-RTO sender receives
477	    acknowledgements for segments transmitted before the retransmission
478	    timeout in response to the two new segments sent at the algorithm
479	    step 2, the normal operation of TCP has been just delayed, and the
480	    retransmission timeout is considered spurious. Note that this
481	    reasoning works only when the TCP sender is not in loss recovery at
482	    the time the retransmission timeout occurs. The condition in step 1
483	    checking that "RecoveryPoint" is larger than SND.UNA prevents the
484	    execution of the F-RTO algorithm in case a previous loss recovery,
485	    either RTO recovery or SACK loss recovery, is underway when the
486	    retransmission timer expires. It, however, allows the execution of
487	    the F-RTO algorithm, if the retransmission timer expires multiple
488	    times for the same segment.

490	4.  Taking Actions after Detecting Spurious RTO

492	    Upon a retransmission timeout, a conventional TCP sender assumes
493	    that outstanding segments are lost and starts retransmitting the
494	    unacknowledged segments.  When the retransmission timeout is
495	    detected to be spurious, the TCP sender should not continue
496	    retransmitting based on the timeout.  For example, if the sender was
497	    in congestion avoidance phase transmitting new, previously unsent
498	    segments, it should continue transmitting previously unsent segments
499	    in congestion avoidance.

501	    There are currently two alternatives specified for a spurious
502	    timeout response algorithm, the Eifel Response Algorithm [LG04], and
503	    an algorithm for adapting the retransmission timeout after a
504	    spurious RTO [BBA06]. If no specific response algorithm is
505	    implemented, the TCP SHOULD respond to spurious timeout
506	    conservatively, applying the TCP congestion control specification
507	    [APS99]. Different response algorithms for spurious retransmission
508	    timeouts have been analyzed in some research papers [GL03, Sar03]
509	    and IETF documents [SL03].

511	5.  Evaluation of RFC 4138 and Differences to this Document

513	    F-RTO was first specified in an Experimental RFC 4138 that has been
514	    implemented in a number of operating systems since it was published.
515	    Gained experience has been documented in a separate document
516	    [KYHS07], and can be summarized as follows.

518	    If the TCP sender employs F-RTO, it is able to detect spurious RTOs
519	    and avoid the unnecessary retransmission of the whole window of
520	    data. Because F-RTO avoids the unnecessary retransmissions after a
521	    spurious RTO, it is able to adhere to the packet conservation
522	    principle, unlike a regular TCP that enters the slow-start recovery
523	    unnecessarily an inappropriately restarts the ACK clock while there
524	    are segments outstanding in the network. When a spurious RTO has
525	    been detected, a sender can select an appropriate congestion control
526	    response instead of setting the congestion window to one segment.
527	    Because F-RTO avoids unnecessary retransmissions, it is able to take
528	    the RTT of the delayed segments into account when calculating the
529	    RTO estimate, which may help in avoiding further spurious
530	    retransmission timeouts.

532	    Experimental results with the basic F-RTO have been reported in an
533	    emulated network using a Linux implementation [SKR03]. Also
534	    different congestion control responses along with the SACK-enhanced
535	    version of F-RTO were tested in a similar environment [Sar03]. There
536	    are publications analyzing F-RTO performance over commercial W-CDMA
537	    networks, and in an emulated HSDPA network [Yam05, Hok05].  Also
538	    Microsoft reported positive experiences with their implementation of
539	    F-RTO in the IETF-68 meeting.

541	    It is known that some spurious RTOs may remain undetected by F-RTO
542	    if duplicate acknowledgements arrive at the sender immediately after
543	    the spurious RTO, for example due to packet reordering or packet
544	    loss. There are rare corner cases where F-RTO could "hide" a packet
545	    loss and therefore lead to inappropriate behavior with non-
546	    conservative congestion control response: first, if a massive packet
547	    reordering occurred so that the acknowledgement of RTO
548	    retransmission arrived at the sender before the acknowledgments of
549	    original transmissions, the sender might not detect the loss of the
550	    segment that triggered the RTO. Second, a malicious receiver could
551	    lead F-RTO to make a wrong conclusion after an RTO by acknowledging
552	    segments it has not received. Such receiver would, however, risk
553	    breaking the consistency of the TCP state between the sender and
554	    receiver, causing the connection to become unusable, which cannot be
555	    of any benefit to the receiver. Therefore we believe it is not
556	    likely that receivers would start employing such tricks in a
557	    significant scale. Finally, loss of the unnecessary RTO
558	    retransmission cannot be detected without using some explicit
559	    acknowledgement scheme such as DSACK. This is common to the other
560	    mechanisms for detecting spurious RTO, as well as to regular TCP
561	    that does not use DSACK. We note that if the congestion control
562	    response to spurious RTO is conservative enough, the above corner
563	    cases do not cause problems due to increased congestion.

565	6.  Security Considerations

567	    The main security threat regarding F-RTO is the possibility that a
568	    receiver could mislead the sender into setting too large a
569	    congestion window after an RTO.  There are two possible ways a
570	    malicious receiver could trigger a wrong output from the F-RTO
571	    algorithm.  First, the receiver can acknowledge data that it has not
572	    received.  Second, it can delay acknowledgment of a segment it has
573	    received earlier, and acknowledge the segment after the TCP sender
574	    has been deluded to enter algorithm step 3.

576	    If the receiver acknowledges a segment it has not really received,
577	    the sender can be led to declare spurious timeout in the F-RTO
578	    algorithm, step 3.  However, because the sender will have an
579	    incorrect state, it cannot retransmit the segment that has never
580	    reached the receiver.  Therefore, this attack is unlikely to be
581	    useful for the receiver to maliciously gain a larger congestion
582	    window.

584	    A common case for a retransmission timeout is that a fast
585	    retransmission of a segment is lost.  If all other segments have
586	    been received, the RTO retransmission causes the whole window to be
587	    acknowledged at once.  This case is recognized in F-RTO algorithm
588	    branch (2a).  However, if the receiver only acknowledges one segment
589	    after receiving the RTO retransmission, and then the rest of the
590	    segments, it could cause the timeout to be declared spurious when it
591	    is not.  Therefore, it is suggested that, when an RTO expires during
592	    the fast recovery phase, the sender would not fully revert the
593	    congestion window even if the timeout was declared spurious.
594	    Instead, the sender would reduce the congestion window to 1.

596	    If there is more than one segment missing at the time of a
597	    retransmission timeout, the receiver does not benefit from
598	    misleading the sender to declare a spurious timeout because the
599	    sender would have to go through another recovery period to
600	    retransmit the missing segments, usually after an RTO has elapsed.

602	7.  IANA Considerations

604	    This document has no actions for IANA.

606	8.  Acknowledgements

608	    The authors would like to thank Alfred Hoenes, Ilpo Jarvinen and
609	    Murari Sridharan for the comments on this document.

611	    We are also thankful to Reiner Ludwig, Andrei Gurtov, Josh Blanton,
612	    Mark Allman, Sally Floyd, Yogesh Swami, Mika Liljeberg, Ivan Arias
613	    Rodriguez, Sourabh Ladha, Martin Duke, Motoharu Miyake, Ted Faber,
614	    Samu Kontinen, and Kostas Pentikousis who gave valuable feedback
615	    during the preparation of RFC 4138, the precursor of this document.

617	Appendix

619	A.  Discussion of Window-Limited Cases

621	    When the advertised window limits the transmission of two new
622	    previously unsent segments, or there are no new data to send, it is
623	    recommended in F-RTO algorithm step (2b) that the TCP sender
624	    continue with the conventional RTO recovery algorithm.  The
625	    disadvantage is that the sender may continue unnecessary
626	    retransmissions due to possible spurious timeout.  This section
627	    briefly discusses the options that can potentially improve
628	    performance when transmitting previously unsent data is not
629	    possible.

631	    - The TCP sender could reserve an unused space of a size of one or
632	      two segments in the advertised window to ensure the use of
633	      algorithms such as F-RTO or Limited Transmit [ABF01] in receiver
634	      window-limited situations.  On the other hand, while doing this,
635	      the TCP sender should ensure that the window of outstanding
636	      segments is large enough for proper utilization of the available
637	      pipe.

639	    - Use additional information if available, e.g., TCP timestamps with
640	      the Eifel Detection algorithm, for detecting a spurious timeout.
641	      However, Eifel detection may yield different results from F-RTO
642	      when ACK losses and an RTO occur within the same round-trip time
643	      [SKR03].

645	    - Retransmit data from the tail of the retransmission queue and
646	      continue with step 3 of the F-RTO algorithm.  It is possible that
647	      the retransmission will be made unnecessarily. Furthermore, the
648	      operation of the SACK-based F-RTO algorithm would need to consider
649	      this case separately, to not use the retransmitted segment to
650	      indicate spurious timeout. Given these considerations, this option
651	      is not recommended.

653	    - Send a zero-sized segment below SND.UNA, similar to a TCP Keep-
654	      Alive probe, and continue with step 3 of the F-RTO algorithm.
655	      Because the receiver replies with a duplicate ACK, the sender is
656	      able to detect whether the timeout was spurious from the incoming
657	      acknowledgment. This method does not send data unnecessarily, but
658	      it delays the recovery by one round-trip time in cases where the
659	      timeout was not spurious.  Therefore, this method is not
660	      encouraged.

662	    - In receiver-limited cases, send one octet of new data, regardless
663	      of the advertised window limit, and continue with step 3 of the F-
664	      RTO algorithm.  It is possible that the receiver will have free
665	      buffer space to receive the data by the time the segment has
666	      propagated through the network, in which case no harm is done.  If
667	      the receiver is not capable of receiving the segment, it rejects
668	      the segment and sends a duplicate ACK.

670	B.  List of Changes

672	    Changes from RFC 4138 are summarized below, apart from minor editing
673	    and language improvements.

675	    * Modified the basic F-RTO algorithm and SACK-enhanced F-RTO
676	    algorithm to prevent the TCP sender from applying F-RTO algorithm if
677	    retransmission timer expires when an earlier RTO recovery is
678	    underway, except when RTO expires multiple times for the same
679	    segment.

681	    * Clarified behavior on multiple timeouts.

683	    * Added a paragraph on acknowledgements that do not acknowledge new
684	    data but are not duplicate acknowledgements

686	    * Clarified the SACK-algorithm a bit, and added one paragraph of
687	    description of the basic idea of the algorithm.

689	    * Removed SCTP considerations

691	    * Removed earlier Appendix sections, except Appendix C from RFC
692	    4138, which is now Appendix A

694	    * Clarified text about the possible response algorithms

696	    * Added section that summarizes the evaluation of RFC 4138

698	References

700	Normative References

702	    [APS99]   Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
703	              Control",  RFC 2581, April 1999.

705	    [APB08]   Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
706	              Control", Internet-Draft "draft-ietf-tcpm-
707	    rfc2581bis-04.txt",
708	              April 2008.

710	    [BAFW03]  Blanton, E., Allman, M., Fall, K., and L. Wang, "A
711	              Conservative Selective Acknowledgment (SACK)-based Loss
712	              Recovery Algorithm for TCP", RFC 3517, April 2003.

714	    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
715	              Requirement Levels", BCP 14, RFC 2119, March 1997.

717	    [FHG04]   Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
718	              Modification to TCP's Fast Recovery Algorithm", RFC 3782,
719	              April 2004.

721	    [MMFR96]  Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
722	              Selective Acknowledgement Options", RFC 2018,
723	              October 1996.

725	    [PA00]    Paxson, V. and M. Allman, "Computing TCP's Retransmission
726	              Timer", RFC 2988, November 2000.

728	    [Pos81]   Postel, J., "Transmission Control Protocol", STD 7, RFC
729	              793, September 1981.

731	Informative References

733	    [ABF01]   Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
734	              TCP's Loss Recovery Using Limited Transmit", RFC 3042,
735	              January 2001.

737	    [BA04]    Blanton, E. and M. Allman, "Using TCP Duplicate Selective
738	              Acknowledgement (DSACKs) and Stream Control Transmission
739	              Protocol (SCTP) Duplicate Transmission Sequence Numbers
740	              (TSNs) to Detect Spurious Retransmissions", RFC 3708,
741	              February 2004.

743	    [BBA06]   Blanton, J., Blanton, E., and M. Allman, "Using Spurious
744	              Retransmissions to Adapt the Retransmission Timeout",
745	              Internet-Draft "draft-allman-rto-backoff-04.txt", December
746	              2006. Work in progress.

748	    [BBJ92]   Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
749	              for High Performance", RFC 1323, May 1992.

751	    [FMMP00]  Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
752	              Extension to the Selective Acknowledgement (SACK) Option
753	              for TCP", RFC 2883, July 2000.

755	    [GL02]    Gurtov A. and R. Ludwig, "Evaluating the Eifel Algorithm
756	              for TCP in a GPRS Network", In Proc. European Wireless,
757	              Florence, Italy, February 2002.

759	    [GL03]    Gurtov A. and R. Ludwig, "Responding to Spurious Timeouts
760	              in TCP", In Proc. IEEE INFOCOM 03, San Francisco, CA, USA,
761	              March 2003.

763	    [Jac88]   Jacobson, V., "Congestion Avoidance and Control", In
764	              Proc. ACM SIGCOMM 88.

766	    [Hok05]   Hokamura, A., et al., "Performance Evaluation of F-RTO and
767	              Eifel Response Algorithms over W-CDMA packet network", In
768	              Proc. Wireless Personal Multimedia Communications
769	    (WPMC'05),
770	              Sept. 2005.

772	    [KYHS07]  Kojo, M., Yamamoto, K., Hata, M., and P. Sarolahti,
773	              "Evaluation of RFC 4138", Internet-draft
774	              "draft-kojo-tcpm-frto-eval-00.txt", June 2007. Work
775	              in progress.

777	    [LG04]    Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm
778	              for TCP", RFC 4015, February 2005.

780	    [LK00]    Ludwig R. and R.H. Katz, "The Eifel Algorithm: Making TCP
781	              Robust Against Spurious Retransmissions", ACM SIGCOMM
782	              Computer Communication Review, 30(1), January 2000.

784	    [LM03]    Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
785	              for TCP", RFC 3522, April 2003.

787	    [Nag84]   Nagle, J., "Congestion Control in IP/TCP Internetworks",
788	              RFC 896, January 1984.

790	    [SK05]    Sarolahti, P. and M. Kojo, "Forward RTO-Recovery (F-RTO):
791	              An Algorithm for Detecting Spurious Retransmission
792	              Timeouts with TCP and the Stream Control Transmission
793	              Protocol (SCTP)", RFC 4138, August 2005.

795	    [SKR03]   Sarolahti, P., Kojo, M., and K. Raatikainen, "F-RTO: An
796	              Enhanced Recovery Algorithm for TCP Retransmission
797	              Timeouts", ACM SIGCOMM Computer Communication Review,
798	              33(2), April 2003.

800	    [Sar03]   P. Sarolahti, P., "Congestion Control on Spurious TCP
801	              Retransmission Timeouts", In Proc. of IEEE Globecom
802	              2003, San Francisco, CA, USA. December 2003.

804	    [SL03]    Swami Y. and K. Le, "DCLOR: De-correlated Loss Recovery
805	              using SACK Option for Spurious Timeouts", Expired
806	              Internet-Draft, September 2003.

808	    [Ste07]   Stewart, R., Ed., "Stream Control Transmission Protocol",
809	              RFC 4960, September 2007.

811	    [Yam05]   Yamamoto, K., et al., "Effects of F-RTO and Eifel Response
812	              Algorithms for W-CDMA and HSDPA networks", In Proc.
813	    Wireless
814	              Personal Multimedia Communications (WPMC'05), September
815	    2005.

817	AUTHORS' ADDRESSES

819	    Pasi Sarolahti
820	    Nokia Research Center
821	    P.O. Box 407
822	    FI-00045 NOKIA GROUP
823	    Finland
824	    Phone: +358 50 4876607
825	    Email: pasi.sarolahti@nokia.com

827	    Markku Kojo
828	    University of Helsinki
829	    P.O. Box 68
830	    FI-00014 UNIVERSITY OF HELSINKI
831	    Finland
832	    Email: kojo@cs.helsinki.fi

834	    Kazunori Yamamoto
835	    NTT Docomo, Inc.
836	    3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan
837	    Phone: +81-46-840-3812
838	    Email: yamamotokaz@nttdocomo.co.jp

840	    Max Hata
841	    NTT Docomo, Inc.
842	    3-5 Hikarinooka, Yokosuka, Kanagawa, 239-8536, Japan
843	    Phone: +81-46-840-3812
844	    Email: hatama@s1.nttdocomo.co.jp

846	Full Copyright Statement

848	    Copyright (C) The IETF Trust (2008).

850	    This document is subject to the rights, licenses and restrictions
851	    contained in BCP 78, and except as set forth therein, the authors
852	    retain all their rights.

854	    This document and the information contained herein are provided on
855	    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
856	    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
857	    IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
858	    WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
859	    WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
860	    ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
861	    FOR A PARTICULAR PURPOSE.

863	Intellectual Property

865	    The IETF takes no position regarding the validity or scope of any
866	    Intellectual Property Rights or other rights that might be claimed
867	    to pertain to the implementation or use of the technology described
868	    in this document or the extent to which any license under such
869	    rights might or might not be available; nor does it represent that
870	    it has made any independent effort to identify any such rights.
871	    Information on the procedures with respect to rights in RFC
872	    documents can be found in BCP 78 and BCP 79.

874	    Copies of IPR disclosures made to the IETF Secretariat and any
875	    assurances of licenses to be made available, or the result of an
876	    attempt made to obtain a general license or permission for the use
877	    of such proprietary rights by implementers or users of this
878	    specification can be obtained from the IETF on-line IPR repository
879	    at http://www.ietf.org/ipr.

881	    The IETF invites any interested party to bring to its attention any
882	    copyrights, patents or patent applications, or other proprietary
883	    rights that may cover technology that may be required to implement
884	    this standard.  Please address the information to the IETF at ietf-
885	    ipr@ietf.org.