idnits 2.17.1 

draft-allman-rto-backoff-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 375.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 351.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 358.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 364.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There is 1 instance of too long lines in the document, the longest one
     being 1 character in excess of 72.

  ** The abstract seems to contain references ([RFC2119], [RFC2988]), which
     it shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 2007) is 6127 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC1323' is mentioned on line 95, but not defined

  ** Obsolete undefined reference: RFC 1323 (Obsoleted by RFC 7323)

  ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298)

  ** Downref: Normative reference to an Experimental RFC: RFC 3522

  ** Downref: Normative reference to an Experimental RFC: RFC 3708

  ** Downref: Normative reference to an Experimental RFC: RFC 4138

  -- Obsolete informational reference (is this intentional?): RFC 1323 (ref.
     'Flo98') (Obsoleted by RFC 7323)

  -- Obsolete informational reference (is this intentional?): RFC 3517
     (Obsoleted by RFC 6675)

  -- Obsolete informational reference (is this intentional?): RFC 3782
     (Obsoleted by RFC 6582)


     Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                             Josh Blanton
2	INTERNET DRAFT                                           Ohio University
3	draft-allman-rto-backoff-05.txt                            Ethan Blanton
4	Expires: January 2008                                  Purdue University
5	                                                             Mark Allman
6	                                                               ICIR/ICSI
7	                                                               July 2007

9	   Using Spurious Retransmissions to Adapt the Retransmission Timeout
10	                    draft-allman-rto-backoff-05.txt

12	Status of this Memo

14	    By submitting this Internet-Draft, each author represents that any
15	    applicable patent or other IPR claims of which he or she is aware
16	    have been or will be disclosed, and any of which he or she becomes
17	    aware will be disclosed, in accordance with Section 6 of BCP 79.

19	    Internet-Drafts are working documents of the Internet Engineering
20	    Task Force (IETF), its areas, and its working groups.  Note that
21	    other groups may also distribute working documents as
22	    Internet-Drafts.

24	    Internet-Drafts are draft documents valid for a maximum of six
25	    months and may be updated, replaced, or obsoleted by other documents
26	    at any time.  It is inappropriate to use Internet-Drafts as
27	    reference material or to cite them other than as "work in progress."

29	    The list of current Internet-Drafts can be accessed at
30	    http://www.ietf.org/ietf/1id-abstracts.txt.

32	    The list of Internet-Draft Shadow Directories can be accessed at
33	    http://www.ietf.org/shadow.html.

35	Copyright Notice

37	    Copyright (C) The IETF Trust (2007).

39	Abstract

41	    This document describes a method for using spurious retransmission
42	    timeouts as the trigger for slightly changing the way TCP's
43	    retransmission timeout is computed in an effort to avoid subsequent
44	    unnecessary retransmissions.

46	Terminology

48	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
49	    NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
50	    "OPTIONAL" in this document are to be interpreted as described
51	    in [RFC2119].

53	    The reader is expected to be familiar with the algorithm and
54	    terminology from [RFC2988].

56	1.  Introduction

58	    Various studies have shown that the retransmission timeout (RTO)
59	    estimator in [RFC2988] can trigger spurious retransmissions.  [AP99]
60	    shows that such unnecessary retransmissions are generally fairly
61	    rare.  However, [LK00] shows that in some networks (e.g., wireless
62	    networks) spurious retransmissions are more problematic due to
63	    occasional delay spikes that are not well predicted by TCP's RTO
64	    estimator.  In this document we outline one possible approach to
65	    mitigate the impact of pre-mature RTO firings by altering the RTO
66	    estimator specified in [RFC2988].

68	    Several methods for detecting spurious timeouts have been developed
69	    [RFC3522,RFC3708,RFC4138].  Additionally, [RFC4015] outlines one
70	    possible response to detecting spurious timeouts.  This document
71	    outlines an alternative to [RFC4015].  In general terms, [RFC4015]
72	    specifies two actions upon the detection of an unnecessary RTO-based
73	    retransmission.  First, the sending rate prior to the spurious
74	    retransmission is restored.  Furthermore, the RTO is adapted by
75	    re-initializing the RTO estimator with the long round-trip time
76	    (RTT) measurement that caused the spurious RTO.  The approach given
77	    in [RFC4015] is reasonable if the underlying cause of the problem is
78	    a shift in the path RTT.  For instance, if the route a TCP
79	    connection is traversing changes and the new path's RTT is
80	    significantly longer than the previous path's RTT then simply
81	    re-initializing the RTO is a reasonable action.

83	    As specified in the next section this document takes a slightly
84	    different approach than [RFC4015].  Generally, this document uses
85	    the failure of the RTO to wait long enough before triggering a
86	    retransmit as an indication that the RTO estimator itself is not
87	    properly capturing the variance present in the RTTs experienced by
88	    the TCP connection.  Therefore, this document calls for an
89	    additive contribution to the variance component in the RTO
90	    estimator upon the detection of retransmission timeouts in an
91	    effort to cope. This change represents a preference to try to
92	    avoid future spurious timeouts rather than simply reacting to each
93	    spurious retransmission.

95	    We note that TCP implementations using the RTTM mechanism [RFC1323]
96	    to assess the RTT multiple times per RTT with the standard
97	    exponentially-weighted moving average (EWMA) gains from [RFC2988]
98	    retain less RTT history than when taking one RTT measurement per RTT.
99	    [AP99] shows that "fast" EWMAs yield more spurious retransmissions
100	    than when using the standard gains with one RTT sample per RTT.
101	    Therefore, an orthogonal change to TCP implementations that use RTTM
102	    that may prevent spurious RTOs is to set the EWMA gains based on the
103	    number of RTT samples taken per RTT such that the amount of history
104	    kept, in terms of time, is the same regardless of the number RTT
105	    samples taken [Flo98,LS00].

107	2.  Parameter Changes

109	    As the basis for the changes proposed below, a TCP MUST support an
110	    IETF-specified spurious timeout detection method.  Currently,
111	    [RFC3522], [RFC3708] and [RFC4138] are such detection methods.  We
112	    note that the research literature includes alternate methods for
113	    detecting spurious retransmissions, e.g., the "retransmit bit"
114	    [LK00], but these schemes MUST NOT be used as part of the changes
115	    specified in this document until such time that the IETF approves a
116	    specification of these schemes.

118	    We also note that [RFC2988] explicitly allows for an RTO estimator
119	    that is more conservative than that given in [RFC2988] (which this
120	    document specifies).

122	    Also we note that, given that the TCP is savvy enough to untangle
123	    needed and uneeded retransmission timeouts, the TCP does not need to
124	    use Karn's algorithm [KP87,RFC2988] and can accurately determine the
125	    RTT that causes spurious retransmissions.

127	    This document specifies that a TCP MAY change the RTO estimator
128	    given in [RFC2988] upon detection of a spurious timeout, as follows.

130	    The general idea behind the mechanism is to introduce an additive
131	    variance term, V, in addition to the muliplier K which is applied
132	    to RTTVAR in the RTO calculation given in step (2.3) of [RFC2988],
133	    to allow for additional variance in the path's RTT.  The specific
134	    mechanism for TCPs using this change is:

136	    (A) A TCP using this method MUST replace the calculation of RTO in
137	        step (2.3) of [RFC2988] with:

139	          RTO <- SRTT + max(G, K*RTTVAR) + V                       (1)

141	        to include the additional variance term.

143	    (B) When a TCP connection is initiated, V is set to 0.

145	    (C) Upon the first expiration of the retransmission timer for a
146	        given sequence number, the values of SRTT and RTTVAR MUST be
147	        saved as SRTT_prev and RTTVAR_prev, respectively.

149	    (D) Upon detecting that a previous RTO-based retransmission was
150	        spurious, a TCP MUST calculate a V' using the RTT sample
151	        R', which is the time between when the original transmission of
152	        the given segment was sent and when the that original
153	        transmission is acknowledged, as follows:

155	          V' = R' - SRTT_prev + max(G, K*SRTTVAR_prev)             (2)

157	        V' then becomes the difference between the previously
158	        calculated RTO and the RTO value which would have prevented
159	        the spurious retransmission.

161	        The value of V' MUST NOT be reduced for the remainder of the
162	        connection (as discussed in more detail below).

164	    (E) The values of SRTT and RTTVAR in use when the spurious
165	        retransmit occured MUST replace the current values:

167	          SRTT = SRTT_prev                                         (3)
168	          RTTVAR = RTTVAR_prev                                     (4)

170	    (F) The R RTT sample MUST be used to adjust SRTT and RTTVAR and
171	        therefore the RTO, per [RFC2988].

173	    The actual V that is used in the RTO calculation is determined by
174	    the size of the congestion window.  When a TCP has only a small
175	    number of outstanding segments, advanced loss recovery that relies
176	    on the receipt of three duplicate acknowledgments as a recovery
177	    trigger is not as effective as when the congestion window is larger.
178	    Therefore, TCP relies more heavily on the RTO in this regime.
179	    Furthermore, the impact caused by spurious timeouts in this
180	    situation---in terms of congestion window reduction and resource
181	    wastage by go-back-N transmission---is small.  Hence, when the
182	    congestion window is less than or equal to 4*SMSS bytes then a
183	    V of 0 SHOULD be used when calculating the RTO.  Once the congestion
184	    window size grows beyond 4*SMSS bytes, the calculated value of V
185	    SHOULD be used in the calculation of the RTO.

187	    This specification explicitly offers no way to reduce V after it
188	    has been inflated.  V is never reduced because the presence of
189	    spurious timeouts which inflated V indicates that the standard
190	    estimator is inadequate for accurately estimating the variance of
191	    the RTT across the network path and therefore reducing V would
192	    increase the chances of further spurious retransmissions.

194	    Finally, we note that bounding V' is not advisable.  Say V' would be
195	    set to 20 via equation (2).  If V' were, instead, bound to 10 then
196	    legitimate RTOs would be forced to wait longer without offering
197	    solid protection against delay spikes (given that delay spikes that
198	    a V' of 10 will not handle have been observed).

200	3.  Advantages

202	    The advantage of tuning the RTO calculation to be more conservative
203	    after detecting spurious RTO-based retransmissions is in preventing
204	    further spurious RTOs.  In addition, spurious RTOs can cause
205	    go-back-N behavior [LK00] which can also be avoided by adapting the
206	    RTO to be more conservative.

208	4.  Disadvantages

210	    The disadvantage of tuning the RTO calculation to be more
211	    conservative is that legitimate RTO firings takes longer and could
212	    hurt performance.  However, an important note is that the RTO should
213	    not be TCP's primary loss recovery strategy.  [RFC3782] and
214	    [RFC3517] provide methods for TCP to effectively repair multiple
215	    lost segments from a single window of data without falling back to
216	    using the RTO.  Further, research shows that these changes are
217	    widely implemented [MAF05].  Therefore, making TCP's RTO calculation
218	    more conservative should not hinder performance under normal
219	    circumstance.  Put differently, when using advanced loss recovery
220	    techniques the firing of the RTO should be an indication that the
221	    congestion situation in the network is fairly bad.  In this case, it
222	    may well be that making the RTO estimator more conservative is the
223	    right general approach.

225	    The common exception to the above argument is when the congestion
226	    window is small, such that these advanced loss recovery algorithms
227	    do not work effectively.  The mechanism in this document explicitly
228	    takes this case into account by not using the more conservative RTO
229	    estimate when the congestion window is small.

231	5.  Summary

233	    This document specifies a small change that makes the RTO
234	    calculation given in [RFC2988] more conservative upon the detection
235	    of spurious RTO-based retransmissions.  The root cause of spurious
236	    retransmits is an inaccurate assessment of the network conditions
237	    (in this case, of the RTT).  Therefore, we tackle this by making the
238	    RTO calculation take into account an additional variance term.
239	    While this does lengthen the time required for legitimate
240	    retransmissions to fire, the RTO should not be TCP's primary means
241	    for retransmitting data and therefore this lengthened interval
242	    should only minimally impact overall performance and should only
243	    come into play when conditions along the network path have
244	    deteriorated significantly.  Finally, we note that this document
245	    makes the estimator given in [RFC2988] strictly more conservative
246	    and is therefore allowed via [RFC2988].

248	6.  Security Considerations

250	    This document calls for a simple parameter tweak and does not change
251	    the security considerations given in [RFC2988].

253	7.  IANA Considerations

255	    None.

257	Acknowledgments

259	    This document has benefited from discussions with Ted Faber, Aaron
260	    Falk, Joseph Ishac, Janardhan Iyengar, Sally Floyd, Vern Paxson and
261	    Joe Touch.

263	Normative References

265	    [RFC2119] S. Bradner.  Key words for use in RFCs to Indicate
266	        Requirement Levels, March 1997.  BCP 14, RFC 2119.

268	    [RFC2988] V. Paxson, M. Allman.  Computing TCP's Retransmission
269	        Timer, November 2000.  RFC 2988.

271	    [RFC3522] R. Ludwig, M. Meyer.  The Eifel Detection Algorithm for
272	        TCP, April 2003.  RFC 3522.

274	    [RFC3708] E. Blanton, M. Allman.  Using TCP Duplicate Selective
275	        Acknowledgement (DSACKs) and Stream Control Transmission
276	        Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs)
277	        to Detect Spurious Retransmissions, February 2004.  RFC 3708.

279	    [RFC4138] P. Sarolahti, M. Kojo.  Forward RTO-Recovery (F-RTO): An
280	        Algorithm for Detecting Spurious Retransmission Timeouts with
281	        TCP and the Stream Control Transmission Protocol (SCTP), August
282	        2005.  RFC 4138.

284	Informative References

286	    [AP99] Mark Allman, Vern Paxson. On Estimating End-to-End Network
287	        Path Properties. ACM SIGCOMM, September 1999.

289	    [Flo98] Sally Floyd.  Comments on RFC1323.bis, TCP-LW mailing list,
290	        May 1998.

292	    [KP87] Phil Karn, Craig Partridge.  Improving Round-Trip Time
293	        Estimates in Reliable Transport Protocols.  ACM SIGCOMM, August
294	        1997.

296	    [LK00] R. Ludwig, R. H. Katz.  The Eifel Algorithm: Making TCP
297	        Robust Against Spurious Retransmissions.  ACM Computer
298	        Communication Review, 30(1), January 2000.

300	    [LS00] R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM
301	        Computer Communication Review, Vol. 30, No. 3, July 2000.

303	    [MAF05] A. Medina, M. Allman, S. Floyd.  Measuring the Evolution of
304	        Transport Protocols in the Internet. ACM Computer Communication
305	        Review, 35(2), April 2005.

307	    [RFC3517] E. Blanton, M. Allman, K. Fall, L. Wang.  A Conservative
308	        Selective Acknowledgment (SACK)-based Loss Recovery Algorithm
309	        for TCP, April 2003.  RFC 3517.

311	    [RFC3782] S. Floyd, T. Henderson, A. Gurtov.  The NewReno
312	        Modification to TCP's Fast Recovery Algorithm, April 2004.  RFC
313	        3782.

315	    [RFC4015] R. Ludwig, A. Gurtov.  The Eifel Response Algorithm for
316	        TCP, February 2005.  RFC 4015.

318	Author's Addresses

320	    Josh Blanton
321	    Ohio University Internetworking Research Group
322	    301 Stocker Center
323	    Athens, OH  45701
324	    Email: jblanton@cs.ohiou.edu
325	    URL: http://irg.cs.ohiou.edu/~jblanton/

327	    Ethan Blanton
328	    Purdue University Computer Sciences
329	    305 North University Street
330	    West Lafayette, IN  47907
331	    Email: eblanton@cs.purdue.edu
332	    URL: http://www.cs.purdue.edu/homes/eblanton/

334	    Mark Allman
335	    ICSI Center for Internet Research
336	    1947 Center Street, Suite 600
337	    Berkeley, CA 94704-1198
338	    Phone: (440) 235-1792
339	    Email: mallman@icir.org
340	    URL: http://www.icir.org/mallman/

342	Intellectual Property Statement

344	    The IETF takes no position regarding the validity or scope of any
345	    Intellectual Property Rights or other rights that might be claimed
346	    to pertain to the implementation or use of the technology described
347	    in this document or the extent to which any license under such
348	    rights might or might not be available; nor does it represent that
349	    it has made any independent effort to identify any such rights.
350	    Information on the procedures with respect to rights in RFC
351	    documents can be found in BCP 78 and BCP 79.

353	    Copies of IPR disclosures made to the IETF Secretariat and any
354	    assurances of licenses to be made available, or the result of an
355	    attempt made to obtain a general license or permission for the use
356	    of such proprietary rights by implementers or users of this
357	    specification can be obtained from the IETF on-line IPR repository
358	    at http://www.ietf.org/ipr.

360	    The IETF invites any interested party to bring to its attention any
361	    copyrights, patents or patent applications, or other proprietary
362	    rights that may cover technology that may be required to implement
363	    this standard.  Please address the information to the IETF at
364	    ietf-ipr@ietf.org.

366	Disclaimer of Validity

368	    This document and the information contained herein are provided on
369	    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
370	    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
371	    IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
372	    WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
373	    WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
374	    ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
375	    FOR A PARTICULAR PURPOSE.

377	Copyright Statement

379	    Copyright (C) The IETF Trust (2007).  This document is subject
380	    to the rights, licenses and restrictions contained in BCP 78, and
381	    except as set forth therein, the authors retain all their rights.

383	Acknowledgment

385	    Funding for the RFC Editor function is currently provided by the
386	    Internet Society.