idnits 2.17.1 

draft-ietf-tcpm-rto-consider-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  == The document has an IETF Trust Provisions of 28 Dec 2009, Section 6.c(i)
     Publication Limitation clause.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The abstract seems to contain references ([RFC2119]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (April 15, 2016) is 2931 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC5681' is mentioned on line 301, but not defined

  -- Obsolete informational reference (is this intentional?): RFC 3940
     (Obsoleted by RFC 5740)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)


     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                                M. Allman
2	INTERNET-DRAFT                                                      ICSI
3	File: draft-ietf-tcpm-rto-consider-03.txt                 April 15, 2016
4	Intended Status: Best Current Practice
5	Expires: October 15, 2016

7	                 Retransmission Timeout Considerations

9	Status of this Memo

11	    This document may not be modified, and derivative works of it may
12	    not be created, except to format it for publication as an RFC or to
13	    translate it into languages other than English.

15	    This Internet-Draft is submitted in full conformance with the
16	    provisions of BCP 78 and BCP 79.  Internet-Drafts are working
17	    documents of the Internet Engineering Task Force (IETF), its areas,
18	    and its working groups. Note that other groups may also distribute
19	    working documents as Internet-Drafts.

21	    Internet-Drafts are draft documents valid for a maximum of six
22	    months and may be updated, replaced, or obsoleted by other documents
23	    at any time. It is inappropriate to use Internet-Drafts as
24	    reference material or to cite them other than as "work in progress."

26	    The list of current Internet-Drafts can be accessed at
27	    http://www.ietf.org/1id-abstracts.html

29	    The list of Internet-Draft Shadow Directories can be accessed at
30	    http://www.ietf.org/shadow.html

32	    This Internet-Draft will expire on October 15, 2016.

34	Copyright Notice

36	    Copyright (c) 2016 IETF Trust and the persons identified as the
37	    document authors. All rights reserved.

39	    This document is subject to BCP 78 and the IETF Trust's Legal
40	    Provisions Relating to IETF Documents
41	    (http://trustee.ietf.org/license-info) in effect on the date of
42	    publication of this document. Please review these documents
43	    carefully, as they describe your rights and restrictions with
44	    respect to this document. Code Components extracted from this
45	    document must include Simplified BSD License text as described in
46	    Section 4.e of the Trust Legal Provisions and are provided without
47	    warranty as described in the Simplified BSD License.

49	Abstract

51	    Each implementation of a retransmission timeout mechanism represents
52	    a balance between correctness and timeliness and therefore no
53	    implementation suits all situations.  This document provides
54	    high-level requirements for retransmission timeout schemes
55	    appropriate for general use in the Internet.  Within the
56	    requirements, implementations have latitude to define particulars
57	    that best address each situation.

59	Terminology

61	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
62	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
63	    document are to be interpreted as described in BCP 14, RFC 2119
64	    [RFC2119].

66	1   Introduction

68	    Despite our best intentions and most robust mechanisms, reliability
69	    in networking ultimately requires a timeout and re-try mechanism.
70	    Often there are more timely and precise mechanisms than a timeout
71	    for repairing loss (e.g., TCP's fast retransmit [RFC5681], NewReno
72	    [RFC6582] or selective acknowledgment scheme [RFC2018,RFC6675])
73	    which require information exchange between components in the system.
74	    Such communication cannot be guaranteed.  Alternatively, information
75	    coding---e.g., FEC---can allow the recipient to recover from some
76	    amount of lost information without use of a retransmission.  This
77	    latter provides probabilistic reliability.  Finally, negative
78	    acknowledgment schemes exist that do not depend on continuous
79	    feedback to trigger retransmissions (e.g., [RFC3940]).  However,
80	    regardless of these useful alternatives, the only thing we can truly
81	    depend on is the passage of time and therefore our ultimate backstop
82	    to ensuring reliability is a timeout.  (Note: There is a case when
83	    we cannot count on the passage of time, but in this case we believe
84	    repairing loss will be a moot point and hence we do not further
85	    consider this case in this document.)

87	    Various protocols have defined their own timeout mechanisms (e.g.,
88	    TCP [RFC6298], SCTP [RFC4960], SIP [RFC3261]).  Ideally, if we know
89	    a segment will be lost before reaching the destination, a second
90	    copy of it would be sent immediately after the first transmission.
91	    However, in reality the specifics of retransmission timeouts often
92	    represent a particular tradeoff between correctness and
93	    responsiveness [AP99].  In other words we want to simultaneously:

95	      - Wait long enough to ensure the decision to retransmit is
96	        correct.

98	      - Bound the delay we impose on applications before
99	        retransmitting.

101	    However, serving both of these goals is difficult as they pull in
102	    opposite directions.  I.e., towards either (a) withholding needed
103	    retransmissions too long to ensure the retransmissions are truly
104	    needed or (b) not waiting long enough to help application
105	    responsiveness and sending spurious retransmissions.  Given this
106	    fundamental tradeoff [AP99], we have found that even though the
107	    retransmission timeout (RTO) procedures are standardized,
108	    implementations often add their own subtle imprint on the specifics
109	    of the process to tilt the tradeoff between correctness and
110	    responsiveness in some particular way.

112	    At this point we recognize that often these specific tweaks are not
113	    crucial for network safety.  Hence, in this document we outline the
114	    high-level requirements that are crucial for any retransmission
115	    timeout scheme to follow.  The intent is to then allow
116	    implementations to instantiate mechanisms that best realize their
117	    specific goals within this framework.  These specific mechanisms
118	    could be standardized by the IETF or ad-hoc, but as long as they
119	    adhere to the requirements given in this document they would be
120	    considered consistent with the standards.

122	    Finally, we note the requirements in this document are applicable to
123	    any protocol that uses a retransmission timeout mechanism.  The
124	    examples and discussion are framed in terms of TCP, however, that is
125	    an artifact of where much of our experience with RTOs comes from and
126	    should not be read as narrowing the scope of the requirements.

128	2   Scope

130	    This document offers high-level requirements based on experience
131	    with retransmission timer algorithms.  However, this document
132	    explicitly does not update or obsolete currently standardized
133	    algorithms nor limit future standardization of specific RTO
134	    mechanisms.  Specifically:

136	    (a) RTO mechanisms that are currently standardized are not updated
137	        or obsoleted by this document.  This holds even in cases where
138	        the existing specification differs from the requirements in this
139	        document (e.g., [RFC3261] uses a smaller initial RTO than this
140	        document specifies).  Existing standard specifications enjoy
141	        their own consensus which this document does not change.

143	    (b) Future standardization efforts that specify RTO mechanisms
144	        SHOULD follow the requirements in this document.  This follows
145	        the definition of "SHOULD" [RFC2119] and is explicitly not a
146	        "MUST".  That is, the requirements in this document hold unless
147	        the community has consensus that specific deviations in a
148	        particular context are warranted.

150	    (c) RTO mechanisms that are not standardized but adhere to the
151	        requirements in the following section are deemed consistent with
152	        the standards.  This includes RTO mechanisms that are deviations
153	        from a specific standardized algorithm, but are still within the
154	        requirements below.

156	    More colloquially we note that each RTO implementation can be placed
157	    into one of the following four categories:

159	    - The implementation precisely follows a standard RTO mechanism
160	      (e.g., [RFC6298]), as well as adhering to the requirements in this
161	      document.

163	      This document represents no change for this situation as such an
164	      implementation is clearly standards compliant.

166	    - The implementation does not precisely follow a standard RTO
167	      mechanism and does not adhere to the requirements in this
168	      document.

170	      This document makes no change to this situation as such an
171	      implementation is clearly not standards compliant.

173	    - The implementation precisely follows a standard RTO mechanism
174	      (e.g., [RFC3261]), but does not precisely adhere to the
175	      requirements in this document.

177	      This document represents no change for this situation as such an
178	      implementation is considered standards compliant by virtue of
179	      precisely implementing a standard mechanism that has community
180	      consensus as a reasonable approach.  That is, this document's
181	      stance is to not limit the community's ability to make exceptions
182	      to the requirements herein for particular cases.

184	    - The implementation does not precisely follow a standard RTO
185	      mechanism, yet does adhere to the requirements in this document.

187	      This document represents a change for these implementations and
188	      considers them to be consistent with the standards by virtue of
189	      following the requirements herein that provide for an RTO safe for
190	      operation in the Internet.

192	    In other words, the requirements in this document can be viewed as
193	    specifying the default properties of an RTO mechanism.
194	    Specifications can more concretely nail down specifics within these
195	    defaults or work outside the defaults as necessary.  However,
196	    implementations that fall within the defaults do not require
197	    explicit specifications to be considered consistent with the
198	    standards.

200	3   Requirements

202	    We now list the requirements that SHOULD apply when designing
203	    retransmission timeout (RTO) mechanisms.

205	    (1) In the absence of any knowledge about the latency of a path, the
206	        RTO MUST be conservatively set to no less than 1 second.

208	        This requirement ensures two important aspects of the RTO.
209	        First, when transmitting into an unknown network,
210	        retransmissions will not be sent before an ACK would reasonably
211	        be expected to arrive and hence possibly waste scarce network
212	        resources.  Second, as noted below, sometimes retransmissions
213	        can lead to ambiguities in assessing the latency of a network
214	        path.  Therefore, it is especially important for the first
215	        latency sample to be free of ambiguities such that there is a
216	        baseline for the remainder of the communication.

218	        The specific constant (1 second) comes from the analysis of
219	        Internet RTTs found in Appendix A of [RFC6298].

221	    (2) We specify three requirements that pertain to the sampling of
222	        the latency across a path.

224	        Often measuring the latency is framed as assessing the
225	        round-trip time (RTT)---e.g., in TCP's RTO computation
226	        specification [RFC6298].  This is somewhat mis-leading as the
227	        latency is better framed as the "feedback time" (FT).  In other
228	        words, it is not simply a network property, but the length of
229	        time before a sender should reasonably expect a response to a
230	        query.

232	        For instance, consider a DNS request from a client to a
233	        resolver.  When the request can be served from the resolver's
234	        cache the FT likely well approximates the network RTT between
235	        the client and resolver.  However, on a cache miss the resolver
236	        will have to request the needed information from authoritative
237	        DNS servers, which will non-trivially increase the FT and
238	        therefore the FT between the client and resolver does not well
239	        match the network-based RTT between the two hosts.

241	        (a) In steady state the RTO MUST be set based on recent
242	            observations of both the FT and the variance of the FT.

244	            In other words, the RTO should be based on a reasonable
245	            amount of time that the sender should wait for an
246	            acknowledgment of the data before retransmitting the given
247	            data.

249	        (b) FT observations MUST be taken regularly.

251	            The exact definition of "regularly" is deliberately left
252	            vague.  TCP takes a FT sample roughly once per RTT, or if
253	            using the timestamp option [RFC7323] on each acknowledgment
254	            arrival.  [AP99] shows that both these approaches result in
255	            roughly equivalent performance for the RTO estimator.
256	            Additionally, [AP99] shows that taking only a single FT
257	            sample per TCP connection is suboptimal and hence the
258	            requirement that the FT be sampled continuously throughout
259	            the lifetime of a connection.  For the purpose of this
260	            requirement, we state that FT samples SHOULD be taken at
261	            least once per RTT or as frequently as data is exchanged and
262	            ACKed if that happens less frequently than every RTT.
263	            However, we also recognize that it may not always be
264	            practical to take a FT sample this often in all cases.
265	            Hence, this once-per-RTT sampling requirement is explicitly
266	            a "SHOULD" and not a "MUST".

268	        (c) FT samples used in the computation of the RTO MUST NOT be
269	            ambiguous.

271	            Assume two copies of some segment X are transmitted at times
272	            t0 and t1 and then segment X is acknowledged at time t2.  In
273	            some cases, it is not clear which copy of X triggered the
274	            ACK and hence the actual FT is either t2-t1 or t2-t0, but
275	            which is a mystery.  Therefore, in this situation an
276	            implementation MUST use Karn's algorithm [KP87,RFC6298] and
277	            use neither version of the FT sample and hence not update
278	            the RTO.

280	            There are cases where two copies of some data are
281	            transmitted in a way whereby the sender can tell which is
282	            being acknowledged by an incoming ACK.  E.g., TCP's
283	            timestamp option [RFC7323] allows for segments to be
284	            uniquely identified and hence avoid the ambiguity.  In such
285	            cases there is no ambiguity and the resulting samples can
286	            update the RTO.

288	    (3) Each time the RTO fires and causes a retransmission the value of
289	        the RTO MUST be exponentially backed off such that the next
290	        firing requires a longer interval.  The backoff may be removed
291	        after the successful transmission of non-retransmitted data.

293	        A maximum value MAY be placed on the RTO provided it is at least
294	        60 seconds (a la [RFC6298]).

296	        This ensures network safety.

298	    (4) Retransmission timeouts MUST be taken as indications of
299	        congestion in the network and the sending rate adapted using a
300	        standard mechanism (e.g., TCP collapses the congestion window to
301	        one segment [RFC5681]).

303	        This ensures network safety.

305	        An exception is made to this rule if an IETF standardized
306	        mechanism is used to determine that a particular loss is due to
307	        a non-congestion event (e.g., packet corruption).  In such a
308	        case a congestion control action is not required.  Additionally,
309	        RTO-triggered congestion control actions may be reversed when a
310	        standard mechanism determines that the cause of the loss was not
311	        congestion after all.

313	4   Discussion

315	    We note that research has shown the tension between the
316	    responsiveness and correctness of retransmission timeouts seems to
317	    be a fundamental tradeoff [AP99].  That is, making the RTO more
318	    aggressive (e.g., via changing TCP's EWMA gains, lowering the
319	    minimum RTO, etc.) can reduce the time spent waiting on needed
320	    retransmissions.  However, at the same time, such aggressiveness
321	    leads to more needless retransmissions.  Therefore, being as
322	    aggressive as the requirements given in the previous section allow
323	    in any particular situation may not be the best course of action
324	    because an RTO expiration carries a requirement to slow down.

326	    While the tradeoff between responsiveness and correctness seems
327	    fundamental, the tradeoff can be made less relevant if the sender
328	    can detect and recover from spurious RTOs.  Several mechanisms have
329	    been proposed for this purpose, such as Eifel [RFC3522], F-RTO
330	    [RFC5682] and DSACK [RFC2883,RFC3708].  Using such mechanisms may
331	    allow a data originator to tip towards being more responsive without
332	    incurring (as much of) the attendant costs of needless retransmits.

334	    Also, note, that in addition to the experiments discussed in [AP99],
335	    the Linux TCP implementation has been using various non-standard RTO
336	    mechanisms for many years seemingly without large scale problems
337	    (e.g., using different EWMA gains).  Further, a number of
338	    implementations use minimum RTOs that are less than the 1 second
339	    specified in [RFC6298].  While the implication of these deviations
340	    from the standard may be more spurious retransmits (per [AP99]), we
341	    are aware of no large scale problems caused by this change to the
342	    minimum RTO.

344	    Finally, we note that while allowing implementations to be more
345	    aggressive may in fact increase the number of needless
346	    retransmissions the above requirements fail safe in that they insist
347	    on exponential backoff of the RTO and a transmission rate reduction.
348	    Therefore, allowing implementers latitude in their instantiations of
349	    an RTO mechanism does not somehow open the flood gates to aggressive
350	    behavior.  Since there is a downside to being aggressive the
351	    incentives for proper behavior are retained in the mechanism.

353	5   Security Considerations

355	    This document does not alter the security properties of
356	    retransmission timeout mechanisms.  See [RFC6298] for a discussion
357	    of these within the context of TCP.

359	Acknowledgments

361	    This document benefits from years of discussions with Ethan Blanton,
362	    Sally Floyd, Jana Iyengar, Shawn Ostermann, Vern Paxson, and the
363	    members of the TCPM and TCP-IMPL working groups.  Ran Atkinson,
364	    Yuchung Cheng, Jonathan Looney and Michael Scharf provided useful
365	    comments on a previous version of this draft.

367	Normative References

369	    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
370	        Requirement Levels", BCP 14, RFC 2119, March 1997.

372	Informative References

374	    [AP99] Allman, M., V. Paxson, "On Estimating End-to-End Network Path
375	        Properties", Proceedings of the ACM SIGCOMM Technical Symposium,
376	        September 1999.

378	    [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time
379	        Estimates in Reliable Transport Protocols", SIGCOMM 87.

381	    [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
382	        Selective Acknowledgment Options", RFC 2018, October 1996.

384	    [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
385	        Extension to the Selective Acknowledgement (SACK) Option for
386	        TCP", RFC 2883, July 2000.

388	    [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
389	        A., Peterson, J., Sparks, R., Handley, M., and E. Schooler,
390	        "SIP: Session Initiation Protocol", RFC 3261, June 2002.

392	    [RFC3522] Ludwig, R., M. Meyer, "The Eifel Detection Algorithm for
393	        TCP", RFC 3522, april 2003.

395	    [RFC3708] Blanton, E., M. Allman, "Using TCP Duplicate Selective
396	        Acknowledgement (DSACKs) and Stream Control Transmission
397	        Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs)
398	        to Detect Spurious Retransmissions", RFC 3708, February 2004.

400	    [RFC3940] Adamson, B., C. Bormann, M. Handley, J. Macker,
401	        "Negative-acknowledgment (NACK)-Oriented Reliable Multicast
402	        (NORM) Protocol", November 2004, RFC 3940.

404	    [RFC4960] Stweart, R., "Stream Control Transmission Protocol", RFC
405	        4960, September 2007.

407	    [RFC5682] Sarolahti, P., M. Kojo, K. Yamamoto, M. Hata, "Forward
408	        RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious
409	        Retransmission Timeouts with TCP", RFC 5682, September 2009.

411	    [RFC6298] Paxson, V., M. Allman, H.K. Chu, M. Sargent, "Computing
412	        TCP's Retransmission Timer", June 2011, RFC 6298.

414	    [RFC6582] Henderson, T., S. Floyd, A. Gurtov, Y. Nishida, "The
415	        NewReno Modification to TCP's Fast Recovery Algorithm", April
416	        2012, RFC 6582.

418	    [RFC6675] Blanton, E., M. Allman, L. Wang, I. Jarvinen, M.  Kojo,
419	        Y. Nishida, "A Conservative Loss Recovery Algorithm Based on
420	        Selective Acknowledgment (SACK) for TCP", August 2012, RFC 6675.

422	    [RFC7323] Borman D., B. Braden, V. Jacobson, R. Scheffenegger, "TCP
423	        Extensions for High Performance", September 2014, RFC 7323.

425	Authors' Addresses

427	   Mark Allman
428	   International Computer Science Institute
429	   1947 Center St.  Suite 600
430	   Berkeley, CA  94704
431	   EMail: mallman@icir.org
432	   http://www.icir.org/mallman