idnits 2.17.1 

draft-ietf-tcpm-rto-consider-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  == The document has an IETF Trust Provisions of 28 Dec 2009, Section 6.c(i)
     Publication Limitation clause.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The abstract seems to contain references ([RFC2119]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 15, 2016) is 2866 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC5681' is mentioned on line 357, but not defined

  == Unused Reference: 'RFC3940' is defined on line 465, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6582' is defined on line 490, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 2140
     (Obsoleted by RFC 9040)

  -- Obsolete informational reference (is this intentional?): RFC 3940
     (Obsoleted by RFC 5740)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)


     Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                                M. Allman
2	INTERNET-DRAFT                                                      ICSI
3	File: draft-ietf-tcpm-rto-consider-04.txt                  June 15, 2016
4	Intended Status: Best Current Practice
5	Expires: December 15, 2016

7	                  Retransmission Timeout Requirements

9	Status of this Memo

11	    This document may not be modified, and derivative works of it may
12	    not be created, except to format it for publication as an RFC or to
13	    translate it into languages other than English.

15	    This Internet-Draft is submitted in full conformance with the
16	    provisions of BCP 78 and BCP 79.  Internet-Drafts are working
17	    documents of the Internet Engineering Task Force (IETF), its areas,
18	    and its working groups. Note that other groups may also distribute
19	    working documents as Internet-Drafts.

21	    Internet-Drafts are draft documents valid for a maximum of six
22	    months and may be updated, replaced, or obsoleted by other documents
23	    at any time. It is inappropriate to use Internet-Drafts as
24	    reference material or to cite them other than as "work in progress."

26	    The list of current Internet-Drafts can be accessed at
27	    http://www.ietf.org/1id-abstracts.html

29	    The list of Internet-Draft Shadow Directories can be accessed at
30	    http://www.ietf.org/shadow.html

32	    This Internet-Draft will expire on October 15, 2016.

34	Copyright Notice

36	    Copyright (c) 2016 IETF Trust and the persons identified as the
37	    document authors. All rights reserved.

39	    This document is subject to BCP 78 and the IETF Trust's Legal
40	    Provisions Relating to IETF Documents
41	    (http://trustee.ietf.org/license-info) in effect on the date of
42	    publication of this document. Please review these documents
43	    carefully, as they describe your rights and restrictions with
44	    respect to this document. Code Components extracted from this
45	    document must include Simplified BSD License text as described in
46	    Section 4.e of the Trust Legal Provisions and are provided without
47	    warranty as described in the Simplified BSD License.

49	Abstract

51	    Ensuring reliable communication often manifests in a timeout and
52	    retry mechanism.  Each implementation of a retransmission timeout
53	    mechanism represents a balance between correctness and timeliness
54	    and therefore no implementation suits all situations.  This document
55	    provides high-level requirements for retransmission timeout schemes
56	    appropriate for general use in the Internet.  Within the
57	    requirements, implementations have latitude to define particulars
58	    that best address each situation.

60	Terminology

62	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
63	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
64	    document are to be interpreted as described in BCP 14, RFC 2119
65	    [RFC2119].

67	1   Introduction

69	    Reliable transmission is a key property for many network protocols
70	    and applications.  Our protocols use various mechanisms to achieve
71	    reliable data transmission.  Often we use continuous or periodic
72	    reports from the recipient to inform the sender's notion of which
73	    pieces of data are missing and need to be retransmitted to ensure
74	    reliability.  Alternatively, information coding---e.g., FEC---can be
75	    used to achieve probabilistic reliability without retransmissions.
76	    However, despite our best intentions and most robust mechanisms, the
77	    only thing we can truly depend on is the passage of time and
78	    therefore our ultimate backstop to ensuring reliability is a timeout
79	    and re-try mechanism.  That is, the sender sets some expectation for
80	    how long to wait for confirmation of delivery for a given piece of
81	    data.  When this time period passes without delivery confirmation
82	    the sender assumes the data was lost in transit and therefore
83	    schedules a retransmission.  This process of ensuring reliability
84	    via time-based loss detection and resending lost data is commonly
85	    referred to as a "retransmission timeout (RTO)" mechanism.

87	    Various protocols have defined their own RTO mechanisms (e.g., TCP
88	    [RFC6298], SCTP [RFC4960], SIP [RFC3261]).  The specifics of
89	    retransmission timeouts often represent a particular tradeoff
90	    between correctness and responsiveness [AP99].  In other words we
91	    want to simultaneously:

93	      - wait long enough to ensure the detection of loss is correct and
94	        therefore a retransmission is in fact needed, and

96	      - bound the delay we impose on applications before repairing
97	        loss.

99	    Serving both of these goals is difficult as they pull in opposite
100	    directions.  I.e., towards either (a) withholding needed
101	    retransmissions too long to ensure the original transmission is
102	    truly lost or (b) not waiting long enough to help application
103	    responsiveness and hence sending unnecessary (often denoted
104	    "spurious") retransmissions.  We have found that even though the RTO
105	    procedure is standardized for some protocols (e.g., TCP [RFC6298]),
106	    implementations often add their own subtle imprint on the specifics
107	    of the process to tilt the tradeoff between correctness and
108	    responsiveness in some particular way.

110	    At this point we recognize that often these specific tweaks that
111	    deviate from standardized RTO mechanisms do not materially impact
112	    network safety.  Therefore, in this document we outline a set of
113	    high-level protocol-agnostic requirements for RTO mechanisms that
114	    provide a for network safety.  The intent is to provide a safe
115	    foundation on which implementations have the flexibility to
116	    instantiate mechanisms that best realize their specific goals.

118	2   Scope

120	    The principles we outline in this document are protocol-agnostic and
121	    widely applicable.  We make the following scope statements about
122	    the application of the requirements discussed in Section 3:

124	    (S.1) The requirements in this document apply only to timer-based
125	          loss detection and retransmission.

127	          While there are a bevy of uses for timers in protocols---from
128	          rate-based pacing to connection failure detection to making
129	          congestion control decisions and beyond---these are outside
130	          the scope of this document.

132	    (S.2) The requirements in this document only apply to cases where
133	          loss detected via a timer is repaired by a retransmission of
134	          the original data.

136	          Other cases are certainly possible---e.g., replacing the lost
137	          data with an updated version---but fall outside the scope of
138	          this document.

140	    (S.3) The requirements in this document apply only to endpoint-to-
141	          endpoint unicast communication.  Reliable multicast (e.g.,
142	          [RFC5740]) protocols are explicitly outside the scope of this
143	          document.

145	          Protocols such as SCTP [RFC4960] and MP-TCP [RFC6182] that
146	          communicate in a unicast fashion with multiple specific
147	          endpoints can leverage the requirements in this document
148	          provided they track state and follow the requirements for each
149	          endpoint independently.  I.e., if host A communicates with
150	          hosts B and C, A must use independent RTOs for traffic sent to
151	          B and C.

153	    (S.4) There are cases where state is shared across connections or
154	          flows (e.g., [RFC2140], [RFC3124]).  The RTO is one piece
155	          state that is often discussed as sharable.  These situations
156	          raise issues that the simple flow-oriented RTO mechanism
157	          discussed in this document does not consider (e.g., how long
158	          to preserve state between connections).  Therefore, while the
159	          general principles given in Section 3 are likely applicable,
160	          sharing RTOs across flows is outside the scope of this
161	          document.

163	    (S.5) The requirements in this document apply to reliable
164	          transmission, but do not assume that all data transmitted
165	          within a connection or flow is reliably sent.

167	          E.g., a protocol like DCCP [RFC4340] could leverage the
168	          requirements in this document for the initial reliable
169	          handshake even though the protocol reverts to unreliable
170	          transmission after the handshake.

172	          E.g., a protocol like SCTP [RFC4960] could leverage the
173	          requirements for data that is sent only "partially reliably".
174	          In this case, the protocol uses two phases for each message.
175	          In the first phase, the protocol attempts to ensure
176	          reliability and can leverage the requirements in this
177	          document.  At some point the value of the data is gone and the
178	          protocol transitions to the second phase where the data is
179	          treated as unreliably transmitted and therefore the protocol
180	          will no longer attempt to repair the loss---and hence there
181	          are no more retransmissions and the requirements in this
182	          document are moot.

184	    (S.6) The requirements for RTO mechanisms in this document can be
185	          applied regardless of whether the RTO mechanism is the sole
186	          loss repair strategy or works in concert with other
187	          mechanisms.

189	          E.g., for a simple protocol like UDP-based DNS [] a timeout
190	          and re-try mechanism is likely to act alone to ensure
191	          reliability.

193	          E.g., within a complex protocol like TCP or SCTP we have
194	          designed methods to detect and repair loss based on explicit
195	          endpoint state sharing [RFC2018,RFC4960,RFC6675].  These
196	          mechanisms are preferred over the RTO as they are often more
197	          timely and precise than the coarse-grained RTO.  In these
198	          cases, the RTO becomes a last resort when the more advanced
199	          mechanisms fail.

201	    Additionally, the following statements detail the relationship of
202	    the requirements in this document to other specifications and
203	    implementations:

205	    (R.1) RTO mechanisms that are currently standardized are not updated
206	          or obsoleted by this document.  Implementations are free to
207	          use these existing specifications as they do now.

209	          This holds even in cases where the existing specification
210	          differs from the requirements in this document (e.g.,
211	          [RFC3261] uses a smaller initial timeout than this document
212	          specifies).  Existing standard specifications enjoy their own
213	          consensus which this document does not change.

215	    (R.2) Future standardization efforts that specify RTO mechanisms
216	          SHOULD follow the requirements in this document.

218	          There may be reasons for future RTO mechanisms to deviate from
219	          the requirements in Section 3.  In these cases, we expect only
220	          that the standards process does so after reasonable
221	          deliberation and with good reason.

223	    (R.3) Alternatively, future RTO mechanism implementations may be
224	          made directly against the requirements in Section 3 without
225	          another protocol-specific specification.

227	    (R.4) There will no doubt be cases where applying the requirements
228	          in this document directly is not possible due to the structure
229	          or operation of a protocol.  For instance, a case where a
230	          timeout is used to detect loss, but the loss is not repaired
231	          with a direct retransmission of the original data.  In these
232	          situations, an alternate specification is required.  We
233	          encourage such future efforts to leverage the spirit of the
234	          requirements in this document to inform alternate
235	          specifications.

237	3   Requirements

239	    We now list the requirements that apply when designing
240	    retransmission timeout (RTO) mechanisms.

242	    (1) In the absence of any knowledge about the latency of a path, the
243	        RTO MUST be conservatively set to no less than 1 second.

245	        This requirement ensures two important aspects of the RTO.
246	        First, when transmitting into an unknown network,
247	        retransmissions will not be sent before an ACK would reasonably
248	        be expected to arrive and hence possibly waste scarce network
249	        resources.  Second, as noted below, sometimes retransmissions
250	        can lead to ambiguities in assessing the latency of a network
251	        path.  Therefore, it is especially important for the first
252	        latency sample to be free of ambiguities such that there is a
253	        baseline for the remainder of the communication.

255	        The specific constant (1 second) comes from the analysis of
256	        Internet RTTs found in Appendix A of [RFC6298].

258	    (2) As we note above, loss detection happens when a sender does not
259	        receive delivery confirmation within an some expected period of
260	        time.  We now specify three requirements that pertain to setting
261	        the length of this expectation.

263	        Often measuring the time required for delivery confirmation is
264	        is framed as the round-trip time (RTT) of the network path as
265	        this is the minimum amount of time required to receive delivery
266	        confirmation and also often follows protocol behavior whereby
267	        acknowledgments are generated quickly after data arrives.  For
268	        instance, this is the case for the RTO used by TCP [RFC6298] and
269	        SCTP [RFC4960].  However, this is somewhat mis-leading as the
270	        expected latency is better framed as the "feedback time" (FT).

272	        In other words, the expectation is not always simply a network
273	        property, but includes additional time before a sender should
274	        reasonably expect a response to a query.

276	        For instance, consider a UDP-based DNS request from a client to
277	        a resolver.  When the request can be served from the resolver's
278	        cache the FT likely well approximates the network RTT between
279	        the client and resolver.  However, on a cache miss the resolver
280	        will have to request the needed information from authoritative
281	        DNS servers, which will non-trivially increase the FT compared
282	        to the RTT between the client and resolver.

284	        (a) In steady state the RTO MUST be set based on recent
285	            observations of both the FT and the variance of the FT.

287	            In other words, the RTO should be based on a reasonable
288	            amount of time that the sender should wait for delivery
289	            confirmation before retransmitting the given data.

291	        (b) FT observations MUST be taken regularly.

293	            Internet measurements show that taking only a single FT
294	            sample per TCP connection results in a relatively poorly
295	            performing RTO mechanism [AP99], hence the requirement that
296	            the FT be sampled continuously throughout the lifetime of a
297	            connection.

299	            TCP takes an FT sample roughly once per RTT, or if using the
300	            timestamp option [RFC7323] on each acknowledgment arrival.
301	            [AP99] shows that both these approaches result in roughly
302	            equivalent performance for the RTO estimator.

304	            Therefore, "regularly" SHOULD be defined as at least once
305	            per RTT or as frequently as data is exchanged in cases where
306	            that happens less frequently than once per RTT.  However, we
307	            also recognize that it may not always be practical to take
308	            an FT sample this often in all cases.  Hence, this
309	            once-per-RTT definition of "regularly" is explicitly a
310	            "SHOULD" and not a "MUST".

312	        (c) FT observations MAY be taken from non-data exchanges.

314	            Some protocols use keepalives, heartbeats or other messages
315	            to exchange control information.  To the extent that the
316	            latency of these transactions mirrors data exchange, they
317	            can be leveraged to take FT samples within the RTO
318	            mechanism.  Such samples can help protocols keep their RTO
319	            accurate during lulls in data transmission.  However, given
320	            that these messages may not be subject to the same delays as
321	            data transmission, we do not take a general view on whether
322	            this is useful or not.

324	        (d) An RTO mechanism MUST NOT use ambiguous FT samples.

326	            Assume two copies of some segment X are transmitted at times
327	            t0 and t1 and then at time t2 the sender receives
328	            confirmation that X in fact arrived.  In some cases, it is
329	            not clear which copy of X triggered the confirmation and
330	            hence the actual FT is either t2-t1 or t2-t0, but which is a
331	            mystery.  Therefore, in this situation an implementation
332	            MUST use Karn's algorithm [KP87,RFC6298] and use neither
333	            version of the FT sample and hence not update the RTO.

335	            There are cases where two copies of some data are
336	            transmitted in a way whereby the sender can tell which is
337	            being acknowledged by an incoming ACK.  E.g., TCP's
338	            timestamp option [RFC7323] allows for segments to be
339	            uniquely identified and hence avoid the ambiguity.  In such
340	            cases there is no ambiguity and the resulting samples can
341	            update the RTO.

343	    (3) Each time the RTO detects a loss and a retransmission is
344	        scheduled, the value of the RTO MUST be exponentially backed off
345	        such that the next firing requires a longer interval.  The
346	        backoff SHOULD be removed after the successful repair of the
347	        lost data and subsequent transmission of non-retransmitted data.

349	        A maximum value MAY be placed on the RTO.  The maximum RTO MUST
350	        NOT be less than 60 seconds (a la [RFC6298]).

352	        This ensures network safety.

354	    (4) Retransmissions triggered by the RTO mechanism MUST be taken as
355	        indications of network congestion and the sending rate adapted
356	        using a standard mechanism (e.g., TCP collapses the congestion
357	        window to one segment [RFC5681]).

359	        This ensures network safety.

361	        Exception could be made to this rule if an IETF standardized
362	        mechanism is used to determine that a particular loss is due to
363	        a non-congestion event (e.g., packet corruption).  In such a
364	        case a congestion control action is not required.  Additionally,
365	        RTO-triggered congestion control actions may be reversed when a
366	        standard mechanism determines that the cause of the loss was not
367	        congestion after all (e.g., [RFC5682]).

369	4   Discussion

371	    We note that research has shown the tension between the
372	    responsiveness and correctness of retransmission timeouts seems to
373	    be a fundamental tradeoff in the context of TCP [AP99].  That is,
374	    making the RTO more aggressive (e.g., via changing TCP's EWMA gains,
375	    lowering the minimum RTO, etc.) can reduce the time spent waiting on
376	    needed retransmissions.  However, at the same time, such
377	    aggressiveness leads to more needless retransmissions.  Therefore,
378	    being as aggressive as the requirements given in the previous
379	    section allow in any particular situation may not be the best course
380	    of action because an RTO expiration carries a requirement to invoke
381	    a congestion response and hence slow transmission down.

383	    While the tradeoff between responsiveness and correctness seems
384	    fundamental, the tradeoff can be made less relevant if the sender
385	    can detect and recover from spurious RTOs.  Several mechanisms have
386	    been proposed for this purpose, such as Eifel [RFC3522], F-RTO
387	    [RFC5682] and DSACK [RFC2883,RFC3708].  Using such mechanisms may
388	    allow a data originator to tip towards being more responsive without
389	    incurring (as much of) the attendant costs of needless retransmits.

391	    Also, note, that in addition to the experiments discussed in [AP99],
392	    the Linux TCP implementation has been using various non-standard RTO
393	    mechanisms for many years seemingly without large scale problems
394	    (e.g., using different EWMA gains than specified in [RFC6298]).
395	    Further, a number of implementations use minimum RTOs that are less
396	    than the 1 second specified in [RFC6298].  While the implication of
397	    these deviations from the standard may be more spurious retransmits
398	    (per [AP99]), we are aware of no large scale problems caused by this
399	    change to the minimum RTO.

401	    Finally, we note that while allowing implementations to be more
402	    aggressive may in fact increase the number of needless
403	    retransmissions the above requirements fail safe in that they insist
404	    on exponential backoff of the RTO and a transmission rate reduction.
405	    Therefore, providing implementers more latitude than they have
406	    traditionally been given in IETF specifications of RTO mechanisms
407	    does not somehow open the flood gates to aggressive behavior.  Since
408	    there is a downside to being aggressive the incentives for proper
409	    behavior are retained in the mechanism.

411	5   Security Considerations

413	    This document does not alter the security properties of
414	    retransmission timeout mechanisms.  See [RFC6298] for a discussion
415	    of these within the context of TCP.

417	Acknowledgments

419	    This document benefits from years of discussions with Ethan Blanton,
420	    Sally Floyd, Jana Iyengar, Shawn Ostermann, Vern Paxson, and the
421	    members of the TCPM and TCP-IMPL working groups.  Ran Atkinson,
422	    Yuchung Cheng, David Black, Gorry Fairhurst, Jonathan Looney and
423	    Michael Scharf provided useful comments on a previous version of
424	    this draft.

426	Normative References

428	    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
429	        Requirement Levels", BCP 14, RFC 2119, March 1997.

431	Informative References

433	    [AP99] Allman, M., V. Paxson, "On Estimating End-to-End Network Path
434	        Properties", Proceedings of the ACM SIGCOMM Technical Symposium,
435	        September 1999.

437	    [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time
438	        Estimates in Reliable Transport Protocols", SIGCOMM 87.

440	    [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
441	        Selective Acknowledgment Options", RFC 2018, October 1996.

443	    [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140,
444	        April 1997.

446	    [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
447	        Extension to the Selective Acknowledgement (SACK) Option for
448	        TCP", RFC 2883, July 2000.

450	    [RFC3124] Balakrishnan, H., S. Seshan, "The Congestion Manager", RFC
451	        2134, June 2001.

453	    [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
454	        A., Peterson, J., Sparks, R., Handley, M., and E. Schooler,
455	        "SIP: Session Initiation Protocol", RFC 3261, June 2002.

457	    [RFC3522] Ludwig, R., M. Meyer, "The Eifel Detection Algorithm for
458	        TCP", RFC 3522, april 2003.

460	    [RFC3708] Blanton, E., M. Allman, "Using TCP Duplicate Selective
461	        Acknowledgement (DSACKs) and Stream Control Transmission
462	        Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs)
463	        to Detect Spurious Retransmissions", RFC 3708, February 2004.

465	    [RFC3940] Adamson, B., C. Bormann, M. Handley, J. Macker,
466	        "Negative-acknowledgment (NACK)-Oriented Reliable Multicast
467	        (NORM) Protocol", November 2004, RFC 3940.

469	    [RFC4340] Kohler, E., M. Handley, S. Floyd, "Datagram Congestion
470	        Control Protocol (DCCP)", March 2006, RFC 4340.

472	    [RFC4960] Stweart, R., "Stream Control Transmission Protocol", RFC
473	        4960, September 2007.

475	    [RFC5682] Sarolahti, P., M. Kojo, K. Yamamoto, M. Hata, "Forward
476	        RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious
477	        Retransmission Timeouts with TCP", RFC 5682, September 2009.

479	    [RFC5740] Adamson, B., C. Bormann, M. Handley, J. Macker,
480	        "NACK-Oriented Reliable Multicast (NORM) Transport Protocol",
481	        November 2009, RFC 5740.

483	    [RFC6182] Ford, A., C. Raiciu, M. Handley, S. Barre, J. Iyengar,
484	        "Architectural Guidelines for Multipath TCP Development", March
485	        2011, RFC 6182.

487	    [RFC6298] Paxson, V., M. Allman, H.K. Chu, M. Sargent, "Computing
488	        TCP's Retransmission Timer", June 2011, RFC 6298.

490	    [RFC6582] Henderson, T., S. Floyd, A. Gurtov, Y. Nishida, "The
491	        NewReno Modification to TCP's Fast Recovery Algorithm", April
492	        2012, RFC 6582.

494	    [RFC6675] Blanton, E., M. Allman, L. Wang, I. Jarvinen, M.  Kojo,
495	        Y. Nishida, "A Conservative Loss Recovery Algorithm Based on
496	        Selective Acknowledgment (SACK) for TCP", August 2012, RFC 6675.

498	    [RFC7323] Borman D., B. Braden, V. Jacobson, R. Scheffenegger, "TCP
499	        Extensions for High Performance", September 2014, RFC 7323.

501	Authors' Addresses

503	   Mark Allman
504	   International Computer Science Institute
505	   1947 Center St.  Suite 600
506	   Berkeley, CA  94704

508	   EMail: mallman@icir.org
509	   http://www.icir.org/mallman