idnits 2.17.1 

draft-ietf-tcpm-rto-consider-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  == The document has an IETF Trust Provisions of 28 Dec 2009, Section 6.c(i)
     Publication Limitation clause.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The abstract seems to contain references ([RFC2119]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 10, 2017) is 2603 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC5681' is mentioned on line 324, but not defined

  == Unused Reference: 'RFC3940' is defined on line 431, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6582' is defined on line 456, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 2140
     (Obsoleted by RFC 9040)

  -- Obsolete informational reference (is this intentional?): RFC 3940
     (Obsoleted by RFC 5740)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)


     Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                                M. Allman
2	INTERNET-DRAFT                                                      ICSI
3	File: draft-ietf-tcpm-rto-consider-05.txt                 March 10, 2017
4	Intended Status: Best Current Practice
5	Expires: September 10, 2017

7	                  Retransmission Timeout Requirements

9	Status of this Memo

11	    This document may not be modified, and derivative works of it may
12	    not be created, except to format it for publication as an RFC or to
13	    translate it into languages other than English.

15	    This Internet-Draft is submitted in full conformance with the
16	    provisions of BCP 78 and BCP 79.  Internet-Drafts are working
17	    documents of the Internet Engineering Task Force (IETF), its areas,
18	    and its working groups. Note that other groups may also distribute
19	    working documents as Internet-Drafts.

21	    Internet-Drafts are draft documents valid for a maximum of six
22	    months and may be updated, replaced, or obsoleted by other documents
23	    at any time. It is inappropriate to use Internet-Drafts as
24	    reference material or to cite them other than as "work in progress."

26	    The list of current Internet-Drafts can be accessed at
27	    http://www.ietf.org/1id-abstracts.html

29	    The list of Internet-Draft Shadow Directories can be accessed at
30	    http://www.ietf.org/shadow.html

32	    This Internet-Draft will expire on September 10, 2017.

34	Copyright Notice

36	    Copyright (c) 2017 IETF Trust and the persons identified as the
37	    document authors. All rights reserved.

39	    This document is subject to BCP 78 and the IETF Trust's Legal
40	    Provisions Relating to IETF Documents
41	    (http://trustee.ietf.org/license-info) in effect on the date of
42	    publication of this document. Please review these documents
43	    carefully, as they describe your rights and restrictions with
44	    respect to this document. Code Components extracted from this
45	    document must include Simplified BSD License text as described in
46	    Section 4.e of the Trust Legal Provisions and are provided without
47	    warranty as described in the Simplified BSD License.

49	Abstract

51	    Ensuring reliable communication often manifests in a timeout and
52	    retry mechanism.  Each implementation of a retransmission timeout
53	    mechanism represents a balance between correctness and timeliness
54	    and therefore no implementation suits all situations.  This document
55	    provides high-level requirements for retransmission timeout schemes
56	    appropriate for general use in the Internet.  Within the
57	    requirements, implementations have latitude to define particulars
58	    that best address each situation.

60	Terminology

62	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
63	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
64	    document are to be interpreted as described in BCP 14, RFC 2119
65	    [RFC2119].

67	1   Introduction

69	    Reliable transmission is a key property for many network protocols
70	    and applications.  Our protocols use various mechanisms to achieve
71	    reliable data transmission.  Often we use continuous or periodic
72	    reports from the recipient to inform the sender's notion of which
73	    pieces of data are missing and need to be retransmitted to ensure
74	    reliability.  Alternatively, information coding---e.g., FEC---can be
75	    used to achieve probabilistic reliability without retransmissions.
76	    However, despite our best intentions and most robust mechanisms, the
77	    only thing we can truly depend on is the passage of time and
78	    therefore our ultimate backstop to ensuring reliability is a timeout
79	    and re-try mechanism.  That is, the sender sets some expectation for
80	    how long to wait for confirmation of delivery for a given piece of
81	    data.  When this time period passes without delivery confirmation
82	    the sender assumes the data was lost in transit and therefore
83	    schedules a retransmission.  This process of ensuring reliability
84	    via time-based loss detection and resending lost data is commonly
85	    referred to as a "retransmission timeout (RTO)" mechanism.

87	    Various protocols have defined their own RTO mechanisms (e.g., TCP
88	    [RFC6298], SCTP [RFC4960], SIP [RFC3261]).  The specifics of
89	    retransmission timeouts often represent a particular tradeoff
90	    between correctness and responsiveness [AP99].  In other words we
91	    want to simultaneously:

93	      - wait long enough to ensure the detection of loss is correct and
94	        therefore a retransmission is in fact needed, and

96	      - bound the delay we impose on applications before repairing
97	        loss.

99	    Serving both of these goals is difficult as they pull in opposite
100	    directions.  I.e., towards either (a) withholding needed
101	    retransmissions too long to ensure the original transmission is
102	    truly lost or (b) not waiting long enough---to help application
103	    responsiveness---and hence sending unnecessary (often denoted
104	    "spurious") retransmissions.

106	    We have found that even though the RTO procedure is standardized for
107	    some protocols (e.g., TCP [RFC6298]), implementations often add
108	    their own subtle imprint on the specifics of the process to tilt the
109	    tradeoff between correctness and responsiveness in some particular
110	    way.

112	    At this point we recognize that often these specific tweaks that
113	    deviate from standardized RTO mechanisms do not materially impact
114	    network safety.  Therefore, in this document we outline a set of
115	    high-level protocol-agnostic requirements for RTO mechanisms.  The
116	    intent is to provide a safe foundation on which implementations have
117	    the flexibility to instantiate mechanisms that best realize their
118	    specific goals.

120	2   Scope

122	    The principles we outline in this document are protocol-agnostic and
123	    widely applicable.  We make the following scope statements about
124	    the application of the requirements discussed in Section 3:

126	    (S.1) The requirements in this document apply only to timer-based
127	          loss detection and retransmission.

129	          While there are a bevy of uses for timers in protocols---from
130	          rate-based pacing to connection failure detection to making
131	          congestion control decisions and beyond---these are outside
132	          the scope of this document.

134	    (S.2) The requirements in this document only apply to cases where
135	          loss detected via a timer is repaired by a retransmission of
136	          the original data.

138	          Other cases are certainly possible---e.g., replacing the lost
139	          data with an updated version---but fall outside the scope of
140	          this document.

142	    (S.3) The requirements in this document apply only to endpoint-to-
143	          endpoint unicast communication.  Reliable multicast (e.g.,
144	          [RFC5740]) protocols are explicitly outside the scope of this
145	          document.

147	          Protocols such as SCTP [RFC4960] and MP-TCP [RFC6182] that
148	          communicate in a unicast fashion with multiple specific
149	          endpoints can leverage the requirements in this document
150	          provided they track state and follow the requirements for each
151	          endpoint independently.  I.e., if host A communicates with
152	          hosts B and C, A must use independent RTOs for traffic sent to
153	          B and C.

155	    (S.4) There are cases where state is shared across connections or
156	          flows (e.g., [RFC2140], [RFC3124]).  The RTO is one piece
157	          state that is often discussed as sharable.  These situations
158	          raise issues that the simple flow-oriented RTO mechanism
159	          discussed in this document does not consider (e.g., how long
160	          to preserve state between connections).  Therefore, while the
161	          general principles given in Section 3 are likely applicable,
162	          sharing RTOs across flows is outside the scope of this
163	          document.

165	    (S.5) The requirements in this document apply to reliable
166	          transmission, but do not assume that all data transmitted
167	          within a connection or flow is reliably sent.

169	          E.g., a protocol like DCCP [RFC4340] could leverage the
170	          requirements in this document for the initial reliable
171	          handshake even though the protocol reverts to unreliable
172	          transmission after the handshake.

174	          E.g., a protocol like SCTP [RFC4960] could leverage the
175	          requirements for data that is sent only "partially reliably".
176	          In this case, the protocol uses two phases for each message.
177	          In the first phase, the protocol attempts to ensure
178	          reliability and can leverage the requirements in this
179	          document.  At some point the value of the data is gone and the
180	          protocol transitions to the second phase where the data is
181	          treated as unreliably transmitted and therefore the protocol
182	          will no longer attempt to repair the loss---and hence there
183	          are no more retransmissions and the requirements in this
184	          document are moot.

186	    (S.6) The requirements for RTO mechanisms in this document can be
187	          applied regardless of whether the RTO mechanism is the sole
188	          loss repair strategy or works in concert with other
189	          mechanisms.

191	          E.g., for a simple protocol like UDP-based DNS [] a timeout
192	          and re-try mechanism is likely to act alone to ensure
193	          reliability.

195	          E.g., within a complex protocol like TCP or SCTP we have
196	          designed methods to detect and repair loss based on explicit
197	          endpoint state sharing [RFC2018,RFC4960,RFC6675].  These
198	          mechanisms are preferred over the RTO as they are often more
199	          timely and precise than the coarse-grained RTO.  In these
200	          cases, the RTO becomes a last resort when the more advanced
201	          mechanisms fail.

203	3   Requirements

205	    We now list the requirements that apply when designing
206	    retransmission timeout (RTO) mechanisms.

208	    (1) In the absence of any knowledge about the latency of a path, the
209	        RTO MUST be conservatively set to no less than 1 second.

211	        This requirement ensures two important aspects of the RTO.
212	        First, when transmitting into an unknown network,
213	        retransmissions will not be sent before an ACK would reasonably
214	        be expected to arrive and hence possibly waste scarce network
215	        resources.  Second, as noted below, sometimes retransmissions
216	        can lead to ambiguities in assessing the latency of a network
217	        path.  Therefore, it is especially important for the first
218	        latency sample to be free of ambiguities such that there is a
219	        baseline for the remainder of the communication.

221	        The specific constant (1 second) comes from the analysis of
222	        Internet RTTs found in Appendix A of [RFC6298].

224	    (2) As we note above, loss detection happens when a sender does not
225	        receive delivery confirmation within an some expected period of
226	        time.  We now specify four requirements that pertain to setting
227	        the length of this expectation.

229	        Often measuring the time required for delivery confirmation is
230	        is framed as involving the "round-trip time (RTT)" of the
231	        network path as this is the minimum amount of time required to
232	        receive delivery confirmation and also often follows protocol
233	        behavior whereby acknowledgments are generated quickly after
234	        data arrives.  For instance, this is the case for the RTO used
235	        by TCP [RFC6298] and SCTP [RFC4960].  However, this is somewhat
236	        mis-leading as the expected latency is better framed as the
237	        "feedback time" (FT).  In other words, the expectation is not
238	        always simply a network property, but includes additional time
239	        before a sender should reasonably expect a response to a query.

241	        For instance, consider a UDP-based DNS request from a client to
242	        a recursive resolver.  When the request can be served from the
243	        resolver's cache the FT likely well approximates the network RTT
244	        between the client and resolver.  However, on a cache miss the
245	        resolver will request the needed information from one or more
246	        authoritative DNS servers, which will non-trivially increase the
247	        FT compared to the RTT between the client and resolver.

249	        Therefore, we express the following requirements in terms of FT:

251	        (a) In steady state the RTO SHOULD be set based on recent
252	            observations of both the FT and the variance of the FT.

254	            In other words, the RTO should be based on a reasonable
255	            amount of time that the sender should wait for delivery
256	            confirmation before retransmitting the given data.

258	        (b) FT observations SHOULD be taken regularly.

260	            Internet measurements show that taking only a single FT
261	            sample per TCP connection results in a relatively poorly
262	            performing RTO mechanism [AP99], hence this requirement that
263	            the FT be sampled continuously throughout the lifetime of
264	            communication.

266	            The notion of "regularly" SHOULD be defined as at least once
267	            per RTT or as frequently as data is exchanged in cases where
268	            that happens less frequently than once per RTT.  However, we
269	            also recognize that it may not always be practical to take
270	            an FT sample this often in all cases.  Hence, this
271	            once-per-RTT definition of "regularly" is explicitly a
272	            "SHOULD" and not a "MUST".

274	            TCP takes an FT sample roughly once per RTT, or if using the
275	            timestamp option [RFC7323] on each acknowledgment arrival.
276	            [AP99] shows that both these approaches result in roughly
277	            equivalent performance for the RTO estimator.

279	        (c) FT observations MAY be taken from non-data exchanges.

281	            Some protocols use keepalives, heartbeats or other messages
282	            to exchange control information.  To the extent that the
283	            latency of these transactions mirrors data exchange, they
284	            can be leveraged to take FT samples within the RTO
285	            mechanism.  Such samples can help protocols keep their RTO
286	            accurate during lulls in data transmission.  However, given
287	            that these messages may not be subject to the same delays as
288	            data transmission, we do not take a general view on whether
289	            this is useful or not.

291	        (d) An RTO mechanism MUST NOT use ambiguous FT samples.

293	            Assume two copies of some segment X are transmitted at times
294	            t0 and t1 and then at time t2 the sender receives
295	            confirmation that X in fact arrived.  In some cases, it is
296	            not clear which copy of X triggered the confirmation and
297	            hence the actual FT is either t2-t1 or t2-t0, but which is a
298	            mystery.  Therefore, in this situation an implementation
299	            MUST use Karn's algorithm [KP87,RFC6298] and use neither
300	            version of the FT sample and hence not update the RTO.

302	            There are cases where two copies of some data are
303	            transmitted in a way whereby the sender can tell which is
304	            being acknowledged by an incoming ACK.  E.g., TCP's
305	            timestamp option [RFC7323] allows for segments to be
306	            uniquely identified and hence avoid the ambiguity.  In such
307	            cases there is no ambiguity and the resulting samples can
308	            update the RTO.

310	    (3) Each time the RTO is used to detect a loss and a retransmission
311	        is scheduled, the value of the RTO MUST be exponentially backed
312	        off such that the next firing requires a longer interval.  The
313	        backoff SHOULD be removed after the successful repair of the
314	        lost data and subsequent transmission of non-retransmitted data.

316	        A maximum value MAY be placed on the RTO.  The maximum RTO MUST
317	        NOT be less than 60 seconds (a la [RFC6298]).

319	        This ensures network safety.

321	    (4) Retransmissions triggered by the RTO mechanism MUST be taken as
322	        indications of network congestion and the sending rate adapted
323	        using a standard mechanism (e.g., TCP collapses the congestion
324	        window to one segment [RFC5681]).

326	        This ensures network safety.

328	        Exception could be made to this rule if an IETF standardized
329	        mechanism is used to determine that a particular loss is due to
330	        a non-congestion event (e.g., packet corruption).  In such a
331	        case a congestion control action is not required.  Additionally,
332	        RTO-triggered congestion control actions may be reversed when a
333	        standard mechanism determines that the cause of the loss was not
334	        congestion after all (e.g., [RFC5682]).

336	4   Discussion

338	    We note that research has shown the tension between the
339	    responsiveness and correctness of retransmission timeouts seems to
340	    be a fundamental tradeoff in the context of TCP [AP99].  That is,
341	    making the RTO more aggressive (e.g., via changing TCP's EWMA gains,
342	    lowering the minimum RTO, etc.) can reduce the time spent waiting on
343	    needed retransmissions.  However, at the same time, such
344	    aggressiveness leads to more needless retransmissions.  Therefore,
345	    being as aggressive as the requirements given in the previous
346	    section allow in any particular situation may not be the best course
347	    of action because an RTO expiration carries a requirement to invoke
348	    a congestion response and hence slow transmission down.

350	    While the tradeoff between responsiveness and correctness seems
351	    fundamental, the tradeoff can be made less relevant if the sender
352	    can detect and recover from spurious RTOs.  Several mechanisms have
353	    been proposed for this purpose, such as Eifel [RFC3522], F-RTO
354	    [RFC5682] and DSACK [RFC2883,RFC3708].  Using such mechanisms may
355	    allow a data originator to tip towards being more responsive without
356	    incurring (as much of) the attendant costs of needless retransmits.

358	    Also, note, that in addition to the experiments discussed in [AP99],
359	    the Linux TCP implementation has been using various non-standard RTO
360	    mechanisms for many years seemingly without large scale problems
361	    (e.g., using different EWMA gains than specified in [RFC6298]).
362	    Further, a number of implementations use minimum RTOs that are less
363	    than the 1 second specified in [RFC6298].  While the implication of
364	    these deviations from the standard may be more spurious retransmits
365	    (per [AP99]), we are aware of no large scale network safety issues
366	    caused by this change to the minimum RTO.

368	    Finally, we note that while allowing implementations to be more
369	    aggressive may in fact increase the number of needless
370	    retransmissions the above requirements fail safe in that they insist
371	    on exponential backoff of the RTO and a transmission rate reduction.
372	    Therefore, providing implementers more latitude than they have
373	    traditionally been given in IETF specifications of RTO mechanisms
374	    does not somehow open the flood gates to aggressive behavior.  Since
375	    there is a downside to being aggressive the incentives for proper
376	    behavior are retained in the mechanism.

378	5   Security Considerations
379	    This document does not alter the security properties of
380	    retransmission timeout mechanisms.  See [RFC6298] for a discussion
381	    of these within the context of TCP.

383	Acknowledgments

385	    This document benefits from years of discussions with Ethan Blanton,
386	    Sally Floyd, Jana Iyengar, Shawn Ostermann, Vern Paxson, and the
387	    members of the TCPM and TCP-IMPL working groups.  Ran Atkinson,
388	    Yuchung Cheng, David Black, Gorry Fairhurst, Mirja Kuhlewind,
389	    Jonathan Looney and Michael Scharf provided useful comments on a
390	    previous version of this draft.

392	Normative References

394	    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
395	        Requirement Levels", BCP 14, RFC 2119, March 1997.

397	Informative References

399	    [AP99] Allman, M., V. Paxson, "On Estimating End-to-End Network Path
400	        Properties", Proceedings of the ACM SIGCOMM Technical Symposium,
401	        September 1999.

403	    [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time
404	        Estimates in Reliable Transport Protocols", SIGCOMM 87.

406	    [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
407	        Selective Acknowledgment Options", RFC 2018, October 1996.

409	    [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140,
410	        April 1997.

412	    [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
413	        Extension to the Selective Acknowledgement (SACK) Option for
414	        TCP", RFC 2883, July 2000.

416	    [RFC3124] Balakrishnan, H., S. Seshan, "The Congestion Manager", RFC
417	        2134, June 2001.

419	    [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
420	        A., Peterson, J., Sparks, R., Handley, M., and E. Schooler,
421	        "SIP: Session Initiation Protocol", RFC 3261, June 2002.

423	    [RFC3522] Ludwig, R., M. Meyer, "The Eifel Detection Algorithm for
424	        TCP", RFC 3522, april 2003.

426	    [RFC3708] Blanton, E., M. Allman, "Using TCP Duplicate Selective
427	        Acknowledgement (DSACKs) and Stream Control Transmission
428	        Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs)
429	        to Detect Spurious Retransmissions", RFC 3708, February 2004.

431	    [RFC3940] Adamson, B., C. Bormann, M. Handley, J. Macker,
432	        "Negative-acknowledgment (NACK)-Oriented Reliable Multicast
433	        (NORM) Protocol", November 2004, RFC 3940.

435	    [RFC4340] Kohler, E., M. Handley, S. Floyd, "Datagram Congestion
436	        Control Protocol (DCCP)", March 2006, RFC 4340.

438	    [RFC4960] Stweart, R., "Stream Control Transmission Protocol", RFC
439	        4960, September 2007.

441	    [RFC5682] Sarolahti, P., M. Kojo, K. Yamamoto, M. Hata, "Forward
442	        RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious
443	        Retransmission Timeouts with TCP", RFC 5682, September 2009.

445	    [RFC5740] Adamson, B., C. Bormann, M. Handley, J. Macker,
446	        "NACK-Oriented Reliable Multicast (NORM) Transport Protocol",
447	        November 2009, RFC 5740.

449	    [RFC6182] Ford, A., C. Raiciu, M. Handley, S. Barre, J. Iyengar,
450	        "Architectural Guidelines for Multipath TCP Development", March
451	        2011, RFC 6182.

453	    [RFC6298] Paxson, V., M. Allman, H.K. Chu, M. Sargent, "Computing
454	        TCP's Retransmission Timer", June 2011, RFC 6298.

456	    [RFC6582] Henderson, T., S. Floyd, A. Gurtov, Y. Nishida, "The
457	        NewReno Modification to TCP's Fast Recovery Algorithm", April
458	        2012, RFC 6582.

460	    [RFC6675] Blanton, E., M. Allman, L. Wang, I. Jarvinen, M.  Kojo,
461	        Y. Nishida, "A Conservative Loss Recovery Algorithm Based on
462	        Selective Acknowledgment (SACK) for TCP", August 2012, RFC 6675.

464	    [RFC7323] Borman D., B. Braden, V. Jacobson, R. Scheffenegger, "TCP
465	        Extensions for High Performance", September 2014, RFC 7323.

467	Authors' Addresses

469	   Mark Allman
470	   International Computer Science Institute
471	   1947 Center St.  Suite 600
472	   Berkeley, CA  94704

474	   EMail: mallman@icir.org
475	   http://www.icir.org/mallman