idnits 2.17.1 draft-ietf-tcpm-rto-consider-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- == The document has an IETF Trust Provisions of 28 Dec 2009, Section 6.c(i) Publication Limitation clause. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 10, 2017) is 2603 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5681' is mentioned on line 324, but not defined == Unused Reference: 'RFC3940' is defined on line 431, but no explicit reference was found in the text == Unused Reference: 'RFC6582' is defined on line 456, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2140 (Obsoleted by RFC 9040) -- Obsolete informational reference (is this intentional?): RFC 3940 (Obsoleted by RFC 5740) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force M. Allman 2 INTERNET-DRAFT ICSI 3 File: draft-ietf-tcpm-rto-consider-05.txt March 10, 2017 4 Intended Status: Best Current Practice 5 Expires: September 10, 2017 7 Retransmission Timeout Requirements 9 Status of this Memo 11 This document may not be modified, and derivative works of it may 12 not be created, except to format it for publication as an RFC or to 13 translate it into languages other than English. 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. Internet-Drafts are working 17 documents of the Internet Engineering Task Force (IETF), its areas, 18 and its working groups. Note that other groups may also distribute 19 working documents as Internet-Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other documents 23 at any time. It is inappropriate to use Internet-Drafts as 24 reference material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/1id-abstracts.html 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html 32 This Internet-Draft will expire on September 10, 2017. 34 Copyright Notice 36 Copyright (c) 2017 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with 44 respect to this document. Code Components extracted from this 45 document must include Simplified BSD License text as described in 46 Section 4.e of the Trust Legal Provisions and are provided without 47 warranty as described in the Simplified BSD License. 49 Abstract 51 Ensuring reliable communication often manifests in a timeout and 52 retry mechanism. Each implementation of a retransmission timeout 53 mechanism represents a balance between correctness and timeliness 54 and therefore no implementation suits all situations. This document 55 provides high-level requirements for retransmission timeout schemes 56 appropriate for general use in the Internet. Within the 57 requirements, implementations have latitude to define particulars 58 that best address each situation. 60 Terminology 62 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 63 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 64 document are to be interpreted as described in BCP 14, RFC 2119 65 [RFC2119]. 67 1 Introduction 69 Reliable transmission is a key property for many network protocols 70 and applications. Our protocols use various mechanisms to achieve 71 reliable data transmission. Often we use continuous or periodic 72 reports from the recipient to inform the sender's notion of which 73 pieces of data are missing and need to be retransmitted to ensure 74 reliability. Alternatively, information coding---e.g., FEC---can be 75 used to achieve probabilistic reliability without retransmissions. 76 However, despite our best intentions and most robust mechanisms, the 77 only thing we can truly depend on is the passage of time and 78 therefore our ultimate backstop to ensuring reliability is a timeout 79 and re-try mechanism. That is, the sender sets some expectation for 80 how long to wait for confirmation of delivery for a given piece of 81 data. When this time period passes without delivery confirmation 82 the sender assumes the data was lost in transit and therefore 83 schedules a retransmission. This process of ensuring reliability 84 via time-based loss detection and resending lost data is commonly 85 referred to as a "retransmission timeout (RTO)" mechanism. 87 Various protocols have defined their own RTO mechanisms (e.g., TCP 88 [RFC6298], SCTP [RFC4960], SIP [RFC3261]). The specifics of 89 retransmission timeouts often represent a particular tradeoff 90 between correctness and responsiveness [AP99]. In other words we 91 want to simultaneously: 93 - wait long enough to ensure the detection of loss is correct and 94 therefore a retransmission is in fact needed, and 96 - bound the delay we impose on applications before repairing 97 loss. 99 Serving both of these goals is difficult as they pull in opposite 100 directions. I.e., towards either (a) withholding needed 101 retransmissions too long to ensure the original transmission is 102 truly lost or (b) not waiting long enough---to help application 103 responsiveness---and hence sending unnecessary (often denoted 104 "spurious") retransmissions. 106 We have found that even though the RTO procedure is standardized for 107 some protocols (e.g., TCP [RFC6298]), implementations often add 108 their own subtle imprint on the specifics of the process to tilt the 109 tradeoff between correctness and responsiveness in some particular 110 way. 112 At this point we recognize that often these specific tweaks that 113 deviate from standardized RTO mechanisms do not materially impact 114 network safety. Therefore, in this document we outline a set of 115 high-level protocol-agnostic requirements for RTO mechanisms. The 116 intent is to provide a safe foundation on which implementations have 117 the flexibility to instantiate mechanisms that best realize their 118 specific goals. 120 2 Scope 122 The principles we outline in this document are protocol-agnostic and 123 widely applicable. We make the following scope statements about 124 the application of the requirements discussed in Section 3: 126 (S.1) The requirements in this document apply only to timer-based 127 loss detection and retransmission. 129 While there are a bevy of uses for timers in protocols---from 130 rate-based pacing to connection failure detection to making 131 congestion control decisions and beyond---these are outside 132 the scope of this document. 134 (S.2) The requirements in this document only apply to cases where 135 loss detected via a timer is repaired by a retransmission of 136 the original data. 138 Other cases are certainly possible---e.g., replacing the lost 139 data with an updated version---but fall outside the scope of 140 this document. 142 (S.3) The requirements in this document apply only to endpoint-to- 143 endpoint unicast communication. Reliable multicast (e.g., 144 [RFC5740]) protocols are explicitly outside the scope of this 145 document. 147 Protocols such as SCTP [RFC4960] and MP-TCP [RFC6182] that 148 communicate in a unicast fashion with multiple specific 149 endpoints can leverage the requirements in this document 150 provided they track state and follow the requirements for each 151 endpoint independently. I.e., if host A communicates with 152 hosts B and C, A must use independent RTOs for traffic sent to 153 B and C. 155 (S.4) There are cases where state is shared across connections or 156 flows (e.g., [RFC2140], [RFC3124]). The RTO is one piece 157 state that is often discussed as sharable. These situations 158 raise issues that the simple flow-oriented RTO mechanism 159 discussed in this document does not consider (e.g., how long 160 to preserve state between connections). Therefore, while the 161 general principles given in Section 3 are likely applicable, 162 sharing RTOs across flows is outside the scope of this 163 document. 165 (S.5) The requirements in this document apply to reliable 166 transmission, but do not assume that all data transmitted 167 within a connection or flow is reliably sent. 169 E.g., a protocol like DCCP [RFC4340] could leverage the 170 requirements in this document for the initial reliable 171 handshake even though the protocol reverts to unreliable 172 transmission after the handshake. 174 E.g., a protocol like SCTP [RFC4960] could leverage the 175 requirements for data that is sent only "partially reliably". 176 In this case, the protocol uses two phases for each message. 177 In the first phase, the protocol attempts to ensure 178 reliability and can leverage the requirements in this 179 document. At some point the value of the data is gone and the 180 protocol transitions to the second phase where the data is 181 treated as unreliably transmitted and therefore the protocol 182 will no longer attempt to repair the loss---and hence there 183 are no more retransmissions and the requirements in this 184 document are moot. 186 (S.6) The requirements for RTO mechanisms in this document can be 187 applied regardless of whether the RTO mechanism is the sole 188 loss repair strategy or works in concert with other 189 mechanisms. 191 E.g., for a simple protocol like UDP-based DNS [] a timeout 192 and re-try mechanism is likely to act alone to ensure 193 reliability. 195 E.g., within a complex protocol like TCP or SCTP we have 196 designed methods to detect and repair loss based on explicit 197 endpoint state sharing [RFC2018,RFC4960,RFC6675]. These 198 mechanisms are preferred over the RTO as they are often more 199 timely and precise than the coarse-grained RTO. In these 200 cases, the RTO becomes a last resort when the more advanced 201 mechanisms fail. 203 3 Requirements 205 We now list the requirements that apply when designing 206 retransmission timeout (RTO) mechanisms. 208 (1) In the absence of any knowledge about the latency of a path, the 209 RTO MUST be conservatively set to no less than 1 second. 211 This requirement ensures two important aspects of the RTO. 212 First, when transmitting into an unknown network, 213 retransmissions will not be sent before an ACK would reasonably 214 be expected to arrive and hence possibly waste scarce network 215 resources. Second, as noted below, sometimes retransmissions 216 can lead to ambiguities in assessing the latency of a network 217 path. Therefore, it is especially important for the first 218 latency sample to be free of ambiguities such that there is a 219 baseline for the remainder of the communication. 221 The specific constant (1 second) comes from the analysis of 222 Internet RTTs found in Appendix A of [RFC6298]. 224 (2) As we note above, loss detection happens when a sender does not 225 receive delivery confirmation within an some expected period of 226 time. We now specify four requirements that pertain to setting 227 the length of this expectation. 229 Often measuring the time required for delivery confirmation is 230 is framed as involving the "round-trip time (RTT)" of the 231 network path as this is the minimum amount of time required to 232 receive delivery confirmation and also often follows protocol 233 behavior whereby acknowledgments are generated quickly after 234 data arrives. For instance, this is the case for the RTO used 235 by TCP [RFC6298] and SCTP [RFC4960]. However, this is somewhat 236 mis-leading as the expected latency is better framed as the 237 "feedback time" (FT). In other words, the expectation is not 238 always simply a network property, but includes additional time 239 before a sender should reasonably expect a response to a query. 241 For instance, consider a UDP-based DNS request from a client to 242 a recursive resolver. When the request can be served from the 243 resolver's cache the FT likely well approximates the network RTT 244 between the client and resolver. However, on a cache miss the 245 resolver will request the needed information from one or more 246 authoritative DNS servers, which will non-trivially increase the 247 FT compared to the RTT between the client and resolver. 249 Therefore, we express the following requirements in terms of FT: 251 (a) In steady state the RTO SHOULD be set based on recent 252 observations of both the FT and the variance of the FT. 254 In other words, the RTO should be based on a reasonable 255 amount of time that the sender should wait for delivery 256 confirmation before retransmitting the given data. 258 (b) FT observations SHOULD be taken regularly. 260 Internet measurements show that taking only a single FT 261 sample per TCP connection results in a relatively poorly 262 performing RTO mechanism [AP99], hence this requirement that 263 the FT be sampled continuously throughout the lifetime of 264 communication. 266 The notion of "regularly" SHOULD be defined as at least once 267 per RTT or as frequently as data is exchanged in cases where 268 that happens less frequently than once per RTT. However, we 269 also recognize that it may not always be practical to take 270 an FT sample this often in all cases. Hence, this 271 once-per-RTT definition of "regularly" is explicitly a 272 "SHOULD" and not a "MUST". 274 TCP takes an FT sample roughly once per RTT, or if using the 275 timestamp option [RFC7323] on each acknowledgment arrival. 276 [AP99] shows that both these approaches result in roughly 277 equivalent performance for the RTO estimator. 279 (c) FT observations MAY be taken from non-data exchanges. 281 Some protocols use keepalives, heartbeats or other messages 282 to exchange control information. To the extent that the 283 latency of these transactions mirrors data exchange, they 284 can be leveraged to take FT samples within the RTO 285 mechanism. Such samples can help protocols keep their RTO 286 accurate during lulls in data transmission. However, given 287 that these messages may not be subject to the same delays as 288 data transmission, we do not take a general view on whether 289 this is useful or not. 291 (d) An RTO mechanism MUST NOT use ambiguous FT samples. 293 Assume two copies of some segment X are transmitted at times 294 t0 and t1 and then at time t2 the sender receives 295 confirmation that X in fact arrived. In some cases, it is 296 not clear which copy of X triggered the confirmation and 297 hence the actual FT is either t2-t1 or t2-t0, but which is a 298 mystery. Therefore, in this situation an implementation 299 MUST use Karn's algorithm [KP87,RFC6298] and use neither 300 version of the FT sample and hence not update the RTO. 302 There are cases where two copies of some data are 303 transmitted in a way whereby the sender can tell which is 304 being acknowledged by an incoming ACK. E.g., TCP's 305 timestamp option [RFC7323] allows for segments to be 306 uniquely identified and hence avoid the ambiguity. In such 307 cases there is no ambiguity and the resulting samples can 308 update the RTO. 310 (3) Each time the RTO is used to detect a loss and a retransmission 311 is scheduled, the value of the RTO MUST be exponentially backed 312 off such that the next firing requires a longer interval. The 313 backoff SHOULD be removed after the successful repair of the 314 lost data and subsequent transmission of non-retransmitted data. 316 A maximum value MAY be placed on the RTO. The maximum RTO MUST 317 NOT be less than 60 seconds (a la [RFC6298]). 319 This ensures network safety. 321 (4) Retransmissions triggered by the RTO mechanism MUST be taken as 322 indications of network congestion and the sending rate adapted 323 using a standard mechanism (e.g., TCP collapses the congestion 324 window to one segment [RFC5681]). 326 This ensures network safety. 328 Exception could be made to this rule if an IETF standardized 329 mechanism is used to determine that a particular loss is due to 330 a non-congestion event (e.g., packet corruption). In such a 331 case a congestion control action is not required. Additionally, 332 RTO-triggered congestion control actions may be reversed when a 333 standard mechanism determines that the cause of the loss was not 334 congestion after all (e.g., [RFC5682]). 336 4 Discussion 338 We note that research has shown the tension between the 339 responsiveness and correctness of retransmission timeouts seems to 340 be a fundamental tradeoff in the context of TCP [AP99]. That is, 341 making the RTO more aggressive (e.g., via changing TCP's EWMA gains, 342 lowering the minimum RTO, etc.) can reduce the time spent waiting on 343 needed retransmissions. However, at the same time, such 344 aggressiveness leads to more needless retransmissions. Therefore, 345 being as aggressive as the requirements given in the previous 346 section allow in any particular situation may not be the best course 347 of action because an RTO expiration carries a requirement to invoke 348 a congestion response and hence slow transmission down. 350 While the tradeoff between responsiveness and correctness seems 351 fundamental, the tradeoff can be made less relevant if the sender 352 can detect and recover from spurious RTOs. Several mechanisms have 353 been proposed for this purpose, such as Eifel [RFC3522], F-RTO 354 [RFC5682] and DSACK [RFC2883,RFC3708]. Using such mechanisms may 355 allow a data originator to tip towards being more responsive without 356 incurring (as much of) the attendant costs of needless retransmits. 358 Also, note, that in addition to the experiments discussed in [AP99], 359 the Linux TCP implementation has been using various non-standard RTO 360 mechanisms for many years seemingly without large scale problems 361 (e.g., using different EWMA gains than specified in [RFC6298]). 362 Further, a number of implementations use minimum RTOs that are less 363 than the 1 second specified in [RFC6298]. While the implication of 364 these deviations from the standard may be more spurious retransmits 365 (per [AP99]), we are aware of no large scale network safety issues 366 caused by this change to the minimum RTO. 368 Finally, we note that while allowing implementations to be more 369 aggressive may in fact increase the number of needless 370 retransmissions the above requirements fail safe in that they insist 371 on exponential backoff of the RTO and a transmission rate reduction. 372 Therefore, providing implementers more latitude than they have 373 traditionally been given in IETF specifications of RTO mechanisms 374 does not somehow open the flood gates to aggressive behavior. Since 375 there is a downside to being aggressive the incentives for proper 376 behavior are retained in the mechanism. 378 5 Security Considerations 379 This document does not alter the security properties of 380 retransmission timeout mechanisms. See [RFC6298] for a discussion 381 of these within the context of TCP. 383 Acknowledgments 385 This document benefits from years of discussions with Ethan Blanton, 386 Sally Floyd, Jana Iyengar, Shawn Ostermann, Vern Paxson, and the 387 members of the TCPM and TCP-IMPL working groups. Ran Atkinson, 388 Yuchung Cheng, David Black, Gorry Fairhurst, Mirja Kuhlewind, 389 Jonathan Looney and Michael Scharf provided useful comments on a 390 previous version of this draft. 392 Normative References 394 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 395 Requirement Levels", BCP 14, RFC 2119, March 1997. 397 Informative References 399 [AP99] Allman, M., V. Paxson, "On Estimating End-to-End Network Path 400 Properties", Proceedings of the ACM SIGCOMM Technical Symposium, 401 September 1999. 403 [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time 404 Estimates in Reliable Transport Protocols", SIGCOMM 87. 406 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 407 Selective Acknowledgment Options", RFC 2018, October 1996. 409 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, 410 April 1997. 412 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 413 Extension to the Selective Acknowledgement (SACK) Option for 414 TCP", RFC 2883, July 2000. 416 [RFC3124] Balakrishnan, H., S. Seshan, "The Congestion Manager", RFC 417 2134, June 2001. 419 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 420 A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, 421 "SIP: Session Initiation Protocol", RFC 3261, June 2002. 423 [RFC3522] Ludwig, R., M. Meyer, "The Eifel Detection Algorithm for 424 TCP", RFC 3522, april 2003. 426 [RFC3708] Blanton, E., M. Allman, "Using TCP Duplicate Selective 427 Acknowledgement (DSACKs) and Stream Control Transmission 428 Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs) 429 to Detect Spurious Retransmissions", RFC 3708, February 2004. 431 [RFC3940] Adamson, B., C. Bormann, M. Handley, J. Macker, 432 "Negative-acknowledgment (NACK)-Oriented Reliable Multicast 433 (NORM) Protocol", November 2004, RFC 3940. 435 [RFC4340] Kohler, E., M. Handley, S. Floyd, "Datagram Congestion 436 Control Protocol (DCCP)", March 2006, RFC 4340. 438 [RFC4960] Stweart, R., "Stream Control Transmission Protocol", RFC 439 4960, September 2007. 441 [RFC5682] Sarolahti, P., M. Kojo, K. Yamamoto, M. Hata, "Forward 442 RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious 443 Retransmission Timeouts with TCP", RFC 5682, September 2009. 445 [RFC5740] Adamson, B., C. Bormann, M. Handley, J. Macker, 446 "NACK-Oriented Reliable Multicast (NORM) Transport Protocol", 447 November 2009, RFC 5740. 449 [RFC6182] Ford, A., C. Raiciu, M. Handley, S. Barre, J. Iyengar, 450 "Architectural Guidelines for Multipath TCP Development", March 451 2011, RFC 6182. 453 [RFC6298] Paxson, V., M. Allman, H.K. Chu, M. Sargent, "Computing 454 TCP's Retransmission Timer", June 2011, RFC 6298. 456 [RFC6582] Henderson, T., S. Floyd, A. Gurtov, Y. Nishida, "The 457 NewReno Modification to TCP's Fast Recovery Algorithm", April 458 2012, RFC 6582. 460 [RFC6675] Blanton, E., M. Allman, L. Wang, I. Jarvinen, M. Kojo, 461 Y. Nishida, "A Conservative Loss Recovery Algorithm Based on 462 Selective Acknowledgment (SACK) for TCP", August 2012, RFC 6675. 464 [RFC7323] Borman D., B. Braden, V. Jacobson, R. Scheffenegger, "TCP 465 Extensions for High Performance", September 2014, RFC 7323. 467 Authors' Addresses 469 Mark Allman 470 International Computer Science Institute 471 1947 Center St. Suite 600 472 Berkeley, CA 94704 474 EMail: mallman@icir.org 475 http://www.icir.org/mallman