idnits 2.17.1 draft-ietf-tcpm-rto-consider-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- == The document has an IETF Trust Provisions of 28 Dec 2009, Section 6.c(i) Publication Limitation clause. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 19, 2018) is 2008 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5681' is mentioned on line 367, but not defined == Unused Reference: 'RFC3940' is defined on line 474, but no explicit reference was found in the text == Unused Reference: 'RFC6582' is defined on line 499, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2140 (Obsoleted by RFC 9040) -- Obsolete informational reference (is this intentional?): RFC 3940 (Obsoleted by RFC 5740) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force M. Allman 2 INTERNET-DRAFT ICSI 3 File: draft-ietf-tcpm-rto-consider-06.txt October 19, 2018 4 Intended Status: Best Current Practice 5 Expires: April 19, 2019 7 Retransmission Timeout Requirements 9 Status of this Memo 11 This document may not be modified, and derivative works of it may 12 not be created, except to format it for publication as an RFC or to 13 translate it into languages other than English. 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. Internet-Drafts are working 17 documents of the Internet Engineering Task Force (IETF), its areas, 18 and its working groups. Note that other groups may also distribute 19 working documents as Internet-Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other documents 23 at any time. It is inappropriate to use Internet-Drafts as 24 reference material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/1id-abstracts.html 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html 32 This Internet-Draft will expire on April 19, 2019. 34 Copyright Notice 36 Copyright (c) 2018 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with 44 respect to this document. Code Components extracted from this 45 document must include Simplified BSD License text as described in 46 Section 4.e of the Trust Legal Provisions and are provided without 47 warranty as described in the Simplified BSD License. 49 Abstract 51 Ensuring reliable communication often manifests in a timeout and 52 retry mechanism. Each implementation of a retransmission timeout 53 mechanism represents a balance between correctness and timeliness 54 and therefore no implementation suits all situations. This document 55 provides high-level requirements for retransmission timeout schemes 56 appropriate for general use in the Internet. Within the 57 requirements, implementations have latitude to define particulars 58 that best address each situation. 60 Terminology 62 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 63 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 64 document are to be interpreted as described in BCP 14, RFC 2119 65 [RFC2119]. 67 1 Introduction 69 Reliable transmission is a key property for many network protocols 70 and applications. Our protocols use various mechanisms to achieve 71 reliable data transmission. Often we use continuous or periodic 72 acknowledgments from the recipient to inform the sender's notion of 73 which pieces of data are missing and need to be retransmitted to 74 ensure reliability. Alternatively, information coding---e.g., 75 FEC---can be used to achieve probabilistic reliability without 76 retransmissions. However, despite our best intentions and most 77 robust mechanisms, the only thing we can truly depend on is the 78 passage of time and therefore our ultimate backstop to ensuring 79 reliability is a timeout and re-try mechanism. That is, the sender 80 sets some expectation for how long to wait for confirmation of 81 delivery for a given piece of data. When this time period passes 82 without delivery confirmation the sender assumes the data was lost 83 in transit and therefore schedules a retransmission. This process 84 of ensuring reliability via time-based loss detection and resending 85 lost data is commonly referred to as a "retransmission timeout 86 (RTO)" mechanism. 88 Various protocols have defined their own RTO mechanisms (e.g., TCP 89 [RFC6298], SCTP [RFC4960], SIP [RFC3261]). The specifics of 90 retransmission timeouts often represent a particular tradeoff 91 between correctness and responsiveness [AP99]. In other words we 92 want to simultaneously: 94 - wait long enough to ensure the detection of loss is correct and 95 therefore a retransmission is in fact needed, and 97 - bound the delay we impose on applications before repairing 98 loss. 100 Serving both of these goals is difficult as they pull in opposite 101 directions. I.e., towards either (a) withholding needed 102 retransmissions too long to ensure the original transmission is 103 truly lost or (b) not waiting long enough---to help application 104 responsiveness---and hence sending unnecessary (often denoted 105 "spurious") retransmissions. 107 At this point, our experience has lead to a recognition that often 108 specific tweaks that deviate from standardized RTO mechanisms do not 109 materially impact network safety. Therefore, in this document we 110 outline a set of high-level protocol-agnostic requirements for RTO 111 mechanisms. The intent is to provide a safe foundation on which 112 implementations have the flexibility to instantiate mechanisms that 113 best realize their specific goals. 115 2 Context 117 This document is a bit "weird" in that it is backwards from the way 118 we generally like to engineer systems. Usually, we strive to 119 understand high-level requirements as a starting point. We then 120 methodically proceed to engineer specific protocols, algorithms and 121 systems that meet these requirements. Within the standards process 122 we have derived many retransmission timeouts without benefit from 123 some over-arching requirements document---because we had no idea how 124 to write such a requirements document! Therefore, we made the best 125 specific decisions we could in response to specific needs. 127 At this point, however, we believe the community's experience has 128 matured to the point where we can define a set of high-level 129 requirements for retransmission timers. That is, we now understand 130 how to separate the aspects of retransmission timers that are 131 crucial for network safety from those small details that do not 132 materially impact network safety. There are two basic benefits of 133 writing this high-level document post-facto: 135 - Existing retransmission timer mechanisms may be revisited with 136 an eye towards changing the small and less crucial details to 137 facilitate some benefit (e.g., performance), while at the same 138 time not sacrificing network safety. 140 - Future retransmission timers will have a solid basis of 141 experience to lean on rather than cobbling together a new 142 retransmission timer from scratch and/or pieces parts of other 143 specifications. 145 However, adding a requirements umbrella to a body of existing 146 specific retransmission timer specifications is inherently messy and 147 we run the risk of creating "inconsistencies". The correct way to 148 view this document is as the default case and these other 149 specifications as agreed upon deviations from the default. For 150 instance, [RFC3261] uses a smaller initial timeout than this 151 document specifies (requirement (1) in section 4). This situation 152 does not render useless the general guidance in this document, but 153 rather develops an initial retransmission timeout that is 154 appropriate in a specific context. Likewise, TCP's retransmission 155 timer has a minimum value of 1 second [RFC6298], whereas this 156 document does not specify that a minimum retransmission timeout is 157 necessary at all. Again, this situation should be viewed as 158 [RFC6298] providing a refinement for a specific case. 160 3 Scope 162 The principles we outline in this document are protocol-agnostic and 163 widely applicable. We make the following scope statements about 164 the application of the requirements discussed in Section 4: 166 (S.1) The requirements in this document apply only to timer-based 167 loss detection and retransmission. 169 While there are a bevy of uses for timers in protocols---from 170 rate-based pacing to connection failure detection to making 171 congestion control decisions and beyond---these are outside 172 the scope of this document. 174 (S.2) The requirements in this document only apply to cases where 175 loss detected via a timer is repaired by a retransmission of 176 the original data. 178 Other cases are certainly possible---e.g., replacing the lost 179 data with an updated version---but fall outside the scope of 180 this document. 182 (S.3) The requirements in this document apply only to endpoint-to- 183 endpoint unicast communication. Reliable multicast (e.g., 184 [RFC5740]) protocols are explicitly outside the scope of this 185 document. 187 Protocols such as SCTP [RFC4960] and MP-TCP [RFC6182] that 188 communicate in a unicast fashion with multiple specific 189 endpoints can leverage the requirements in this document 190 provided they track state and follow the requirements for each 191 endpoint independently. I.e., if host A communicates with 192 hosts B and C, A must use independent RTOs for traffic sent to 193 B and C. 195 (S.4) There are cases where state is shared across connections or 196 flows (e.g., [RFC2140], [RFC3124]). The RTO is one piece 197 state that is often discussed as sharable. These situations 198 raise issues that the simple flow-oriented RTO mechanism 199 discussed in this document does not consider (e.g., how long 200 to preserve state between connections). Therefore, while the 201 general principles given in Section 4 are likely applicable, 202 sharing RTOs across flows is outside the scope of this 203 document. 205 (S.5) The requirements in this document apply to reliable 206 transmission, but do not assume that all data transmitted 207 within a connection or flow is reliably sent. 209 E.g., a protocol like DCCP [RFC4340] could leverage the 210 requirements in this document for the initial reliable 211 handshake even though the protocol reverts to unreliable 212 transmission after the handshake. 214 E.g., a protocol like SCTP [RFC4960] could leverage the 215 requirements for data that is sent only "partially reliably". 216 In this case, the protocol uses two phases for each message. 218 In the first phase, the protocol attempts to ensure 219 reliability and can leverage the requirements in this 220 document. At some point the value of the data is gone and the 221 protocol transitions to the second phase where the data is 222 treated as unreliably transmitted and therefore the protocol 223 will no longer attempt to repair the loss---and hence there 224 are no more retransmissions and the requirements in this 225 document are moot. 227 (S.6) The requirements for RTO mechanisms in this document can be 228 applied regardless of whether the RTO mechanism is the sole 229 loss repair strategy or works in concert with other 230 mechanisms. 232 E.g., for a simple protocol like UDP-based DNS [] a timeout 233 and re-try mechanism is likely to act alone to ensure 234 reliability. 236 E.g., within a complex protocol like TCP or SCTP we have 237 designed methods to detect and repair loss based on explicit 238 endpoint state sharing [RFC2018,RFC4960,RFC6675]. These 239 mechanisms are preferred over the RTO as they are often more 240 timely and precise than the coarse-grained RTO. In these 241 cases, the RTO becomes a last resort when the more advanced 242 mechanisms fail. 244 4 Requirements 246 We now list the requirements that apply when designing 247 retransmission timeout (RTO) mechanisms. 249 (1) In the absence of any knowledge about the latency of a path, the 250 RTO MUST be conservatively set to no less than 1 second. 252 This requirement ensures two important aspects of the RTO. 253 First, when transmitting into an unknown network, 254 retransmissions will not be sent before an ACK would reasonably 255 be expected to arrive and hence possibly waste scarce network 256 resources. Second, as noted below, sometimes retransmissions 257 can lead to ambiguities in assessing the latency of a network 258 path. Therefore, it is especially important for the first 259 latency sample to be free of ambiguities such that there is a 260 baseline for the remainder of the communication. 262 The specific constant (1 second) comes from the analysis of 263 Internet RTTs found in Appendix A of [RFC6298]. 265 (2) As we note above, loss detection happens when a sender does not 266 receive delivery confirmation within an some expected period of 267 time. We now specify four requirements that pertain to setting 268 the length of this expectation. 270 Often measuring the time required for delivery confirmation is 271 is framed as involving the "round-trip time (RTT)" of the 272 network path as this is the minimum amount of time required to 273 receive delivery confirmation and also often follows protocol 274 behavior whereby acknowledgments are generated quickly after 275 data arrives. For instance, this is the case for the RTO used 276 by TCP [RFC6298] and SCTP [RFC4960]. However, this is somewhat 277 mis-leading as the expected latency is better framed as the 278 "feedback time" (FT). In other words, the expectation is not 279 always simply a network property, but includes additional time 280 before a sender should reasonably expect a response to a query. 282 For instance, consider a UDP-based DNS request from a client to 283 a recursive resolver. When the request can be served from the 284 resolver's cache the FT likely well approximates the network RTT 285 between the client and resolver. However, on a cache miss the 286 resolver will request the needed information from one or more 287 authoritative DNS servers, which will non-trivially increase the 288 FT compared to the RTT between the client and resolver. 290 Therefore, we express the following requirements in terms of FT: 292 (a) In steady state the RTO SHOULD be set based on recent 293 observations of both the FT and the variance of the FT. 295 In other words, the RTO should represent an 296 empirically-derived reasonable amount of time that the 297 sender should wait for delivery confirmation before 298 retransmitting the given data. 300 (b) FT observations SHOULD be taken regularly. 302 Internet measurements show that taking only a single FT 303 sample per TCP connection results in a relatively poorly 304 performing RTO mechanism [AP99], hence this requirement that 305 the FT be sampled continuously throughout the lifetime of 306 communication. 308 The notion of "regularly" SHOULD be defined as at least once 309 per RTT or as frequently as data is exchanged in cases where 310 that happens less frequently than once per RTT. However, we 311 also recognize that it may not always be practical to take 312 an FT sample this often in all cases. Hence, this 313 once-per-RTT definition of "regularly" is explicitly a 314 "SHOULD" and not a "MUST". 316 As an example, TCP takes an FT sample roughly once per RTT, 317 or if using the timestamp option [RFC7323] on each 318 acknowledgment arrival. [AP99] shows that both these 319 approaches result in roughly equivalent performance for the 320 RTO estimator. 322 (c) FT observations MAY be taken from non-data exchanges. 324 Some protocols use keepalives, heartbeats or other messages 325 to exchange control information. To the extent that the 326 latency of these transactions mirrors data exchange, they 327 can be leveraged to take FT samples within the RTO 328 mechanism. Such samples can help protocols keep their RTO 329 accurate during lulls in data transmission. However, given 330 that these messages may not be subject to the same delays as 331 data transmission, we do not take a general view on whether 332 this is useful or not. 334 (d) An RTO mechanism MUST NOT use ambiguous FT samples. 336 Assume two copies of some segment X are transmitted at times 337 t0 and t1 and then at time t2 the sender receives 338 confirmation that X in fact arrived. In some cases, it is 339 not clear which copy of X triggered the confirmation and 340 hence the actual FT is either t2-t1 or t2-t0, but which is a 341 mystery. Therefore, in this situation an implementation 342 MUST use Karn's algorithm [KP87,RFC6298] and use neither 343 version of the FT sample and hence not update the RTO. 345 There are cases where two copies of some data are 346 transmitted in a way whereby the sender can tell which is 347 being acknowledged by an incoming ACK. E.g., TCP's 348 timestamp option [RFC7323] allows for segments to be 349 uniquely identified and hence avoid the ambiguity. In such 350 cases there is no ambiguity and the resulting samples can 351 update the RTO. 353 (3) Each time the RTO is used to detect a loss and a retransmission 354 is scheduled, the value of the RTO MUST be exponentially backed 355 off such that the next firing requires a longer interval. The 356 backoff SHOULD be removed after the successful repair of the 357 lost data and subsequent transmission of non-retransmitted data. 359 A maximum value MAY be placed on the RTO. The maximum RTO MUST 360 NOT be less than 60 seconds (a la [RFC6298]). 362 This ensures network safety. 364 (4) Retransmissions triggered by the RTO mechanism MUST be taken as 365 indications of network congestion and the sending rate adapted 366 using a standard mechanism (e.g., TCP collapses the congestion 367 window to one segment [RFC5681]). 369 This ensures network safety. 371 An exception could be made to this rule if an IETF standardized 372 mechanism is used to determine that a particular loss is due to 373 a non-congestion event (e.g., packet corruption). In such a 374 case a congestion control action is not required. Additionally, 375 RTO-triggered congestion control actions may be reversed when a 376 standard mechanism determines that the cause of the loss was not 377 congestion after all (e.g., [RFC5682]). 379 5 Discussion 380 We note that research has shown the tension between the 381 responsiveness and correctness of retransmission timeouts seems to 382 be a fundamental tradeoff in the context of TCP [AP99]. That is, 383 making the RTO more aggressive (e.g., via changing TCP's EWMA gains, 384 lowering the minimum RTO, etc.) can reduce the time spent waiting on 385 needed retransmissions. However, at the same time, such 386 aggressiveness leads to more needless retransmissions. Therefore, 387 being as aggressive as the requirements given in the previous 388 section allow in any particular situation may not be the best course 389 of action because an RTO expiration carries a requirement to invoke 390 a congestion response and hence slow transmission down. 392 While the tradeoff between responsiveness and correctness seems 393 fundamental, the tradeoff can be made less relevant if the sender 394 can detect and recover from spurious RTOs. Several mechanisms have 395 been proposed for this purpose, such as Eifel [RFC3522], F-RTO 396 [RFC5682] and DSACK [RFC2883,RFC3708]. Using such mechanisms may 397 allow a data originator to tip towards being more responsive without 398 incurring (as much of) the attendant costs of needless retransmits. 400 Also, note, that in addition to the experiments discussed in [AP99], 401 the Linux TCP implementation has been using various non-standard RTO 402 mechanisms for many years seemingly without large scale problems 403 (e.g., using different EWMA gains than specified in [RFC6298]). 404 Further, a number of implementations use minimum RTOs that are less 405 than the 1 second specified in [RFC6298]. While the implication of 406 these deviations from the standard may be more spurious retransmits 407 (per [AP99]), we are aware of no large scale network safety issues 408 caused by this change to the minimum RTO. 410 Finally, we note that while allowing implementations to be more 411 aggressive may in fact increase the number of needless 412 retransmissions the above requirements fail safe in that they insist 413 on exponential backoff of the RTO and a transmission rate reduction. 414 Therefore, providing implementers more latitude than they have 415 traditionally been given in IETF specifications of RTO mechanisms 416 does not somehow open the flood gates to aggressive behavior. Since 417 there is a downside to being aggressive the incentives for proper 418 behavior are retained in the mechanism. 420 6 Security Considerations 422 This document does not alter the security properties of 423 retransmission timeout mechanisms. See [RFC6298] for a discussion 424 of these within the context of TCP. 426 Acknowledgments 428 This document benefits from years of discussions with Ethan Blanton, 429 Sally Floyd, Jana Iyengar, Shawn Ostermann, Vern Paxson, and the 430 members of the TCPM and TCP-IMPL working groups. Ran Atkinson, 431 Yuchung Cheng, David Black, Gorry Fairhurst, Mirja Kuhlewind, 432 Jonathan Looney and Michael Scharf provided useful comments on a 433 previous version of this draft. 435 Normative References 437 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 438 Requirement Levels", BCP 14, RFC 2119, March 1997. 440 Informative References 442 [AP99] Allman, M., V. Paxson, "On Estimating End-to-End Network Path 443 Properties", Proceedings of the ACM SIGCOMM Technical Symposium, 444 September 1999. 446 [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time 447 Estimates in Reliable Transport Protocols", SIGCOMM 87. 449 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 450 Selective Acknowledgment Options", RFC 2018, October 1996. 452 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, 453 April 1997. 455 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 456 Extension to the Selective Acknowledgement (SACK) Option for 457 TCP", RFC 2883, July 2000. 459 [RFC3124] Balakrishnan, H., S. Seshan, "The Congestion Manager", RFC 460 2134, June 2001. 462 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 463 A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, 464 "SIP: Session Initiation Protocol", RFC 3261, June 2002. 466 [RFC3522] Ludwig, R., M. Meyer, "The Eifel Detection Algorithm for 467 TCP", RFC 3522, april 2003. 469 [RFC3708] Blanton, E., M. Allman, "Using TCP Duplicate Selective 470 Acknowledgement (DSACKs) and Stream Control Transmission 471 Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs) 472 to Detect Spurious Retransmissions", RFC 3708, February 2004. 474 [RFC3940] Adamson, B., C. Bormann, M. Handley, J. Macker, 475 "Negative-acknowledgment (NACK)-Oriented Reliable Multicast 476 (NORM) Protocol", November 2004, RFC 3940. 478 [RFC4340] Kohler, E., M. Handley, S. Floyd, "Datagram Congestion 479 Control Protocol (DCCP)", March 2006, RFC 4340. 481 [RFC4960] Stweart, R., "Stream Control Transmission Protocol", RFC 482 4960, September 2007. 484 [RFC5682] Sarolahti, P., M. Kojo, K. Yamamoto, M. Hata, "Forward 485 RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious 486 Retransmission Timeouts with TCP", RFC 5682, September 2009. 488 [RFC5740] Adamson, B., C. Bormann, M. Handley, J. Macker, 489 "NACK-Oriented Reliable Multicast (NORM) Transport Protocol", 490 November 2009, RFC 5740. 492 [RFC6182] Ford, A., C. Raiciu, M. Handley, S. Barre, J. Iyengar, 493 "Architectural Guidelines for Multipath TCP Development", March 494 2011, RFC 6182. 496 [RFC6298] Paxson, V., M. Allman, H.K. Chu, M. Sargent, "Computing 497 TCP's Retransmission Timer", June 2011, RFC 6298. 499 [RFC6582] Henderson, T., S. Floyd, A. Gurtov, Y. Nishida, "The 500 NewReno Modification to TCP's Fast Recovery Algorithm", April 501 2012, RFC 6582. 503 [RFC6675] Blanton, E., M. Allman, L. Wang, I. Jarvinen, M. Kojo, 504 Y. Nishida, "A Conservative Loss Recovery Algorithm Based on 505 Selective Acknowledgment (SACK) for TCP", August 2012, RFC 6675. 507 [RFC7323] Borman D., B. Braden, V. Jacobson, R. Scheffenegger, "TCP 508 Extensions for High Performance", September 2014, RFC 7323. 510 Authors' Addresses 512 Mark Allman 513 International Computer Science Institute 514 1947 Center St. Suite 600 515 Berkeley, CA 94704 517 EMail: mallman@icir.org 518 http://www.icir.org/mallman