idnits 2.17.1 draft-ietf-tcpm-rto-consider-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 22, 2019) is 1890 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5681' is mentioned on line 368, but not defined == Unused Reference: 'RFC3940' is defined on line 476, but no explicit reference was found in the text == Unused Reference: 'RFC6582' is defined on line 501, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2140 (Obsoleted by RFC 9040) -- Obsolete informational reference (is this intentional?): RFC 3940 (Obsoleted by RFC 5740) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force M. Allman 2 INTERNET-DRAFT ICSI 3 File: draft-ietf-tcpm-rto-consider-08.txt February 22, 2019 4 Intended Status: Best Current Practice 5 Expires: August 22, 2019 7 Retransmission Timeout Requirements 9 Status of this Memo 11 This Internet-Draft is submitted in full conformance with the 12 provisions of BCP 78 and BCP 79. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six 18 months and may be updated, replaced, or obsoleted by other documents 19 at any time. It is inappropriate to use Internet-Drafts as 20 reference material or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/1id-abstracts.html 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html 28 This Internet-Draft will expire on August 22, 2019. 30 Copyright Notice 32 Copyright (c) 2019 IETF Trust and the persons identified as the 33 document authors. All rights reserved. 35 This document is subject to BCP 78 and the IETF Trust's Legal 36 Provisions Relating to IETF Documents 37 (http://trustee.ietf.org/license-info) in effect on the date of 38 publication of this document. Please review these documents 39 carefully, as they describe your rights and restrictions with 40 respect to this document. Code Components extracted from this 41 document must include Simplified BSD License text as described in 42 Section 4.e of the Trust Legal Provisions and are provided without 43 warranty as described in the Simplified BSD License. 45 Abstract 47 Ensuring reliable communication often manifests in a timeout and 48 retry mechanism. Each implementation of a retransmission timeout 49 mechanism represents a balance between correctness and timeliness 50 and therefore no implementation suits all situations. This document 51 provides high-level requirements for retransmission timeout schemes 52 appropriate for general use in the Internet. Within the 53 requirements, implementations have latitude to define particulars 54 that best address each situation. 56 Terminology 58 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 59 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 60 document are to be interpreted as described in BCP 14, RFC 2119 61 [RFC2119]. 63 1 Introduction 65 Reliable transmission is a key property for many network protocols 66 and applications. Our protocols use various mechanisms to achieve 67 reliable data transmission. Often we use continuous or periodic 68 acknowledgments from the recipient to inform the sender's notion of 69 which pieces of data are missing and need to be retransmitted to 70 ensure reliability. Alternatively, information coding---e.g., 71 FEC---can be used to achieve probabilistic reliability without 72 retransmissions. However, despite our best intentions and most 73 robust mechanisms, the only thing we can truly depend on is the 74 passage of time and therefore our ultimate backstop to ensuring 75 reliability is a timeout and re-try mechanism. That is, the sender 76 sets some expectation for how long to wait for confirmation of 77 delivery for a given piece of data. When this time period passes 78 without delivery confirmation the sender assumes the data was lost 79 in transit and therefore schedules a retransmission. This process 80 of ensuring reliability via time-based loss detection and resending 81 lost data is commonly referred to as a "retransmission timeout 82 (RTO)" mechanism. 84 Various protocols have defined their own RTO mechanisms (e.g., TCP 85 [RFC6298], SCTP [RFC4960], SIP [RFC3261]). In this document, our 86 use of "RTO" does not refer to any one specific scheme, but rather 87 is a generic term that includes all timer-based retransmission 88 mechanisms. The specifics of retransmission timeouts often 89 represent a particular tradeoff between correctness and 90 responsiveness [AP99]. In other words we want to simultaneously: 92 - wait long enough to ensure the detection of loss is correct and 93 therefore a retransmission is in fact needed, and 95 - bound the delay we impose on applications before repairing 96 loss. 98 Serving both of these goals is difficult as they pull in opposite 99 directions. I.e., towards either (a) withholding needed 100 retransmissions too long to ensure the original transmission is 101 truly lost or (b) not waiting long enough---to help application 102 responsiveness---and hence sending unnecessary (often denoted 103 "spurious") retransmissions. 105 At this point, our experience has lead to a recognition that often 106 specific tweaks that deviate from standardized RTO mechanisms do not 107 materially impact network safety. Therefore, in this document we 108 outline a set of high-level protocol-agnostic requirements for RTO 109 mechanisms. The intent is to provide a safe foundation on which 110 implementations have the flexibility to instantiate mechanisms that 111 best realize their specific goals. 113 2 Context 115 This document is a bit "weird" in that it is backwards from the way 116 we generally like to engineer systems. Usually, we strive to 117 understand high-level requirements as a starting point. We then 118 methodically proceed to engineer specific protocols, algorithms and 119 systems that meet these requirements. Within the standards process 120 we have derived many retransmission timeouts without benefit from 121 some over-arching requirements document---because we had no idea how 122 to write such a requirements document! Therefore, we made the best 123 specific decisions we could in response to specific needs. 125 At this point, however, we believe the community's experience has 126 matured to the point where we can define a set of high-level 127 requirements for retransmission timers. That is, we now understand 128 how to separate the aspects of retransmission timers that are 129 crucial for network safety from those small details that do not 130 materially impact network safety. There are two basic benefits of 131 writing this high-level document post-facto: 133 - Existing retransmission timer mechanisms may be revisited with 134 an eye towards changing the small and less crucial details to 135 facilitate some benefit (e.g., performance), while at the same 136 time not sacrificing network safety. 138 - Future retransmission timers will have a solid basis of 139 experience to lean on rather than cobbling together a new 140 retransmission timer from scratch and/or pieces parts of other 141 specifications. 143 However, adding a requirements umbrella to a body of existing 144 specific retransmission timer specifications is inherently messy and 145 we run the risk of creating "inconsistencies". The correct way to 146 view this document is as the default case and these other 147 specifications as agreed upon deviations from the default. For 148 instance, [RFC3261] uses a smaller initial timeout than this 149 document specifies (requirement (1) in section 4). This situation 150 does not render useless the general guidance in this document, but 151 rather develops an initial retransmission timeout that is 152 appropriate in a specific context. Likewise, TCP's retransmission 153 timer has a minimum value of 1 second [RFC6298], whereas this 154 document does not specify that a minimum retransmission timeout is 155 necessary at all. Again, this situation should be viewed as 156 [RFC6298] providing a refinement for a specific case. 158 3 Scope 160 The principles we outline in this document are protocol-agnostic and 161 widely applicable. We make the following scope statements about 162 the application of the requirements discussed in Section 4: 164 (S.1) The requirements in this document apply only to timer-based 165 loss detection and retransmission. 167 While there are a bevy of uses for timers in protocols---from 168 rate-based pacing to connection failure detection to making 169 congestion control decisions and beyond---these are outside 170 the scope of this document. 172 (S.2) The requirements in this document only apply to cases where 173 loss detected via a timer is repaired by a retransmission of 174 the original data. 176 Other cases are certainly possible---e.g., replacing the lost 177 data with an updated version---but fall outside the scope of 178 this document. 180 (S.3) The requirements in this document apply only to endpoint-to- 181 endpoint unicast communication. Reliable multicast (e.g., 182 [RFC5740]) protocols are explicitly outside the scope of this 183 document. 185 Protocols such as SCTP [RFC4960] and MP-TCP [RFC6182] that 186 communicate in a unicast fashion with multiple specific 187 endpoints can leverage the requirements in this document 188 provided they track state and follow the requirements for each 189 endpoint independently. I.e., if host A communicates with 190 hosts B and C, A must use independent RTOs for traffic sent to 191 B and C. 193 (S.4) There are cases where state is shared across connections or 194 flows (e.g., [RFC2140], [RFC3124]). The RTO is one piece 195 state that is often discussed as sharable. These situations 196 raise issues that the simple flow-oriented RTO mechanism 197 discussed in this document does not consider (e.g., how long 198 to preserve state between connections). Therefore, while the 199 general principles given in Section 4 are likely applicable, 200 sharing RTOs across flows is outside the scope of this 201 document. 203 (S.5) The requirements in this document apply to reliable 204 transmission, but do not assume that all data transmitted 205 within a connection or flow is reliably sent. 207 E.g., a protocol like DCCP [RFC4340] could leverage the 208 requirements in this document for the initial reliable 209 handshake even though the protocol reverts to unreliable 210 transmission after the handshake. 212 E.g., a protocol like SCTP [RFC4960] could leverage the 213 requirements for data that is sent only "partially reliably". 214 In this case, the protocol uses two phases for each message. 215 In the first phase, the protocol attempts to ensure 216 reliability and can leverage the requirements in this 217 document. At some point the value of the data is gone and the 218 protocol transitions to the second phase where the data is 219 treated as unreliably transmitted and therefore the protocol 220 will no longer attempt to repair the loss---and hence there 221 are no more retransmissions and the requirements in this 222 document are moot. 224 (S.6) The requirements for RTO mechanisms in this document can be 225 applied regardless of whether the RTO mechanism is the sole 226 loss repair strategy or works in concert with other 227 mechanisms. 229 E.g., for a simple protocol like UDP-based DNS [] a timeout 230 and re-try mechanism is likely to act alone to ensure 231 reliability. 233 E.g., within a complex protocol like TCP or SCTP we have 234 designed methods to detect and repair loss based on explicit 235 endpoint state sharing [RFC2018,RFC4960,RFC6675]. These 236 mechanisms are preferred over the RTO as they are often more 237 timely and precise than the coarse-grained RTO. In these 238 cases, the RTO becomes a last resort when the more advanced 239 mechanisms fail. 241 E.g., some protocols may leverage more than one retransmission 242 timer simultaneously. In these cases, the general guidance in 243 this document can be applied to all such timers. 245 4 Requirements 247 We now list the requirements that apply when designing 248 retransmission timeout (RTO) mechanisms. 250 (1) In the absence of any knowledge about the latency of a path, the 251 RTO MUST be conservatively set to no less than 1 second. 253 This requirement ensures two important aspects of the RTO. 254 First, when transmitting into an unknown network, 255 retransmissions will not be sent before an ACK would reasonably 256 be expected to arrive and hence possibly waste scarce network 257 resources. Second, as noted below, sometimes retransmissions 258 can lead to ambiguities in assessing the latency of a network 259 path. Therefore, it is especially important for the first 260 latency sample to be free of ambiguities such that there is a 261 baseline for the remainder of the communication. 263 The specific constant (1 second) comes from the analysis of 264 Internet RTTs found in Appendix A of [RFC6298]. 266 (2) As we note above, loss detection happens when a sender does not 267 receive delivery confirmation within an some expected period of 268 time. We now specify four requirements that pertain to setting 269 the length of this expectation. 271 Often measuring the time required for delivery confirmation is 272 is framed as involving the "round-trip time (RTT)" of the 273 network path as this is the minimum amount of time required to 274 receive delivery confirmation and also often follows protocol 275 behavior whereby acknowledgments are generated quickly after 276 data arrives. For instance, this is the case for the RTO used 277 by TCP [RFC6298] and SCTP [RFC4960]. However, this is somewhat 278 mis-leading as the expected latency is better framed as the 279 "feedback time" (FT). In other words, the expectation is not 280 always simply a network property, but includes additional time 281 before a sender should reasonably expect a response to a query. 283 For instance, consider a UDP-based DNS request from a client to 284 a recursive resolver. When the request can be served from the 285 resolver's cache the FT likely well approximates the network RTT 286 between the client and resolver. However, on a cache miss the 287 resolver will request the needed information from one or more 288 authoritative DNS servers, which will non-trivially increase the 289 FT compared to the RTT between the client and resolver. 291 Therefore, we express the following requirements in terms of FT: 293 (a) In steady state the RTO SHOULD be set based on recent 294 observations of both the FT and the variance of the FT. 296 In other words, the RTO should represent an 297 empirically-derived reasonable amount of time that the 298 sender should wait for delivery confirmation before 299 retransmitting the given data. 301 (b) FT observations SHOULD be taken regularly. 303 Internet measurements show that taking only a single FT 304 sample per TCP connection results in a relatively poorly 305 performing RTO mechanism [AP99], hence this requirement that 306 the FT be sampled continuously throughout the lifetime of 307 communication. 309 The notion of "regularly" SHOULD be defined as at least once 310 per RTT or as frequently as data is exchanged in cases where 311 that happens less frequently than once per RTT. However, we 312 also recognize that it may not always be practical to take 313 an FT sample this often in all cases. Hence, this 314 once-per-RTT definition of "regularly" is explicitly a 315 "SHOULD" and not a "MUST". 317 As an example, TCP takes an FT sample roughly once per RTT, 318 or if using the timestamp option [RFC7323] on each 319 acknowledgment arrival. [AP99] shows that both these 320 approaches result in roughly equivalent performance for the 321 RTO estimator. 323 (c) FT observations MAY be taken from non-data exchanges. 325 Some protocols use keepalives, heartbeats or other messages 326 to exchange control information. To the extent that the 327 latency of these transactions mirrors data exchange, they 328 can be leveraged to take FT samples within the RTO 329 mechanism. Such samples can help protocols keep their RTO 330 accurate during lulls in data transmission. However, given 331 that these messages may not be subject to the same delays as 332 data transmission, we do not take a general view on whether 333 this is useful or not. 335 (d) An RTO mechanism MUST NOT use ambiguous FT samples. 337 Assume two copies of some segment X are transmitted at times 338 t0 and t1 and then at time t2 the sender receives 339 confirmation that X in fact arrived. In some cases, it is 340 not clear which copy of X triggered the confirmation and 341 hence the actual FT is either t2-t1 or t2-t0, but which is a 342 mystery. Therefore, in this situation an implementation 343 MUST use Karn's algorithm [KP87,RFC6298] and use neither 344 version of the FT sample and hence not update the RTO. 346 There are cases where two copies of some data are 347 transmitted in a way whereby the sender can tell which is 348 being acknowledged by an incoming ACK. E.g., TCP's 349 timestamp option [RFC7323] allows for segments to be 350 uniquely identified and hence avoid the ambiguity. In such 351 cases there is no ambiguity and the resulting samples can 352 update the RTO. 354 (3) Each time the RTO is used to detect a loss and a retransmission 355 is scheduled, the value of the RTO MUST be exponentially backed 356 off such that the next firing requires a longer interval. The 357 backoff SHOULD be removed after the successful repair of the 358 lost data and subsequent transmission of non-retransmitted data. 360 A maximum value MAY be placed on the RTO. The maximum RTO MUST 361 NOT be less than 60 seconds (a la [RFC6298]). 363 This ensures network safety. 365 (4) Retransmissions triggered by the RTO mechanism MUST be taken as 366 indications of network congestion and the sending rate adapted 367 using a standard mechanism (e.g., TCP collapses the congestion 368 window to one segment [RFC5681]). 370 This ensures network safety. 372 An exception could be made to this rule if an IETF standardized 373 mechanism is used to determine that a particular loss is due to 374 a non-congestion event (e.g., packet corruption). In such a 375 case a congestion control action is not required. Additionally, 376 RTO-triggered congestion control actions may be reversed when a 377 standard mechanism determines that the cause of the loss was not 378 congestion after all (e.g., [RFC5682]). 380 5 Discussion 382 We note that research has shown the tension between the 383 responsiveness and correctness of retransmission timeouts seems to 384 be a fundamental tradeoff in the context of TCP [AP99]. That is, 385 making the RTO more aggressive (e.g., via changing TCP's EWMA gains, 386 lowering the minimum RTO, etc.) can reduce the time spent waiting on 387 needed retransmissions. However, at the same time, such 388 aggressiveness leads to more needless retransmissions. Therefore, 389 being as aggressive as the requirements given in the previous 390 section allow in any particular situation may not be the best course 391 of action because an RTO expiration carries a requirement to invoke 392 a congestion response and hence slow transmission down. 394 While the tradeoff between responsiveness and correctness seems 395 fundamental, the tradeoff can be made less relevant if the sender 396 can detect and recover from spurious RTOs. Several mechanisms have 397 been proposed for this purpose, such as Eifel [RFC3522], F-RTO 398 [RFC5682] and DSACK [RFC2883,RFC3708]. Using such mechanisms may 399 allow a data originator to tip towards being more responsive without 400 incurring (as much of) the attendant costs of needless retransmits. 402 Also, note, that in addition to the experiments discussed in [AP99], 403 the Linux TCP implementation has been using various non-standard RTO 404 mechanisms for many years seemingly without large scale problems 405 (e.g., using different EWMA gains than specified in [RFC6298]). 406 Further, a number of implementations use minimum RTOs that are less 407 than the 1 second specified in [RFC6298]. While the implication of 408 these deviations from the standard may be more spurious retransmits 409 (per [AP99]), we are aware of no large scale network safety issues 410 caused by this change to the minimum RTO. 412 Finally, we note that while allowing implementations to be more 413 aggressive may in fact increase the number of needless 414 retransmissions the above requirements fail safe in that they insist 415 on exponential backoff of the RTO and a transmission rate reduction. 416 Therefore, providing implementers more latitude than they have 417 traditionally been given in IETF specifications of RTO mechanisms 418 does not somehow open the flood gates to aggressive behavior. Since 419 there is a downside to being aggressive the incentives for proper 420 behavior are retained in the mechanism. 422 6 Security Considerations 424 This document does not alter the security properties of 425 retransmission timeout mechanisms. See [RFC6298] for a discussion 426 of these within the context of TCP. 428 Acknowledgments 430 This document benefits from years of discussions with Ethan Blanton, 431 Sally Floyd, Jana Iyengar, Shawn Ostermann, Vern Paxson, and the 432 members of the TCPM and TCP-IMPL working groups. Ran Atkinson, 433 Yuchung Cheng, David Black, Gorry Fairhurst, Mirja Kuhlewind, 434 Nicolas Kuhn, Jonathan Looney and Michael Scharf provided useful 435 comments on a previous version of this draft. 437 Normative References 439 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 440 Requirement Levels", BCP 14, RFC 2119, March 1997. 442 Informative References 444 [AP99] Allman, M., V. Paxson, "On Estimating End-to-End Network Path 445 Properties", Proceedings of the ACM SIGCOMM Technical Symposium, 446 September 1999. 448 [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time 449 Estimates in Reliable Transport Protocols", SIGCOMM 87. 451 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 452 Selective Acknowledgment Options", RFC 2018, October 1996. 454 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, 455 April 1997. 457 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 458 Extension to the Selective Acknowledgement (SACK) Option for 459 TCP", RFC 2883, July 2000. 461 [RFC3124] Balakrishnan, H., S. Seshan, "The Congestion Manager", RFC 462 2134, June 2001. 464 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 465 A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, 466 "SIP: Session Initiation Protocol", RFC 3261, June 2002. 468 [RFC3522] Ludwig, R., M. Meyer, "The Eifel Detection Algorithm for 469 TCP", RFC 3522, april 2003. 471 [RFC3708] Blanton, E., M. Allman, "Using TCP Duplicate Selective 472 Acknowledgement (DSACKs) and Stream Control Transmission 473 Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs) 474 to Detect Spurious Retransmissions", RFC 3708, February 2004. 476 [RFC3940] Adamson, B., C. Bormann, M. Handley, J. Macker, 477 "Negative-acknowledgment (NACK)-Oriented Reliable Multicast 478 (NORM) Protocol", November 2004, RFC 3940. 480 [RFC4340] Kohler, E., M. Handley, S. Floyd, "Datagram Congestion 481 Control Protocol (DCCP)", March 2006, RFC 4340. 483 [RFC4960] Stweart, R., "Stream Control Transmission Protocol", RFC 484 4960, September 2007. 486 [RFC5682] Sarolahti, P., M. Kojo, K. Yamamoto, M. Hata, "Forward 487 RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious 488 Retransmission Timeouts with TCP", RFC 5682, September 2009. 490 [RFC5740] Adamson, B., C. Bormann, M. Handley, J. Macker, 491 "NACK-Oriented Reliable Multicast (NORM) Transport Protocol", 492 November 2009, RFC 5740. 494 [RFC6182] Ford, A., C. Raiciu, M. Handley, S. Barre, J. Iyengar, 495 "Architectural Guidelines for Multipath TCP Development", March 496 2011, RFC 6182. 498 [RFC6298] Paxson, V., M. Allman, H.K. Chu, M. Sargent, "Computing 499 TCP's Retransmission Timer", June 2011, RFC 6298. 501 [RFC6582] Henderson, T., S. Floyd, A. Gurtov, Y. Nishida, "The 502 NewReno Modification to TCP's Fast Recovery Algorithm", April 503 2012, RFC 6582. 505 [RFC6675] Blanton, E., M. Allman, L. Wang, I. Jarvinen, M. Kojo, 506 Y. Nishida, "A Conservative Loss Recovery Algorithm Based on 507 Selective Acknowledgment (SACK) for TCP", August 2012, RFC 6675. 509 [RFC7323] Borman D., B. Braden, V. Jacobson, R. Scheffenegger, "TCP 510 Extensions for High Performance", September 2014, RFC 7323. 512 Authors' Addresses 514 Mark Allman 515 International Computer Science Institute 516 1947 Center St. Suite 600 517 Berkeley, CA 94704 519 EMail: mallman@icir.org 520 http://www.icir.org/mallman