idnits 2.17.1 draft-ietf-tcpm-rto-consider-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- == The document has an IETF Trust Provisions of 28 Dec 2009, Section 6.c(i) Publication Limitation clause. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 15, 2016) is 2866 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5681' is mentioned on line 357, but not defined == Unused Reference: 'RFC3940' is defined on line 465, but no explicit reference was found in the text == Unused Reference: 'RFC6582' is defined on line 490, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2140 (Obsoleted by RFC 9040) -- Obsolete informational reference (is this intentional?): RFC 3940 (Obsoleted by RFC 5740) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force M. Allman 2 INTERNET-DRAFT ICSI 3 File: draft-ietf-tcpm-rto-consider-04.txt June 15, 2016 4 Intended Status: Best Current Practice 5 Expires: December 15, 2016 7 Retransmission Timeout Requirements 9 Status of this Memo 11 This document may not be modified, and derivative works of it may 12 not be created, except to format it for publication as an RFC or to 13 translate it into languages other than English. 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. Internet-Drafts are working 17 documents of the Internet Engineering Task Force (IETF), its areas, 18 and its working groups. Note that other groups may also distribute 19 working documents as Internet-Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other documents 23 at any time. It is inappropriate to use Internet-Drafts as 24 reference material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/1id-abstracts.html 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html 32 This Internet-Draft will expire on October 15, 2016. 34 Copyright Notice 36 Copyright (c) 2016 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with 44 respect to this document. Code Components extracted from this 45 document must include Simplified BSD License text as described in 46 Section 4.e of the Trust Legal Provisions and are provided without 47 warranty as described in the Simplified BSD License. 49 Abstract 51 Ensuring reliable communication often manifests in a timeout and 52 retry mechanism. Each implementation of a retransmission timeout 53 mechanism represents a balance between correctness and timeliness 54 and therefore no implementation suits all situations. This document 55 provides high-level requirements for retransmission timeout schemes 56 appropriate for general use in the Internet. Within the 57 requirements, implementations have latitude to define particulars 58 that best address each situation. 60 Terminology 62 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 63 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 64 document are to be interpreted as described in BCP 14, RFC 2119 65 [RFC2119]. 67 1 Introduction 69 Reliable transmission is a key property for many network protocols 70 and applications. Our protocols use various mechanisms to achieve 71 reliable data transmission. Often we use continuous or periodic 72 reports from the recipient to inform the sender's notion of which 73 pieces of data are missing and need to be retransmitted to ensure 74 reliability. Alternatively, information coding---e.g., FEC---can be 75 used to achieve probabilistic reliability without retransmissions. 76 However, despite our best intentions and most robust mechanisms, the 77 only thing we can truly depend on is the passage of time and 78 therefore our ultimate backstop to ensuring reliability is a timeout 79 and re-try mechanism. That is, the sender sets some expectation for 80 how long to wait for confirmation of delivery for a given piece of 81 data. When this time period passes without delivery confirmation 82 the sender assumes the data was lost in transit and therefore 83 schedules a retransmission. This process of ensuring reliability 84 via time-based loss detection and resending lost data is commonly 85 referred to as a "retransmission timeout (RTO)" mechanism. 87 Various protocols have defined their own RTO mechanisms (e.g., TCP 88 [RFC6298], SCTP [RFC4960], SIP [RFC3261]). The specifics of 89 retransmission timeouts often represent a particular tradeoff 90 between correctness and responsiveness [AP99]. In other words we 91 want to simultaneously: 93 - wait long enough to ensure the detection of loss is correct and 94 therefore a retransmission is in fact needed, and 96 - bound the delay we impose on applications before repairing 97 loss. 99 Serving both of these goals is difficult as they pull in opposite 100 directions. I.e., towards either (a) withholding needed 101 retransmissions too long to ensure the original transmission is 102 truly lost or (b) not waiting long enough to help application 103 responsiveness and hence sending unnecessary (often denoted 104 "spurious") retransmissions. We have found that even though the RTO 105 procedure is standardized for some protocols (e.g., TCP [RFC6298]), 106 implementations often add their own subtle imprint on the specifics 107 of the process to tilt the tradeoff between correctness and 108 responsiveness in some particular way. 110 At this point we recognize that often these specific tweaks that 111 deviate from standardized RTO mechanisms do not materially impact 112 network safety. Therefore, in this document we outline a set of 113 high-level protocol-agnostic requirements for RTO mechanisms that 114 provide a for network safety. The intent is to provide a safe 115 foundation on which implementations have the flexibility to 116 instantiate mechanisms that best realize their specific goals. 118 2 Scope 120 The principles we outline in this document are protocol-agnostic and 121 widely applicable. We make the following scope statements about 122 the application of the requirements discussed in Section 3: 124 (S.1) The requirements in this document apply only to timer-based 125 loss detection and retransmission. 127 While there are a bevy of uses for timers in protocols---from 128 rate-based pacing to connection failure detection to making 129 congestion control decisions and beyond---these are outside 130 the scope of this document. 132 (S.2) The requirements in this document only apply to cases where 133 loss detected via a timer is repaired by a retransmission of 134 the original data. 136 Other cases are certainly possible---e.g., replacing the lost 137 data with an updated version---but fall outside the scope of 138 this document. 140 (S.3) The requirements in this document apply only to endpoint-to- 141 endpoint unicast communication. Reliable multicast (e.g., 142 [RFC5740]) protocols are explicitly outside the scope of this 143 document. 145 Protocols such as SCTP [RFC4960] and MP-TCP [RFC6182] that 146 communicate in a unicast fashion with multiple specific 147 endpoints can leverage the requirements in this document 148 provided they track state and follow the requirements for each 149 endpoint independently. I.e., if host A communicates with 150 hosts B and C, A must use independent RTOs for traffic sent to 151 B and C. 153 (S.4) There are cases where state is shared across connections or 154 flows (e.g., [RFC2140], [RFC3124]). The RTO is one piece 155 state that is often discussed as sharable. These situations 156 raise issues that the simple flow-oriented RTO mechanism 157 discussed in this document does not consider (e.g., how long 158 to preserve state between connections). Therefore, while the 159 general principles given in Section 3 are likely applicable, 160 sharing RTOs across flows is outside the scope of this 161 document. 163 (S.5) The requirements in this document apply to reliable 164 transmission, but do not assume that all data transmitted 165 within a connection or flow is reliably sent. 167 E.g., a protocol like DCCP [RFC4340] could leverage the 168 requirements in this document for the initial reliable 169 handshake even though the protocol reverts to unreliable 170 transmission after the handshake. 172 E.g., a protocol like SCTP [RFC4960] could leverage the 173 requirements for data that is sent only "partially reliably". 174 In this case, the protocol uses two phases for each message. 175 In the first phase, the protocol attempts to ensure 176 reliability and can leverage the requirements in this 177 document. At some point the value of the data is gone and the 178 protocol transitions to the second phase where the data is 179 treated as unreliably transmitted and therefore the protocol 180 will no longer attempt to repair the loss---and hence there 181 are no more retransmissions and the requirements in this 182 document are moot. 184 (S.6) The requirements for RTO mechanisms in this document can be 185 applied regardless of whether the RTO mechanism is the sole 186 loss repair strategy or works in concert with other 187 mechanisms. 189 E.g., for a simple protocol like UDP-based DNS [] a timeout 190 and re-try mechanism is likely to act alone to ensure 191 reliability. 193 E.g., within a complex protocol like TCP or SCTP we have 194 designed methods to detect and repair loss based on explicit 195 endpoint state sharing [RFC2018,RFC4960,RFC6675]. These 196 mechanisms are preferred over the RTO as they are often more 197 timely and precise than the coarse-grained RTO. In these 198 cases, the RTO becomes a last resort when the more advanced 199 mechanisms fail. 201 Additionally, the following statements detail the relationship of 202 the requirements in this document to other specifications and 203 implementations: 205 (R.1) RTO mechanisms that are currently standardized are not updated 206 or obsoleted by this document. Implementations are free to 207 use these existing specifications as they do now. 209 This holds even in cases where the existing specification 210 differs from the requirements in this document (e.g., 211 [RFC3261] uses a smaller initial timeout than this document 212 specifies). Existing standard specifications enjoy their own 213 consensus which this document does not change. 215 (R.2) Future standardization efforts that specify RTO mechanisms 216 SHOULD follow the requirements in this document. 218 There may be reasons for future RTO mechanisms to deviate from 219 the requirements in Section 3. In these cases, we expect only 220 that the standards process does so after reasonable 221 deliberation and with good reason. 223 (R.3) Alternatively, future RTO mechanism implementations may be 224 made directly against the requirements in Section 3 without 225 another protocol-specific specification. 227 (R.4) There will no doubt be cases where applying the requirements 228 in this document directly is not possible due to the structure 229 or operation of a protocol. For instance, a case where a 230 timeout is used to detect loss, but the loss is not repaired 231 with a direct retransmission of the original data. In these 232 situations, an alternate specification is required. We 233 encourage such future efforts to leverage the spirit of the 234 requirements in this document to inform alternate 235 specifications. 237 3 Requirements 239 We now list the requirements that apply when designing 240 retransmission timeout (RTO) mechanisms. 242 (1) In the absence of any knowledge about the latency of a path, the 243 RTO MUST be conservatively set to no less than 1 second. 245 This requirement ensures two important aspects of the RTO. 246 First, when transmitting into an unknown network, 247 retransmissions will not be sent before an ACK would reasonably 248 be expected to arrive and hence possibly waste scarce network 249 resources. Second, as noted below, sometimes retransmissions 250 can lead to ambiguities in assessing the latency of a network 251 path. Therefore, it is especially important for the first 252 latency sample to be free of ambiguities such that there is a 253 baseline for the remainder of the communication. 255 The specific constant (1 second) comes from the analysis of 256 Internet RTTs found in Appendix A of [RFC6298]. 258 (2) As we note above, loss detection happens when a sender does not 259 receive delivery confirmation within an some expected period of 260 time. We now specify three requirements that pertain to setting 261 the length of this expectation. 263 Often measuring the time required for delivery confirmation is 264 is framed as the round-trip time (RTT) of the network path as 265 this is the minimum amount of time required to receive delivery 266 confirmation and also often follows protocol behavior whereby 267 acknowledgments are generated quickly after data arrives. For 268 instance, this is the case for the RTO used by TCP [RFC6298] and 269 SCTP [RFC4960]. However, this is somewhat mis-leading as the 270 expected latency is better framed as the "feedback time" (FT). 272 In other words, the expectation is not always simply a network 273 property, but includes additional time before a sender should 274 reasonably expect a response to a query. 276 For instance, consider a UDP-based DNS request from a client to 277 a resolver. When the request can be served from the resolver's 278 cache the FT likely well approximates the network RTT between 279 the client and resolver. However, on a cache miss the resolver 280 will have to request the needed information from authoritative 281 DNS servers, which will non-trivially increase the FT compared 282 to the RTT between the client and resolver. 284 (a) In steady state the RTO MUST be set based on recent 285 observations of both the FT and the variance of the FT. 287 In other words, the RTO should be based on a reasonable 288 amount of time that the sender should wait for delivery 289 confirmation before retransmitting the given data. 291 (b) FT observations MUST be taken regularly. 293 Internet measurements show that taking only a single FT 294 sample per TCP connection results in a relatively poorly 295 performing RTO mechanism [AP99], hence the requirement that 296 the FT be sampled continuously throughout the lifetime of a 297 connection. 299 TCP takes an FT sample roughly once per RTT, or if using the 300 timestamp option [RFC7323] on each acknowledgment arrival. 301 [AP99] shows that both these approaches result in roughly 302 equivalent performance for the RTO estimator. 304 Therefore, "regularly" SHOULD be defined as at least once 305 per RTT or as frequently as data is exchanged in cases where 306 that happens less frequently than once per RTT. However, we 307 also recognize that it may not always be practical to take 308 an FT sample this often in all cases. Hence, this 309 once-per-RTT definition of "regularly" is explicitly a 310 "SHOULD" and not a "MUST". 312 (c) FT observations MAY be taken from non-data exchanges. 314 Some protocols use keepalives, heartbeats or other messages 315 to exchange control information. To the extent that the 316 latency of these transactions mirrors data exchange, they 317 can be leveraged to take FT samples within the RTO 318 mechanism. Such samples can help protocols keep their RTO 319 accurate during lulls in data transmission. However, given 320 that these messages may not be subject to the same delays as 321 data transmission, we do not take a general view on whether 322 this is useful or not. 324 (d) An RTO mechanism MUST NOT use ambiguous FT samples. 326 Assume two copies of some segment X are transmitted at times 327 t0 and t1 and then at time t2 the sender receives 328 confirmation that X in fact arrived. In some cases, it is 329 not clear which copy of X triggered the confirmation and 330 hence the actual FT is either t2-t1 or t2-t0, but which is a 331 mystery. Therefore, in this situation an implementation 332 MUST use Karn's algorithm [KP87,RFC6298] and use neither 333 version of the FT sample and hence not update the RTO. 335 There are cases where two copies of some data are 336 transmitted in a way whereby the sender can tell which is 337 being acknowledged by an incoming ACK. E.g., TCP's 338 timestamp option [RFC7323] allows for segments to be 339 uniquely identified and hence avoid the ambiguity. In such 340 cases there is no ambiguity and the resulting samples can 341 update the RTO. 343 (3) Each time the RTO detects a loss and a retransmission is 344 scheduled, the value of the RTO MUST be exponentially backed off 345 such that the next firing requires a longer interval. The 346 backoff SHOULD be removed after the successful repair of the 347 lost data and subsequent transmission of non-retransmitted data. 349 A maximum value MAY be placed on the RTO. The maximum RTO MUST 350 NOT be less than 60 seconds (a la [RFC6298]). 352 This ensures network safety. 354 (4) Retransmissions triggered by the RTO mechanism MUST be taken as 355 indications of network congestion and the sending rate adapted 356 using a standard mechanism (e.g., TCP collapses the congestion 357 window to one segment [RFC5681]). 359 This ensures network safety. 361 Exception could be made to this rule if an IETF standardized 362 mechanism is used to determine that a particular loss is due to 363 a non-congestion event (e.g., packet corruption). In such a 364 case a congestion control action is not required. Additionally, 365 RTO-triggered congestion control actions may be reversed when a 366 standard mechanism determines that the cause of the loss was not 367 congestion after all (e.g., [RFC5682]). 369 4 Discussion 371 We note that research has shown the tension between the 372 responsiveness and correctness of retransmission timeouts seems to 373 be a fundamental tradeoff in the context of TCP [AP99]. That is, 374 making the RTO more aggressive (e.g., via changing TCP's EWMA gains, 375 lowering the minimum RTO, etc.) can reduce the time spent waiting on 376 needed retransmissions. However, at the same time, such 377 aggressiveness leads to more needless retransmissions. Therefore, 378 being as aggressive as the requirements given in the previous 379 section allow in any particular situation may not be the best course 380 of action because an RTO expiration carries a requirement to invoke 381 a congestion response and hence slow transmission down. 383 While the tradeoff between responsiveness and correctness seems 384 fundamental, the tradeoff can be made less relevant if the sender 385 can detect and recover from spurious RTOs. Several mechanisms have 386 been proposed for this purpose, such as Eifel [RFC3522], F-RTO 387 [RFC5682] and DSACK [RFC2883,RFC3708]. Using such mechanisms may 388 allow a data originator to tip towards being more responsive without 389 incurring (as much of) the attendant costs of needless retransmits. 391 Also, note, that in addition to the experiments discussed in [AP99], 392 the Linux TCP implementation has been using various non-standard RTO 393 mechanisms for many years seemingly without large scale problems 394 (e.g., using different EWMA gains than specified in [RFC6298]). 395 Further, a number of implementations use minimum RTOs that are less 396 than the 1 second specified in [RFC6298]. While the implication of 397 these deviations from the standard may be more spurious retransmits 398 (per [AP99]), we are aware of no large scale problems caused by this 399 change to the minimum RTO. 401 Finally, we note that while allowing implementations to be more 402 aggressive may in fact increase the number of needless 403 retransmissions the above requirements fail safe in that they insist 404 on exponential backoff of the RTO and a transmission rate reduction. 405 Therefore, providing implementers more latitude than they have 406 traditionally been given in IETF specifications of RTO mechanisms 407 does not somehow open the flood gates to aggressive behavior. Since 408 there is a downside to being aggressive the incentives for proper 409 behavior are retained in the mechanism. 411 5 Security Considerations 413 This document does not alter the security properties of 414 retransmission timeout mechanisms. See [RFC6298] for a discussion 415 of these within the context of TCP. 417 Acknowledgments 419 This document benefits from years of discussions with Ethan Blanton, 420 Sally Floyd, Jana Iyengar, Shawn Ostermann, Vern Paxson, and the 421 members of the TCPM and TCP-IMPL working groups. Ran Atkinson, 422 Yuchung Cheng, David Black, Gorry Fairhurst, Jonathan Looney and 423 Michael Scharf provided useful comments on a previous version of 424 this draft. 426 Normative References 428 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 429 Requirement Levels", BCP 14, RFC 2119, March 1997. 431 Informative References 433 [AP99] Allman, M., V. Paxson, "On Estimating End-to-End Network Path 434 Properties", Proceedings of the ACM SIGCOMM Technical Symposium, 435 September 1999. 437 [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time 438 Estimates in Reliable Transport Protocols", SIGCOMM 87. 440 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 441 Selective Acknowledgment Options", RFC 2018, October 1996. 443 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, 444 April 1997. 446 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 447 Extension to the Selective Acknowledgement (SACK) Option for 448 TCP", RFC 2883, July 2000. 450 [RFC3124] Balakrishnan, H., S. Seshan, "The Congestion Manager", RFC 451 2134, June 2001. 453 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 454 A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, 455 "SIP: Session Initiation Protocol", RFC 3261, June 2002. 457 [RFC3522] Ludwig, R., M. Meyer, "The Eifel Detection Algorithm for 458 TCP", RFC 3522, april 2003. 460 [RFC3708] Blanton, E., M. Allman, "Using TCP Duplicate Selective 461 Acknowledgement (DSACKs) and Stream Control Transmission 462 Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs) 463 to Detect Spurious Retransmissions", RFC 3708, February 2004. 465 [RFC3940] Adamson, B., C. Bormann, M. Handley, J. Macker, 466 "Negative-acknowledgment (NACK)-Oriented Reliable Multicast 467 (NORM) Protocol", November 2004, RFC 3940. 469 [RFC4340] Kohler, E., M. Handley, S. Floyd, "Datagram Congestion 470 Control Protocol (DCCP)", March 2006, RFC 4340. 472 [RFC4960] Stweart, R., "Stream Control Transmission Protocol", RFC 473 4960, September 2007. 475 [RFC5682] Sarolahti, P., M. Kojo, K. Yamamoto, M. Hata, "Forward 476 RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious 477 Retransmission Timeouts with TCP", RFC 5682, September 2009. 479 [RFC5740] Adamson, B., C. Bormann, M. Handley, J. Macker, 480 "NACK-Oriented Reliable Multicast (NORM) Transport Protocol", 481 November 2009, RFC 5740. 483 [RFC6182] Ford, A., C. Raiciu, M. Handley, S. Barre, J. Iyengar, 484 "Architectural Guidelines for Multipath TCP Development", March 485 2011, RFC 6182. 487 [RFC6298] Paxson, V., M. Allman, H.K. Chu, M. Sargent, "Computing 488 TCP's Retransmission Timer", June 2011, RFC 6298. 490 [RFC6582] Henderson, T., S. Floyd, A. Gurtov, Y. Nishida, "The 491 NewReno Modification to TCP's Fast Recovery Algorithm", April 492 2012, RFC 6582. 494 [RFC6675] Blanton, E., M. Allman, L. Wang, I. Jarvinen, M. Kojo, 495 Y. Nishida, "A Conservative Loss Recovery Algorithm Based on 496 Selective Acknowledgment (SACK) for TCP", August 2012, RFC 6675. 498 [RFC7323] Borman D., B. Braden, V. Jacobson, R. Scheffenegger, "TCP 499 Extensions for High Performance", September 2014, RFC 7323. 501 Authors' Addresses 503 Mark Allman 504 International Computer Science Institute 505 1947 Center St. Suite 600 506 Berkeley, CA 94704 508 EMail: mallman@icir.org 509 http://www.icir.org/mallman