idnits 2.17.1 draft-ietf-tcpm-rto-consider-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([RFC2119]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 30, 2020) is 1429 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC5681' is mentioned on line 364, but not defined == Unused Reference: 'RFC3940' is defined on line 480, but no explicit reference was found in the text == Unused Reference: 'RFC4340' is defined on line 484, but no explicit reference was found in the text == Unused Reference: 'RFC6582' is defined on line 505, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2140 (Obsoleted by RFC 9040) -- Obsolete informational reference (is this intentional?): RFC 3940 (Obsoleted by RFC 5740) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force M. Allman 2 INTERNET-DRAFT ICSI 3 File: draft-ietf-tcpm-rto-consider-11.txt April 30, 2020 4 Intended Status: Best Current Practice 5 Expires: October 30, 2020 7 Requirements for Time-Based Loss Detection 9 Status of this Memo 11 This Internet-Draft is submitted in full conformance with the 12 provisions of BCP 78 and BCP 79. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six 18 months and may be updated, replaced, or obsoleted by other documents 19 at any time. It is inappropriate to use Internet-Drafts as 20 reference material or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/1id-abstracts.html 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html 28 This Internet-Draft will expire on October 30, 2020. 30 Copyright Notice 32 Copyright (c) 2020 IETF Trust and the persons identified as the 33 document authors. All rights reserved. 35 This document is subject to BCP 78 and the IETF Trust's Legal 36 Provisions Relating to IETF Documents 37 (http://trustee.ietf.org/license-info) in effect on the date of 38 publication of this document. Please review these documents 39 carefully, as they describe your rights and restrictions with 40 respect to this document. Code Components extracted from this 41 document must include Simplified BSD License text as described in 42 Section 4.e of the Trust Legal Provisions and are provided without 43 warranty as described in the Simplified BSD License. 45 Abstract 47 Many protocols must detect packet loss for various reasons (e.g., to 48 ensure reliability using retransmissions or to understand the level 49 of congestion along a network path). While many mechanisms have 50 been designed to detect loss, protocols ultimately can only count on 51 the passage of time without delivery confirmation to declare a 52 packet "lost". Each implementation of a time-based loss detection 53 mechanism represents a balance between correctness and timeliness 54 and therefore no implementation suits all situations. This document 55 provides high-level requirements for time-based loss detectors 56 appropriate for general use in the Internet. Within the 57 requirements, implementations have latitude to define particulars 58 that best address each situation. 60 Terminology 62 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 63 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 64 document are to be interpreted as described in BCP 14, RFC 2119 65 [RFC2119]. 67 1 Introduction 69 Loss detection is a crucial activity for many protocols and 70 applications and is generally undertaken for two major reasons: 72 (1) Ensuring reliable data delivery. 74 This requires a data sender to develop an understanding of 75 which transmitted packets have not arrived at the receiver. 76 This knowledge allows the sender to retransmit missing data. 78 (2) Congestion control. 80 Packet loss is often taken as an indication that the sender 81 is transmitting too fast and is overwhelming some portion of 82 the network path. Data senders can therefore use loss to 83 trigger transmission rate reductions. 85 Various mechanisms are used to detect losses in a packet stream. 86 Often we use continuous or periodic acknowledgments from the 87 recipient to inform the sender's notion of which pieces of data are 88 missing. However, despite our best intentions and most robust 89 mechanisms we cannot place ultimate faith in receiving such 90 acknowledgments, but can only truly depend on the passage of time. 91 Therefore, our ultimate backstop to ensuring that we detect all loss 92 is a timeout. That is, the sender sets some expectation for how 93 long to wait for confirmation of delivery for a given piece of data. 94 When this time period passes without delivery confirmation the 95 sender concludes the data was lost in transit. 97 The specifics of time-based loss detection schemes represent a 98 tradeoff between correctness and responsiveness. In other words we 99 wish to simultaneously: 101 - wait long enough to ensure the detection of loss is correct, and 103 - minimize the amount of delay we impose on applications (before 104 repairing loss) and the network (before we reduce the 105 congestion). 107 Serving both of these goals is difficult as they pull in opposite 108 directions [AP99]. By not waiting long enough to accurately 109 determine a packet has been lost we risk sending unnecessary 110 ("spurious") retransmissions and needlessly lowering the 111 transmission rate. By waiting long enough that we are unambiguously 112 certain a packet has been lost we cannot repair losses in a timely 113 manner and we risk prolonging network congestion. 115 Many protocols and applications use their own time-based loss 116 detection mechanisms (e.g., TCP [RFC6298], SCTP [RFC4960], SIP 117 [RFC3261]). At this point, our experience has lead to a recognition 118 that often specific tweaks that deviate from standardized time-based 119 loss detectors do not materially impact network safety. Therefore, 120 in this document we outline a set of high-level protocol-agnostic 121 requirements for time-based loss detection. The intent is to 122 provide a safe foundation on which implementations have the 123 flexibility to instantiate mechanisms that best realize their 124 specific goals. 126 2 Context 128 This document is different from other standards documents in that it 129 is backwards from the way we generally like to engineer systems. 130 Usually, we strive to understand high-level requirements as a 131 starting point. We then methodically engineer specific protocols, 132 algorithms and systems that meet these requirements. Within the 133 standards process we have derived many time-based loss detection 134 schemes without benefit from some over-arching requirements 135 document---because we had no idea how to write such a document! 136 Therefore, we made the best specific decisions we could in response 137 to specific needs. 139 At this point, however, the community's experience has matured to 140 the point where we can define a set of high-level requirements for 141 time-based loss detection schemes. We now understand how to 142 separate the strategies these mechanisms use that are crucial for 143 network safety from those small details that do not materially 144 impact network safety. However, adding a requirements umbrella to a 145 body of existing specifications is inherently messy and we run the 146 risk of creating inconsistencies with both past and future 147 mechanisms. The correct way to view this document is as the default 148 case. Specifically: 150 - This document does not update or obsolete any existing RFC. 151 These previous specifications---while generally consistent with 152 the requirements in this document---reflect community consensus 153 and this document does not change that consensus. 155 - The requirements in this document are meant to provide for 156 network safety and, as such, SHOULD be used by all time-based 157 loss detection mechanisms. 159 - The requirements in this document may not be appropriate in all 160 cases and, therefore, inconsistent deviations may be necessary 161 (hence the "SHOULD" in the last bullet). However, 162 inconsistencies MUST be (a) explained and (b) gather consensus. 164 3 Scope 166 The principles we outline in this document are protocol-agnostic and 167 widely applicable. We make the following scope statements about 168 the application of the requirements discussed in Section 4: 170 (S.1) The requirements in this document apply only to time-based 171 loss detection. 173 While there are a bevy of uses for timers in protocols---from 174 rate-based pacing to connection failure detection and 175 beyond---these are outside the scope of this document. 177 (S.2) The requirements in this document apply only to endpoint-to- 178 endpoint unicast communication. Reliable multicast (e.g., 179 [RFC5740]) protocols are explicitly outside the scope of this 180 document. 182 Protocols such as SCTP [RFC4960] and MP-TCP [RFC6182] that 183 communicate in a unicast fashion with multiple specific 184 endpoints can leverage the requirements in this document 185 provided they track state and follow the requirements for each 186 endpoint independently. I.e., if host A communicates with 187 hosts B and C, A needs to use independent time-based loss 188 detector instances for traffic sent to B and C. 190 (S.3) There are cases where state is shared across connections 191 or flows (e.g., [RFC2140], [RFC3124]). State pertaining to 192 time-based loss detection is often discussed as sharable. 193 These situations raise issues that the simple flow-oriented 194 time-based loss detection mechanism discussed in this document 195 does not consider (e.g., how long to preserve state between 196 connections). Therefore, while the general principles given 197 in Section 4 are likely applicable, sharing time-based loss 198 detection information across flows is outside the scope of 199 this document. 201 (S.4) The requirements for time-based loss detection mechanisms in 202 this document can be applied regardless of whether the 203 mechanism is the sole loss repair strategy or works in concert 204 with other mechanisms. 206 E.g., for a simple protocol like UDP-based DNS 207 [RFC1034,RFC1035] a timeout and re-try mechanism is likely to 208 act alone to ensure reliability. 210 E.g., complex protocols like TCP or SCTP have methods to 211 detect (and repair) loss based on explicit endpoint state 212 sharing [RFC2018,RFC4960,RFC6675]. These mechanisms are 213 preferred over a time-based loss detection as they are often 214 more timely and precise than time-based schemes. In these 215 cases, a time-based scheme---called a "retransmission timeout" 216 or "RTO"---becomes a last resort when the more advanced 217 mechanisms fail. 219 E.g., some protocols may leverage more than one time-based 220 loss detector simultaneously. In these cases, the general 221 guidance in this document can be applied to all such timers. 223 4 Requirements 225 We now list the requirements that apply when designing time-based 226 loss detection mechanisms. For historical reasons and ease of 227 exposition, we refer to the time between sending a packet and 228 determining the packet has been lost due to lack of delivery 229 confirmation as the "retransmission timeout" or "RTO". However, the 230 detected loss need not be repaired (i.e., the loss could be detected 231 only for congestion control and not reliability purposes). 233 (1) As we note above, loss detection happens when a sender does not 234 receive delivery confirmation within an some expected period of 235 time. In the absence of any knowledge about the latency of a 236 path, the initial RTO MUST be conservatively set to no less than 237 1 second. 239 Correctness is of the utmost importance when transmitting into a 240 network with unknown properties because: 242 - Premature loss detection can trigger spurious retransmits that 243 could cause issues when a network is already congested. 245 - Premature loss detection can needlessly cause congestion 246 control to dramatically lower the sender's allowed 247 transmission rate---especially since the rate is already 248 likely low at this stage of the communication. Recovering 249 from such a rate change can taken a relatively long time. 251 - Finally, as discussed below, sometimes using time-based 252 loss detection and retransmissions can cause ambiguities in 253 assessing the latency of a network path. Therefore, it is 254 especially important for the first latency sample to be free 255 of ambiguities such that there is a baseline for the remainder 256 of the communication. 258 The specific constant (1 second) comes from the analysis of 259 Internet RTTs found in Appendix A of [RFC6298]. 261 (2) We now specify four requirements that pertain to setting 262 an expected time interval for delivery confirmation. 264 Often measuring the time required for delivery confirmation is 265 is framed as assessing the "round-trip time (RTT)" of the 266 network path as this is the minimum amount of time required to 267 receive delivery confirmation and also often follows protocol 268 behavior whereby acknowledgments are generated quickly after 269 data arrives. For instance, this is the case for the RTO used 270 by TCP [RFC6298] and SCTP [RFC4960]. However, this is somewhat 271 mis-leading and the expected latency is better framed as the 272 "feedback time" (FT). In other words, the expectation is not 273 always simply a network property, but can include additional 274 time before a sender should reasonably expect a response. 276 For instance, consider a UDP-based DNS request from a client to 277 a recursive resolver. When the request can be served from the 278 resolver's cache the FT likely well approximates the network RTT 279 between the client and resolver. However, on a cache miss the 280 resolver will request the needed information from one or more 281 authoritative DNS servers, which will non-trivially increase the 282 FT compared to the network RTT between the client and resolver. 284 Therefore, we express the requirements in terms of FT. Again, 285 for ease of exposition we use "RTO" to indicate the interval 286 between a packet transmission and the decision the packet has 287 been lost---regardless of whether the packet will be 288 retransmitted. 290 (a) In steady state the RTO SHOULD be set based on observations 291 of both the FT and the variance of the FT. 293 In other words, the RTO should represent an empirically- 294 derived reasonable amount of time that the sender should 295 wait for delivery confirmation before deciding the given 296 data is lost. Networks are inherently dynamic and therefore 297 it is crucial to allow for some variance in the FT when 298 developing the expectation. 300 (b) FT observations SHOULD be taken and incorporated into the 301 RTO at least once per RTT or as frequently as data is 302 exchanged in cases where that happens less frequently than 303 once per RTT. 305 Internet measurements show that taking only a single FT 306 sample per TCP connection results in a relatively poorly 307 performing RTO mechanism [AP99], hence this requirement that 308 the FT be sampled continuously throughout the lifetime of 309 communication. 311 As an example, TCP takes an FT sample roughly once per RTT, 312 or if using the timestamp option [RFC7323] on each 313 acknowledgment arrival. [AP99] shows that both these 314 approaches result in roughly equivalent performance for the 315 RTO estimator. 317 (c) FT observations MAY be taken from non-data exchanges. 319 Some protocols use keepalives, heartbeats or other messages 320 to exchange control information. To the extent that the 321 latency of these transactions mirrors data exchange, they 322 can be leveraged to take FT samples within the RTO 323 mechanism. Such samples can help protocols keep their RTO 324 accurate during lulls in data transmission. However, given 325 that these messages may not be subject to the same delays as 326 data transmission, we do not take a general view on whether 327 this is useful or not. 329 (d) An RTO mechanism MUST NOT use ambiguous FT samples. 331 Assume two copies of some segment X are transmitted at times 332 t0 and t1 and then at time t2 the sender receives 333 confirmation that X in fact arrived. In some cases, it is 334 not clear which copy of X triggered the confirmation and 335 hence the actual FT is either t2-t1 or t2-t0, but which is a 336 mystery. Therefore, in this situation an implementation 337 MUST use Karn's algorithm [KP87,RFC6298] and use neither 338 version of the FT sample and hence not update the RTO. 340 There are cases where two copies of some data are 341 transmitted in a way whereby the sender can tell which is 342 being acknowledged by an incoming ACK. E.g., TCP's 343 timestamp option [RFC7323] allows for segments to be 344 uniquely identified and hence avoid the ambiguity. In such 345 cases there is no ambiguity and the resulting samples can 346 update the RTO. 348 (3) Each time the RTO is used to detect a loss, the value of the RTO 349 MUST be exponentially backed off such that the next firing 350 requires a longer interval. The backoff SHOULD be removed after 351 either (a) the subsequent successful transmission of 352 non-retransmitted data, or (b) an RTO passes without detecting 353 additional losses. The former will generally be quicker. The 354 latter covers cases where loss is detected, but not repaired. 356 A maximum value MAY be placed on the RTO. The maximum RTO MUST 357 NOT be less than 60 seconds (as specified in [RFC6298]). 359 This ensures network safety. 361 (4) Loss detected by the RTO mechanism MUST be taken as an 362 indication of network congestion and the sending rate adapted 363 using a standard mechanism (e.g., TCP collapses the congestion 364 window to one segment [RFC5681]). 366 This ensures network safety. 368 An exception to this rule is if an IETF standardized mechanism 369 determines that a particular loss is due to a non-congestion 370 event (e.g., packet corruption). In such a case a congestion 371 control action is not required. Additionally, congestion 372 control actions taken based on time-based loss detection could 373 be reversed when a standard mechanism post-facto determines that 374 the cause of the loss was not congestion (e.g., [RFC5682]). 376 5 Discussion 377 We note that research has shown the tension between the 378 responsiveness and correctness of time-based loss detection seems to 379 be a fundamental tradeoff in the context of TCP [AP99]. That is, 380 making the RTO more aggressive (e.g., via changing TCP's EWMA gains, 381 lowering the minimum RTO, etc.) can reduce the time required to 382 detect actual loss. However, at the same time, such aggressiveness 383 leads to more cases of mistakenly declaring packets lost that 384 ultimately arrived at the receiver. Therefore, being as aggressive 385 as the requirements given in the previous section allow in any 386 particular situation may not be the best course of action because 387 detecting loss---even if falsely---carries a requirement to invoke a 388 congestion response which will ultimately reduce the transmission 389 rate. 391 While the tradeoff between responsiveness and correctness seems 392 fundamental, the tradeoff can be made less relevant if the sender 393 can detect and recover from mistaken loss detection. Several 394 mechanisms have been proposed for this purpose, such as Eifel 395 [RFC3522], F-RTO [RFC5682] and DSACK [RFC2883,RFC3708]. Using such 396 mechanisms may allow a data originator to tip towards being more 397 responsive without incurring (as much of) the attendant costs of 398 mistakenly declaring packets to be lost. 400 Also, note, that in addition to the experiments discussed in [AP99], 401 the Linux TCP implementation has been using various non-standard RTO 402 mechanisms for many years seemingly without large scale problems 403 (e.g., using different EWMA gains than specified in [RFC6298]). 404 Further, a number of implementations use minimum RTOs that are less 405 than the 1 second specified in [RFC6298]. While the implication of 406 these deviations from the standard may be more spurious retransmits 407 (per [AP99]), we are aware of no large scale network safety issues 408 caused by this change to the minimum RTO. 410 Finally, we note that while allowing implementations to be more 411 aggressive could in fact increase the number of needless 412 retransmissions the above requirements fail safe in that they insist 413 on exponential backoff and a transmission rate reduction. 414 Therefore, providing implementers more latitude than they have 415 traditionally been given in IETF specifications of RTO mechanisms 416 does not somehow open the flood gates to aggressive behavior. Since 417 there is a downside to being aggressive the incentives for proper 418 behavior are retained in the mechanism. 420 6 Security Considerations 422 This document does not alter the security properties of time-based 423 loss detection mechanisms. See [RFC6298] for a discussion of these 424 within the context of TCP. 426 Acknowledgments 428 This document benefits from years of discussions with Ethan Blanton, 429 Sally Floyd, Jana Iyengar, Shawn Ostermann, Vern Paxson, and the 430 members of the TCPM and TCP-IMPL working groups. Ran Atkinson, 431 Yuchung Cheng, David Black, Gorry Fairhurst, Rahul Arvind Jadhav, 432 Mirja Kuhlewind, Nicolas Kuhn, Jonathan Looney and Michael Scharf 433 provided useful comments on previous versions of this document. 435 Normative References 437 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 438 Requirement Levels", BCP 14, RFC 2119, March 1997. 440 Informative References 442 [AP99] Allman, M., V. Paxson, "On Estimating End-to-End Network Path 443 Properties", Proceedings of the ACM SIGCOMM Technical Symposium, 444 September 1999. 446 [KP87] Karn, P. and C. Partridge, "Improving Round-Trip Time 447 Estimates in Reliable Transport Protocols", SIGCOMM 87. 449 [RFC1034] Mockapetris, P. "Domain Names - Concepts and Facilities", 450 RFC 1034, November 1987. 452 [RFC1035] Mockapetris, P. "Domain Names - Implementation and 453 Specification", RFC 1035, November 1987. 455 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 456 Selective Acknowledgment Options", RFC 2018, October 1996. 458 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, 459 April 1997. 461 [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An 462 Extension to the Selective Acknowledgement (SACK) Option for 463 TCP", RFC 2883, July 2000. 465 [RFC3124] Balakrishnan, H., S. Seshan, "The Congestion Manager", RFC 466 2134, June 2001. 468 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 469 A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, 470 "SIP: Session Initiation Protocol", RFC 3261, June 2002. 472 [RFC3522] Ludwig, R., M. Meyer, "The Eifel Detection Algorithm for 473 TCP", RFC 3522, april 2003. 475 [RFC3708] Blanton, E., M. Allman, "Using TCP Duplicate Selective 476 Acknowledgement (DSACKs) and Stream Control Transmission 477 Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs) 478 to Detect Spurious Retransmissions", RFC 3708, February 2004. 480 [RFC3940] Adamson, B., C. Bormann, M. Handley, J. Macker, 481 "Negative-acknowledgment (NACK)-Oriented Reliable Multicast 482 (NORM) Protocol", November 2004, RFC 3940. 484 [RFC4340] Kohler, E., M. Handley, S. Floyd, "Datagram Congestion 485 Control Protocol (DCCP)", March 2006, RFC 4340. 487 [RFC4960] Stweart, R., "Stream Control Transmission Protocol", RFC 488 4960, September 2007. 490 [RFC5682] Sarolahti, P., M. Kojo, K. Yamamoto, M. Hata, "Forward 491 RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious 492 Retransmission Timeouts with TCP", RFC 5682, September 2009. 494 [RFC5740] Adamson, B., C. Bormann, M. Handley, J. Macker, 495 "NACK-Oriented Reliable Multicast (NORM) Transport Protocol", 496 November 2009, RFC 5740. 498 [RFC6182] Ford, A., C. Raiciu, M. Handley, S. Barre, J. Iyengar, 499 "Architectural Guidelines for Multipath TCP Development", March 500 2011, RFC 6182. 502 [RFC6298] Paxson, V., M. Allman, H.K. Chu, M. Sargent, "Computing 503 TCP's Retransmission Timer", June 2011, RFC 6298. 505 [RFC6582] Henderson, T., S. Floyd, A. Gurtov, Y. Nishida, "The 506 NewReno Modification to TCP's Fast Recovery Algorithm", April 507 2012, RFC 6582. 509 [RFC6675] Blanton, E., M. Allman, L. Wang, I. Jarvinen, M. Kojo, 510 Y. Nishida, "A Conservative Loss Recovery Algorithm Based on 511 Selective Acknowledgment (SACK) for TCP", August 2012, RFC 6675. 513 [RFC7323] Borman D., B. Braden, V. Jacobson, R. Scheffenegger, "TCP 514 Extensions for High Performance", September 2014, RFC 7323. 516 Authors' Addresses 518 Mark Allman 519 International Computer Science Institute 520 1947 Center St. Suite 600 521 Berkeley, CA 94704 523 EMail: mallman@icir.org 524 http://www.icir.org/mallman