idnits 2.17.1 draft-ietf-quic-recovery-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([2], [3], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 03, 2018) is 2032 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1393 -- Looks like a reference, but probably isn't: '2' on line 1395 -- Looks like a reference, but probably isn't: '3' on line 1397 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-15 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft Fastly 4 Intended status: Standards Track I. Swett, Ed. 5 Expires: April 6, 2019 Google 6 October 03, 2018 8 QUIC Loss Detection and Congestion Control 9 draft-ietf-quic-recovery-15 11 Abstract 13 This document describes loss detection and congestion control 14 mechanisms for QUIC. 16 Note to Readers 18 Discussion of this draft takes place on the QUIC working group 19 mailing list (quic@ietf.org), which is archived at 20 https://mailarchive.ietf.org/arch/search/?email_list=quic [1]. 22 Working Group information can be found at https://github.com/quicwg 23 [2]; source code and issues list for this draft can be found at 24 https://github.com/quicwg/base-drafts/labels/-recovery [3]. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on April 6, 2019. 43 Copyright Notice 45 Copyright (c) 2018 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 62 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 63 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 64 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 5 65 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 5 66 3.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6 67 3.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 6 68 3.1.5. Explicit Correction For Delayed ACKs . . . . . . . . 6 69 4. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 7 70 4.1. Computing the RTT estimate . . . . . . . . . . . . . . . 7 71 4.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 7 72 4.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 7 73 4.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 8 74 4.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 9 75 4.3.1. Crypto Handshake Timeout . . . . . . . . . . . . . . 9 76 4.3.2. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 10 77 4.3.3. Retransmission Timeout . . . . . . . . . . . . . . . 11 78 4.4. Generating Acknowledgements . . . . . . . . . . . . . . . 12 79 4.4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . 12 80 4.4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . 13 81 4.4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . 13 82 4.5. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 13 83 4.5.1. Constants of interest . . . . . . . . . . . . . . . . 13 84 4.5.2. Variables of interest . . . . . . . . . . . . . . . . 14 85 4.5.3. Initialization . . . . . . . . . . . . . . . . . . . 15 86 4.5.4. On Sending a Packet . . . . . . . . . . . . . . . . . 16 87 4.5.5. On Receiving an Acknowledgment . . . . . . . . . . . 17 88 4.5.6. On Packet Acknowledgment . . . . . . . . . . . . . . 19 89 4.5.7. Setting the Loss Detection Timer . . . . . . . . . . 19 90 4.5.8. On Timeout . . . . . . . . . . . . . . . . . . . . . 21 91 4.5.9. Detecting Lost Packets . . . . . . . . . . . . . . . 22 92 4.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 23 93 5. Congestion Control . . . . . . . . . . . . . . . . . . . . . 23 94 5.1. Explicit Congestion Notification . . . . . . . . . . . . 24 95 5.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 24 96 5.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 24 97 5.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 24 98 5.5. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 25 99 5.6. Retransmission Timeout . . . . . . . . . . . . . . . . . 25 100 5.7. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 25 101 5.8. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 26 102 5.8.1. Constants of interest . . . . . . . . . . . . . . . . 26 103 5.8.2. Variables of interest . . . . . . . . . . . . . . . . 26 104 5.8.3. Initialization . . . . . . . . . . . . . . . . . . . 27 105 5.8.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 27 106 5.8.5. On Packet Acknowledgement . . . . . . . . . . . . . . 27 107 5.8.6. On New Congestion Event . . . . . . . . . . . . . . . 28 108 5.8.7. Process ECN Information . . . . . . . . . . . . . . . 28 109 5.8.8. On Packets Lost . . . . . . . . . . . . . . . . . . . 28 110 5.8.9. On Retransmission Timeout Verified . . . . . . . . . 29 111 6. Security Considerations . . . . . . . . . . . . . . . . . . . 29 112 6.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 29 113 6.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 29 114 6.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 29 115 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 116 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 117 8.1. Normative References . . . . . . . . . . . . . . . . . . 30 118 8.2. Informative References . . . . . . . . . . . . . . . . . 30 119 8.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 31 120 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 32 121 A.1. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 32 122 A.2. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 32 123 A.3. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 32 124 A.4. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 32 125 A.5. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 32 126 A.6. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 33 127 A.7. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 33 128 A.8. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 33 129 A.9. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 33 130 A.10. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 33 131 A.11. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 33 132 A.12. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 33 133 A.13. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 33 134 A.14. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 34 135 A.15. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 34 136 A.16. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 34 137 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 34 138 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34 140 1. Introduction 142 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 143 on decades of transport and security experience, and implements 144 mechanisms that make it attractive as a modern general-purpose 145 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 147 QUIC implements the spirit of known TCP loss recovery mechanisms, 148 described in RFCs, various Internet-drafts, and also those prevalent 149 in the Linux TCP implementation. This document describes QUIC 150 congestion control and loss recovery, and where applicable, 151 attributes the TCP equivalent in RFCs, Internet-drafts, academic 152 papers, and/or TCP implementations. 154 2. Conventions and Definitions 156 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 157 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 158 "OPTIONAL" in this document are to be interpreted as described in BCP 159 14 [RFC2119] [RFC8174] when, and only when, they appear in all 160 capitals, as shown here. 162 Definitions of terms that are used in this document: 164 ACK-only: Any packet containing only an ACK frame. 166 In-flight: Packets are considered in-flight when they have been sent 167 and neither acknowledged nor declared lost, and they are not ACK- 168 only. 170 Retransmittable Frames: All frames besides ACK or PADDING are 171 considered retransmittable. 173 Retransmittable Packets: Packets that contain retransmittable frames 174 elicit an ACK from the receiver and are called retransmittable 175 packets. 177 3. Design of the QUIC Transmission Machinery 179 All transmissions in QUIC are sent with a packet-level header, which 180 indicates the encryption level and includes a packet sequence number 181 (referred to below as a packet number). The encryption level 182 indicates the packet number space, as described in [QUIC-TRANSPORT]. 183 Packet numbers never repeat within a packet number space for the 184 lifetime of a connection. Packet numbers monotonically increase 185 within a space, preventing ambiguity. 187 This design obviates the need for disambiguating between 188 transmissions and retransmissions and eliminates significant 189 complexity from QUIC's interpretation of TCP loss detection 190 mechanisms. 192 QUIC packets can contain multiple frames of different types. The 193 recovery mechanisms ensure that data and frames that need reliable 194 delivery are acknowledged or declared lost and sent in new packets as 195 necessary. The types of frames contained in a packet affect recovery 196 and congestion control logic: 198 o All packets are acknowledged, though packets that contain only ACK 199 and PADDING frames are not acknowledged immediately. 201 o Long header packets that contain CRYPTO frames are critical to the 202 performance of the QUIC handshake and use shorter timers for 203 acknowledgement and retransmission. 205 o Packets that contain only ACK frames do not count toward 206 congestion control limits and are not considered in-flight. Note 207 that this means PADDING frames cause packets to contribute toward 208 bytes in flight without directly causing an acknowledgment to be 209 sent. 211 3.1. Relevant Differences Between QUIC and TCP 213 Readers familiar with TCP's loss detection and congestion control 214 will find algorithms here that parallel well-known TCP ones. 215 Protocol differences between QUIC and TCP however contribute to 216 algorithmic differences. We briefly describe these protocol 217 differences below. 219 3.1.1. Separate Packet Number Spaces 221 QUIC uses separate packet number spaces for each encryption level, 222 except 0-RTT and all generations of 1-RTT keys use the same packet 223 number space. Separate packet number spaces ensures acknowledgement 224 of packets sent with one level of encryption will not cause spurious 225 retransmission of packets sent with a different encryption level. 226 Congestion control and RTT measurement are unified across packet 227 number spaces. 229 3.1.2. Monotonically Increasing Packet Numbers 231 TCP conflates transmission sequence number at the sender with 232 delivery sequence number at the receiver, which results in 233 retransmissions of the same data carrying the same sequence number, 234 and consequently to problems caused by "retransmission ambiguity". 236 QUIC separates the two: QUIC uses a packet number for transmissions, 237 and any application data is sent in one or more streams, with 238 delivery order determined by stream offsets encoded within STREAM 239 frames. 241 QUIC's packet number is strictly increasing, and directly encodes 242 transmission order. A higher QUIC packet number signifies that the 243 packet was sent later, and a lower QUIC packet number signifies that 244 the packet was sent earlier. When a packet containing frames is 245 deemed lost, QUIC rebundles necessary frames in a new packet with a 246 new packet number, removing ambiguity about which packet is 247 acknowledged when an ACK is received. Consequently, more accurate 248 RTT measurements can be made, spurious retransmissions are trivially 249 detected, and mechanisms such as Fast Retransmit can be applied 250 universally, based only on packet number. 252 This design point significantly simplifies loss detection mechanisms 253 for QUIC. Most TCP mechanisms implicitly attempt to infer 254 transmission ordering based on TCP sequence numbers - a non-trivial 255 task, especially when TCP timestamps are not available. 257 3.1.3. No Reneging 259 QUIC ACKs contain information that is similar to TCP SACK, but QUIC 260 does not allow any acked packet to be reneged, greatly simplifying 261 implementations on both sides and reducing memory pressure on the 262 sender. 264 3.1.4. More ACK Ranges 266 QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In 267 high loss environments, this speeds recovery, reduces spurious 268 retransmits, and ensures forward progress without relying on 269 timeouts. 271 3.1.5. Explicit Correction For Delayed ACKs 273 QUIC ACKs explicitly encode the delay incurred at the receiver 274 between when a packet is received and when the corresponding ACK is 275 sent. This allows the receiver of the ACK to adjust for receiver 276 delays, specifically the delayed ack timer, when estimating the path 277 RTT. This mechanism also allows a receiver to measure and report the 278 delay from when a packet was received by the OS kernel, which is 279 useful in receivers which may incur delays such as context-switch 280 latency before a userspace QUIC receiver processes a received packet. 282 4. Loss Detection 284 QUIC senders use both ack information and timeouts to detect lost 285 packets, and this section provides a description of these algorithms. 286 Estimating the network round-trip time (RTT) is critical to these 287 algorithms and is described first. 289 4.1. Computing the RTT estimate 291 RTT is calculated when an ACK frame arrives by computing the 292 difference between the current time and the time the largest newly 293 acked packet was sent. If no packets are newly acknowledged, RTT 294 cannot be calculated. When RTT is calculated, the ack delay field 295 from the ACK frame SHOULD be subtracted from the RTT as long as the 296 result is larger than the Min RTT. If the result is smaller than the 297 min_rtt, the RTT should be used, but the ack delay field should be 298 ignored. 300 Like TCP, QUIC calculates both smoothed RTT and RTT variance similar 301 to those specified in [RFC6298]. 303 Min RTT is the minimum RTT measured over the connection, prior to 304 adjusting by ack delay. Ignoring ack delay for min RTT prevents 305 intentional or unintentional underestimation of min RTT, which in 306 turn prevents underestimating smoothed RTT. 308 4.2. Ack-based Detection 310 Ack-based loss detection implements the spirit of TCP's Fast 311 Retransmit [RFC5681], Early Retransmit [RFC5827], FACK, and SACK loss 312 recovery [RFC6675]. This section provides an overview of how these 313 algorithms are implemented in QUIC. 315 4.2.1. Fast Retransmit 317 An unacknowledged packet is marked as lost when an acknowledgment is 318 received for a packet that was sent a threshold number of packets 319 (kReorderingThreshold) and/or a threshold amount of time after the 320 unacknowledged packet. Receipt of the acknowledgement indicates that 321 a later packet was received, while the reordering threshold provides 322 some tolerance for reordering of packets in the network. 324 The RECOMMENDED initial value for kReorderingThreshold is 3, based on 325 TCP loss recovery [RFC5681] [RFC6675]. Some networks may exhibit 326 higher degrees of reordering, causing a sender to detect spurious 327 losses. Spuriously declaring packets lost leads to unnecessary 328 retransmissions and may result in degraded performance due to the 329 actions of the congestion controller upon detecting loss. 331 Implementers MAY use algorithms developed for TCP, such as TCP-NCR 332 [RFC4653], to improve QUIC's reordering resilience. 334 QUIC implementations can use time-based loss detection to handle 335 reordering based on time elapsed since the packet was sent. This may 336 be used either as a replacement for a packet reordering threshold or 337 in addition to it. The RECOMMENDED time threshold, expressed as a 338 fraction of the round-trip time (kTimeReorderingFraction), is 1/8. 340 4.2.2. Early Retransmit 342 Unacknowledged packets close to the tail may have fewer than 343 kReorderingThreshold retransmittable packets sent after them. Loss 344 of such packets cannot be detected via Fast Retransmit. To enable 345 ack-based loss detection of such packets, receipt of an 346 acknowledgment for the last outstanding retransmittable packet 347 triggers the Early Retransmit process, as follows. 349 If there are unacknowledged in-flight packets still pending, they 350 should be marked as lost. To compensate for the reduced reordering 351 resilience, the sender SHOULD set a timer for a small period of time. 352 If the unacknowledged in-flight packets are not acknowledged during 353 this time, then these packets MUST be marked as lost. 355 An endpoint SHOULD set the timer such that a packet is marked as lost 356 no earlier than 1.125 * max(SRTT, latest_RTT) since when it was sent. 358 Using max(SRTT, latest_RTT) protects from the two following cases: 360 o the latest RTT sample is lower than the SRTT, perhaps due to 361 reordering where packet whose ack triggered the Early Retransit 362 process encountered a shorter path; 364 o the latest RTT sample is higher than the SRTT, perhaps due to a 365 sustained increase in the actual RTT, but the smoothed SRTT has 366 not yet caught up. 368 The 1.125 multiplier increases reordering resilience. Implementers 369 MAY experiment with using other multipliers, bearing in mind that a 370 lower multiplier reduces reordering resilience and increases spurious 371 retransmissions, and a higher multiplier increases loss recovery 372 delay. 374 This mechanism is based on Early Retransmit for TCP [RFC5827]. 375 However, [RFC5827] does not include the timer described above. Early 376 Retransmit is prone to spurious retransmissions due to its reduced 377 reordering resilence without the timer. This observation led Linux 378 TCP implementers to implement a timer for TCP as well, and this 379 document incorporates this advancement. 381 4.3. Timer-based Detection 383 Timer-based loss detection recovers from losses that cannot be 384 handled by ack-based loss detection. It uses a single timer which 385 switches between a handshake retransmission timer, a Tail Loss Probe 386 timer and Retransmission Timeout mechanisms. 388 4.3.1. Crypto Handshake Timeout 390 Data in CRYPTO frames is critical to QUIC transport and crypto 391 negotiation, so a more aggressive timeout is used to retransmit it. 392 Below, the term "handshake packet" is used to refer to packets 393 containing CRYPTO frames, not packets with the specific long header 394 packet type Handshake. 396 The initial handshake timeout SHOULD be set to twice the initial RTT. 398 At the beginning, there are no prior RTT samples within a connection. 399 Resumed connections over the same network SHOULD use the previous 400 connection's final smoothed RTT value as the resumed connection's 401 initial RTT. 403 If no previous RTT is available, or if the network changes, the 404 initial RTT SHOULD be set to 100ms. 406 When CRYPTO frames are sent, the sender SHOULD set a timer for the 407 handshake timeout period. Upon timeout, the sender MUST retransmit 408 all unacknowledged CRYPTO data by calling 409 RetransmitAllUnackedHandshakeData(). On each consecutive expiration 410 of the handshake timer without receiving an acknowledgement for a new 411 packet, the sender SHOULD double the handshake timeout and set a 412 timer for this period. 414 When CRYPTO frames are outstanding, the TLP and RTO timers are not 415 active unless the CRYPTO frames were sent at 1-RTT encryption. 417 When an acknowledgement is received for a handshake packet, the new 418 RTT is computed and the timer SHOULD be set for twice the newly 419 computed smoothed RTT. 421 4.3.1.1. Retry and Version Negotiation 423 A Retry or Version Negotiation packet causes a client to send another 424 Initial packet, effectively restarting the connection process. 426 Either packet indicates that the Initial was received but not 427 processed. Neither packet can be treated as an acknowledgment for 428 the Initial, but they MAY be used to improve the RTT estimate. 430 4.3.2. Tail Loss Probe 432 The algorithm described in this section is an adaptation of the Tail 433 Loss Probe algorithm proposed for TCP [TLP]. 435 A packet sent at the tail is particularly vulnerable to slow loss 436 detection, since acks of subsequent packets are needed to trigger 437 ack-based detection. To ameliorate this weakness of tail packets, 438 the sender schedules a timer when the last retransmittable packet 439 before quiescence is transmitted. Upon timeout, a Tail Loss Probe 440 (TLP) packet is sent to evoke an acknowledgement from the receiver. 442 The timer duration, or Probe Timeout (PTO), is set based on the 443 following conditions: 445 o PTO SHOULD be scheduled for max(1.5*SRTT+MaxAckDelay, 446 kMinTLPTimeout) 448 o If RTO (Section 4.3.3) is earlier, schedule a TLP in its place. 449 That is, PTO SHOULD be scheduled for min(RTO, PTO). 451 QUIC includes MaxAckDelay in all probe timeouts, because it assumes 452 the ack delay may come into play, regardless of the number of packets 453 outstanding. TCP's TLP assumes if at least 2 packets are 454 outstanding, acks will not be delayed. 456 A PTO value of at least 1.5*SRTT ensures that the ACK is overdue. 457 The 1.5 is based on [TLP], but implementations MAY experiment with 458 other constants. 460 To reduce latency, it is RECOMMENDED that the sender set and allow 461 the TLP timer to fire twice before setting an RTO timer. In other 462 words, when the TLP timer expires the first time, a TLP packet is 463 sent, and it is RECOMMENDED that the TLP timer be scheduled for a 464 second time. When the TLP timer expires the second time, a second 465 TLP packet is sent, and an RTO timer SHOULD be scheduled 466 Section 4.3.3. 468 A TLP packet SHOULD carry new data when possible. If new data is 469 unavailable or new data cannot be sent due to flow control, a TLP 470 packet MAY retransmit unacknowledged data to potentially reduce 471 recovery time. Since a TLP timer is used to send a probe into the 472 network prior to establishing any packet loss, prior unacknowledged 473 packets SHOULD NOT be marked as lost when a TLP timer expires. 475 A sender may not know that a packet being sent is a tail packet. 476 Consequently, a sender may have to arm or adjust the TLP timer on 477 every sent retransmittable packet. 479 4.3.3. Retransmission Timeout 481 A Retransmission Timeout (RTO) timer is the final backstop for loss 482 detection. The algorithm used in QUIC is based on the RTO algorithm 483 for TCP [RFC5681] and is additionally resilient to spurious RTO 484 events [RFC5682]. 486 When the last TLP packet is sent, a timer is set for the RTO period. 487 When this timer expires, the sender sends two packets, to evoke 488 acknowledgements from the receiver, and restarts the RTO timer. 490 Similar to TCP [RFC6298], the RTO period is set based on the 491 following conditions: 493 o When the final TLP packet is sent, the RTO period is set to 494 max(SRTT + 4*RTTVAR + MaxAckDelay, kMinRTOTimeout) 496 o When an RTO timer expires, the RTO period is doubled. 498 The sender typically has incurred a high latency penalty by the time 499 an RTO timer expires, and this penalty increases exponentially in 500 subsequent consecutive RTO events. Sending a single packet on an RTO 501 event therefore makes the connection very sensitive to single packet 502 loss. Sending two packets instead of one significantly increases 503 resilience to packet drop in both directions, thus reducing the 504 probability of consecutive RTO events. 506 QUIC's RTO algorithm differs from TCP in that the firing of an RTO 507 timer is not considered a strong enough signal of packet loss, so 508 does not result in an immediate change to congestion window or 509 recovery state. An RTO timer expires only when there's a prolonged 510 period of network silence, which could be caused by a change in the 511 underlying network RTT. 513 QUIC also diverges from TCP by including MaxAckDelay in the RTO 514 period. Since QUIC corrects for this delay in its SRTT and RTTVAR 515 computations, it is necessary to add this delay explicitly in the TLP 516 and RTO computation. 518 When an acknowledgment is received for a packet sent on an RTO event, 519 any unacknowledged packets with lower packet numbers than those 520 acknowledged MUST be marked as lost. If an acknowledgement for a 521 packet sent on an RTO is received at the same time packets sent prior 522 to the first RTO are acknowledged, the RTO is considered spurious and 523 standard loss detection rules apply. 525 A packet sent when an RTO timer expires MAY carry new data if 526 available or unacknowledged data to potentially reduce recovery time. 527 Since this packet is sent as a probe into the network prior to 528 establishing any packet loss, prior unacknowledged packets SHOULD NOT 529 be marked as lost. 531 A packet sent on an RTO timer MUST NOT be blocked by the sender's 532 congestion controller. A sender MUST however count these bytes as 533 additional bytes in flight, since this packet adds network load 534 without establishing packet loss. 536 4.4. Generating Acknowledgements 538 QUIC SHOULD delay sending acknowledgements in response to packets, 539 but MUST NOT excessively delay acknowledgements of packets containing 540 frames other than ACK. Specifically, implementations MUST attempt to 541 enforce a maximum ack delay to avoid causing the peer spurious 542 timeouts. The maximum ack delay is communicated in the 543 "max_ack_delay" transport parameter and the default value is 25ms. 545 An acknowledgement SHOULD be sent immediately upon receipt of a 546 second packet but the delay SHOULD NOT exceed the maximum ack delay. 547 QUIC recovery algorithms do not assume the peer generates an 548 acknowledgement immediately when receiving a second full-packet. 550 Out-of-order packets SHOULD be acknowledged more quickly, in order to 551 accelerate loss recovery. The receiver SHOULD send an immediate ACK 552 when it receives a new packet which is not one greater than the 553 largest received packet number. 555 Similarly, packets marked with the ECN Congestion Experienced (CE) 556 codepoint in the IP header SHOULD be acknowledged immediately, to 557 reduce the peer's response time to congestion events. 559 As an optimization, a receiver MAY process multiple packets before 560 sending any ACK frames in response. In this case they can determine 561 whether an immediate or delayed acknowledgement should be generated 562 after processing incoming packets. 564 4.4.1. Crypto Handshake Data 566 In order to quickly complete the handshake and avoid spurious 567 retransmissions due to handshake timeouts, handshake packets SHOULD 568 use a very short ack delay, such as 1ms. ACK frames MAY be sent 569 immediately when the crypto stack indicates all data for that 570 encryption level has been received. 572 4.4.2. ACK Ranges 574 When an ACK frame is sent, one or more ranges of acknowledged packets 575 are included. Including older packets reduces the chance of spurious 576 retransmits caused by losing previously sent ACK frames, at the cost 577 of larger ACK frames. 579 ACK frames SHOULD always acknowledge the most recently received 580 packets, and the more out-of-order the packets are, the more 581 important it is to send an updated ACK frame quickly, to prevent the 582 peer from declaring a packet as lost and spuriously retransmitting 583 the frames it contains. 585 Below is one recommended approach for determining what packets to 586 include in an ACK frame. 588 4.4.3. Receiver Tracking of ACK Frames 590 When a packet containing an ACK frame is sent, the largest 591 acknowledged in that frame may be saved. When a packet containing an 592 ACK frame is acknowledged, the receiver can stop acknowledging 593 packets less than or equal to the largest acknowledged in the sent 594 ACK frame. 596 In cases without ACK frame loss, this algorithm allows for a minimum 597 of 1 RTT of reordering. In cases with ACK frame loss, this approach 598 does not guarantee that every acknowledgement is seen by the sender 599 before it is no longer included in the ACK frame. Packets could be 600 received out of order and all subsequent ACK frames containing them 601 could be lost. In this case, the loss recovery algorithm may cause 602 spurious retransmits, but the sender will continue making forward 603 progress. 605 4.5. Pseudocode 607 4.5.1. Constants of interest 609 Constants used in loss recovery are based on a combination of RFCs, 610 papers, and common practice. Some may need to be changed or 611 negotiated in order to better suit a variety of environments. 613 kMaxTLPs: Maximum number of tail loss probes before an RTO expires. 614 The RECOMMENDED value is 2. 616 kReorderingThreshold: Maximum reordering in packet number space 617 before FACK style loss detection considers a packet lost. The 618 RECOMMENDED value is 3. 620 kTimeReorderingFraction: Maximum reordering in time space before 621 time based loss detection considers a packet lost. In fraction of 622 an RTT. The RECOMMENDED value is 1/8. 624 kUsingTimeLossDetection: Whether time based loss detection is in 625 use. If false, uses FACK style loss detection. The RECOMMENDED 626 value is false. 628 kMinTLPTimeout: Minimum time in the future a tail loss probe timer 629 may be set for. The RECOMMENDED value is 10ms. 631 kMinRTOTimeout: Minimum time in the future an RTO timer may be set 632 for. The RECOMMENDED value is 200ms. 634 kDelayedAckTimeout: The length of the peer's delayed ack timer. The 635 RECOMMENDED value is 25ms. 637 kInitialRtt: The RTT used before an RTT sample is taken. The 638 RECOMMENDED value is 100ms. 640 4.5.2. Variables of interest 642 Variables required to implement the congestion control mechanisms are 643 described in this section. 645 loss_detection_timer: Multi-modal timer used for loss detection. 647 handshake_count: The number of times all unacknowledged handshake 648 data has been retransmitted without receiving an ack. 650 tlp_count: The number of times a tail loss probe has been sent 651 without receiving an ack. 653 rto_count: The number of times an RTO has been sent without 654 receiving an ack. 656 largest_sent_before_rto: The last packet number sent prior to the 657 first retransmission timeout. 659 time_of_last_sent_retransmittable_packet: The time the most recent 660 retransmittable packet was sent. 662 time_of_last_sent_handshake_packet: The time the most recent packet 663 containing a CRYPTO frame was sent. 665 largest_sent_packet: The packet number of the most recently sent 666 packet. 668 largest_acked_packet: The largest packet number acknowledged in an 669 ACK frame. 671 latest_rtt: The most recent RTT measurement made when receiving an 672 ack for a previously unacked packet. 674 smoothed_rtt: The smoothed RTT of the connection, computed as 675 described in [RFC6298] 677 rttvar: The RTT variance, computed as described in [RFC6298] 679 min_rtt: The minimum RTT seen in the connection, ignoring ack delay. 681 max_ack_delay: The maximum amount of time by which the receiver 682 intends to delay acknowledgments, in milliseconds. The actual 683 ack_delay in a received ACK frame may be larger due to late 684 timers, reordering, or lost ACKs. 686 reordering_threshold: The largest packet number gap between the 687 largest acknowledged retransmittable packet and an unacknowledged 688 retransmittable packet before it is declared lost. 690 time_reordering_fraction: The reordering window as a fraction of 691 max(smoothed_rtt, latest_rtt). 693 loss_time: The time at which the next packet will be considered lost 694 based on early transmit or exceeding the reordering window in 695 time. 697 sent_packets: An association of packet numbers to information about 698 them, including a number field indicating the packet number, a 699 time field indicating the time a packet was sent, a boolean 700 indicating whether the packet is ack-only, a boolean indicating 701 whether it counts towards bytes in flight, and a bytes field 702 indicating the packet's size. sent_packets is ordered by packet 703 number, and packets remain in sent_packets until acknowledged or 704 lost. A sent_packets data structure is maintained per packet 705 number space, and ACK processing only applies to a single space. 707 4.5.3. Initialization 709 At the beginning of the connection, initialize the loss detection 710 variables as follows: 712 loss_detection_timer.reset() 713 handshake_count = 0 714 tlp_count = 0 715 rto_count = 0 716 if (kUsingTimeLossDetection) 717 reordering_threshold = infinite 718 time_reordering_fraction = kTimeReorderingFraction 719 else: 720 reordering_threshold = kReorderingThreshold 721 time_reordering_fraction = infinite 722 loss_time = 0 723 smoothed_rtt = 0 724 rttvar = 0 725 min_rtt = infinite 726 largest_sent_before_rto = 0 727 time_of_last_sent_retransmittable_packet = 0 728 time_of_last_sent_handshake_packet = 0 729 largest_sent_packet = 0 731 4.5.4. On Sending a Packet 733 After any packet is sent, be it a new transmission or a rebundled 734 transmission, the following OnPacketSent function is called. The 735 parameters to OnPacketSent are as follows: 737 o packet_number: The packet number of the sent packet. 739 o ack_only: A boolean that indicates whether a packet contains only 740 ACK or PADDING frame(s). If true, it is still expected an ack 741 will be received for this packet, but it is not retransmittable. 743 o in_flight: A boolean that indicates whether the packet counts 744 towards bytes in flight. 746 o is_handshake_packet: A boolean that indicates whether the packet 747 contains cryptographic handshake messages critical to the 748 completion of the QUIC handshake. In this version of QUIC, this 749 includes any packet with the long header that includes a CRYPTO 750 frame. 752 o sent_bytes: The number of bytes sent in the packet, not including 753 UDP or IP overhead, but including QUIC framing overhead. 755 Pseudocode for OnPacketSent follows: 757 OnPacketSent(packet_number, ack_only, in_flight, 758 is_handshake_packet, sent_bytes): 759 largest_sent_packet = packet_number 760 sent_packets[packet_number].packet_number = packet_number 761 sent_packets[packet_number].time = now 762 sent_packets[packet_number].ack_only = ack_only 763 sent_packets[packet_number].in_flight = in_flight 764 if !ack_only: 765 if is_handshake_packet: 766 time_of_last_sent_handshake_packet = now 767 time_of_last_sent_retransmittable_packet = now 768 OnPacketSentCC(sent_bytes) 769 sent_packets[packet_number].bytes = sent_bytes 770 SetLossDetectionTimer() 772 4.5.5. On Receiving an Acknowledgment 774 When an ACK frame is received, it may newly acknowledge any number of 775 packets. 777 Pseudocode for OnAckReceived and UpdateRtt follow: 779 OnAckReceived(ack): 780 largest_acked_packet = ack.largest_acked 781 // If the largest acknowledged is newly acked, 782 // update the RTT. 783 if (sent_packets[ack.largest_acked]): 784 latest_rtt = now - sent_packets[ack.largest_acked].time 785 UpdateRtt(latest_rtt, ack.ack_delay) 787 // Find all newly acked packets in this ACK frame 788 newly_acked_packets = DetermineNewlyAckedPackets(ack) 789 for acked_packet in newly_acked_packets: 790 OnPacketAcked(acked_packet.packet_number) 792 if !newly_acked_packets.empty(): 793 // Find the smallest newly acknowledged packet 794 smallest_newly_acked = 795 FindSmallestNewlyAcked(newly_acked_packets) 796 // If any packets sent prior to RTO were acked, then the 797 // RTO was spurious. Otherwise, inform congestion control. 798 if (rto_count > 0 && 799 smallest_newly_acked > largest_sent_before_rto): 800 OnRetransmissionTimeoutVerified(smallest_newly_acked) 801 handshake_count = 0 802 tlp_count = 0 803 rto_count = 0 805 DetectLostPackets(ack.largest_acked_packet) 806 SetLossDetectionTimer() 808 // Process ECN information if present. 809 if (ACK frame contains ECN information): 810 ProcessECN(ack) 812 UpdateRtt(latest_rtt, ack_delay): 813 // min_rtt ignores ack delay. 814 min_rtt = min(min_rtt, latest_rtt) 815 // Adjust for ack delay if it's plausible. 816 if (latest_rtt - min_rtt > ack_delay): 817 latest_rtt -= ack_delay 818 // Based on {{RFC6298}}. 819 if (smoothed_rtt == 0): 820 smoothed_rtt = latest_rtt 821 rttvar = latest_rtt / 2 822 else: 823 rttvar_sample = abs(smoothed_rtt - latest_rtt) 824 rttvar = 3/4 * rttvar + 1/4 * rttvar_sample 825 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt 827 4.5.6. On Packet Acknowledgment 829 When a packet is acked for the first time, the following 830 OnPacketAcked function is called. Note that a single ACK frame may 831 newly acknowledge several packets. OnPacketAcked must be called once 832 for each of these newly acked packets. 834 OnPacketAcked takes one parameter, acked_packet, which is the struct 835 of the newly acked packet. 837 If this is the first acknowledgement following RTO, check if the 838 smallest newly acknowledged packet is one sent by the RTO, and if so, 839 inform congestion control of a verified RTO, similar to F-RTO 840 [RFC5682]. 842 Pseudocode for OnPacketAcked follows: 844 OnPacketAcked(acked_packet): 845 if (!acked_packet.is_ack_only): 846 OnPacketAckedCC(acked_packet) 847 sent_packets.remove(acked_packet.packet_number) 849 4.5.7. Setting the Loss Detection Timer 851 QUIC loss detection uses a single timer for all timer-based loss 852 detection. The duration of the timer is based on the timer's mode, 853 which is set in the packet and timer events further below. The 854 function SetLossDetectionTimer defined below shows how the single 855 timer is set. 857 4.5.7.1. Handshake Timer 859 When a connection has unacknowledged handshake data, the handshake 860 timer is set and when it expires, all unacknowledgedd handshake data 861 is retransmitted. 863 When stateless rejects are in use, the connection is considered 864 immediately closed once a reject is sent, so no timer is set to 865 retransmit the reject. 867 Version negotiation packets are always stateless, and MUST be sent 868 once per handshake packet that uses an unsupported QUIC version, and 869 MAY be sent in response to 0-RTT packets. 871 4.5.7.2. Tail Loss Probe and Retransmission Timer 873 Tail loss probes [TLP] and retransmission timeouts [RFC6298] are 874 timer based mechanisms to recover from cases when there are 875 outstanding retransmittable packets, but an acknowledgement has not 876 been received in a timely manner. 878 The TLP and RTO timers are armed when there is no unacknowledged 879 handshake data. The TLP timer is set until the max number of TLP 880 packets have been sent, and then the RTO timer is set. 882 4.5.7.3. Early Retransmit Timer 884 Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It 885 is part of QUIC's time based loss detection, but is always enabled, 886 even when only packet reordering loss detection is enabled. 888 4.5.7.4. Pseudocode 890 Pseudocode for SetLossDetectionTimer follows: 892 SetLossDetectionTimer(): 893 // Don't arm timer if there are no retransmittable packets 894 // in flight. 895 if (bytes_in_flight == 0): 896 loss_detection_timer.cancel() 897 return 899 if (handshake packets are outstanding): 900 // Handshake retransmission timer. 901 if (smoothed_rtt == 0): 902 timeout = 2 * kInitialRtt 903 else: 904 timeout = 2 * smoothed_rtt 905 timeout = max(timeout, kMinTLPTimeout) 906 timeout = timeout * (2 ^ handshake_count) 907 loss_detection_timer.set( 908 time_of_last_sent_handshake_packet + timeout) 909 return; 910 else if (loss_time != 0): 911 // Early retransmit timer or time loss detection. 912 timeout = loss_time - 913 time_of_last_sent_retransmittable_packet 914 else: 915 // RTO or TLP timer 916 // Calculate RTO duration 917 timeout = 918 smoothed_rtt + 4 * rttvar + max_ack_delay 919 timeout = max(timeout, kMinRTOTimeout) 920 timeout = timeout * (2 ^ rto_count) 921 if (tlp_count < kMaxTLPs): 922 // Tail Loss Probe 923 tlp_timeout = max(1.5 * smoothed_rtt 924 + max_ack_delay, kMinTLPTimeout) 925 timeout = min(tlp_timeout, timeout) 927 loss_detection_timer.set( 928 time_of_last_sent_retransmittable_packet + timeout) 930 4.5.8. On Timeout 932 QUIC uses one loss recovery timer, which when set, can be in one of 933 several modes. When the timer expires, the mode determines the 934 action to be performed. 936 Pseudocode for OnLossDetectionTimeout follows: 938 OnLossDetectionTimeout(): 939 if (handshake packets are outstanding): 940 // Handshake timeout. 941 RetransmitAllUnackedHandshakeData() 942 handshake_count++ 943 else if (loss_time != 0): 944 // Early retransmit or Time Loss Detection 945 DetectLostPackets(largest_acked_packet) 946 else if (tlp_count < kMaxTLPs): 947 // Tail Loss Probe. 948 SendOnePacket() 949 tlp_count++ 950 else: 951 // RTO. 952 if (rto_count == 0) 953 largest_sent_before_rto = largest_sent_packet 954 SendTwoPackets() 955 rto_count++ 957 SetLossDetectionTimer() 959 4.5.9. Detecting Lost Packets 961 Packets in QUIC are only considered lost once a larger packet number 962 in the same packet number space is acknowledged. DetectLostPackets 963 is called every time an ack is received and operates on the 964 sent_packets for that packet number space. If the loss detection 965 timer expires and the loss_time is set, the previous largest acked 966 packet is supplied. 968 4.5.9.1. Pseudocode 970 DetectLostPackets takes one parameter, acked, which is the largest 971 acked packet. 973 Pseudocode for DetectLostPackets follows: 975 DetectLostPackets(largest_acked): 976 loss_time = 0 977 lost_packets = {} 978 delay_until_lost = infinite 979 if (kUsingTimeLossDetection): 980 delay_until_lost = 981 (1 + time_reordering_fraction) * 982 max(latest_rtt, smoothed_rtt) 983 else if (largest_acked.packet_number == largest_sent_packet): 984 // Early retransmit timer. 985 delay_until_lost = 9/8 * max(latest_rtt, smoothed_rtt) 986 foreach (unacked < largest_acked.packet_number): 987 time_since_sent = now() - unacked.time_sent 988 delta = largest_acked.packet_number - unacked.packet_number 989 if (time_since_sent > delay_until_lost || 990 delta > reordering_threshold): 991 sent_packets.remove(unacked.packet_number) 992 if (!unacked.is_ack_only): 993 lost_packets.insert(unacked) 994 else if (loss_time == 0 && delay_until_lost != infinite): 995 loss_time = now() + delay_until_lost - time_since_sent 997 // Inform the congestion controller of lost packets and 998 // lets it decide whether to retransmit immediately. 999 if (!lost_packets.empty()): 1000 OnPacketsLost(lost_packets) 1002 4.6. Discussion 1004 The majority of constants were derived from best common practices 1005 among widely deployed TCP implementations on the internet. 1006 Exceptions follow. 1008 A shorter delayed ack time of 25ms was chosen because longer delayed 1009 acks can delay loss recovery and for the small number of connections 1010 where less than packet per 25ms is delivered, acking every packet is 1011 beneficial to congestion control and loss recovery. 1013 The default initial RTT of 100ms was chosen because it is slightly 1014 higher than both the median and mean min_rtt typically observed on 1015 the public internet. 1017 5. Congestion Control 1019 QUIC's congestion control is based on TCP NewReno [RFC6582]. NewReno 1020 is a congestion window based congestion control. QUIC specifies the 1021 congestion window in bytes rather than packets due to finer control 1022 and the ease of appropriate byte counting [RFC3465]. 1024 QUIC hosts MUST NOT send packets if they would increase 1025 bytes_in_flight (defined in Section 5.8.2) beyond the available 1026 congestion window, unless the packet is a probe packet sent after the 1027 TLP or RTO timer expires, as described in Section 4.3.2 and 1028 Section 4.3.3. 1030 Implementations MAY use other congestion control algorithms, and 1031 endpoints MAY use different algorithms from one another. The signals 1032 QUIC provides for congestion control are generic and are designed to 1033 support different algorithms. 1035 5.1. Explicit Congestion Notification 1037 If a path has been verified to support ECN, QUIC treats a Congestion 1038 Experienced codepoint in the IP header as a signal of congestion. 1039 This document specifies an endpoint's response when its peer receives 1040 packets with the Congestion Experienced codepoint. As discussed in 1041 [RFC8311], endpoints are permitted to experiment with other response 1042 functions. 1044 5.2. Slow Start 1046 QUIC begins every connection in slow start and exits slow start upon 1047 loss or upon increase in the ECN-CE counter. QUIC re-enters slow 1048 start anytime the congestion window is less than ssthresh, which 1049 typically only occurs after an RTO. While in slow start, QUIC 1050 increases the congestion window by the number of bytes acknowledged 1051 when each ack is processed. 1053 5.3. Congestion Avoidance 1055 Slow start exits to congestion avoidance. Congestion avoidance in 1056 NewReno uses an additive increase multiplicative decrease (AIMD) 1057 approach that increases the congestion window by one maximum packet 1058 size per congestion window acknowledged. When a loss is detected, 1059 NewReno halves the congestion window and sets the slow start 1060 threshold to the new congestion window. 1062 5.4. Recovery Period 1064 Recovery is a period of time beginning with detection of a lost 1065 packet or an increase in the ECN-CE counter. Because QUIC 1066 retransmits stream data and control frames, not packets, it defines 1067 the end of recovery as a packet sent after the start of recovery 1068 being acknowledged. This is slightly different from TCP's definition 1069 of recovery, which ends when the lost packet that started recovery is 1070 acknowledged. 1072 The recovery period limits congestion window reduction to once per 1073 round trip. During recovery, the congestion window remains unchanged 1074 irrespective of new losses or increases in the ECN-CE counter. 1076 5.5. Tail Loss Probe 1078 A TLP packet MUST NOT be blocked by the sender's congestion 1079 controller. The sender MUST however count these bytes as additional 1080 bytes-in-flight, since a TLP adds network load without establishing 1081 packet loss. 1083 Acknowledgement or loss of tail loss probes are treated like any 1084 other packet. 1086 5.6. Retransmission Timeout 1088 When retransmissions are sent due to a retransmission timeout timer, 1089 no change is made to the congestion window until the next 1090 acknowledgement arrives. The retransmission timeout is considered 1091 spurious when this acknowledgement acknowledges packets sent prior to 1092 the first retransmission timeout. The retransmission timeout is 1093 considered valid when this acknowledgement acknowledges no packets 1094 sent prior to the first retransmission timeout. In this case, the 1095 congestion window MUST be reduced to the minimum congestion window 1096 and slow start is re-entered. 1098 5.7. Pacing 1100 This document does not specify a pacer, but it is RECOMMENDED that a 1101 sender pace sending of all in-flight packets based on input from the 1102 congestion controller. For example, a pacer might distribute the 1103 congestion window over the SRTT when used with a window-based 1104 controller, and a pacer might use the rate estimate of a rate-based 1105 controller. 1107 An implementation should take care to architect its congestion 1108 controller to work well with a pacer. For instance, a pacer might 1109 wrap the congestion controller and control the availability of the 1110 congestion window, or a pacer might pace out packets handed to it by 1111 the congestion controller. Timely delivery of ACK frames is 1112 important for efficient loss recovery. Packets containing only ACK 1113 frames should therefore not be paced, to avoid delaying their 1114 delivery to the peer. 1116 As an example of a well-known and publicly available implementation 1117 of a flow pacer, implementers are referred to the Fair Queue packet 1118 scheduler (fq qdisc) in Linux (3.11 onwards). 1120 5.8. Pseudocode 1122 5.8.1. Constants of interest 1124 Constants used in congestion control are based on a combination of 1125 RFCs, papers, and common practice. Some may need to be changed or 1126 negotiated in order to better suit a variety of environments. 1128 kMaxDatagramSize: The sender's maximum payload size. Does not 1129 include UDP or IP overhead. The max packet size is used for 1130 calculating initial and minimum congestion windows. The 1131 RECOMMENDED value is 1200 bytes. 1133 kInitialWindow: Default limit on the initial amount of outstanding 1134 data in bytes. Taken from [RFC6928]. The RECOMMENDED value is 1135 the minimum of 10 * kMaxDatagramSize and max(2* kMaxDatagramSize, 1136 14600)). 1138 kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED 1139 value is 2 * kMaxDatagramSize. 1141 kLossReductionFactor: Reduction in congestion window when a new loss 1142 event is detected. The RECOMMENDED value is 0.5. 1144 5.8.2. Variables of interest 1146 Variables required to implement the congestion control mechanisms are 1147 described in this section. 1149 ecn_ce_counter: The highest value reported for the ECN-CE counter by 1150 the peer in an ACK frame. This variable is used to detect 1151 increases in the reported ECN-CE counter. 1153 bytes_in_flight: The sum of the size in bytes of all sent packets 1154 that contain at least one retransmittable or PADDING frame, and 1155 have not been acked or declared lost. The size does not include 1156 IP or UDP overhead, but does include the QUIC header and AEAD 1157 overhead. Packets only containing ACK frames do not count towards 1158 bytes_in_flight to ensure congestion control does not impede 1159 congestion feedback. 1161 congestion_window: Maximum number of bytes-in-flight that may be 1162 sent. 1164 end_of_recovery: The largest packet number sent when QUIC detects a 1165 loss. When a larger packet is acknowledged, QUIC exits recovery. 1167 ssthresh: Slow start threshold in bytes. When the congestion window 1168 is below ssthresh, the mode is slow start and the window grows by 1169 the number of bytes acknowledged. 1171 5.8.3. Initialization 1173 At the beginning of the connection, initialize the congestion control 1174 variables as follows: 1176 congestion_window = kInitialWindow 1177 bytes_in_flight = 0 1178 end_of_recovery = 0 1179 ssthresh = infinite 1180 ecn_ce_counter = 0 1182 5.8.4. On Packet Sent 1184 Whenever a packet is sent, and it contains non-ACK frames, the packet 1185 increases bytes_in_flight. 1187 OnPacketSentCC(bytes_sent): 1188 bytes_in_flight += bytes_sent 1190 5.8.5. On Packet Acknowledgement 1192 Invoked from loss detection's OnPacketAcked and is supplied with 1193 acked_packet from sent_packets. 1195 InRecovery(packet_number): 1196 return packet_number <= end_of_recovery 1198 OnPacketAckedCC(acked_packet): 1199 // Remove from bytes_in_flight. 1200 bytes_in_flight -= acked_packet.bytes 1201 if (InRecovery(acked_packet.packet_number)): 1202 // Do not increase congestion window in recovery period. 1203 return 1204 if (congestion_window < ssthresh): 1205 // Slow start. 1206 congestion_window += acked_packet.bytes 1207 else: 1208 // Congestion avoidance. 1209 congestion_window += kMaxDatagramSize * acked_packet.bytes 1210 / congestion_window 1212 5.8.6. On New Congestion Event 1214 Invoked from ProcessECN and OnPacketsLost when a new congestion event 1215 is detected. Starts a new recovery period and reduces the congestion 1216 window. 1218 CongestionEvent(packet_number): 1219 // Start a new congestion event if packet_number 1220 // is larger than the end of the previous recovery epoch. 1221 if (!InRecovery(packet_number)): 1222 end_of_recovery = largest_sent_packet 1223 congestion_window *= kLossReductionFactor 1224 congestion_window = max(congestion_window, kMinimumWindow) 1225 ssthresh = congestion_window 1227 5.8.7. Process ECN Information 1229 Invoked when an ACK frame with an ECN section is received from the 1230 peer. 1232 ProcessECN(ack): 1233 // If the ECN-CE counter reported by the peer has increased, 1234 // this could be a new congestion event. 1235 if (ack.ce_counter > ecn_ce_counter): 1236 ecn_ce_counter = ack.ce_counter 1237 // Start a new congestion event if the last acknowledged 1238 // packet is past the end of the previous recovery epoch. 1239 CongestionEvent(ack.largest_acked_packet) 1241 5.8.8. On Packets Lost 1243 Invoked by loss detection from DetectLostPackets when new packets are 1244 detected lost. 1246 OnPacketsLost(lost_packets): 1247 // Remove lost packets from bytes_in_flight. 1248 for (lost_packet : lost_packets): 1249 bytes_in_flight -= lost_packet.bytes 1250 largest_lost_packet = lost_packets.last() 1252 // Start a new congestion epoch if the last lost packet 1253 // is past the end of the previous recovery epoch. 1254 CongestionEvent(largest_lost_packet.packet_number) 1256 5.8.9. On Retransmission Timeout Verified 1258 QUIC decreases the congestion window to the minimum value once the 1259 retransmission timeout has been verified and removes any packets sent 1260 before the newly acknowledged RTO packet. 1262 OnRetransmissionTimeoutVerified(packet_number) 1263 congestion_window = kMinimumWindow 1264 // Declare all packets prior to packet_number lost. 1265 for (sent_packet: sent_packets): 1266 if (sent_packet.packet_number < packet_number): 1267 bytes_in_flight -= lost_packet.bytes 1268 sent_packets.remove(sent_packet.packet_number) 1270 6. Security Considerations 1272 6.1. Congestion Signals 1274 Congestion control fundamentally involves the consumption of signals 1275 - both loss and ECN codepoints - from unauthenticated entities. On- 1276 path attackers can spoof or alter these signals. An attacker can 1277 cause endpoints to reduce their sending rate by dropping packets, or 1278 alter send rate by changing ECN codepoints. 1280 6.2. Traffic Analysis 1282 Packets that carry only ACK frames can be heuristically identified by 1283 observing packet size. Acknowledgement patterns may expose 1284 information about link characteristics or application behavior. 1285 Endpoints can use PADDING frames or bundle acknowledgments with other 1286 frames to reduce leaked information. 1288 6.3. Misreporting ECN Markings 1290 A receiver can misreport ECN markings to alter the congestion 1291 response of a sender. Suppressing reports of ECN-CE markings could 1292 cause a sender to increase their send rate. This increase could 1293 result in congestion and loss. 1295 A sender MAY attempt to detect suppression of reports by marking 1296 occasional packets that they send with ECN-CE. If a packet marked 1297 with ECN-CE is not reported as having been marked when the packet is 1298 acknowledged, the sender SHOULD then disable ECN for that path. 1300 Reporting additional ECN-CE markings will cause a sender to reduce 1301 their sending rate, which is similar in effect to advertising reduced 1302 connection flow control limits and so no advantage is gained by doing 1303 so. 1305 Endpoints choose the congestion controller that they use. Though 1306 congestion controllers generally treat reports of ECN-CE markings as 1307 equivalent to loss [RFC8311], the exact response for each controller 1308 could be different. Failure to correctly respond to information 1309 about ECN markings is therefore difficult to detect. 1311 7. IANA Considerations 1313 This document has no IANA actions. Yet. 1315 8. References 1317 8.1. Normative References 1319 [QUIC-TRANSPORT] 1320 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1321 Multiplexed and Secure Transport", draft-ietf-quic- 1322 transport-15 (work in progress), October 2018. 1324 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1325 Requirement Levels", BCP 14, RFC 2119, 1326 DOI 10.17487/RFC2119, March 1997, 1327 . 1329 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1330 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1331 May 2017, . 1333 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1334 Notification (ECN) Experimentation", RFC 8311, 1335 DOI 10.17487/RFC8311, January 2018, 1336 . 1338 8.2. Informative References 1340 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 1341 Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February 1342 2003, . 1344 [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, 1345 "Improving the Robustness of TCP to Non-Congestion 1346 Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, 1347 . 1349 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1350 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1351 . 1353 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1354 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1355 Spurious Retransmission Timeouts with TCP", RFC 5682, 1356 DOI 10.17487/RFC5682, September 2009, 1357 . 1359 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 1360 P. Hurtig, "Early Retransmit for TCP and Stream Control 1361 Transmission Protocol (SCTP)", RFC 5827, 1362 DOI 10.17487/RFC5827, May 2010, 1363 . 1365 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1366 "Computing TCP's Retransmission Timer", RFC 6298, 1367 DOI 10.17487/RFC6298, June 2011, 1368 . 1370 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 1371 NewReno Modification to TCP's Fast Recovery Algorithm", 1372 RFC 6582, DOI 10.17487/RFC6582, April 2012, 1373 . 1375 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1376 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1377 Based on Selective Acknowledgment (SACK) for TCP", 1378 RFC 6675, DOI 10.17487/RFC6675, August 2012, 1379 . 1381 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 1382 "Increasing TCP's Initial Window", RFC 6928, 1383 DOI 10.17487/RFC6928, April 2013, 1384 . 1386 [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 1387 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 1388 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 1389 in progress), February 2013. 1391 8.3. URIs 1393 [1] https://mailarchive.ietf.org/arch/search/?email_list=quic 1395 [2] https://github.com/quicwg 1397 [3] https://github.com/quicwg/base-drafts/labels/-recovery 1399 Appendix A. Change Log 1401 *RFC Editor's Note:* Please remove this section prior to 1402 publication of a final version of this document. 1404 A.1. Since draft-ietf-quic-recovery-14 1406 o Used max_ack_delay from transport params (#1796, #1782) 1408 o Merge ACK and ACK_ECN (#1783) 1410 A.2. Since draft-ietf-quic-recovery-13 1412 o Corrected the lack of ssthresh reduction in CongestionEvent 1413 pseudocode (#1598) 1415 o Considerations for ECN spoofing (#1426, #1626) 1417 o Clarifications for PADDING and congestion control (#837, #838, 1418 #1517, #1531, #1540) 1420 o Reduce early retransmission timer to RTT/8 (#945, #1581) 1422 o Packets are declared lost after an RTO is verified (#935, #1582) 1424 A.3. Since draft-ietf-quic-recovery-12 1426 o Changes to manage separate packet number spaces and encryption 1427 levels (#1190, #1242, #1413, #1450) 1429 o Added ECN feedback mechanisms and handling; new ACK_ECN frame 1430 (#804, #805, #1372) 1432 A.4. Since draft-ietf-quic-recovery-11 1434 No significant changes. 1436 A.5. Since draft-ietf-quic-recovery-10 1438 o Improved text on ack generation (#1139, #1159) 1440 o Make references to TCP recovery mechanisms informational (#1195) 1442 o Define time_of_last_sent_handshake_packet (#1171) 1444 o Added signal from TLS the data it includes needs to be sent in a 1445 Retry packet (#1061, #1199) 1447 o Minimum RTT (min_rtt) is initialized with an infinite value 1448 (#1169) 1450 A.6. Since draft-ietf-quic-recovery-09 1452 No significant changes. 1454 A.7. Since draft-ietf-quic-recovery-08 1456 o Clarified pacing and RTO (#967, #977) 1458 A.8. Since draft-ietf-quic-recovery-07 1460 o Include Ack Delay in RTO(and TLP) computations (#981) 1462 o Ack Delay in SRTT computation (#961) 1464 o Default RTT and Slow Start (#590) 1466 o Many editorial fixes. 1468 A.9. Since draft-ietf-quic-recovery-06 1470 No significant changes. 1472 A.10. Since draft-ietf-quic-recovery-05 1474 o Add more congestion control text (#776) 1476 A.11. Since draft-ietf-quic-recovery-04 1478 No significant changes. 1480 A.12. Since draft-ietf-quic-recovery-03 1482 No significant changes. 1484 A.13. Since draft-ietf-quic-recovery-02 1486 o Integrate F-RTO (#544, #409) 1488 o Add congestion control (#545, #395) 1490 o Require connection abort if a skipped packet was acknowledged 1491 (#415) 1493 o Simplify RTO calculations (#142, #417) 1495 A.14. Since draft-ietf-quic-recovery-01 1497 o Overview added to loss detection 1499 o Changes initial default RTT to 100ms 1501 o Added time-based loss detection and fixes early retransmit 1503 o Clarified loss recovery for handshake packets 1505 o Fixed references and made TCP references informative 1507 A.15. Since draft-ietf-quic-recovery-00 1509 o Improved description of constants and ACK behavior 1511 A.16. Since draft-iyengar-quic-loss-recovery-01 1513 o Adopted as base for draft-ietf-quic-recovery 1515 o Updated authors/editors list 1517 o Added table of contents 1519 Acknowledgments 1521 Authors' Addresses 1523 Jana Iyengar (editor) 1524 Fastly 1526 Email: jri.ietf@gmail.com 1528 Ian Swett (editor) 1529 Google 1531 Email: ianswett@google.com