idnits 2.17.1 draft-ietf-quic-recovery-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The abstract seems to contain references ([2], [3], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 28, 2018) is 2128 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1300 -- Looks like a reference, but probably isn't: '2' on line 1302 -- Looks like a reference, but probably isn't: '3' on line 1304 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-13 Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft Fastly 4 Intended status: Standards Track I. Swett, Ed. 5 Expires: December 30, 2018 Google 6 June 28, 2018 8 QUIC Loss Detection and Congestion Control 9 draft-ietf-quic-recovery-13 11 Abstract 13 This document describes loss detection and congestion control 14 mechanisms for QUIC. 16 Note to Readers 18 Discussion of this draft takes place on the QUIC working group 19 mailing list (quic@ietf.org), which is archived at 20 https://mailarchive.ietf.org/arch/search/?email_list=quic [1]. 22 Working Group information can be found at https://github.com/quicwg 23 [2]; source code and issues list for this draft can be found at 24 https://github.com/quicwg/base-drafts/labels/-recovery [3]. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on December 30, 2018. 43 Copyright Notice 45 Copyright (c) 2018 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 4 62 2. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 63 2.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 64 2.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 5 65 2.1.2. Monotonically Increasing Packet Numbers . . . . . . . 5 66 2.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6 67 2.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 6 68 2.1.5. Explicit Correction For Delayed ACKs . . . . . . . . 6 69 3. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 6 70 3.1. Computing the RTT estimate . . . . . . . . . . . . . . . 6 71 3.2. Ack-based Detection . . . . . . . . . . . . . . . . . . . 7 72 3.2.1. Fast Retransmit . . . . . . . . . . . . . . . . . . . 7 73 3.2.2. Early Retransmit . . . . . . . . . . . . . . . . . . 7 74 3.3. Timer-based Detection . . . . . . . . . . . . . . . . . . 8 75 3.3.1. Crypto Handshake Timeout . . . . . . . . . . . . . . 8 76 3.3.2. Tail Loss Probe . . . . . . . . . . . . . . . . . . . 9 77 3.3.3. Retransmission Timeout . . . . . . . . . . . . . . . 10 78 3.4. Generating Acknowledgements . . . . . . . . . . . . . . . 12 79 3.4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . 12 80 3.4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . 12 81 3.4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . 13 82 3.5. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 13 83 3.5.1. Constants of interest . . . . . . . . . . . . . . . . 13 84 3.5.2. Variables of interest . . . . . . . . . . . . . . . . 14 85 3.5.3. Initialization . . . . . . . . . . . . . . . . . . . 15 86 3.5.4. On Sending a Packet . . . . . . . . . . . . . . . . . 16 87 3.5.5. On Receiving an Acknowledgment . . . . . . . . . . . 17 88 3.5.6. On Packet Acknowledgment . . . . . . . . . . . . . . 18 89 3.5.7. Setting the Loss Detection Alarm . . . . . . . . . . 19 90 3.5.8. On Alarm Firing . . . . . . . . . . . . . . . . . . . 21 91 3.5.9. Detecting Lost Packets . . . . . . . . . . . . . . . 22 92 3.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 23 93 4. Congestion Control . . . . . . . . . . . . . . . . . . . . . 23 94 4.1. Explicit Congestion Notification . . . . . . . . . . . . 24 95 4.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 24 96 4.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 24 97 4.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 24 98 4.5. Tail Loss Probe . . . . . . . . . . . . . . . . . . . . . 25 99 4.6. Retransmission Timeout . . . . . . . . . . . . . . . . . 25 100 4.7. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 25 101 4.8. Pseudocode . . . . . . . . . . . . . . . . . . . . . . . 26 102 4.8.1. Constants of interest . . . . . . . . . . . . . . . . 26 103 4.8.2. Variables of interest . . . . . . . . . . . . . . . . 26 104 4.8.3. Initialization . . . . . . . . . . . . . . . . . . . 27 105 4.8.4. On Packet Sent . . . . . . . . . . . . . . . . . . . 27 106 4.8.5. On Packet Acknowledgement . . . . . . . . . . . . . . 27 107 4.8.6. On New Congestion Event . . . . . . . . . . . . . . . 27 108 4.8.7. Process ECN Information . . . . . . . . . . . . . . . 28 109 4.8.8. On Packets Lost . . . . . . . . . . . . . . . . . . . 28 110 4.8.9. On Retransmission Timeout Verified . . . . . . . . . 28 111 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 112 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 113 6.1. Normative References . . . . . . . . . . . . . . . . . . 29 114 6.2. Informative References . . . . . . . . . . . . . . . . . 29 115 6.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 30 116 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 30 117 A.1. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 30 118 A.2. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 31 119 A.3. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 31 120 A.4. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 31 121 A.5. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 31 122 A.6. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 31 123 A.7. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 31 124 A.8. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 31 125 A.9. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 32 126 A.10. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 32 127 A.11. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 32 128 A.12. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 32 129 A.13. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 32 130 A.14. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 32 131 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 32 132 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 134 1. Introduction 136 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 137 on decades of transport and security experience, and implements 138 mechanisms that make it attractive as a modern general-purpose 139 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 141 QUIC implements the spirit of known TCP loss recovery mechanisms, 142 described in RFCs, various Internet-drafts, and also those prevalent 143 in the Linux TCP implementation. This document describes QUIC 144 congestion control and loss recovery, and where applicable, 145 attributes the TCP equivalent in RFCs, Internet-drafts, academic 146 papers, and/or TCP implementations. 148 1.1. Notational Conventions 150 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 152 "OPTIONAL" in this document are to be interpreted as described in BCP 153 14 [RFC2119] [RFC8174] when, and only when, they appear in all 154 capitals, as shown here. 156 2. Design of the QUIC Transmission Machinery 158 All transmissions in QUIC are sent with a packet-level header, which 159 indicates the encryption level and includes a packet sequence number 160 (referred to below as a packet number). The encryption level 161 indicates the packet number space, as described in [QUIC-TRANSPORT]. 162 Packet numbers never repeat within a packet number space for the 163 lifetime of a connection. Packet numbers monotonically increase 164 within a space, preventing ambiguity. 166 This design obviates the need for disambiguating between 167 transmissions and retransmissions and eliminates significant 168 complexity from QUIC's interpretation of TCP loss detection 169 mechanisms. 171 Every packet may contain several frames. We outline the frames that 172 are important to the loss detection and congestion control machinery 173 below. 175 o Retransmittable frames are those that count towards bytes in 176 flight and need acknowledgement. The most common are STREAM 177 frames, which typically contain application data. 179 o Retransmittable packets are those that contain at least one 180 retransmittable frame. 182 o Cryptographic handshake data is sent in CRYPTO frames, and uses 183 the reliability machinery of QUIC underneath. 185 o ACK and ACK_ECN frames contain acknowledgment information. 186 ACK_ECN frames additionally contain information about ECN 187 codepoints seen by the peer. (The rest of this document uses ACK 188 frames to refer to both ACK and ACK_ECN frames.) 190 2.1. Relevant Differences Between QUIC and TCP 192 Readers familiar with TCP's loss detection and congestion control 193 will find algorithms here that parallel well-known TCP ones. 194 Protocol differences between QUIC and TCP however contribute to 195 algorithmic differences. We briefly describe these protocol 196 differences below. 198 2.1.1. Separate Packet Number Spaces 200 QUIC uses separate packet number spaces for each encryption level, 201 except 0-RTT and all generations of 1-RTT keys use the same packet 202 number space. Separate packet number spaces ensures acknowledgement 203 of packets sent with one level of encryption will not cause spurious 204 retransmission of packets sent with a different encryption level. 205 Congestion control and RTT measurement are unified across packet 206 number spaces. 208 2.1.2. Monotonically Increasing Packet Numbers 210 TCP conflates transmission sequence number at the sender with 211 delivery sequence number at the receiver, which results in 212 retransmissions of the same data carrying the same sequence number, 213 and consequently to problems caused by "retransmission ambiguity". 214 QUIC separates the two: QUIC uses a packet number for transmissions, 215 and any data that is to be delivered to the receiving application(s) 216 is sent in one or more streams, with delivery order determined by 217 stream offsets encoded within STREAM frames. 219 QUIC's packet number is strictly increasing, and directly encodes 220 transmission order. A higher QUIC packet number signifies that the 221 packet was sent later, and a lower QUIC packet number signifies that 222 the packet was sent earlier. When a packet containing frames is 223 deemed lost, QUIC rebundles necessary frames in a new packet with a 224 new packet number, removing ambiguity about which packet is 225 acknowledged when an ACK is received. Consequently, more accurate 226 RTT measurements can be made, spurious retransmissions are trivially 227 detected, and mechanisms such as Fast Retransmit can be applied 228 universally, based only on packet number. 230 This design point significantly simplifies loss detection mechanisms 231 for QUIC. Most TCP mechanisms implicitly attempt to infer 232 transmission ordering based on TCP sequence numbers - a non-trivial 233 task, especially when TCP timestamps are not available. 235 2.1.3. No Reneging 237 QUIC ACKs contain information that is similar to TCP SACK, but QUIC 238 does not allow any acked packet to be reneged, greatly simplifying 239 implementations on both sides and reducing memory pressure on the 240 sender. 242 2.1.4. More ACK Ranges 244 QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In 245 high loss environments, this speeds recovery, reduces spurious 246 retransmits, and ensures forward progress without relying on 247 timeouts. 249 2.1.5. Explicit Correction For Delayed ACKs 251 QUIC ACKs explicitly encode the delay incurred at the receiver 252 between when a packet is received and when the corresponding ACK is 253 sent. This allows the receiver of the ACK to adjust for receiver 254 delays, specifically the delayed ack timer, when estimating the path 255 RTT. This mechanism also allows a receiver to measure and report the 256 delay from when a packet was received by the OS kernel, which is 257 useful in receivers which may incur delays such as context-switch 258 latency before a userspace QUIC receiver processes a received packet. 260 3. Loss Detection 262 QUIC senders use both ack information and timeouts to detect lost 263 packets, and this section provides a description of these algorithms. 264 Estimating the network round-trip time (RTT) is critical to these 265 algorithms and is described first. 267 3.1. Computing the RTT estimate 269 RTT is calculated when an ACK frame arrives by computing the 270 difference between the current time and the time the largest newly 271 acked packet was sent. If no packets are newly acknowledged, RTT 272 cannot be calculated. When RTT is calculated, the ack delay field 273 from the ACK frame SHOULD be subtracted from the RTT as long as the 274 result is larger than the Min RTT. If the result is smaller than the 275 min_rtt, the RTT should be used, but the ack delay field should be 276 ignored. 278 Like TCP, QUIC calculates both smoothed RTT and RTT variance similar 279 to those specified in [RFC6298]. 281 Min RTT is the minimum RTT measured over the connection, prior to 282 adjusting by ack delay. Ignoring ack delay for min RTT prevents 283 intentional or unintentional underestimation of min RTT, which in 284 turn prevents underestimating smoothed RTT. 286 3.2. Ack-based Detection 288 Ack-based loss detection implements the spirit of TCP's Fast 289 Retransmit [RFC5681], Early Retransmit [RFC5827], FACK, and SACK loss 290 recovery [RFC6675]. This section provides an overview of how these 291 algorithms are implemented in QUIC. 293 3.2.1. Fast Retransmit 295 An unacknowledged packet is marked as lost when an acknowledgment is 296 received for a packet that was sent a threshold number of packets 297 (kReorderingThreshold) after the unacknowledged packet. Receipt of 298 the ack indicates that a later packet was received, while 299 kReorderingThreshold provides some tolerance for reordering of 300 packets in the network. 302 The RECOMMENDED initial value for kReorderingThreshold is 3. 304 We derive this recommendation from TCP loss recovery [RFC5681] 305 [RFC6675]. It is possible for networks to exhibit higher degrees of 306 reordering, causing a sender to detect spurious losses. Detecting 307 spurious losses leads to unnecessary retransmissions and may result 308 in degraded performance due to the actions of the congestion 309 controller upon detecting loss. Implementers MAY use algorithms 310 developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's 311 reordering resilience, though care should be taken to map TCP 312 specifics to QUIC correctly. Similarly, using time-based loss 313 detection to deal with reordering, such as in PR-TCP, should be more 314 readily usable in QUIC. Making QUIC deal with such networks is 315 important open research, and implementers are encouraged to explore 316 this space. 318 3.2.2. Early Retransmit 320 Unacknowledged packets close to the tail may have fewer than 321 kReorderingThreshold retransmittable packets sent after them. Loss 322 of such packets cannot be detected via Fast Retransmit. To enable 323 ack-based loss detection of such packets, receipt of an 324 acknowledgment for the last outstanding retransmittable packet 325 triggers the Early Retransmit process, as follows. 327 If there are unacknowledged retransmittable packets still pending, 328 they should be marked as lost. To compensate for the reduced 329 reordering resilience, the sender SHOULD set an alarm for a small 330 period of time. If the unacknowledged retransmittable packets are 331 not acknowledged during this time, then these packets MUST be marked 332 as lost. 334 An endpoint SHOULD set the alarm such that a packet is marked as lost 335 no earlier than 1.25 * max(SRTT, latest_RTT) since when it was sent. 337 Using max(SRTT, latest_RTT) protects from the two following cases: 339 o the latest RTT sample is lower than the SRTT, perhaps due to 340 reordering where packet whose ack triggered the Early Retransit 341 process encountered a shorter path; 343 o the latest RTT sample is higher than the SRTT, perhaps due to a 344 sustained increase in the actual RTT, but the smoothed SRTT has 345 not yet caught up. 347 The 1.25 multiplier increases reordering resilience. Implementers 348 MAY experiment with using other multipliers, bearing in mind that a 349 lower multiplier reduces reordering resilience and increases spurious 350 retransmissions, and a higher multipler increases loss recovery 351 delay. 353 This mechanism is based on Early Retransmit for TCP [RFC5827]. 354 However, [RFC5827] does not include the alarm described above. Early 355 Retransmit is prone to spurious retransmissions due to its reduced 356 reordering resilence without the alarm. This observation led Linux 357 TCP implementers to implement an alarm for TCP as well, and this 358 document incorporates this advancement. 360 3.3. Timer-based Detection 362 Timer-based loss detection implements a handshake retransmission 363 timer that is optimized for QUIC as well as the spirit of TCP's Tail 364 Loss Probe and Retransmission Timeout mechanisms. 366 3.3.1. Crypto Handshake Timeout 368 Data in CRYPTO frames is critical to QUIC transport and crypto 369 negotiation, so a more aggressive timeout is used to retransmit it. 370 Below, the term "handshake packet" is used to refer to packets 371 containing CRYPTO frames, not packets with the specific long header 372 packet type Handshake. 374 The initial handshake timeout SHOULD be set to twice the initial RTT. 376 At the beginning, there are no prior RTT samples within a connection. 377 Resumed connections over the same network SHOULD use the previous 378 connection's final smoothed RTT value as the resumed connection's 379 initial RTT. 381 If no previous RTT is available, or if the network changes, the 382 initial RTT SHOULD be set to 100ms. 384 When CRYPTO frames are sent, the sender SHOULD set an alarm for the 385 handshake timeout period. When the alarm fires, the sender MUST 386 retransmit all unacknowledged CRYPTO data by calling 387 RetransmitAllUnackedHandshakeData(). On each consecutive firing of 388 the handshake alarm without receiving an acknowledgement for a new 389 packet, the sender SHOULD double the handshake timeout and set an 390 alarm for this period. 392 When CRYPTO frames are outstanding, the TLP and RTO timers are not 393 active unless the CRYPTO frames were sent at 1RTT encryption. 395 When an acknowledgement is received for a handshake packet, the new 396 RTT is computed and the alarm SHOULD be set for twice the newly 397 computed smoothed RTT. 399 3.3.1.1. Retry 401 A Retry packet causes the content of the client's Initial packet to 402 be immediately retransmitted along with the token present in the 403 Retry. 405 The Retry indicates that the Initial was received but not processed. 406 It MUST NOT be treated as an acknowledgment for the Initial, but it 407 MAY be used for an RTT measurement. 409 3.3.2. Tail Loss Probe 411 The algorithm described in this section is an adaptation of the Tail 412 Loss Probe algorithm proposed for TCP [TLP]. 414 A packet sent at the tail is particularly vulnerable to slow loss 415 detection, since acks of subsequent packets are needed to trigger 416 ack-based detection. To ameliorate this weakness of tail packets, 417 the sender schedules an alarm when the last retransmittable packet 418 before quiescence is transmitted. When this alarm fires, a Tail Loss 419 Probe (TLP) packet is sent to evoke an acknowledgement from the 420 receiver. 422 The alarm duration, or Probe Timeout (PTO), is set based on the 423 following conditions: 425 o PTO SHOULD be scheduled for max(1.5*SRTT+MaxAckDelay, 426 kMinTLPTimeout) 428 o If RTO (Section 3.3.3) is earlier, schedule a TLP alarm in its 429 place. That is, PTO SHOULD be scheduled for min(RTO, PTO). 431 MaxAckDelay is the maximum ack delay supplied in an incoming ACK 432 frame. MaxAckDelay excludes ack delays that aren't included in an 433 RTT sample because they're too large and excludes those which 434 reference an ack-only packet. 436 QUIC diverges from TCP by calculating MaxAckDelay dynamically, 437 instead of assuming a constant delayed ack timeout for all 438 connections. QUIC includes this in all probe timeouts, because it 439 assume the ack delay may come into play, regardless of the number of 440 packets outstanding. TCP's TLP assumes if at least 2 packets are 441 outstanding, acks will not be delayed. 443 A PTO value of at least 1.5*SRTT ensures that the ACK is overdue. 444 The 1.5 is based on [TLP], but implementations MAY experiment with 445 other constants. 447 To reduce latency, it is RECOMMENDED that the sender set and allow 448 the TLP alarm to fire twice before setting an RTO alarm. In other 449 words, when the TLP alarm fires the first time, a TLP packet is sent, 450 and it is RECOMMENDED that the TLP alarm be scheduled for a second 451 time. When the TLP alarm fires the second time, a second TLP packet 452 is sent, and an RTO alarm SHOULD be scheduled Section 3.3.3. 454 A TLP packet SHOULD carry new data when possible. If new data is 455 unavailable or new data cannot be sent due to flow control, a TLP 456 packet MAY retransmit unacknowledged data to potentially reduce 457 recovery time. Since a TLP alarm is used to send a probe into the 458 network prior to establishing any packet loss, prior unacknowledged 459 packets SHOULD NOT be marked as lost when a TLP alarm fires. 461 A sender may not know that a packet being sent is a tail packet. 462 Consequently, a sender may have to arm or adjust the TLP alarm on 463 every sent retransmittable packet. 465 3.3.3. Retransmission Timeout 467 A Retransmission Timeout (RTO) alarm is the final backstop for loss 468 detection. The algorithm used in QUIC is based on the RTO algorithm 469 for TCP [RFC5681] and is additionally resilient to spurious RTO 470 events [RFC5682]. 472 When the last TLP packet is sent, an alarm is scheduled for the RTO 473 period. When this alarm fires, the sender sends two packets, to 474 evoke acknowledgements from the receiver, and restarts the RTO alarm. 476 Similar to TCP [RFC6298], the RTO period is set based on the 477 following conditions: 479 o When the final TLP packet is sent, the RTO period is set to 480 max(SRTT + 4*RTTVAR + MaxAckDelay, kMinRTOTimeout) 482 o When an RTO alarm fires, the RTO period is doubled. 484 The sender typically has incurred a high latency penalty by the time 485 an RTO alarm fires, and this penalty increases exponentially in 486 subsequent consecutive RTO events. Sending a single packet on an RTO 487 event therefore makes the connection very sensitive to single packet 488 loss. Sending two packets instead of one significantly increases 489 resilience to packet drop in both directions, thus reducing the 490 probability of consecutive RTO events. 492 QUIC's RTO algorithm differs from TCP in that the firing of an RTO 493 alarm is not considered a strong enough signal of packet loss, so 494 does not result in an immediate change to congestion window or 495 recovery state. An RTO alarm fires only when there's a prolonged 496 period of network silence, which could be caused by a change in the 497 underlying network RTT. 499 QUIC also diverges from TCP by including MaxAckDelay in the RTO 500 period. QUIC is able to explicitly model delay at the receiver via 501 the ack delay field in the ACK frame. Since QUIC corrects for this 502 delay in its SRTT and RTTVAR computations, it is necessary to add 503 this delay explicitly in the TLP and RTO computation. 505 When an acknowledgment is received for a packet sent on an RTO event, 506 any unacknowledged packets with lower packet numbers than those 507 acknowledged MUST be marked as lost. 509 A packet sent when an RTO alarm fires MAY carry new data if available 510 or unacknowledged data to potentially reduce recovery time. Since 511 this packet is sent as a probe into the network prior to establishing 512 any packet loss, prior unacknowledged packets SHOULD NOT be marked as 513 lost. 515 A packet sent on an RTO alarm MUST NOT be blocked by the sender's 516 congestion controller. A sender MUST however count these bytes as 517 additional bytes in flight, since this packet adds network load 518 without establishing packet loss. 520 3.4. Generating Acknowledgements 522 QUIC SHOULD delay sending acknowledgements in response to packets, 523 but MUST NOT excessively delay acknowledgements of packets containing 524 frames other than ACK or ACN_ECN. Specifically, implementaions MUST 525 attempt to enforce a maximum ack delay to avoid causing the peer 526 spurious timeouts. The RECOMMENDED maximum ack delay in QUIC is 527 25ms. 529 An acknowledgement MAY be sent for every second full-sized packet, as 530 TCP does [RFC5681], or may be sent less frequently, as long as the 531 delay does not exceed the maximum ack delay. QUIC recovery 532 algorithms do not assume the peer generates an acknowledgement 533 immediately when receiving a second full-sized packet. 535 Out-of-order packets SHOULD be acknowledged more quickly, in order to 536 accelerate loss recovery. The receiver SHOULD send an immediate ACK 537 when it receives a new packet which is not one greater than the 538 largest received packet number. 540 Similarly, packets marked with the ECN Congestion Experienced (CE) 541 codepoint in the IP header SHOULD be acknowledged immediately, to 542 reduce the peer's response time to congestion events. 544 As an optimization, a receiver MAY process multiple packets before 545 sending any ACK frames in response. In this case they can determine 546 whether an immediate or delayed acknowledgement should be generated 547 after processing incoming packets. 549 3.4.1. Crypto Handshake Data 551 In order to quickly complete the handshake and avoid spurious 552 retransmissions due to handshake alarm timeouts, handshake packets 553 SHOULD use a very short ack delay, such as 1ms. ACK frames MAY be 554 sent immediately when the crypto stack indicates all data for that 555 encryption level has been received. 557 3.4.2. ACK Ranges 559 When an ACK frame is sent, one or more ranges of acknowledged packets 560 are included. Including older packets reduces the chance of spurious 561 retransmits caused by losing previously sent ACK frames, at the cost 562 of larger ACK frames. 564 ACK frames SHOULD always acknowledge the most recently received 565 packets, and the more out-of-order the packets are, the more 566 important it is to send an updated ACK frame quickly, to prevent the 567 peer from declaring a packet as lost and spuriusly retransmitting the 568 frames it contains. 570 Below is one recommended approach for determining what packets to 571 include in an ACK frame. 573 3.4.3. Receiver Tracking of ACK Frames 575 When a packet containing an ACK frame is sent, the largest 576 acknowledged in that frame may be saved. When a packet containing an 577 ACK frame is acknowledged, the receiver can stop acknowledging 578 packets less than or equal to the largest acknowledged in the sent 579 ACK frame. 581 In cases without ACK frame loss, this algorithm allows for a minimum 582 of 1 RTT of reordering. In cases with ACK frame loss, this approach 583 does not guarantee that every acknowledgement is seen by the sender 584 before it is no longer included in the ACK frame. Packets could be 585 received out of order and all subsequent ACK frames containing them 586 could be lost. In this case, the loss recovery algorithm may cause 587 spurious retransmits, but the sender will continue making forward 588 progress. 590 3.5. Pseudocode 592 3.5.1. Constants of interest 594 Constants used in loss recovery are based on a combination of RFCs, 595 papers, and common practice. Some may need to be changed or 596 negotiated in order to better suit a variety of environments. 598 kMaxTLPs (RECOMMENDED 2): Maximum number of tail loss probes before 599 an RTO fires. 601 kReorderingThreshold (RECOMMENDED 3): Maximum reordering in packet 602 number space before FACK style loss detection considers a packet 603 lost. 605 kTimeReorderingFraction (RECOMMENDED 1/8): Maximum reordering in 606 time space before time based loss detection considers a packet 607 lost. In fraction of an RTT. 609 kUsingTimeLossDetection (RECOMMENDED false): Whether time based loss 610 detection is in use. If false, uses FACK style loss detection. 612 kMinTLPTimeout (RECOMMENDED 10ms): Minimum time in the future a tail 613 loss probe alarm may be set for. 615 kMinRTOTimeout (RECOMMENDED 200ms): Minimum time in the future an 616 RTO alarm may be set for. 618 kDelayedAckTimeout (RECOMMENDED 25ms): The length of the peer's 619 delayed ack timer. 621 kInitialRtt (RECOMMENDED 100ms): The RTT used before an RTT sample 622 is taken. 624 3.5.2. Variables of interest 626 Variables required to implement the congestion control mechanisms are 627 described in this section. 629 loss_detection_alarm: Multi-modal alarm used for loss detection. 631 handshake_count: The number of times all unacknowledged handshake 632 data has been retransmitted without receiving an ack. 634 tlp_count: The number of times a tail loss probe has been sent 635 without receiving an ack. 637 rto_count: The number of times an rto has been sent without 638 receiving an ack. 640 largest_sent_before_rto: The last packet number sent prior to the 641 first retransmission timeout. 643 time_of_last_sent_retransmittable_packet: The time the most recent 644 retransmittable packet was sent. 646 time_of_last_sent_handshake_packet: The time the most recent packet 647 containing a CRYPTO frame was sent. 649 largest_sent_packet: The packet number of the most recently sent 650 packet. 652 largest_acked_packet: The largest packet number acknowledged in an 653 ACK frame. 655 latest_rtt: The most recent RTT measurement made when receiving an 656 ack for a previously unacked packet. 658 smoothed_rtt: The smoothed RTT of the connection, computed as 659 described in [RFC6298] 661 rttvar: The RTT variance, computed as described in [RFC6298] 662 min_rtt: The minimum RTT seen in the connection, ignoring ack delay. 664 max_ack_delay: The maximum ack delay in an incoming ACK frame for 665 this connection. Excludes ack delays for ack only packets and 666 those that create an RTT sample less than min_rtt. 668 reordering_threshold: The largest packet number gap between the 669 largest acked retransmittable packet and an unacknowledged 670 retransmittable packet before it is declared lost. 672 time_reordering_fraction: The reordering window as a fraction of 673 max(smoothed_rtt, latest_rtt). 675 loss_time: The time at which the next packet will be considered lost 676 based on early transmit or exceeding the reordering window in 677 time. 679 sent_packets: An association of packet numbers to information about 680 them, including a number field indicating the packet number, a 681 time field indicating the time a packet was sent, a boolean 682 indicating whether the packet is ack only, and a bytes field 683 indicating the packet's size. sent_packets is ordered by packet 684 number, and packets remain in sent_packets until acknowledged or 685 lost. A sent_packets data structure is maintained per packet 686 number space, and ACK processing only applies to a single space. 688 3.5.3. Initialization 690 At the beginning of the connection, initialize the loss detection 691 variables as follows: 693 loss_detection_alarm.reset() 694 handshake_count = 0 695 tlp_count = 0 696 rto_count = 0 697 if (kUsingTimeLossDetection) 698 reordering_threshold = infinite 699 time_reordering_fraction = kTimeReorderingFraction 700 else: 701 reordering_threshold = kReorderingThreshold 702 time_reordering_fraction = infinite 703 loss_time = 0 704 smoothed_rtt = 0 705 rttvar = 0 706 min_rtt = infinite 707 max_ack_delay = 0 708 largest_sent_before_rto = 0 709 time_of_last_sent_retransmittable_packet = 0 710 time_of_last_sent_handshake_packet = 0 711 largest_sent_packet = 0 713 3.5.4. On Sending a Packet 715 After any packet is sent, be it a new transmission or a rebundled 716 transmission, the following OnPacketSent function is called. The 717 parameters to OnPacketSent are as follows: 719 o packet_number: The packet number of the sent packet. 721 o is_ack_only: A boolean that indicates whether a packet only 722 contains an ACK frame. If true, it is still expected an ack will 723 be received for this packet, but it is not retransmittable. 725 o is_handshake_packet: A boolean that indicates whether a packet 726 contains handshake data. 728 o sent_bytes: The number of bytes sent in the packet, not including 729 UDP or IP overhead, but including QUIC framing overhead. 731 Pseudocode for OnPacketSent follows: 733 OnPacketSent(packet_number, is_ack_only, is_handshake_packet, 734 sent_bytes): 735 largest_sent_packet = packet_number 736 sent_packets[packet_number].packet_number = packet_number 737 sent_packets[packet_number].time = now 738 sent_packets[packet_number].ack_only = is_ack_only 739 if !is_ack_only: 740 if is_handshake_packet: 741 time_of_last_sent_handshake_packet = now 742 time_of_last_sent_retransmittable_packet = now 743 OnPacketSentCC(sent_bytes) 744 sent_packets[packet_number].bytes = sent_bytes 745 SetLossDetectionAlarm() 747 3.5.5. On Receiving an Acknowledgment 749 When an ACK frame is received, it may acknowledge 0 or more packets. 751 Pseudocode for OnAckReceived and UpdateRtt follow: 753 OnAckReceived(ack): 754 largest_acked_packet = ack.largest_acked 755 // If the largest acked is newly acked, update the RTT. 756 if (sent_packets[ack.largest_acked]): 757 latest_rtt = now - sent_packets[ack.largest_acked].time 758 UpdateRtt(latest_rtt, ack.ack_delay) 759 // Find all newly acked packets. 760 for acked_packet in DetermineNewlyAckedPackets(): 761 OnPacketAcked(acked_packet.packet_number) 763 DetectLostPackets(ack.largest_acked_packet) 764 SetLossDetectionAlarm() 766 // Process ECN information if present. 767 if (ACK frame contains ECN information): 768 ProcessECN(ack) 770 UpdateRtt(latest_rtt, ack_delay): 771 // min_rtt ignores ack delay. 772 min_rtt = min(min_rtt, latest_rtt) 773 // Adjust for ack delay if it's plausible. 774 if (latest_rtt - min_rtt > ack_delay): 775 latest_rtt -= ack_delay 776 // Only save into max ack delay if it's used 777 // for rtt calculation and is not ack only. 778 if (!sent_packets[ack.largest_acked].ack_only) 779 max_ack_delay = max(max_ack_delay, ack_delay) 780 // Based on {{RFC6298}}. 781 if (smoothed_rtt == 0): 782 smoothed_rtt = latest_rtt 783 rttvar = latest_rtt / 2 784 else: 785 rttvar_sample = abs(smoothed_rtt - latest_rtt) 786 rttvar = 3/4 * rttvar + 1/4 * rttvar_sample 787 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * latest_rtt 789 3.5.6. On Packet Acknowledgment 791 When a packet is acked for the first time, the following 792 OnPacketAcked function is called. Note that a single ACK frame may 793 newly acknowledge several packets. OnPacketAcked must be called once 794 for each of these newly acked packets. 796 OnPacketAcked takes one parameter, acked_packet, which is the struct 797 of the newly acked packet. 799 If this is the first acknowledgement following RTO, check if the 800 smallest newly acknowledged packet is one sent by the RTO, and if so, 801 inform congestion control of a verified RTO, similar to F-RTO 802 [RFC5682]. 804 Pseudocode for OnPacketAcked follows: 806 OnPacketAcked(acked_packet): 807 if (!acked_packet.is_ack_only): 808 OnPacketAckedCC(acked_packet) 809 // If a packet sent prior to RTO was acked, then the RTO 810 // was spurious. Otherwise, inform congestion control. 811 if (rto_count > 0 && 812 acked_packet.packet_number > largest_sent_before_rto) 813 OnRetransmissionTimeoutVerified() 814 handshake_count = 0 815 tlp_count = 0 816 rto_count = 0 817 sent_packets.remove(acked_packet.packet_number) 819 3.5.7. Setting the Loss Detection Alarm 821 QUIC loss detection uses a single alarm for all timer-based loss 822 detection. The duration of the alarm is based on the alarm's mode, 823 which is set in the packet and timer events further below. The 824 function SetLossDetectionAlarm defined below shows how the single 825 timer is set based on the alarm mode. 827 3.5.7.1. Handshake Alarm 829 When a connection has unacknowledged handshake data, the handshake 830 alarm is set and when it expires, all unacknowledgedd handshake data 831 is retransmitted. 833 When stateless rejects are in use, the connection is considered 834 immediately closed once a reject is sent, so no timer is set to 835 retransmit the reject. 837 Version negotiation packets are always stateless, and MUST be sent 838 once per handshake packet that uses an unsupported QUIC version, and 839 MAY be sent in response to 0-RTT packets. 841 3.5.7.2. Tail Loss Probe and Retransmission Alarm 843 Tail loss probes [TLP] and retransmission timeouts [RFC6298] are an 844 alarm based mechanism to recover from cases when there are 845 outstanding retransmittable packets, but an acknowledgement has not 846 been received in a timely manner. 848 The TLP and RTO timers are armed when there is not unacknowledged 849 handshake data. The TLP alarm is set until the max number of TLP 850 packets have been sent, and then the RTO timer is set. 852 3.5.7.3. Early Retransmit Alarm 854 Early retransmit [RFC5827] is implemented with a 1/4 RTT timer. It 855 is part of QUIC's time based loss detection, but is always enabled, 856 even when only packet reordering loss detection is enabled. 858 3.5.7.4. Pseudocode 860 Pseudocode for SetLossDetectionAlarm follows: 862 SetLossDetectionAlarm(): 863 // Don't arm the alarm if there are no packets with 864 // retransmittable data in flight. 865 if (bytes_in_flight == 0): 866 loss_detection_alarm.cancel() 867 return 869 if (handshake packets are outstanding): 870 // Handshake retransmission alarm. 871 if (smoothed_rtt == 0): 872 alarm_duration = 2 * kInitialRtt 873 else: 874 alarm_duration = 2 * smoothed_rtt 875 alarm_duration = max(alarm_duration + max_ack_delay, 876 kMinTLPTimeout) 877 alarm_duration = alarm_duration * (2 ^ handshake_count) 878 loss_detection_alarm.set( 879 time_of_last_sent_handshake_packet + alarm_duration) 880 return; 881 else if (loss_time != 0): 882 // Early retransmit timer or time loss detection. 883 alarm_duration = loss_time - 884 time_of_last_sent_retransmittable_packet 885 else: 886 // RTO or TLP alarm 887 // Calculate RTO duration 888 alarm_duration = 889 smoothed_rtt + 4 * rttvar + max_ack_delay 890 alarm_duration = max(alarm_duration, kMinRTOTimeout) 891 alarm_duration = alarm_duration * (2 ^ rto_count) 892 if (tlp_count < kMaxTLPs): 893 // Tail Loss Probe 894 tlp_alarm_duration = max(1.5 * smoothed_rtt 895 + max_ack_delay, kMinTLPTimeout) 896 alarm_duration = min(tlp_alarm_duration, alarm_duration) 898 loss_detection_alarm.set( 899 time_of_last_sent_retransmittable_packet + alarm_duration) 901 3.5.8. On Alarm Firing 903 QUIC uses one loss recovery alarm, which when set, can be in one of 904 several modes. When the alarm fires, the mode determines the action 905 to be performed. 907 Pseudocode for OnLossDetectionAlarm follows: 909 OnLossDetectionAlarm(): 910 if (handshake packets are outstanding): 911 // Handshake retransmission alarm. 912 RetransmitAllUnackedHandshakeData() 913 handshake_count++ 914 else if (loss_time != 0): 915 // Early retransmit or Time Loss Detection 916 DetectLostPackets(largest_acked_packet) 917 else if (tlp_count < kMaxTLPs): 918 // Tail Loss Probe. 919 SendOnePacket() 920 tlp_count++ 921 else: 922 // RTO. 923 if (rto_count == 0) 924 largest_sent_before_rto = largest_sent_packet 925 SendTwoPackets() 926 rto_count++ 928 SetLossDetectionAlarm() 930 3.5.9. Detecting Lost Packets 932 Packets in QUIC are only considered lost once a larger packet number 933 in the same packet number space is acknowledged. DetectLostPackets 934 is called every time an ack is received and operates on the 935 sent_packets for that packet number space. If the loss detection 936 alarm fires and the loss_time is set, the previous largest acked 937 packet is supplied. 939 3.5.9.1. Pseudocode 941 DetectLostPackets takes one parameter, acked, which is the largest 942 acked packet. 944 Pseudocode for DetectLostPackets follows: 946 DetectLostPackets(largest_acked): 947 loss_time = 0 948 lost_packets = {} 949 delay_until_lost = infinite 950 if (kUsingTimeLossDetection): 951 delay_until_lost = 952 (1 + time_reordering_fraction) * 953 max(latest_rtt, smoothed_rtt) 954 else if (largest_acked.packet_number == largest_sent_packet): 955 // Early retransmit alarm. 956 delay_until_lost = 5/4 * max(latest_rtt, smoothed_rtt) 957 foreach (unacked < largest_acked.packet_number): 958 time_since_sent = now() - unacked.time_sent 959 delta = largest_acked.packet_number - unacked.packet_number 960 if (time_since_sent > delay_until_lost || 961 delta > reordering_threshold): 962 sent_packets.remove(unacked.packet_number) 963 if (!unacked.is_ack_only): 964 lost_packets.insert(unacked) 965 else if (loss_time == 0 && delay_until_lost != infinite): 966 loss_time = now() + delay_until_lost - time_since_sent 968 // Inform the congestion controller of lost packets and 969 // lets it decide whether to retransmit immediately. 970 if (!lost_packets.empty()): 971 OnPacketsLost(lost_packets) 973 3.6. Discussion 975 The majority of constants were derived from best common practices 976 among widely deployed TCP implementations on the internet. 977 Exceptions follow. 979 A shorter delayed ack time of 25ms was chosen because longer delayed 980 acks can delay loss recovery and for the small number of connections 981 where less than packet per 25ms is delivered, acking every packet is 982 beneficial to congestion control and loss recovery. 984 The default initial RTT of 100ms was chosen because it is slightly 985 higher than both the median and mean min_rtt typically observed on 986 the public internet. 988 4. Congestion Control 990 QUIC's congestion control is based on TCP NewReno [RFC6582] 991 congestion control to determine the congestion window. QUIC 992 congestion control is specified in bytes due to finer control and the 993 ease of appropriate byte counting [RFC3465]. 995 QUIC hosts MUST NOT send packets if they would increase 996 bytes_in_flight (defined in Section 4.8.2) beyond the available 997 congestion window, unless the packet is a probe packet sent after the 998 TLP or RTO alarm fires, as described in Section 3.3.2 and 999 Section 3.3.3. 1001 4.1. Explicit Congestion Notification 1003 If a path has been verified to support ECN, QUIC treats a Congestion 1004 Experienced codepoint in the IP header as a signal of congestion. 1005 This document specifies an endpoint's response when its peer receives 1006 packets with the Congestion Experienced codepoint. As discussed in 1007 [RFC8311], endpoints are permitted to experiment with other response 1008 functions. 1010 4.2. Slow Start 1012 QUIC begins every connection in slow start and exits slow start upon 1013 loss or upon increase in the ECN-CE counter. QUIC re-enters slow 1014 start anytime the congestion window is less than sshthresh, which 1015 typically only occurs after an RTO. While in slow start, QUIC 1016 increases the congestion window by the number of bytes acknowledged 1017 when each ack is processed. 1019 4.3. Congestion Avoidance 1021 Slow start exits to congestion avoidance. Congestion avoidance in 1022 NewReno uses an additive increase multiplicative decrease (AIMD) 1023 approach that increases the congestion window by one MSS of bytes per 1024 congestion window acknowledged. When a loss is detected, NewReno 1025 halves the congestion window and sets the slow start threshold to the 1026 new congestion window. 1028 4.4. Recovery Period 1030 Recovery is a period of time beginning with detection of a lost 1031 packet or an increase in the ECN-CE counter. Because QUIC 1032 retransmits stream data and control frames, not packets, it defines 1033 the end of recovery as a packet sent after the start of recovery 1034 being acknowledged. This is slightly different from TCP's definition 1035 of recovery, which ends when the lost packet that started recovery is 1036 acknowledged. 1038 The recovery period limits congestion window reduction to once per 1039 round trip. During recovery, the congestion window remains unchanged 1040 irrespective of new losses or increases in the ECN-CE counter. 1042 4.5. Tail Loss Probe 1044 A TLP packet MUST NOT be blocked by the sender's congestion 1045 controller. The sender MUST however count these bytes as additional 1046 bytes-in-flight, since a TLP adds network load without establishing 1047 packet loss. 1049 Acknowledgement or loss of tail loss probes are treated like any 1050 other packet. 1052 4.6. Retransmission Timeout 1054 When retransmissions are sent due to a retransmission timeout alarm, 1055 no change is made to the congestion window until the next 1056 acknowledgement arrives. The retransmission timeout is considered 1057 spurious when this acknowledgement acknowledges packets sent prior to 1058 the first retransmission timeout. The retransmission timeout is 1059 considered valid when this acknowledgement acknowledges no packets 1060 sent prior to the first retransmission timeout. In this case, the 1061 congestion window MUST be reduced to the minimum congestion window 1062 and slow start is re-entered. 1064 4.7. Pacing 1066 This document does not specify a pacer, but it is RECOMMENDED that a 1067 sender pace sending of all retransmittable packets based on input 1068 from the congestion controller. For example, a pacer might 1069 distribute the congestion window over the SRTT when used with a 1070 window-based controller, and a pacer might use the rate estimate of a 1071 rate-based controller. 1073 An implementation should take care to architect its congestion 1074 controller to work well with a pacer. For instance, a pacer might 1075 wrap the congestion controller and control the availability of the 1076 congestion window, or a pacer might pace out packets handed to it by 1077 the congestion controller. Timely delivery of ACK frames is 1078 important for efficient loss recovery. Packets containing only ACK 1079 frames should therefore not be paced, to avoid delaying their 1080 delivery to the peer. 1082 As an example of a well-known and publicly available implementation 1083 of a flow pacer, implementers are referred to the Fair Queue packet 1084 scheduler (fq qdisc) in Linux (3.11 onwards). 1086 4.8. Pseudocode 1088 4.8.1. Constants of interest 1090 Constants used in congestion control are based on a combination of 1091 RFCs, papers, and common practice. Some may need to be changed or 1092 negotiated in order to better suit a variety of environments. 1094 kInitialMss (RECOMMENDED 1460 bytes): The max packet size is used 1095 for calculating initial and minimum congestion windows. 1097 kInitialWindow (RECOMMENDED 10 * kInitialMss): Limit on the initial 1098 amount of outstanding data in bytes. 1100 kMinimumWindow (RECOMMENDED 2 * kInitialMss): Minimum congestion 1101 window in bytes. 1103 kLossReductionFactor (RECOMMENDED 0.5): Reduction in congestion 1104 window when a new loss event is detected. 1106 4.8.2. Variables of interest 1108 Variables required to implement the congestion control mechanisms are 1109 described in this section. 1111 ecn_ce_counter: The highest value reported for the ECN-CE counter by 1112 the peer in an ACK_ECN frame. This variable is used to detect 1113 increases in the reported ECN-CE counter. 1115 bytes_in_flight: The sum of the size in bytes of all sent packets 1116 that contain at least one retransmittable frame, and have not been 1117 acked or declared lost. The size does not include IP or UDP 1118 overhead. Packets only containing ACK frames do not count towards 1119 bytes_in_flight to ensure congestion control does not impede 1120 congestion feedback. 1122 congestion_window: Maximum number of bytes-in-flight that may be 1123 sent. 1125 end_of_recovery: The largest packet number sent when QUIC detects a 1126 loss. When a larger packet is acknowledged, QUIC exits recovery. 1128 ssthresh: Slow start threshold in bytes. When the congestion window 1129 is below ssthresh, the mode is slow start and the window grows by 1130 the number of bytes acknowledged. 1132 4.8.3. Initialization 1134 At the beginning of the connection, initialize the congestion control 1135 variables as follows: 1137 congestion_window = kInitialWindow 1138 bytes_in_flight = 0 1139 end_of_recovery = 0 1140 ssthresh = infinite 1141 ecn_ce_counter = 0 1143 4.8.4. On Packet Sent 1145 Whenever a packet is sent, and it contains non-ACK frames, the packet 1146 increases bytes_in_flight. 1148 OnPacketSentCC(bytes_sent): 1149 bytes_in_flight += bytes_sent 1151 4.8.5. On Packet Acknowledgement 1153 Invoked from loss detection's OnPacketAcked and is supplied with 1154 acked_packet from sent_packets. 1156 InRecovery(packet_number): 1157 return packet_number <= end_of_recovery 1159 OnPacketAckedCC(acked_packet): 1160 // Remove from bytes_in_flight. 1161 bytes_in_flight -= acked_packet.bytes 1162 if (InRecovery(acked_packet.packet_number)): 1163 // Do not increase congestion window in recovery period. 1164 return 1165 if (congestion_window < ssthresh): 1166 // Slow start. 1167 congestion_window += acked_packet.bytes 1168 else: 1169 // Congestion avoidance. 1170 congestion_window += 1171 kInitialMss * acked_packet.bytes / congestion_window 1173 4.8.6. On New Congestion Event 1175 Invoked from ProcessECN and OnPacketLost when a new congestion event 1176 is detected. Starts a new recovery period and reduces the congestion 1177 window. 1179 CongestionEvent(packet_number): 1180 // Start a new congestion event if packet_number 1181 // is larger than the end of the previous recovery epoch. 1182 if (!InRecovery(packet_number)): 1183 end_of_recovery = largest_sent_packet 1184 congestion_window *= kMarkReductionFactor 1185 congestion_window = max(congestion_window, kMinimumWindow) 1187 4.8.7. Process ECN Information 1189 Invoked when an ACK_ECN frame is received from the peer. 1191 ProcessECN(ack): 1192 // If the ECN-CE counter reported by the peer has increased, 1193 // this could be a new congestion event. 1194 if (ack.ce_counter > ecn_ce_counter): 1195 ecn_ce_counter = ack.ce_counter 1196 // Start a new congestion event if the last acknowledged 1197 // packet is past the end of the previous recovery epoch. 1198 CongestionEvent(ack.largest_acked_packet) 1200 4.8.8. On Packets Lost 1202 Invoked by loss detection from DetectLostPackets when new packets are 1203 detected lost. 1205 OnPacketsLost(lost_packets): 1206 // Remove lost packets from bytes_in_flight. 1207 for (lost_packet : lost_packets): 1208 bytes_in_flight -= lost_packet.bytes 1209 largest_lost_packet = lost_packets.last() 1211 // Start a new congestion epoch if the last lost packet 1212 // is past the end of the previous recovery epoch. 1213 CongestionEvent(largest_lost_packet.packet_number) 1215 4.8.9. On Retransmission Timeout Verified 1217 QUIC decreases the congestion window to the minimum value once the 1218 retransmission timeout has been verified. 1220 OnRetransmissionTimeoutVerified() 1221 congestion_window = kMinimumWindow 1223 5. IANA Considerations 1225 This document has no IANA actions. Yet. 1227 6. References 1229 6.1. Normative References 1231 [QUIC-TRANSPORT] 1232 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1233 Multiplexed and Secure Transport", draft-ietf-quic- 1234 transport-13 (work in progress), June 2018. 1236 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1237 Requirement Levels", BCP 14, RFC 2119, 1238 DOI 10.17487/RFC2119, March 1997, 1239 . 1241 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1242 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1243 May 2017, . 1245 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1246 Notification (ECN) Experimentation", RFC 8311, 1247 DOI 10.17487/RFC8311, January 2018, 1248 . 1250 6.2. Informative References 1252 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 1253 Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February 1254 2003, . 1256 [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, 1257 "Improving the Robustness of TCP to Non-Congestion 1258 Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, 1259 . 1261 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1262 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1263 . 1265 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1266 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1267 Spurious Retransmission Timeouts with TCP", RFC 5682, 1268 DOI 10.17487/RFC5682, September 2009, 1269 . 1271 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 1272 P. Hurtig, "Early Retransmit for TCP and Stream Control 1273 Transmission Protocol (SCTP)", RFC 5827, 1274 DOI 10.17487/RFC5827, May 2010, 1275 . 1277 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1278 "Computing TCP's Retransmission Timer", RFC 6298, 1279 DOI 10.17487/RFC6298, June 2011, 1280 . 1282 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 1283 NewReno Modification to TCP's Fast Recovery Algorithm", 1284 RFC 6582, DOI 10.17487/RFC6582, April 2012, 1285 . 1287 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1288 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1289 Based on Selective Acknowledgment (SACK) for TCP", 1290 RFC 6675, DOI 10.17487/RFC6675, August 2012, 1291 . 1293 [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 1294 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 1295 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 1296 in progress), February 2013. 1298 6.3. URIs 1300 [1] https://mailarchive.ietf.org/arch/search/?email_list=quic 1302 [2] https://github.com/quicwg 1304 [3] https://github.com/quicwg/base-drafts/labels/-recovery 1306 Appendix A. Change Log 1308 *RFC Editor's Note:* Please remove this section prior to 1309 publication of a final version of this document. 1311 A.1. Since draft-ietf-quic-recovery-12 1313 o Changes to manage separate packet number spaces and encryption 1314 levels (#1190, #1242, #1413, #1450) 1316 o Added ECN feedback mechanisms and handling; new ACK_ECN frame 1317 (#804, #805, #1372) 1319 A.2. Since draft-ietf-quic-recovery-11 1321 No significant changes. 1323 A.3. Since draft-ietf-quic-recovery-10 1325 o Improved text on ack generation (#1139, #1159) 1327 o Make references to TCP recovery mechanisms informational (#1195) 1329 o Define time_of_last_sent_handshake_packet (#1171) 1331 o Added signal from TLS the data it includes needs to be sent in a 1332 Retry packet (#1061, #1199) 1334 o Minimum RTT (min_rtt) is initialized with an infinite value 1335 (#1169) 1337 A.4. Since draft-ietf-quic-recovery-09 1339 No significant changes. 1341 A.5. Since draft-ietf-quic-recovery-08 1343 o Clarified pacing and RTO (#967, #977) 1345 A.6. Since draft-ietf-quic-recovery-07 1347 o Include Ack Delay in RTO(and TLP) computations (#981) 1349 o Ack Delay in SRTT computation (#961) 1351 o Default RTT and Slow Start (#590) 1353 o Many editorial fixes. 1355 A.7. Since draft-ietf-quic-recovery-06 1357 No significant changes. 1359 A.8. Since draft-ietf-quic-recovery-05 1361 o Add more congestion control text (#776) 1363 A.9. Since draft-ietf-quic-recovery-04 1365 No significant changes. 1367 A.10. Since draft-ietf-quic-recovery-03 1369 No significant changes. 1371 A.11. Since draft-ietf-quic-recovery-02 1373 o Integrate F-RTO (#544, #409) 1375 o Add congestion control (#545, #395) 1377 o Require connection abort if a skipped packet was acknowledged 1378 (#415) 1380 o Simplify RTO calculations (#142, #417) 1382 A.12. Since draft-ietf-quic-recovery-01 1384 o Overview added to loss detection 1386 o Changes initial default RTT to 100ms 1388 o Added time-based loss detection and fixes early retransmit 1390 o Clarified loss recovery for handshake packets 1392 o Fixed references and made TCP references informative 1394 A.13. Since draft-ietf-quic-recovery-00 1396 o Improved description of constants and ACK behavior 1398 A.14. Since draft-iyengar-quic-loss-recovery-01 1400 o Adopted as base for draft-ietf-quic-recovery 1402 o Updated authors/editors list 1404 o Added table of contents 1406 Acknowledgments 1407 Authors' Addresses 1409 Jana Iyengar (editor) 1410 Fastly 1412 Email: jri.ietf@gmail.com 1414 Ian Swett (editor) 1415 Google 1417 Email: ianswett@google.com