idnits 2.17.1 draft-ietf-quic-recovery-26.txt: -(1848): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (21 February 2020) is 1519 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Initial' is mentioned on line 1290, but not defined == Outdated reference: A later version (-34) exists of draft-ietf-quic-tls-26 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-26 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-rack-05 -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft Fastly 4 Intended status: Standards Track I. Swett, Ed. 5 Expires: 24 August 2020 Google 6 21 February 2020 8 QUIC Loss Detection and Congestion Control 9 draft-ietf-quic-recovery-26 11 Abstract 13 This document describes loss detection and congestion control 14 mechanisms for QUIC. 16 Note to Readers 18 Discussion of this draft takes place on the QUIC working group 19 mailing list (quic@ietf.org), which is archived at 20 https://mailarchive.ietf.org/arch/search/?email_list=quic 21 (https://mailarchive.ietf.org/arch/search/?email_list=quic). 23 Working Group information can be found at https://github.com/quicwg 24 (https://github.com/quicwg); source code and issues list for this 25 draft can be found at https://github.com/quicwg/base-drafts/labels/- 26 recovery (https://github.com/quicwg/base-drafts/labels/-recovery). 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on 24 August 2020. 45 Copyright Notice 47 Copyright (c) 2020 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 52 license-info) in effect on the date of publication of this document. 53 Please review these documents carefully, as they describe your rights 54 and restrictions with respect to this document. Code Components 55 extracted from this document must include Simplified BSD License text 56 as described in Section 4.e of the Trust Legal Provisions and are 57 provided without warranty as described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 62 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 63 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 5 64 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 65 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 6 66 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 67 3.1.3. Clearer Loss Epoch . . . . . . . . . . . . . . . . . 6 68 3.1.4. No Reneging . . . . . . . . . . . . . . . . . . . . . 7 69 3.1.5. More ACK Ranges . . . . . . . . . . . . . . . . . . . 7 70 3.1.6. Explicit Correction For Delayed Acknowledgements . . 7 71 4. Estimating the Round-Trip Time . . . . . . . . . . . . . . . 7 72 4.1. Generating RTT samples . . . . . . . . . . . . . . . . . 7 73 4.2. Estimating min_rtt . . . . . . . . . . . . . . . . . . . 8 74 4.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 9 75 5. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 10 76 5.1. Acknowledgement-based Detection . . . . . . . . . . . . . 10 77 5.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 11 78 5.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 11 79 5.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 12 80 5.2.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 12 81 5.3. Handshakes and New Paths . . . . . . . . . . . . . . . . 13 82 5.3.1. Sending Probe Packets . . . . . . . . . . . . . . . . 14 83 5.3.2. Loss Detection . . . . . . . . . . . . . . . . . . . 15 84 5.4. Handling Retry Packets . . . . . . . . . . . . . . . . . 15 85 5.5. Discarding Keys and Packet State . . . . . . . . . . . . 15 86 6. Congestion Control . . . . . . . . . . . . . . . . . . . . . 16 87 6.1. Explicit Congestion Notification . . . . . . . . . . . . 16 88 6.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 17 89 6.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 17 90 6.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 17 91 6.5. Ignoring Loss of Undecryptable Packets . . . . . . . . . 17 92 6.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 18 93 6.7. Persistent Congestion . . . . . . . . . . . . . . . . . . 18 94 6.8. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 19 95 6.9. Under-utilizing the Congestion Window . . . . . . . . . . 20 96 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 97 7.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 20 98 7.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 20 99 7.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 20 100 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 101 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 102 9.1. Normative References . . . . . . . . . . . . . . . . . . 21 103 9.2. Informative References . . . . . . . . . . . . . . . . . 21 104 Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 23 105 A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 23 106 A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 24 107 A.2. Constants of interest . . . . . . . . . . . . . . . . . . 24 108 A.3. Variables of interest . . . . . . . . . . . . . . . . . . 25 109 A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 25 110 A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 26 111 A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 26 112 A.7. On Packet Acknowledgment . . . . . . . . . . . . . . . . 28 113 A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 28 114 A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 30 115 A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 30 116 Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 31 117 B.1. Constants of interest . . . . . . . . . . . . . . . . . . 31 118 B.2. Variables of interest . . . . . . . . . . . . . . . . . . 32 119 B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 33 120 B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 33 121 B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 33 122 B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 34 123 B.7. Process ECN Information . . . . . . . . . . . . . . . . . 34 124 B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 35 125 Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 35 126 C.1. Since draft-ietf-quic-recovery-25 . . . . . . . . . . . . 35 127 C.2. Since draft-ietf-quic-recovery-24 . . . . . . . . . . . . 35 128 C.3. Since draft-ietf-quic-recovery-23 . . . . . . . . . . . . 35 129 C.4. Since draft-ietf-quic-recovery-22 . . . . . . . . . . . . 36 130 C.5. Since draft-ietf-quic-recovery-21 . . . . . . . . . . . . 36 131 C.6. Since draft-ietf-quic-recovery-20 . . . . . . . . . . . . 36 132 C.7. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 36 133 C.8. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 37 134 C.9. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 37 135 C.10. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 38 136 C.11. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 38 137 C.12. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 38 138 C.13. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 39 139 C.14. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 39 140 C.15. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 39 141 C.16. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 39 142 C.17. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 39 143 C.18. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 39 144 C.19. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 40 145 C.20. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 40 146 C.21. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 40 147 C.22. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 40 148 C.23. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 40 149 C.24. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 40 150 C.25. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 40 151 C.26. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 41 152 Appendix D. Contributors . . . . . . . . . . . . . . . . . . . . 41 153 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 41 154 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 156 1. Introduction 158 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 159 on decades of transport and security experience, and implements 160 mechanisms that make it attractive as a modern general-purpose 161 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 163 QUIC implements the spirit of existing TCP congestion control and 164 loss recovery mechanisms, described in RFCs, various Internet-drafts, 165 and also those prevalent in the Linux TCP implementation. This 166 document describes QUIC congestion control and loss recovery, and 167 where applicable, attributes the TCP equivalent in RFCs, Internet- 168 drafts, academic papers, and/or TCP implementations. 170 2. Conventions and Definitions 172 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 173 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 174 "OPTIONAL" in this document are to be interpreted as described in BCP 175 14 [RFC2119] [RFC8174] when, and only when, they appear in all 176 capitals, as shown here. 178 Definitions of terms that are used in this document: 180 Ack-eliciting Frames: All frames other than ACK, PADDING, and 181 CONNECTION_CLOSE are considered ack-eliciting. 183 Ack-eliciting Packets: Packets that contain ack-eliciting frames 184 elicit an ACK from the receiver within the maximum ack delay and 185 are called ack-eliciting packets. 187 In-flight: Packets are considered in-flight when they are ack- 188 eliciting or contain a PADDING frame, and they have been sent but 189 are not acknowledged, declared lost, or abandoned along with old 190 keys. 192 3. Design of the QUIC Transmission Machinery 194 All transmissions in QUIC are sent with a packet-level header, which 195 indicates the encryption level and includes a packet sequence number 196 (referred to below as a packet number). The encryption level 197 indicates the packet number space, as described in [QUIC-TRANSPORT]. 198 Packet numbers never repeat within a packet number space for the 199 lifetime of a connection. Packet numbers are sent in monotonically 200 increasing order within a space, preventing ambiguity. 202 This design obviates the need for disambiguating between 203 transmissions and retransmissions and eliminates significant 204 complexity from QUIC's interpretation of TCP loss detection 205 mechanisms. 207 QUIC packets can contain multiple frames of different types. The 208 recovery mechanisms ensure that data and frames that need reliable 209 delivery are acknowledged or declared lost and sent in new packets as 210 necessary. The types of frames contained in a packet affect recovery 211 and congestion control logic: 213 * All packets are acknowledged, though packets that contain no ack- 214 eliciting frames are only acknowledged along with ack-eliciting 215 packets. 217 * Long header packets that contain CRYPTO frames are critical to the 218 performance of the QUIC handshake and use shorter timers for 219 acknowledgement. 221 * Packets containing frames besides ACK or CONNECTION_CLOSE frames 222 count toward congestion control limits and are considered in- 223 flight. 225 * PADDING frames cause packets to contribute toward bytes in flight 226 without directly causing an acknowledgment to be sent. 228 3.1. Relevant Differences Between QUIC and TCP 230 Readers familiar with TCP's loss detection and congestion control 231 will find algorithms here that parallel well-known TCP ones. 232 Protocol differences between QUIC and TCP however contribute to 233 algorithmic differences. We briefly describe these protocol 234 differences below. 236 3.1.1. Separate Packet Number Spaces 238 QUIC uses separate packet number spaces for each encryption level, 239 except 0-RTT and all generations of 1-RTT keys use the same packet 240 number space. Separate packet number spaces ensures acknowledgement 241 of packets sent with one level of encryption will not cause spurious 242 retransmission of packets sent with a different encryption level. 243 Congestion control and round-trip time (RTT) measurement are unified 244 across packet number spaces. 246 3.1.2. Monotonically Increasing Packet Numbers 248 TCP conflates transmission order at the sender with delivery order at 249 the receiver, which results in retransmissions of the same data 250 carrying the same sequence number, and consequently leads to 251 "retransmission ambiguity". QUIC separates the two. QUIC uses a 252 packet number to indicate transmission order. Application data is 253 sent in one or more streams and delivery order is determined by 254 stream offsets encoded within STREAM frames. 256 QUIC's packet number is strictly increasing within a packet number 257 space, and directly encodes transmission order. A higher packet 258 number signifies that the packet was sent later, and a lower packet 259 number signifies that the packet was sent earlier. When a packet 260 containing ack-eliciting frames is detected lost, QUIC rebundles 261 necessary frames in a new packet with a new packet number, removing 262 ambiguity about which packet is acknowledged when an ACK is received. 263 Consequently, more accurate RTT measurements can be made, spurious 264 retransmissions are trivially detected, and mechanisms such as Fast 265 Retransmit can be applied universally, based only on packet number. 267 This design point significantly simplifies loss detection mechanisms 268 for QUIC. Most TCP mechanisms implicitly attempt to infer 269 transmission ordering based on TCP sequence numbers - a non-trivial 270 task, especially when TCP timestamps are not available. 272 3.1.3. Clearer Loss Epoch 274 QUIC starts a loss epoch when a packet is lost and ends one when any 275 packet sent after the epoch starts is acknowledged. TCP waits for 276 the gap in the sequence number space to be filled, and so if a 277 segment is lost multiple times in a row, the loss epoch may not end 278 for several round trips. Because both should reduce their congestion 279 windows only once per epoch, QUIC will do it once for every round 280 trip that experiences loss, while TCP may only do it once across 281 multiple round trips. 283 3.1.4. No Reneging 285 QUIC ACKs contain information that is similar to TCP SACK, but QUIC 286 does not allow any acked packet to be reneged, greatly simplifying 287 implementations on both sides and reducing memory pressure on the 288 sender. 290 3.1.5. More ACK Ranges 292 QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In 293 high loss environments, this speeds recovery, reduces spurious 294 retransmits, and ensures forward progress without relying on 295 timeouts. 297 3.1.6. Explicit Correction For Delayed Acknowledgements 299 QUIC endpoints measure the delay incurred between when a packet is 300 received and when the corresponding acknowledgment is sent, allowing 301 a peer to maintain a more accurate round-trip time estimate (see 302 Section 13.2 of [QUIC-TRANSPORT]). 304 4. Estimating the Round-Trip Time 306 At a high level, an endpoint measures the time from when a packet was 307 sent to when it is acknowledged as a round-trip time (RTT) sample. 308 The endpoint uses RTT samples and peer-reported host delays (see 309 Section 13.2 of [QUIC-TRANSPORT]) to generate a statistical 310 description of the network path's RTT. An endpoint computes the 311 following three values for each path: the minimum value observed over 312 the lifetime of the path (min_rtt), an exponentially-weighted moving 313 average (smoothed_rtt), and the mean deviation (referred to as 314 "variation" in the rest of this document) in the observed RTT samples 315 (rttvar). 317 4.1. Generating RTT samples 319 An endpoint generates an RTT sample on receiving an ACK frame that 320 meets the following two conditions: 322 * the largest acknowledged packet number is newly acknowledged, and 324 * at least one of the newly acknowledged packets was ack-eliciting. 326 The RTT sample, latest_rtt, is generated as the time elapsed since 327 the largest acknowledged packet was sent: 329 latest_rtt = ack_time - send_time_of_largest_acked 330 An RTT sample is generated using only the largest acknowledged packet 331 in the received ACK frame. This is because a peer reports ACK delays 332 for only the largest acknowledged packet in an ACK frame. While the 333 reported ACK delay is not used by the RTT sample measurement, it is 334 used to adjust the RTT sample in subsequent computations of 335 smoothed_rtt and rttvar Section 4.3. 337 To avoid generating multiple RTT samples for a single packet, an ACK 338 frame SHOULD NOT be used to update RTT estimates if it does not newly 339 acknowledge the largest acknowledged packet. 341 An RTT sample MUST NOT be generated on receiving an ACK frame that 342 does not newly acknowledge at least one ack-eliciting packet. A peer 343 usually does not send an ACK frame when only non-ack-eliciting 344 packets are received. Therefore an ACK frame that contains 345 acknowledgements for only non-ack-eliciting packets could include an 346 arbitrarily large Ack Delay value. Ignoring such ACK frames avoids 347 complications in subsequent smoothed_rtt and rttvar computations. 349 A sender might generate multiple RTT samples per RTT when multiple 350 ACK frames are received within an RTT. As suggested in [RFC6298], 351 doing so might result in inadequate history in smoothed_rtt and 352 rttvar. Ensuring that RTT estimates retain sufficient history is an 353 open research question. 355 4.2. Estimating min_rtt 357 min_rtt is the minimum RTT observed for a given network path. 358 min_rtt is set to the latest_rtt on the first RTT sample, and to the 359 lesser of min_rtt and latest_rtt on subsequent samples. In this 360 document, min_rtt is used by loss detection to reject implausibly 361 small rtt samples. 363 An endpoint uses only locally observed times in computing the min_rtt 364 and does not adjust for ACK delays reported by the peer. Doing so 365 allows the endpoint to set a lower bound for the smoothed_rtt based 366 entirely on what it observes (see Section 4.3), and limits potential 367 underestimation due to erroneously-reported delays by the peer. 369 The RTT for a network path may change over time. If a path's actual 370 RTT decreases, the min_rtt will adapt immediately on the first low 371 sample. If the path's actual RTT increases, the min_rtt will not 372 adapt to it, allowing future RTT samples that are smaller than the 373 new RTT be included in smoothed_rtt. 375 4.3. Estimating smoothed_rtt and rttvar 377 smoothed_rtt is an exponentially-weighted moving average of an 378 endpoint's RTT samples, and rttvar is the variation in the RTT 379 samples, estimated using a mean variation. 381 The calculation of smoothed_rtt uses path latency after adjusting RTT 382 samples for acknowledgement delays. These delays are computed using 383 the ACK Delay field of the ACK frame as described in Section 19.3 of 384 [QUIC-TRANSPORT]. For packets sent in the ApplicationData packet 385 number space, a peer limits any delay in sending an acknowledgement 386 for an ack-eliciting packet to no greater than the value it 387 advertised in the max_ack_delay transport parameter. Consequently, 388 when a peer reports an Ack Delay that is greater than its 389 max_ack_delay, the delay is attributed to reasons out of the peer's 390 control, such as scheduler latency at the peer or loss of previous 391 ACK frames. Any delays beyond the peer's max_ack_delay are therefore 392 considered effectively part of path delay and incorporated into the 393 smoothed_rtt estimate. 395 When adjusting an RTT sample using peer-reported acknowledgement 396 delays, an endpoint: 398 * MUST ignore the Ack Delay field of the ACK frame for packets sent 399 in the Initial and Handshake packet number space. 401 * MUST use the lesser of the value reported in Ack Delay field of 402 the ACK frame and the peer's max_ack_delay transport parameter. 404 * MUST NOT apply the adjustment if the resulting RTT sample is 405 smaller than the min_rtt. This limits the underestimation that a 406 misreporting peer can cause to the smoothed_rtt. 408 On the first RTT sample for a network path, the smoothed_rtt is set 409 to the latest_rtt. 411 smoothed_rtt and rttvar are computed as follows, similar to 412 [RFC6298]. On the first RTT sample for a network path: 414 smoothed_rtt = latest_rtt 415 rttvar = latest_rtt / 2 417 On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows: 419 ack_delay = min(Ack Delay in ACK Frame, max_ack_delay) 420 adjusted_rtt = latest_rtt 421 if (min_rtt + ack_delay < latest_rtt): 422 adjusted_rtt = latest_rtt - ack_delay 423 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt 424 rttvar_sample = abs(smoothed_rtt - adjusted_rtt) 425 rttvar = 3/4 * rttvar + 1/4 * rttvar_sample 427 5. Loss Detection 429 QUIC senders use acknowledgements to detect lost packets, and a probe 430 time out (see Section 5.2) to ensure acknowledgements are received. 431 This section provides a description of these algorithms. 433 If a packet is lost, the QUIC transport needs to recover from that 434 loss, such as by retransmitting the data, sending an updated frame, 435 or abandoning the frame. For more information, see Section 13.3 of 436 [QUIC-TRANSPORT]. 438 5.1. Acknowledgement-based Detection 440 Acknowledgement-based loss detection implements the spirit of TCP's 441 Fast Retransmit [RFC5681], Early Retransmit [RFC5827], FACK [FACK], 442 SACK loss recovery [RFC6675], and RACK [RACK]. This section provides 443 an overview of how these algorithms are implemented in QUIC. 445 A packet is declared lost if it meets all the following conditions: 447 * The packet is unacknowledged, in-flight, and was sent prior to an 448 acknowledged packet. 450 * Either its packet number is kPacketThreshold smaller than an 451 acknowledged packet (Section 5.1.1), or it was sent long enough in 452 the past (Section 5.1.2). 454 The acknowledgement indicates that a packet sent later was delivered, 455 and the packet and time thresholds provide some tolerance for packet 456 reordering. 458 Spuriously declaring packets as lost leads to unnecessary 459 retransmissions and may result in degraded performance due to the 460 actions of the congestion controller upon detecting loss. 461 Implementations that detect spurious retransmissions and increase the 462 reordering threshold in packets or time MAY choose to start with 463 smaller initial reordering thresholds to minimize recovery latency. 465 5.1.1. Packet Threshold 467 The RECOMMENDED initial value for the packet reordering threshold 468 (kPacketThreshold) is 3, based on best practices for TCP loss 469 detection [RFC5681] [RFC6675]. Implementations SHOULD NOT use a 470 packet threshold less than 3, to keep in line with TCP [RFC5681]. 472 Some networks may exhibit higher degrees of reordering, causing a 473 sender to detect spurious losses. Implementers MAY use algorithms 474 developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's 475 reordering resilience. 477 5.1.2. Time Threshold 479 Once a later packet within the same packet number space has been 480 acknowledged, an endpoint SHOULD declare an earlier packet lost if it 481 was sent a threshold amount of time in the past. To avoid declaring 482 packets as lost too early, this time threshold MUST be set to at 483 least kGranularity. The time threshold is: 485 max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity) 487 If packets sent prior to the largest acknowledged packet cannot yet 488 be declared lost, then a timer SHOULD be set for the remaining time. 490 Using max(smoothed_rtt, latest_rtt) protects from the two following 491 cases: 493 * the latest RTT sample is lower than the smoothed RTT, perhaps due 494 to reordering where the acknowledgement encountered a shorter 495 path; 497 * the latest RTT sample is higher than the smoothed RTT, perhaps due 498 to a sustained increase in the actual RTT, but the smoothed RTT 499 has not yet caught up. 501 The RECOMMENDED time threshold (kTimeThreshold), expressed as a 502 round-trip time multiplier, is 9/8. 504 Implementations MAY experiment with absolute thresholds, thresholds 505 from previous connections, adaptive thresholds, or including RTT 506 variation. Smaller thresholds reduce reordering resilience and 507 increase spurious retransmissions, and larger thresholds increase 508 loss detection delay. 510 5.2. Probe Timeout 512 A Probe Timeout (PTO) triggers sending one or two probe datagrams 513 when ack-eliciting packets are not acknowledged within the expected 514 period of time or the handshake has not been completed. A PTO 515 enables a connection to recover from loss of tail packets or 516 acknowledgements. 518 As with loss detection, the probe timeout is per packet number space. 519 The PTO algorithm used in QUIC implements the reliability functions 520 of Tail Loss Probe [RACK], RTO [RFC5681], and F-RTO algorithms for 521 TCP [RFC5682]. The timeout computation is based on TCP's 522 retransmission timeout period [RFC6298]. 524 5.2.1. Computing PTO 526 When an ack-eliciting packet is transmitted, the sender schedules a 527 timer for the PTO period as follows: 529 PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay 531 kGranularity, smoothed_rtt, rttvar, and max_ack_delay are defined in 532 Appendix A.2 and Appendix A.3. 534 The PTO period is the amount of time that a sender ought to wait for 535 an acknowledgement of a sent packet. This time period includes the 536 estimated network roundtrip-time (smoothed_rtt), the variation in the 537 estimate (4*rttvar), and max_ack_delay, to account for the maximum 538 time by which a receiver might delay sending an acknowledgement. 539 When the PTO is armed for Initial or Handshake packet number spaces, 540 the max_ack_delay is 0, as specified in 13.2.1 of [QUIC-TRANSPORT]. 542 The PTO value MUST be set to at least kGranularity, to avoid the 543 timer expiring immediately. 545 A sender computes its PTO timer every time an ack-eliciting packet is 546 sent. When ack-eliciting packets are in-flight in multiple packet 547 number spaces, the timer MUST be set for the packet number space with 548 the earliest timeout, except for ApplicationData, which MUST be 549 ignored until the handshake completes; see Section 4.1.1 of 550 [QUIC-TLS]. Not arming the PTO for ApplicationData prioritizes 551 completing the handshake and prevents the server from sending a 1-RTT 552 packet on a PTO before before it has the keys to process a 1-RTT 553 packet. 555 When a PTO timer expires, the PTO period MUST be set to twice its 556 current value. This exponential reduction in the sender's rate is 557 important because consecutive PTOs might be caused by loss of packets 558 or acknowledgements due to severe congestion. Even when there are 559 ack-eliciting packets in-flight in multiple packet number spaces, the 560 exponential increase in probe timeout occurs across all spaces to 561 prevent excess load on the network. For example, a timeout in the 562 Initial packet number space doubles the length of the timeout in the 563 Handshake packet number space. 565 The life of a connection that is experiencing consecutive PTOs is 566 limited by the endpoint's idle timeout. 568 The probe timer MUST NOT be set if the time threshold Section 5.1.2 569 loss detection timer is set. The time threshold loss detection timer 570 is expected to both expire earlier than the PTO and be less likely to 571 spuriously retransmit data. 573 5.3. Handshakes and New Paths 575 The initial probe timeout for a new connection or new path SHOULD be 576 set to twice the initial RTT. Resumed connections over the same 577 network SHOULD use the previous connection's final smoothed RTT value 578 as the resumed connection's initial RTT. If no previous RTT is 579 available, the initial RTT SHOULD be set to 500ms, resulting in a 1 580 second initial timeout as recommended in [RFC6298]. 582 A connection MAY use the delay between sending a PATH_CHALLENGE and 583 receiving a PATH_RESPONSE to set the initial RTT (see kInitialRtt in 584 Appendix A.2) for a new path, but the delay SHOULD NOT be considered 585 an RTT sample. 587 Until the server has validated the client's address on the path, the 588 amount of data it can send is limited to three times the amount of 589 data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If 590 no data can be sent, then the PTO alarm MUST NOT be armed until 591 datagrams have been received from the client. 593 Since the server could be blocked until more packets are received 594 from the client, it is the client's responsibility to send packets to 595 unblock the server until it is certain that the server has finished 596 its address validation (see Section 8 of [QUIC-TRANSPORT]). That is, 597 the client MUST set the probe timer if the client has not received an 598 acknowledgement for one of its Handshake or 1-RTT packets. 600 Prior to handshake completion, when few to none RTT samples have been 601 generated, it is possible that the probe timer expiration is due to 602 an incorrect RTT estimate at the client. To allow the client to 603 improve its RTT estimate, the new packet that it sends MUST be ack- 604 eliciting. If Handshake keys are available to the client, it MUST 605 send a Handshake packet, and otherwise it MUST send an Initial packet 606 in a UDP datagram of at least 1200 bytes. 608 Initial packets and Handshake packets could be never acknowledged, 609 but they are removed from bytes in flight when the Initial and 610 Handshake keys are discarded. 612 5.3.1. Sending Probe Packets 614 When a PTO timer expires, a sender MUST send at least one ack- 615 eliciting packet in the packet number space as a probe, unless there 616 is no data available to send. An endpoint MAY send up to two full- 617 sized datagrams containing ack-eliciting packets, to avoid an 618 expensive consecutive PTO expiration due to a single lost datagram or 619 transmit data from multiple packet number spaces. 621 In addition to sending data in the packet number space for which the 622 timer expired, the sender SHOULD send ack-eliciting packets from 623 other packet number spaces with in-flight data, coalescing packets if 624 possible. 626 When the PTO timer expires, and there is new or previously sent 627 unacknowledged data, it MUST be sent. 629 It is possible the sender has no new or previously-sent data to send. 630 As an example, consider the following sequence of events: new 631 application data is sent in a STREAM frame, deemed lost, then 632 retransmitted in a new packet, and then the original transmission is 633 acknowledged. When there is no data to send, the sender SHOULD send 634 a PING or other ack-eliciting frame in a single packet, re-arming the 635 PTO timer. 637 Alternatively, instead of sending an ack-eliciting packet, the sender 638 MAY mark any packets still in flight as lost. Doing so avoids 639 sending an additional packet, but increases the risk that loss is 640 declared too aggressively, resulting in an unnecessary rate reduction 641 by the congestion controller. 643 Consecutive PTO periods increase exponentially, and as a result, 644 connection recovery latency increases exponentially as packets 645 continue to be dropped in the network. Sending two packets on PTO 646 expiration increases resilience to packet drops, thus reducing the 647 probability of consecutive PTO events. 649 Probe packets sent on a PTO MUST be ack-eliciting. A probe packet 650 SHOULD carry new data when possible. A probe packet MAY carry 651 retransmitted unacknowledged data when new data is unavailable, when 652 flow control does not permit new data to be sent, or to 653 opportunistically reduce loss recovery delay. Implementations MAY 654 use alternative strategies for determining the content of probe 655 packets, including sending new or retransmitted data based on the 656 application's priorities. 658 When the PTO timer expires multiple times and new data cannot be 659 sent, implementations must choose between sending the same payload 660 every time or sending different payloads. Sending the same payload 661 may be simpler and ensures the highest priority frames arrive first. 662 Sending different payloads each time reduces the chances of spurious 663 retransmission. 665 5.3.2. Loss Detection 667 Delivery or loss of packets in flight is established when an ACK 668 frame is received that newly acknowledges one or more packets. 670 A PTO timer expiration event does not indicate packet loss and MUST 671 NOT cause prior unacknowledged packets to be marked as lost. When an 672 acknowledgement is received that newly acknowledges packets, loss 673 detection proceeds as dictated by packet and time threshold 674 mechanisms; see Section 5.1. 676 5.4. Handling Retry Packets 678 A Retry packet causes a client to send another Initial packet, 679 effectively restarting the connection process. A Retry packet 680 indicates that the Initial was received, but not processed. A Retry 681 packet cannot be treated as an acknowledgment, because it does not 682 indicate that a packet was processed or specify the packet number. 684 Clients that receive a Retry packet reset congestion control and loss 685 recovery state, including resetting any pending timers. Other 686 connection state, in particular cryptographic handshake messages, is 687 retained; see Section 17.2.5 of [QUIC-TRANSPORT]. 689 The client MAY compute an RTT estimate to the server as the time 690 period from when the first Initial was sent to when a Retry or a 691 Version Negotiation packet is received. The client MAY use this 692 value in place of its default for the initial RTT estimate. 694 5.5. Discarding Keys and Packet State 696 When packet protection keys are discarded (see Section 4.10 of 697 [QUIC-TLS]), all packets that were sent with those keys can no longer 698 be acknowledged because their acknowledgements cannot be processed 699 anymore. The sender MUST discard all recovery state associated with 700 those packets and MUST remove them from the count of bytes in flight. 702 Endpoints stop sending and receiving Initial packets once they start 703 exchanging Handshake packets (see Section 17.2.2.1 of 704 [QUIC-TRANSPORT]). At this point, recovery state for all in-flight 705 Initial packets is discarded. 707 When 0-RTT is rejected, recovery state for all in-flight 0-RTT 708 packets is discarded. 710 If a server accepts 0-RTT, but does not buffer 0-RTT packets that 711 arrive before Initial packets, early 0-RTT packets will be declared 712 lost, but that is expected to be infrequent. 714 It is expected that keys are discarded after packets encrypted with 715 them would be acknowledged or declared lost. Initial secrets however 716 might be destroyed sooner, as soon as handshake keys are available 717 (see Section 4.10.1 of [QUIC-TLS]). 719 6. Congestion Control 721 This document specifies a Reno congestion controller for QUIC 722 [RFC6582]. 724 The signals QUIC provides for congestion control are generic and are 725 designed to support different algorithms. Endpoints can unilaterally 726 choose a different algorithm to use, such as Cubic [RFC8312]. 728 If an endpoint uses a different controller than that specified in 729 this document, the chosen controller MUST conform to the congestion 730 control guidelines specified in Section 3.1 of [RFC8085]. 732 The algorithm in this document specifies and uses the controller's 733 congestion window in bytes. 735 An endpoint MUST NOT send a packet if it would cause bytes_in_flight 736 (see Appendix B.2) to be larger than the congestion window, unless 737 the packet is sent on a PTO timer expiration (see Section 5.2). 739 6.1. Explicit Congestion Notification 741 If a path has been verified to support ECN [RFC3168] [RFC8311], QUIC 742 treats a Congestion Experienced(CE) codepoint in the IP header as a 743 signal of congestion. This document specifies an endpoint's response 744 when its peer receives packets with the Congestion Experienced 745 codepoint. 747 6.2. Slow Start 749 QUIC begins every connection in slow start and exits slow start upon 750 loss or upon increase in the ECN-CE counter. QUIC re-enters slow 751 start any time the congestion window is less than ssthresh, which 752 only occurs after persistent congestion is declared. While in slow 753 start, QUIC increases the congestion window by the number of bytes 754 acknowledged when each acknowledgment is processed. 756 6.3. Congestion Avoidance 758 Slow start exits to congestion avoidance. Congestion avoidance in 759 NewReno uses an additive increase multiplicative decrease (AIMD) 760 approach that increases the congestion window by one maximum packet 761 size per congestion window acknowledged. When a loss is detected, 762 NewReno halves the congestion window and sets the slow start 763 threshold to the new congestion window. 765 6.4. Recovery Period 767 A recovery period is entered when loss or ECN-CE marking of a packet 768 is detected. A recovery period ends when a packet sent during the 769 recovery period is acknowledged. This is slightly different from 770 TCP's definition of recovery, which ends when the lost packet that 771 started recovery is acknowledged. 773 The recovery period limits congestion window reduction to once per 774 round trip. During recovery, the congestion window remains unchanged 775 irrespective of new losses or increases in the ECN-CE counter. 777 6.5. Ignoring Loss of Undecryptable Packets 779 During the handshake, some packet protection keys might not be 780 available when a packet arrives. In particular, Handshake and 0-RTT 781 packets cannot be processed until the Initial packets arrive, and 782 1-RTT packets cannot be processed until the handshake completes. 783 Endpoints MAY ignore the loss of Handshake, 0-RTT, and 1-RTT packets 784 that might arrive before the peer has packet protection keys to 785 process those packets. 787 6.6. Probe Timeout 789 Probe packets MUST NOT be blocked by the congestion controller. A 790 sender MUST however count these packets as being additionally in 791 flight, since these packets add network load without establishing 792 packet loss. Note that sending probe packets might cause the 793 sender's bytes in flight to exceed the congestion window until an 794 acknowledgement is received that establishes loss or delivery of 795 packets. 797 6.7. Persistent Congestion 799 When an ACK frame is received that establishes loss of all in-flight 800 packets sent over a long enough period of time, the network is 801 considered to be experiencing persistent congestion. Commonly, this 802 can be established by consecutive PTOs, but since the PTO timer is 803 reset when a new ack-eliciting packet is sent, an explicit duration 804 must be used to account for those cases where PTOs do not occur or 805 are substantially delayed. This duration is computed as follows: 807 (smoothed_rtt + 4 * rttvar + max_ack_delay) * 808 kPersistentCongestionThreshold 810 For example, assume: 812 smoothed_rtt = 1 rttvar = 0 max_ack_delay = 0 813 kPersistentCongestionThreshold = 3 815 If an ack-eliciting packet is sent at time = 0, the following 816 scenario would illustrate persistent congestion: 818 +-----+------------------------+ 819 | t=0 | Send Pkt #1 (App Data) | 820 +=====+========================+ 821 | t=1 | Send Pkt #2 (PTO 1) | 822 +-----+------------------------+ 823 | t=3 | Send Pkt #3 (PTO 2) | 824 +-----+------------------------+ 825 | t=7 | Send Pkt #4 (PTO 3) | 826 +-----+------------------------+ 827 | t=8 | Recv ACK of Pkt #4 | 828 +-----+------------------------+ 830 Table 1 832 The first three packets are determined to be lost when the 833 acknowlegement of packet 4 is received at t=8. The congestion period 834 is calculated as the time between the oldest and newest lost packets: 836 (3 - 0) = 3. The duration for persistent congestion is equal to: (1 837 * kPersistentCongestionThreshold) = 3. Because the threshold was 838 reached and because none of the packets between the oldest and the 839 newest packets are acknowledged, the network is considered to have 840 experienced persistent congestion. 842 When persistent congestion is established, the sender's congestion 843 window MUST be reduced to the minimum congestion window 844 (kMinimumWindow). This response of collapsing the congestion window 845 on persistent congestion is functionally similar to a sender's 846 response on a Retransmission Timeout (RTO) in TCP [RFC5681] after 847 Tail Loss Probes (TLP) [RACK]. 849 6.8. Pacing 851 This document does not specify a pacer, but it is RECOMMENDED that a 852 sender pace sending of all in-flight packets based on input from the 853 congestion controller. For example, a pacer might distribute the 854 congestion window over the smoothed RTT when used with a window-based 855 controller, and a pacer might use the rate estimate of a rate-based 856 controller. 858 An implementation should take care to architect its congestion 859 controller to work well with a pacer. For instance, a pacer might 860 wrap the congestion controller and control the availability of the 861 congestion window, or a pacer might pace out packets handed to it by 862 the congestion controller. Timely delivery of ACK frames is 863 important for efficient loss recovery. Packets containing only ACK 864 frames should therefore not be paced, to avoid delaying their 865 delivery to the peer. 867 Sending multiple packets into the network without any delay between 868 them creates a packet burst that might cause short-term congestion 869 and losses. Implementations MUST either use pacing or limit such 870 bursts to the initial congestion window, which is recommended to be 871 the minimum of 10 * max_datagram_size and max(2* max_datagram_size, 872 14720)), where max_datagram_size is the current maximum size of a 873 datagram for the connection, not including UDP or IP overhead. 875 As an example of a well-known and publicly available implementation 876 of a flow pacer, implementers are referred to the Fair Queue packet 877 scheduler (fq qdisc) in Linux (3.11 onwards). 879 6.9. Under-utilizing the Congestion Window 881 When bytes in flight is smaller than the congestion window and 882 sending is not pacing limited, the congestion window is under- 883 utilized. When this occurs, the congestion window SHOULD NOT be 884 increased in either slow start or congestion avoidance. This can 885 happen due to insufficient application data or flow control credit. 887 A sender MAY use the pipeACK method described in section 4.3 of 888 [RFC7661] to determine if the congestion window is sufficiently 889 utilized. 891 A sender that paces packets (see Section 6.8) might delay sending 892 packets and not fully utilize the congestion window due to this 893 delay. A sender should not consider itself application limited if it 894 would have fully utilized the congestion window without pacing delay. 896 A sender MAY implement alternative mechanisms to update its 897 congestion window after periods of under-utilization, such as those 898 proposed for TCP in [RFC7661]. 900 7. Security Considerations 902 7.1. Congestion Signals 904 Congestion control fundamentally involves the consumption of signals 905 - both loss and ECN codepoints - from unauthenticated entities. On- 906 path attackers can spoof or alter these signals. An attacker can 907 cause endpoints to reduce their sending rate by dropping packets, or 908 alter send rate by changing ECN codepoints. 910 7.2. Traffic Analysis 912 Packets that carry only ACK frames can be heuristically identified by 913 observing packet size. Acknowledgement patterns may expose 914 information about link characteristics or application behavior. 915 Endpoints can use PADDING frames or bundle acknowledgments with other 916 frames to reduce leaked information. 918 7.3. Misreporting ECN Markings 920 A receiver can misreport ECN markings to alter the congestion 921 response of a sender. Suppressing reports of ECN-CE markings could 922 cause a sender to increase their send rate. This increase could 923 result in congestion and loss. 925 A sender MAY attempt to detect suppression of reports by marking 926 occasional packets that they send with ECN-CE. If a packet sent with 927 ECN-CE is not reported as having been CE marked when the packet is 928 acknowledged, then the sender SHOULD disable ECN for that path. 930 Reporting additional ECN-CE markings will cause a sender to reduce 931 their sending rate, which is similar in effect to advertising reduced 932 connection flow control limits and so no advantage is gained by doing 933 so. 935 Endpoints choose the congestion controller that they use. Though 936 congestion controllers generally treat reports of ECN-CE markings as 937 equivalent to loss [RFC8311], the exact response for each controller 938 could be different. Failure to correctly respond to information 939 about ECN markings is therefore difficult to detect. 941 8. IANA Considerations 943 This document has no IANA actions. Yet. 945 9. References 947 9.1. Normative References 949 [QUIC-TLS] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure 950 QUIC", Work in Progress, Internet-Draft, draft-ietf-quic- 951 tls-26, 21 February 2020, 952 . 954 [QUIC-TRANSPORT] 955 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 956 Multiplexed and Secure Transport", Work in Progress, 957 Internet-Draft, draft-ietf-quic-transport-26, 21 February 958 2020, . 961 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 962 Requirement Levels", BCP 14, RFC 2119, 963 DOI 10.17487/RFC2119, March 1997, 964 . 966 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 967 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 968 March 2017, . 970 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 971 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 972 May 2017, . 974 9.2. Informative References 976 [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement: 977 Refining TCP Congestion Control", ACM SIGCOMM , August 978 1996. 980 [RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: 981 a time-based fast loss detection algorithm for TCP", Work 982 in Progress, Internet-Draft, draft-ietf-tcpm-rack-05, 26 983 April 2019, . 986 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 987 of Explicit Congestion Notification (ECN) to IP", 988 RFC 3168, DOI 10.17487/RFC3168, September 2001, 989 . 991 [RFC4653] Bhandarkar, S., Reddy, A. L. N., Allman, M., and E. 992 Blanton, "Improving the Robustness of TCP to Non- 993 Congestion Events", RFC 4653, DOI 10.17487/RFC4653, August 994 2006, . 996 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 997 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 998 . 1000 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1001 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1002 Spurious Retransmission Timeouts with TCP", RFC 5682, 1003 DOI 10.17487/RFC5682, September 2009, 1004 . 1006 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 1007 P. Hurtig, "Early Retransmit for TCP and Stream Control 1008 Transmission Protocol (SCTP)", RFC 5827, 1009 DOI 10.17487/RFC5827, May 2010, 1010 . 1012 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1013 "Computing TCP's Retransmission Timer", RFC 6298, 1014 DOI 10.17487/RFC6298, June 2011, 1015 . 1017 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 1018 NewReno Modification to TCP's Fast Recovery Algorithm", 1019 RFC 6582, DOI 10.17487/RFC6582, April 2012, 1020 . 1022 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1023 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1024 Based on Selective Acknowledgment (SACK) for TCP", 1025 RFC 6675, DOI 10.17487/RFC6675, August 2012, 1026 . 1028 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 1029 "Increasing TCP's Initial Window", RFC 6928, 1030 DOI 10.17487/RFC6928, April 2013, 1031 . 1033 [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating 1034 TCP to Support Rate-Limited Traffic", RFC 7661, 1035 DOI 10.17487/RFC7661, October 2015, 1036 . 1038 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1039 Notification (ECN) Experimentation", RFC 8311, 1040 DOI 10.17487/RFC8311, January 2018, 1041 . 1043 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1044 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1045 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1046 . 1048 Appendix A. Loss Recovery Pseudocode 1050 We now describe an example implementation of the loss detection 1051 mechanisms described in Section 5. 1053 A.1. Tracking Sent Packets 1055 To correctly implement congestion control, a QUIC sender tracks every 1056 ack-eliciting packet until the packet is acknowledged or lost. It is 1057 expected that implementations will be able to access this information 1058 by packet number and crypto context and store the per-packet fields 1059 (Appendix A.1.1) for loss recovery and congestion control. 1061 After a packet is declared lost, the endpoint can track it for an 1062 amount of time comparable to the maximum expected packet reordering, 1063 such as 1 RTT. This allows for detection of spurious 1064 retransmissions. 1066 Sent packets are tracked for each packet number space, and ACK 1067 processing only applies to a single space. 1069 A.1.1. Sent Packet Fields 1071 packet_number: The packet number of the sent packet. 1073 ack_eliciting: A boolean that indicates whether a packet is ack- 1074 eliciting. If true, it is expected that an acknowledgement will 1075 be received, though the peer could delay sending the ACK frame 1076 containing it by up to the MaxAckDelay. 1078 in_flight: A boolean that indicates whether the packet counts 1079 towards bytes in flight. 1081 sent_bytes: The number of bytes sent in the packet, not including 1082 UDP or IP overhead, but including QUIC framing overhead. 1084 time_sent: The time the packet was sent. 1086 A.2. Constants of interest 1088 Constants used in loss recovery are based on a combination of RFCs, 1089 papers, and common practice. 1091 kPacketThreshold: Maximum reordering in packets before packet 1092 threshold loss detection considers a packet lost. The RECOMMENDED 1093 value is 3. 1095 kTimeThreshold: Maximum reordering in time before time threshold 1096 loss detection considers a packet lost. Specified as an RTT 1097 multiplier. The RECOMMENDED value is 9/8. 1099 kGranularity: Timer granularity. This is a system-dependent value. 1100 However, implementations SHOULD use a value no smaller than 1ms. 1102 kInitialRtt: The RTT used before an RTT sample is taken. The 1103 RECOMMENDED value is 500ms. 1105 kPacketNumberSpace: An enum to enumerate the three packet number 1106 spaces. 1108 enum kPacketNumberSpace { 1109 Initial, 1110 Handshake, 1111 ApplicationData, 1112 } 1114 A.3. Variables of interest 1116 Variables required to implement the congestion control mechanisms are 1117 described in this section. 1119 latest_rtt: The most recent RTT measurement made when receiving an 1120 ack for a previously unacked packet. 1122 smoothed_rtt: The smoothed RTT of the connection, computed as 1123 described in [RFC6298] 1125 rttvar: The RTT variation, computed as described in [RFC6298] 1127 min_rtt: The minimum RTT seen in the connection, ignoring ack delay. 1129 max_ack_delay: The maximum amount of time by which the receiver 1130 intends to delay acknowledgments for packets in the 1131 ApplicationData packet number space. The actual ack_delay in a 1132 received ACK frame may be larger due to late timers, reordering, 1133 or lost ACK frames. 1135 loss_detection_timer: Multi-modal timer used for loss detection. 1137 pto_count: The number of times a PTO has been sent without receiving 1138 an ack. 1140 time_of_last_sent_ack_eliciting_packet[kPacketNumberSpace]: The time 1141 the most recent ack-eliciting packet was sent. 1143 largest_acked_packet[kPacketNumberSpace]: The largest packet number 1144 acknowledged in the packet number space so far. 1146 loss_time[kPacketNumberSpace]: The time at which the next packet in 1147 that packet number space will be considered lost based on 1148 exceeding the reordering window in time. 1150 sent_packets[kPacketNumberSpace]: An association of packet numbers 1151 in a packet number space to information about them. Described in 1152 detail above in Appendix A.1. 1154 A.4. Initialization 1156 At the beginning of the connection, initialize the loss detection 1157 variables as follows: 1159 loss_detection_timer.reset() 1160 pto_count = 0 1161 latest_rtt = 0 1162 smoothed_rtt = 0 1163 rttvar = 0 1164 min_rtt = 0 1165 max_ack_delay = 0 1166 for pn_space in [ Initial, Handshake, ApplicationData ]: 1167 largest_acked_packet[pn_space] = infinite 1168 time_of_last_sent_ack_eliciting_packet[pn_space] = 0 1169 loss_time[pn_space] = 0 1171 A.5. On Sending a Packet 1173 After a packet is sent, information about the packet is stored. The 1174 parameters to OnPacketSent are described in detail above in 1175 Appendix A.1.1. 1177 Pseudocode for OnPacketSent follows: 1179 OnPacketSent(packet_number, pn_space, ack_eliciting, 1180 in_flight, sent_bytes): 1181 sent_packets[pn_space][packet_number].packet_number = 1182 packet_number 1183 sent_packets[pn_space][packet_number].time_sent = now 1184 sent_packets[pn_space][packet_number].ack_eliciting = 1185 ack_eliciting 1186 sent_packets[pn_space][packet_number].in_flight = in_flight 1187 if (in_flight): 1188 if (ack_eliciting): 1189 time_of_last_sent_ack_eliciting_packet[pn_space] = now 1190 OnPacketSentCC(sent_bytes) 1191 sent_packets[pn_space][packet_number].size = sent_bytes 1192 SetLossDetectionTimer() 1194 A.6. On Receiving an Acknowledgment 1196 When an ACK frame is received, it may newly acknowledge any number of 1197 packets. 1199 Pseudocode for OnAckReceived and UpdateRtt follow: 1201 OnAckReceived(ack, pn_space): 1202 if (largest_acked_packet[pn_space] == infinite): 1203 largest_acked_packet[pn_space] = ack.largest_acked 1204 else: 1205 largest_acked_packet[pn_space] = 1206 max(largest_acked_packet[pn_space], ack.largest_acked) 1208 // Nothing to do if there are no newly acked packets. 1209 newly_acked_packets = DetermineNewlyAckedPackets(ack, pn_space) 1210 if (newly_acked_packets.empty()): 1211 return 1213 // If the largest acknowledged is newly acked and 1214 // at least one ack-eliciting was newly acked, update the RTT. 1215 if (sent_packets[pn_space].contains(ack.largest_acked) && 1216 IncludesAckEliciting(newly_acked_packets)): 1217 latest_rtt = 1218 now - sent_packets[pn_space][ack.largest_acked].time_sent 1219 ack_delay = 0 1220 if (pn_space == ApplicationData): 1221 ack_delay = ack.ack_delay 1222 UpdateRtt(ack_delay) 1224 // Process ECN information if present. 1225 if (ACK frame contains ECN information): 1226 ProcessECN(ack, pn_space) 1228 for acked_packet in newly_acked_packets: 1229 OnPacketAcked(acked_packet.packet_number, pn_space) 1231 DetectLostPackets(pn_space) 1233 pto_count = 0 1235 SetLossDetectionTimer() 1237 UpdateRtt(ack_delay): 1238 // First RTT sample. 1239 if (smoothed_rtt == 0): 1240 min_rtt = latest_rtt 1241 smoothed_rtt = latest_rtt 1242 rttvar = latest_rtt / 2 1243 return 1245 // min_rtt ignores ack delay. 1246 min_rtt = min(min_rtt, latest_rtt) 1247 // Limit ack_delay by max_ack_delay 1248 ack_delay = min(ack_delay, max_ack_delay) 1249 // Adjust for ack delay if plausible. 1250 adjusted_rtt = latest_rtt 1251 if (latest_rtt > min_rtt + ack_delay): 1252 adjusted_rtt = latest_rtt - ack_delay 1254 rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt) 1255 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt 1257 A.7. On Packet Acknowledgment 1259 When a packet is acknowledged for the first time, the following 1260 OnPacketAcked function is called. Note that a single ACK frame may 1261 newly acknowledge several packets. OnPacketAcked must be called once 1262 for each of these newly acknowledged packets. 1264 OnPacketAcked takes two parameters: acked_packet, which is the struct 1265 detailed in Appendix A.1.1, and the packet number space that this ACK 1266 frame was sent for. 1268 Pseudocode for OnPacketAcked follows: 1270 OnPacketAcked(acked_packet, pn_space): 1271 if (acked_packet.in_flight): 1272 OnPacketAckedCC(acked_packet) 1273 sent_packets[pn_space].remove(acked_packet.packet_number) 1275 A.8. Setting the Loss Detection Timer 1277 QUIC loss detection uses a single timer for all timeout loss 1278 detection. The duration of the timer is based on the timer's mode, 1279 which is set in the packet and timer events further below. The 1280 function SetLossDetectionTimer defined below shows how the single 1281 timer is set. 1283 This algorithm may result in the timer being set in the past, 1284 particularly if timers wake up late. Timers set in the past SHOULD 1285 fire immediately. 1287 Pseudocode for SetLossDetectionTimer follows: 1289 GetEarliestTimeAndSpace(times): 1290 time = times[Initial] 1291 space = Initial 1292 for pn_space in [ Handshake, ApplicationData ]: 1293 if (times[pn_space] != 0 && 1294 (time == 0 || times[pn_space] < time) && 1295 # Skip ApplicationData until handshake completion. 1296 (pn_space != ApplicationData || 1297 IsHandshakeComplete()): 1298 time = times[pn_space]; 1299 space = pn_space 1300 return time, space 1302 PeerNotAwaitingAddressValidation(): 1303 # Assume clients validate the server's address implicitly. 1304 if (endpoint is server): 1305 return true 1306 # Servers complete address validation when a 1307 # protected packet is received. 1308 return has received Handshake ACK || 1309 has received 1-RTT ACK 1311 SetLossDetectionTimer(): 1312 earliest_loss_time, _ = GetEarliestTimeAndSpace(loss_time) 1313 if (earliest_loss_time != 0): 1314 // Time threshold loss detection. 1315 loss_detection_timer.update(earliest_loss_time) 1316 return 1318 if (no ack-eliciting packets in flight && 1319 PeerNotAwaitingAddressValidation()): 1320 loss_detection_timer.cancel() 1321 return 1323 // Use a default timeout if there are no RTT measurements 1324 if (smoothed_rtt == 0): 1325 timeout = 2 * kInitialRtt 1326 else: 1327 // Calculate PTO duration 1328 timeout = smoothed_rtt + max(4 * rttvar, kGranularity) + 1329 max_ack_delay 1330 timeout = timeout * (2 ^ pto_count) 1332 sent_time, _ = GetEarliestTimeAndSpace( 1333 time_of_last_sent_ack_eliciting_packet) 1334 loss_detection_timer.update(sent_time + timeout) 1336 A.9. On Timeout 1338 When the loss detection timer expires, the timer's mode determines 1339 the action to be performed. 1341 Pseudocode for OnLossDetectionTimeout follows: 1343 OnLossDetectionTimeout(): 1344 earliest_loss_time, pn_space = 1345 GetEarliestTimeAndSpace(loss_time) 1346 if (earliest_loss_time != 0): 1347 // Time threshold loss Detection 1348 DetectLostPackets(pn_space) 1349 SetLossDetectionTimer() 1350 return 1352 if (endpoint is client without 1-RTT keys): 1353 // Client sends an anti-deadlock packet: Initial is padded 1354 // to earn more anti-amplification credit, 1355 // a Handshake packet proves address ownership. 1356 if (has Handshake keys): 1357 SendOneAckElicitingHandshakePacket() 1358 else: 1359 SendOneAckElicitingPaddedInitialPacket() 1360 else: 1361 // PTO. Send new data if available, else retransmit old data. 1362 // If neither is available, send a single PING frame. 1363 _, pn_space = GetEarliestTimeAndSpace( 1364 time_of_last_sent_ack_eliciting_packet) 1365 SendOneOrTwoAckElicitingPackets(pn_space) 1367 pto_count++ 1368 SetLossDetectionTimer() 1370 A.10. Detecting Lost Packets 1372 DetectLostPackets is called every time an ACK is received and 1373 operates on the sent_packets for that packet number space. 1375 Pseudocode for DetectLostPackets follows: 1377 DetectLostPackets(pn_space): 1378 assert(largest_acked_packet[pn_space] != infinite) 1379 loss_time[pn_space] = 0 1380 lost_packets = {} 1381 loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt) 1383 // Minimum time of kGranularity before packets are deemed lost. 1384 loss_delay = max(loss_delay, kGranularity) 1386 // Packets sent before this time are deemed lost. 1387 lost_send_time = now() - loss_delay 1389 foreach unacked in sent_packets[pn_space]: 1390 if (unacked.packet_number > largest_acked_packet[pn_space]): 1391 continue 1393 // Mark packet as lost, or set time when it should be marked. 1394 if (unacked.time_sent <= lost_send_time || 1395 largest_acked_packet[pn_space] >= 1396 unacked.packet_number + kPacketThreshold): 1397 sent_packets[pn_space].remove(unacked.packet_number) 1398 if (unacked.in_flight): 1399 lost_packets.insert(unacked) 1400 else: 1401 if (loss_time[pn_space] == 0): 1402 loss_time[pn_space] = unacked.time_sent + loss_delay 1403 else: 1404 loss_time[pn_space] = min(loss_time[pn_space], 1405 unacked.time_sent + loss_delay) 1407 // Inform the congestion controller of lost packets and 1408 // let it decide whether to retransmit immediately. 1409 if (!lost_packets.empty()): 1410 OnPacketsLost(lost_packets) 1412 Appendix B. Congestion Control Pseudocode 1414 We now describe an example implementation of the congestion 1415 controller described in Section 6. 1417 B.1. Constants of interest 1419 Constants used in congestion control are based on a combination of 1420 RFCs, papers, and common practice. 1422 kInitialWindow: Default limit on the initial amount of data in 1423 flight, in bytes. The RECOMMENDED value is the minimum of 10 * 1424 max_datagram_size and max(2 * max_datagram_size, 14720)). This 1425 follows the analysis and recommendations in [RFC6928], increasing 1426 the byte limit to account for the smaller 8 byte overhead of UDP 1427 compared to the 20 byte overhead for TCP. 1429 kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED 1430 value is 2 * max_datagram_size. 1432 kLossReductionFactor: Reduction in congestion window when a new loss 1433 event is detected. The RECOMMENDED value is 0.5. 1435 kPersistentCongestionThreshold: Period of time for persistent 1436 congestion to be established, specified as a PTO multiplier. The 1437 rationale for this threshold is to enable a sender to use initial 1438 PTOs for aggressive probing, as TCP does with Tail Loss Probe 1439 (TLP) [RACK], before establishing persistent congestion, as TCP 1440 does with a Retransmission Timeout (RTO) [RFC5681]. The 1441 RECOMMENDED value for kPersistentCongestionThreshold is 3, which 1442 is approximately equivalent to having two TLPs before an RTO in 1443 TCP. 1445 B.2. Variables of interest 1447 Variables required to implement the congestion control mechanisms are 1448 described in this section. 1450 max_datagram_size: The sender's current maximum payload size. Does 1451 not include UDP or IP overhead. The max datagram size is used for 1452 congestion window computations. An endpoint sets the value of 1453 this variable based on its PMTU (see Section 14.1 of 1454 [QUIC-TRANSPORT]), with a minimum value of 1200 bytes. 1456 ecn_ce_counters[kPacketNumberSpace]: The highest value reported for 1457 the ECN-CE counter in the packet number space by the peer in an 1458 ACK frame. This value is used to detect increases in the reported 1459 ECN-CE counter. 1461 bytes_in_flight: The sum of the size in bytes of all sent packets 1462 that contain at least one ack-eliciting or PADDING frame, and have 1463 not been acked or declared lost. The size does not include IP or 1464 UDP overhead, but does include the QUIC header and AEAD overhead. 1465 Packets only containing ACK frames do not count towards 1466 bytes_in_flight to ensure congestion control does not impede 1467 congestion feedback. 1469 congestion_window: Maximum number of bytes-in-flight that may be 1470 sent. 1472 congestion_recovery_start_time: The time when QUIC first detects 1473 congestion due to loss or ECN, causing it to enter congestion 1474 recovery. When a packet sent after this time is acknowledged, 1475 QUIC exits congestion recovery. 1477 ssthresh: Slow start threshold in bytes. When the congestion window 1478 is below ssthresh, the mode is slow start and the window grows by 1479 the number of bytes acknowledged. 1481 B.3. Initialization 1483 At the beginning of the connection, initialize the congestion control 1484 variables as follows: 1486 congestion_window = kInitialWindow 1487 bytes_in_flight = 0 1488 congestion_recovery_start_time = 0 1489 ssthresh = infinite 1490 for pn_space in [ Initial, Handshake, ApplicationData ]: 1491 ecn_ce_counters[pn_space] = 0 1493 B.4. On Packet Sent 1495 Whenever a packet is sent, and it contains non-ACK frames, the packet 1496 increases bytes_in_flight. 1498 OnPacketSentCC(bytes_sent): 1499 bytes_in_flight += bytes_sent 1501 B.5. On Packet Acknowledgement 1503 Invoked from loss detection's OnPacketAcked and is supplied with the 1504 acked_packet from sent_packets. 1506 InCongestionRecovery(sent_time): 1507 return sent_time <= congestion_recovery_start_time 1509 OnPacketAckedCC(acked_packet): 1510 // Remove from bytes_in_flight. 1511 bytes_in_flight -= acked_packet.size 1512 if (InCongestionRecovery(acked_packet.time_sent)): 1513 // Do not increase congestion window in recovery period. 1514 return 1515 if (IsAppOrFlowControlLimited()): 1516 // Do not increase congestion_window if application 1517 // limited or flow control limited. 1518 return 1519 if (congestion_window < ssthresh): 1520 // Slow start. 1521 congestion_window += acked_packet.size 1522 else: 1523 // Congestion avoidance. 1524 congestion_window += max_datagram_size * acked_packet.size 1525 / congestion_window 1527 B.6. On New Congestion Event 1529 Invoked from ProcessECN and OnPacketsLost when a new congestion event 1530 is detected. May start a new recovery period and reduces the 1531 congestion window. 1533 CongestionEvent(sent_time): 1534 // Start a new congestion event if packet was sent after the 1535 // start of the previous congestion recovery period. 1536 if (!InCongestionRecovery(sent_time)): 1537 congestion_recovery_start_time = Now() 1538 congestion_window *= kLossReductionFactor 1539 congestion_window = max(congestion_window, kMinimumWindow) 1540 ssthresh = congestion_window 1542 B.7. Process ECN Information 1544 Invoked when an ACK frame with an ECN section is received from the 1545 peer. 1547 ProcessECN(ack, pn_space): 1548 // If the ECN-CE counter reported by the peer has increased, 1549 // this could be a new congestion event. 1550 if (ack.ce_counter > ecn_ce_counters[pn_space]): 1551 ecn_ce_counters[pn_space] = ack.ce_counter 1552 CongestionEvent(sent_packets[ack.largest_acked].time_sent) 1554 B.8. On Packets Lost 1556 Invoked from DetectLostPackets when packets are deemed lost. 1558 InPersistentCongestion(largest_lost_packet): 1559 pto = smoothed_rtt + max(4 * rttvar, kGranularity) + 1560 max_ack_delay 1561 congestion_period = pto * kPersistentCongestionThreshold 1562 // Determine if all packets in the time period before the 1563 // newest lost packet, including the edges, are marked 1564 // lost 1565 return AreAllPacketsLost(largest_lost_packet, 1566 congestion_period) 1568 OnPacketsLost(lost_packets): 1569 // Remove lost packets from bytes_in_flight. 1570 for (lost_packet : lost_packets): 1571 bytes_in_flight -= lost_packet.size 1572 largest_lost_packet = lost_packets.last() 1573 CongestionEvent(largest_lost_packet.time_sent) 1575 // Collapse congestion window if persistent congestion 1576 if (InPersistentCongestion(largest_lost_packet)): 1577 congestion_window = kMinimumWindow 1579 Appendix C. Change Log 1581 *RFC Editor's Note:* Please remove this section prior to 1582 publication of a final version of this document. 1584 Issue and pull request numbers are listed with a leading octothorp. 1586 C.1. Since draft-ietf-quic-recovery-25 1588 No significant changes. 1590 C.2. Since draft-ietf-quic-recovery-24 1592 * Require congestion control of some sort (#3247, #3244, #3248) 1594 * Set a minimum reordering threshold (#3256, #3240) 1596 * PTO is specific to a packet number space (#3067, #3074, #3066) 1598 C.3. Since draft-ietf-quic-recovery-23 1600 * Define under-utilizing the congestion window (#2630, #2686, #2675) 1601 * PTO MUST send data if possible (#3056, #3057) 1603 * Connection Close is not ack-eliciting (#3097, #3098) 1605 * MUST limit bursts to the initial congestion window (#3160) 1607 * Define the current max_datagram_size for congestion control 1608 (#3041, #3167) 1610 C.4. Since draft-ietf-quic-recovery-22 1612 * PTO should always send an ack-eliciting packet (#2895) 1614 * Unify the Handshake Timer with the PTO timer (#2648, #2658, #2886) 1616 * Move ACK generation text to transport draft (#1860, #2916) 1618 C.5. Since draft-ietf-quic-recovery-21 1620 * No changes 1622 C.6. Since draft-ietf-quic-recovery-20 1624 * Path validation can be used as initial RTT value (#2644, #2687) 1626 * max_ack_delay transport parameter defaults to 0 (#2638, #2646) 1628 * Ack Delay only measures intentional delays induced by the 1629 implementation (#2596, #2786) 1631 C.7. Since draft-ietf-quic-recovery-19 1633 * Change kPersistentThreshold from an exponent to a multiplier 1634 (#2557) 1636 * Send a PING if the PTO timer fires and there's nothing to send 1637 (#2624) 1639 * Set loss delay to at least kGranularity (#2617) 1641 * Merge application limited and sending after idle sections. Always 1642 limit burst size instead of requiring resetting CWND to initial 1643 CWND after idle (#2605) 1645 * Rewrite RTT estimation, allow RTT samples where a newly acked 1646 packet is ack-eliciting but the largest_acked is not (#2592) 1648 * Don't arm the handshake timer if there is no handshake data 1649 (#2590) 1651 * Clarify that the time threshold loss alarm takes precedence over 1652 the crypto handshake timer (#2590, #2620) 1654 * Change initial RTT to 500ms to align with RFC6298 (#2184) 1656 C.8. Since draft-ietf-quic-recovery-18 1658 * Change IW byte limit to 14720 from 14600 (#2494) 1660 * Update PTO calculation to match RFC6298 (#2480, #2489, #2490) 1662 * Improve loss detection's description of multiple packet number 1663 spaces and pseudocode (#2485, #2451, #2417) 1665 * Declare persistent congestion even if non-probe packets are sent 1666 and don't make persistent congestion more aggressive than RTO 1667 verified was (#2365, #2244) 1669 * Move pseudocode to the appendices (#2408) 1671 * What to send on multiple PTOs (#2380) 1673 C.9. Since draft-ietf-quic-recovery-17 1675 * After Probe Timeout discard in-flight packets or send another 1676 (#2212, #1965) 1678 * Endpoints discard initial keys as soon as handshake keys are 1679 available (#1951, #2045) 1681 * 0-RTT state is discarded when 0-RTT is rejected (#2300) 1683 * Loss detection timer is cancelled when ack-eliciting frames are in 1684 flight (#2117, #2093) 1686 * Packets are declared lost if they are in flight (#2104) 1688 * After becoming idle, either pace packets or reset the congestion 1689 controller (#2138, 2187) 1691 * Process ECN counts before marking packets lost (#2142) 1693 * Mark packets lost before resetting crypto_count and pto_count 1694 (#2208, #2209) 1696 * Congestion and loss recovery state are discarded when keys are 1697 discarded (#2327) 1699 C.10. Since draft-ietf-quic-recovery-16 1701 * Unify TLP and RTO into a single PTO; eliminate min RTO, min TLP 1702 and min crypto timeouts; eliminate timeout validation (#2114, 1703 #2166, #2168, #1017) 1705 * Redefine how congestion avoidance in terms of when the period 1706 starts (#1928, #1930) 1708 * Document what needs to be tracked for packets that are in flight 1709 (#765, #1724, #1939) 1711 * Integrate both time and packet thresholds into loss detection 1712 (#1969, #1212, #934, #1974) 1714 * Reduce congestion window after idle, unless pacing is used (#2007, 1715 #2023) 1717 * Disable RTT calculation for packets that don't elicit 1718 acknowledgment (#2060, #2078) 1720 * Limit ack_delay by max_ack_delay (#2060, #2099) 1722 * Initial keys are discarded once Handshake keys are available 1723 (#1951, #2045) 1725 * Reorder ECN and loss detection in pseudocode (#2142) 1727 * Only cancel loss detection timer if ack-eliciting packets are in 1728 flight (#2093, #2117) 1730 C.11. Since draft-ietf-quic-recovery-14 1732 * Used max_ack_delay from transport params (#1796, #1782) 1734 * Merge ACK and ACK_ECN (#1783) 1736 C.12. Since draft-ietf-quic-recovery-13 1738 * Corrected the lack of ssthresh reduction in CongestionEvent 1739 pseudocode (#1598) 1741 * Considerations for ECN spoofing (#1426, #1626) 1742 * Clarifications for PADDING and congestion control (#837, #838, 1743 #1517, #1531, #1540) 1745 * Reduce early retransmission timer to RTT/8 (#945, #1581) 1747 * Packets are declared lost after an RTO is verified (#935, #1582) 1749 C.13. Since draft-ietf-quic-recovery-12 1751 * Changes to manage separate packet number spaces and encryption 1752 levels (#1190, #1242, #1413, #1450) 1754 * Added ECN feedback mechanisms and handling; new ACK_ECN frame 1755 (#804, #805, #1372) 1757 C.14. Since draft-ietf-quic-recovery-11 1759 No significant changes. 1761 C.15. Since draft-ietf-quic-recovery-10 1763 * Improved text on ack generation (#1139, #1159) 1765 * Make references to TCP recovery mechanisms informational (#1195) 1767 * Define time_of_last_sent_handshake_packet (#1171) 1769 * Added signal from TLS the data it includes needs to be sent in a 1770 Retry packet (#1061, #1199) 1772 * Minimum RTT (min_rtt) is initialized with an infinite value 1773 (#1169) 1775 C.16. Since draft-ietf-quic-recovery-09 1777 No significant changes. 1779 C.17. Since draft-ietf-quic-recovery-08 1781 * Clarified pacing and RTO (#967, #977) 1783 C.18. Since draft-ietf-quic-recovery-07 1785 * Include Ack Delay in RTO(and TLP) computations (#981) 1787 * Ack Delay in SRTT computation (#961) 1789 * Default RTT and Slow Start (#590) 1790 * Many editorial fixes. 1792 C.19. Since draft-ietf-quic-recovery-06 1794 No significant changes. 1796 C.20. Since draft-ietf-quic-recovery-05 1798 * Add more congestion control text (#776) 1800 C.21. Since draft-ietf-quic-recovery-04 1802 No significant changes. 1804 C.22. Since draft-ietf-quic-recovery-03 1806 No significant changes. 1808 C.23. Since draft-ietf-quic-recovery-02 1810 * Integrate F-RTO (#544, #409) 1812 * Add congestion control (#545, #395) 1814 * Require connection abort if a skipped packet was acknowledged 1815 (#415) 1817 * Simplify RTO calculations (#142, #417) 1819 C.24. Since draft-ietf-quic-recovery-01 1821 * Overview added to loss detection 1823 * Changes initial default RTT to 100ms 1825 * Added time-based loss detection and fixes early retransmit 1827 * Clarified loss recovery for handshake packets 1829 * Fixed references and made TCP references informative 1831 C.25. Since draft-ietf-quic-recovery-00 1833 * Improved description of constants and ACK behavior 1835 C.26. Since draft-iyengar-quic-loss-recovery-01 1837 * Adopted as base for draft-ietf-quic-recovery 1839 * Updated authors/editors list 1841 * Added table of contents 1843 Appendix D. Contributors 1845 The IETF QUIC Working Group received an enormous amount of support 1846 from many people. The following people provided substantive 1847 contributions to this document: Alessandro Ghedini, Benjamin 1848 Saunders, Gorry Fairhurst, 奥 一穂 (Kazuho Oku), Lars Eggert, Magnus 1849 Westerlund, Marten Seemann, Martin Duke, Martin Thomson, Nick Banks, 1850 Praveen Balasubramaniam. 1852 Acknowledgments 1854 Authors' Addresses 1856 Jana Iyengar (editor) 1857 Fastly 1859 Email: jri.ietf@gmail.com 1861 Ian Swett (editor) 1862 Google 1864 Email: ianswett@google.com