idnits 2.17.1 draft-ietf-quic-recovery-27.txt: -(1854): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (9 March 2020) is 1499 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Initial' is mentioned on line 1291, but not defined == Outdated reference: A later version (-34) exists of draft-ietf-quic-tls-27 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-27 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-rack-07 -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft Fastly 4 Intended status: Standards Track I. Swett, Ed. 5 Expires: 10 September 2020 Google 6 9 March 2020 8 QUIC Loss Detection and Congestion Control 9 draft-ietf-quic-recovery-27 11 Abstract 13 This document describes loss detection and congestion control 14 mechanisms for QUIC. 16 Note to Readers 18 Discussion of this draft takes place on the QUIC working group 19 mailing list (quic@ietf.org), which is archived at 20 https://mailarchive.ietf.org/arch/search/?email_list=quic 21 (https://mailarchive.ietf.org/arch/search/?email_list=quic). 23 Working Group information can be found at https://github.com/quicwg 24 (https://github.com/quicwg); source code and issues list for this 25 draft can be found at https://github.com/quicwg/base-drafts/labels/- 26 recovery (https://github.com/quicwg/base-drafts/labels/-recovery). 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on 10 September 2020. 45 Copyright Notice 47 Copyright (c) 2020 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 52 license-info) in effect on the date of publication of this document. 53 Please review these documents carefully, as they describe your rights 54 and restrictions with respect to this document. Code Components 55 extracted from this document must include Simplified BSD License text 56 as described in Section 4.e of the Trust Legal Provisions and are 57 provided without warranty as described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 62 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 63 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 5 64 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 65 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 6 66 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 67 3.1.3. Clearer Loss Epoch . . . . . . . . . . . . . . . . . 6 68 3.1.4. No Reneging . . . . . . . . . . . . . . . . . . . . . 7 69 3.1.5. More ACK Ranges . . . . . . . . . . . . . . . . . . . 7 70 3.1.6. Explicit Correction For Delayed Acknowledgements . . 7 71 4. Estimating the Round-Trip Time . . . . . . . . . . . . . . . 7 72 4.1. Generating RTT samples . . . . . . . . . . . . . . . . . 7 73 4.2. Estimating min_rtt . . . . . . . . . . . . . . . . . . . 8 74 4.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 9 75 5. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 10 76 5.1. Acknowledgement-based Detection . . . . . . . . . . . . . 10 77 5.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 11 78 5.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 11 79 5.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 12 80 5.2.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 12 81 5.3. Handshakes and New Paths . . . . . . . . . . . . . . . . 13 82 5.3.1. Sending Probe Packets . . . . . . . . . . . . . . . . 14 83 5.3.2. Loss Detection . . . . . . . . . . . . . . . . . . . 15 84 5.4. Handling Retry Packets . . . . . . . . . . . . . . . . . 15 85 5.5. Discarding Keys and Packet State . . . . . . . . . . . . 15 86 6. Congestion Control . . . . . . . . . . . . . . . . . . . . . 16 87 6.1. Explicit Congestion Notification . . . . . . . . . . . . 16 88 6.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 17 89 6.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 17 90 6.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 17 91 6.5. Ignoring Loss of Undecryptable Packets . . . . . . . . . 17 92 6.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 18 93 6.7. Persistent Congestion . . . . . . . . . . . . . . . . . . 18 94 6.8. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 19 95 6.9. Under-utilizing the Congestion Window . . . . . . . . . . 20 96 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 97 7.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 20 98 7.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 20 99 7.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 20 100 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 101 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 102 9.1. Normative References . . . . . . . . . . . . . . . . . . 21 103 9.2. Informative References . . . . . . . . . . . . . . . . . 21 104 Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 23 105 A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 23 106 A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 24 107 A.2. Constants of interest . . . . . . . . . . . . . . . . . . 24 108 A.3. Variables of interest . . . . . . . . . . . . . . . . . . 25 109 A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 25 110 A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 26 111 A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 26 112 A.7. On Packet Acknowledgment . . . . . . . . . . . . . . . . 28 113 A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 28 114 A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 30 115 A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 30 116 Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 31 117 B.1. Constants of interest . . . . . . . . . . . . . . . . . . 31 118 B.2. Variables of interest . . . . . . . . . . . . . . . . . . 32 119 B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 33 120 B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 33 121 B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 33 122 B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 34 123 B.7. Process ECN Information . . . . . . . . . . . . . . . . . 34 124 B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 35 125 Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 35 126 C.1. Since draft-ietf-quic-recovery-26 . . . . . . . . . . . . 35 127 C.2. Since draft-ietf-quic-recovery-25 . . . . . . . . . . . . 35 128 C.3. Since draft-ietf-quic-recovery-24 . . . . . . . . . . . . 35 129 C.4. Since draft-ietf-quic-recovery-23 . . . . . . . . . . . . 36 130 C.5. Since draft-ietf-quic-recovery-22 . . . . . . . . . . . . 36 131 C.6. Since draft-ietf-quic-recovery-21 . . . . . . . . . . . . 36 132 C.7. Since draft-ietf-quic-recovery-20 . . . . . . . . . . . . 36 133 C.8. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 36 134 C.9. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 37 135 C.10. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 37 136 C.11. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 38 137 C.12. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 38 138 C.13. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 38 139 C.14. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 39 140 C.15. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 39 141 C.16. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 39 142 C.17. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 39 143 C.18. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 39 144 C.19. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 39 145 C.20. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 40 146 C.21. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 40 147 C.22. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 40 148 C.23. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 40 149 C.24. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 40 150 C.25. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 40 151 C.26. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 40 152 C.27. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 41 153 Appendix D. Contributors . . . . . . . . . . . . . . . . . . . . 41 154 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 41 155 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 157 1. Introduction 159 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 160 on decades of transport and security experience, and implements 161 mechanisms that make it attractive as a modern general-purpose 162 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 164 QUIC implements the spirit of existing TCP congestion control and 165 loss recovery mechanisms, described in RFCs, various Internet-drafts, 166 and also those prevalent in the Linux TCP implementation. This 167 document describes QUIC congestion control and loss recovery, and 168 where applicable, attributes the TCP equivalent in RFCs, Internet- 169 drafts, academic papers, and/or TCP implementations. 171 2. Conventions and Definitions 173 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 174 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 175 "OPTIONAL" in this document are to be interpreted as described in BCP 176 14 [RFC2119] [RFC8174] when, and only when, they appear in all 177 capitals, as shown here. 179 Definitions of terms that are used in this document: 181 Ack-eliciting Frames: All frames other than ACK, PADDING, and 182 CONNECTION_CLOSE are considered ack-eliciting. 184 Ack-eliciting Packets: Packets that contain ack-eliciting frames 185 elicit an ACK from the receiver within the maximum ack delay and 186 are called ack-eliciting packets. 188 In-flight: Packets are considered in-flight when they are ack- 189 eliciting or contain a PADDING frame, and they have been sent but 190 are not acknowledged, declared lost, or abandoned along with old 191 keys. 193 3. Design of the QUIC Transmission Machinery 195 All transmissions in QUIC are sent with a packet-level header, which 196 indicates the encryption level and includes a packet sequence number 197 (referred to below as a packet number). The encryption level 198 indicates the packet number space, as described in [QUIC-TRANSPORT]. 199 Packet numbers never repeat within a packet number space for the 200 lifetime of a connection. Packet numbers are sent in monotonically 201 increasing order within a space, preventing ambiguity. 203 This design obviates the need for disambiguating between 204 transmissions and retransmissions and eliminates significant 205 complexity from QUIC's interpretation of TCP loss detection 206 mechanisms. 208 QUIC packets can contain multiple frames of different types. The 209 recovery mechanisms ensure that data and frames that need reliable 210 delivery are acknowledged or declared lost and sent in new packets as 211 necessary. The types of frames contained in a packet affect recovery 212 and congestion control logic: 214 * All packets are acknowledged, though packets that contain no ack- 215 eliciting frames are only acknowledged along with ack-eliciting 216 packets. 218 * Long header packets that contain CRYPTO frames are critical to the 219 performance of the QUIC handshake and use shorter timers for 220 acknowledgement. 222 * Packets containing frames besides ACK or CONNECTION_CLOSE frames 223 count toward congestion control limits and are considered in- 224 flight. 226 * PADDING frames cause packets to contribute toward bytes in flight 227 without directly causing an acknowledgment to be sent. 229 3.1. Relevant Differences Between QUIC and TCP 231 Readers familiar with TCP's loss detection and congestion control 232 will find algorithms here that parallel well-known TCP ones. 233 Protocol differences between QUIC and TCP however contribute to 234 algorithmic differences. We briefly describe these protocol 235 differences below. 237 3.1.1. Separate Packet Number Spaces 239 QUIC uses separate packet number spaces for each encryption level, 240 except 0-RTT and all generations of 1-RTT keys use the same packet 241 number space. Separate packet number spaces ensures acknowledgement 242 of packets sent with one level of encryption will not cause spurious 243 retransmission of packets sent with a different encryption level. 244 Congestion control and round-trip time (RTT) measurement are unified 245 across packet number spaces. 247 3.1.2. Monotonically Increasing Packet Numbers 249 TCP conflates transmission order at the sender with delivery order at 250 the receiver, which results in retransmissions of the same data 251 carrying the same sequence number, and consequently leads to 252 "retransmission ambiguity". QUIC separates the two. QUIC uses a 253 packet number to indicate transmission order. Application data is 254 sent in one or more streams and delivery order is determined by 255 stream offsets encoded within STREAM frames. 257 QUIC's packet number is strictly increasing within a packet number 258 space, and directly encodes transmission order. A higher packet 259 number signifies that the packet was sent later, and a lower packet 260 number signifies that the packet was sent earlier. When a packet 261 containing ack-eliciting frames is detected lost, QUIC rebundles 262 necessary frames in a new packet with a new packet number, removing 263 ambiguity about which packet is acknowledged when an ACK is received. 264 Consequently, more accurate RTT measurements can be made, spurious 265 retransmissions are trivially detected, and mechanisms such as Fast 266 Retransmit can be applied universally, based only on packet number. 268 This design point significantly simplifies loss detection mechanisms 269 for QUIC. Most TCP mechanisms implicitly attempt to infer 270 transmission ordering based on TCP sequence numbers - a non-trivial 271 task, especially when TCP timestamps are not available. 273 3.1.3. Clearer Loss Epoch 275 QUIC starts a loss epoch when a packet is lost and ends one when any 276 packet sent after the epoch starts is acknowledged. TCP waits for 277 the gap in the sequence number space to be filled, and so if a 278 segment is lost multiple times in a row, the loss epoch may not end 279 for several round trips. Because both should reduce their congestion 280 windows only once per epoch, QUIC will do it once for every round 281 trip that experiences loss, while TCP may only do it once across 282 multiple round trips. 284 3.1.4. No Reneging 286 QUIC ACKs contain information that is similar to TCP SACK, but QUIC 287 does not allow any acked packet to be reneged, greatly simplifying 288 implementations on both sides and reducing memory pressure on the 289 sender. 291 3.1.5. More ACK Ranges 293 QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In 294 high loss environments, this speeds recovery, reduces spurious 295 retransmits, and ensures forward progress without relying on 296 timeouts. 298 3.1.6. Explicit Correction For Delayed Acknowledgements 300 QUIC endpoints measure the delay incurred between when a packet is 301 received and when the corresponding acknowledgment is sent, allowing 302 a peer to maintain a more accurate round-trip time estimate (see 303 Section 13.2 of [QUIC-TRANSPORT]). 305 4. Estimating the Round-Trip Time 307 At a high level, an endpoint measures the time from when a packet was 308 sent to when it is acknowledged as a round-trip time (RTT) sample. 309 The endpoint uses RTT samples and peer-reported host delays (see 310 Section 13.2 of [QUIC-TRANSPORT]) to generate a statistical 311 description of the network path's RTT. An endpoint computes the 312 following three values for each path: the minimum value observed over 313 the lifetime of the path (min_rtt), an exponentially-weighted moving 314 average (smoothed_rtt), and the mean deviation (referred to as 315 "variation" in the rest of this document) in the observed RTT samples 316 (rttvar). 318 4.1. Generating RTT samples 320 An endpoint generates an RTT sample on receiving an ACK frame that 321 meets the following two conditions: 323 * the largest acknowledged packet number is newly acknowledged, and 325 * at least one of the newly acknowledged packets was ack-eliciting. 327 The RTT sample, latest_rtt, is generated as the time elapsed since 328 the largest acknowledged packet was sent: 330 latest_rtt = ack_time - send_time_of_largest_acked 331 An RTT sample is generated using only the largest acknowledged packet 332 in the received ACK frame. This is because a peer reports ACK delays 333 for only the largest acknowledged packet in an ACK frame. While the 334 reported ACK delay is not used by the RTT sample measurement, it is 335 used to adjust the RTT sample in subsequent computations of 336 smoothed_rtt and rttvar Section 4.3. 338 To avoid generating multiple RTT samples for a single packet, an ACK 339 frame SHOULD NOT be used to update RTT estimates if it does not newly 340 acknowledge the largest acknowledged packet. 342 An RTT sample MUST NOT be generated on receiving an ACK frame that 343 does not newly acknowledge at least one ack-eliciting packet. A peer 344 usually does not send an ACK frame when only non-ack-eliciting 345 packets are received. Therefore an ACK frame that contains 346 acknowledgements for only non-ack-eliciting packets could include an 347 arbitrarily large Ack Delay value. Ignoring such ACK frames avoids 348 complications in subsequent smoothed_rtt and rttvar computations. 350 A sender might generate multiple RTT samples per RTT when multiple 351 ACK frames are received within an RTT. As suggested in [RFC6298], 352 doing so might result in inadequate history in smoothed_rtt and 353 rttvar. Ensuring that RTT estimates retain sufficient history is an 354 open research question. 356 4.2. Estimating min_rtt 358 min_rtt is the minimum RTT observed for a given network path. 359 min_rtt is set to the latest_rtt on the first RTT sample, and to the 360 lesser of min_rtt and latest_rtt on subsequent samples. In this 361 document, min_rtt is used by loss detection to reject implausibly 362 small rtt samples. 364 An endpoint uses only locally observed times in computing the min_rtt 365 and does not adjust for ACK delays reported by the peer. Doing so 366 allows the endpoint to set a lower bound for the smoothed_rtt based 367 entirely on what it observes (see Section 4.3), and limits potential 368 underestimation due to erroneously-reported delays by the peer. 370 The RTT for a network path may change over time. If a path's actual 371 RTT decreases, the min_rtt will adapt immediately on the first low 372 sample. If the path's actual RTT increases, the min_rtt will not 373 adapt to it, allowing future RTT samples that are smaller than the 374 new RTT be included in smoothed_rtt. 376 4.3. Estimating smoothed_rtt and rttvar 378 smoothed_rtt is an exponentially-weighted moving average of an 379 endpoint's RTT samples, and rttvar is the variation in the RTT 380 samples, estimated using a mean variation. 382 The calculation of smoothed_rtt uses path latency after adjusting RTT 383 samples for acknowledgement delays. These delays are computed using 384 the ACK Delay field of the ACK frame as described in Section 19.3 of 385 [QUIC-TRANSPORT]. For packets sent in the ApplicationData packet 386 number space, a peer limits any delay in sending an acknowledgement 387 for an ack-eliciting packet to no greater than the value it 388 advertised in the max_ack_delay transport parameter. Consequently, 389 when a peer reports an Ack Delay that is greater than its 390 max_ack_delay, the delay is attributed to reasons out of the peer's 391 control, such as scheduler latency at the peer or loss of previous 392 ACK frames. Any delays beyond the peer's max_ack_delay are therefore 393 considered effectively part of path delay and incorporated into the 394 smoothed_rtt estimate. 396 When adjusting an RTT sample using peer-reported acknowledgement 397 delays, an endpoint: 399 * MUST ignore the Ack Delay field of the ACK frame for packets sent 400 in the Initial and Handshake packet number space. 402 * MUST use the lesser of the value reported in Ack Delay field of 403 the ACK frame and the peer's max_ack_delay transport parameter. 405 * MUST NOT apply the adjustment if the resulting RTT sample is 406 smaller than the min_rtt. This limits the underestimation that a 407 misreporting peer can cause to the smoothed_rtt. 409 On the first RTT sample for a network path, the smoothed_rtt is set 410 to the latest_rtt. 412 smoothed_rtt and rttvar are computed as follows, similar to 413 [RFC6298]. On the first RTT sample for a network path: 415 smoothed_rtt = latest_rtt 416 rttvar = latest_rtt / 2 418 On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows: 420 ack_delay = min(Ack Delay in ACK Frame, max_ack_delay) 421 adjusted_rtt = latest_rtt 422 if (min_rtt + ack_delay < latest_rtt): 423 adjusted_rtt = latest_rtt - ack_delay 424 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt 425 rttvar_sample = abs(smoothed_rtt - adjusted_rtt) 426 rttvar = 3/4 * rttvar + 1/4 * rttvar_sample 428 5. Loss Detection 430 QUIC senders use acknowledgements to detect lost packets, and a probe 431 time out (see Section 5.2) to ensure acknowledgements are received. 432 This section provides a description of these algorithms. 434 If a packet is lost, the QUIC transport needs to recover from that 435 loss, such as by retransmitting the data, sending an updated frame, 436 or abandoning the frame. For more information, see Section 13.3 of 437 [QUIC-TRANSPORT]. 439 5.1. Acknowledgement-based Detection 441 Acknowledgement-based loss detection implements the spirit of TCP's 442 Fast Retransmit [RFC5681], Early Retransmit [RFC5827], FACK [FACK], 443 SACK loss recovery [RFC6675], and RACK [RACK]. This section provides 444 an overview of how these algorithms are implemented in QUIC. 446 A packet is declared lost if it meets all the following conditions: 448 * The packet is unacknowledged, in-flight, and was sent prior to an 449 acknowledged packet. 451 * Either its packet number is kPacketThreshold smaller than an 452 acknowledged packet (Section 5.1.1), or it was sent long enough in 453 the past (Section 5.1.2). 455 The acknowledgement indicates that a packet sent later was delivered, 456 and the packet and time thresholds provide some tolerance for packet 457 reordering. 459 Spuriously declaring packets as lost leads to unnecessary 460 retransmissions and may result in degraded performance due to the 461 actions of the congestion controller upon detecting loss. 462 Implementations that detect spurious retransmissions and increase the 463 reordering threshold in packets or time MAY choose to start with 464 smaller initial reordering thresholds to minimize recovery latency. 466 5.1.1. Packet Threshold 468 The RECOMMENDED initial value for the packet reordering threshold 469 (kPacketThreshold) is 3, based on best practices for TCP loss 470 detection [RFC5681] [RFC6675]. Implementations SHOULD NOT use a 471 packet threshold less than 3, to keep in line with TCP [RFC5681]. 473 Some networks may exhibit higher degrees of reordering, causing a 474 sender to detect spurious losses. Implementers MAY use algorithms 475 developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's 476 reordering resilience. 478 5.1.2. Time Threshold 480 Once a later packet within the same packet number space has been 481 acknowledged, an endpoint SHOULD declare an earlier packet lost if it 482 was sent a threshold amount of time in the past. To avoid declaring 483 packets as lost too early, this time threshold MUST be set to at 484 least kGranularity. The time threshold is: 486 max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity) 488 If packets sent prior to the largest acknowledged packet cannot yet 489 be declared lost, then a timer SHOULD be set for the remaining time. 491 Using max(smoothed_rtt, latest_rtt) protects from the two following 492 cases: 494 * the latest RTT sample is lower than the smoothed RTT, perhaps due 495 to reordering where the acknowledgement encountered a shorter 496 path; 498 * the latest RTT sample is higher than the smoothed RTT, perhaps due 499 to a sustained increase in the actual RTT, but the smoothed RTT 500 has not yet caught up. 502 The RECOMMENDED time threshold (kTimeThreshold), expressed as a 503 round-trip time multiplier, is 9/8. 505 Implementations MAY experiment with absolute thresholds, thresholds 506 from previous connections, adaptive thresholds, or including RTT 507 variation. Smaller thresholds reduce reordering resilience and 508 increase spurious retransmissions, and larger thresholds increase 509 loss detection delay. 511 5.2. Probe Timeout 513 A Probe Timeout (PTO) triggers sending one or two probe datagrams 514 when ack-eliciting packets are not acknowledged within the expected 515 period of time or the handshake has not been completed. A PTO 516 enables a connection to recover from loss of tail packets or 517 acknowledgements. 519 As with loss detection, the probe timeout is per packet number space. 520 The PTO algorithm used in QUIC implements the reliability functions 521 of Tail Loss Probe [RACK], RTO [RFC5681], and F-RTO algorithms for 522 TCP [RFC5682]. The timeout computation is based on TCP's 523 retransmission timeout period [RFC6298]. 525 5.2.1. Computing PTO 527 When an ack-eliciting packet is transmitted, the sender schedules a 528 timer for the PTO period as follows: 530 PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay 532 kGranularity, smoothed_rtt, rttvar, and max_ack_delay are defined in 533 Appendix A.2 and Appendix A.3. 535 The PTO period is the amount of time that a sender ought to wait for 536 an acknowledgement of a sent packet. This time period includes the 537 estimated network roundtrip-time (smoothed_rtt), the variation in the 538 estimate (4*rttvar), and max_ack_delay, to account for the maximum 539 time by which a receiver might delay sending an acknowledgement. 540 When the PTO is armed for Initial or Handshake packet number spaces, 541 the max_ack_delay is 0, as specified in 13.2.1 of [QUIC-TRANSPORT]. 543 The PTO value MUST be set to at least kGranularity, to avoid the 544 timer expiring immediately. 546 A sender computes its PTO timer every time an ack-eliciting packet is 547 sent. When ack-eliciting packets are in-flight in multiple packet 548 number spaces, the timer MUST be set for the packet number space with 549 the earliest timeout, except for ApplicationData, which MUST be 550 ignored until the handshake completes; see Section 4.1.1 of 551 [QUIC-TLS]. Not arming the PTO for ApplicationData prioritizes 552 completing the handshake and prevents the server from sending a 1-RTT 553 packet on a PTO before before it has the keys to process a 1-RTT 554 packet. 556 When a PTO timer expires, the PTO period MUST be set to twice its 557 current value. This exponential reduction in the sender's rate is 558 important because consecutive PTOs might be caused by loss of packets 559 or acknowledgements due to severe congestion. Even when there are 560 ack-eliciting packets in-flight in multiple packet number spaces, the 561 exponential increase in probe timeout occurs across all spaces to 562 prevent excess load on the network. For example, a timeout in the 563 Initial packet number space doubles the length of the timeout in the 564 Handshake packet number space. 566 The life of a connection that is experiencing consecutive PTOs is 567 limited by the endpoint's idle timeout. 569 The probe timer MUST NOT be set if the time threshold Section 5.1.2 570 loss detection timer is set. The time threshold loss detection timer 571 is expected to both expire earlier than the PTO and be less likely to 572 spuriously retransmit data. 574 5.3. Handshakes and New Paths 576 The initial probe timeout for a new connection or new path SHOULD be 577 set to twice the initial RTT. Resumed connections over the same 578 network SHOULD use the previous connection's final smoothed RTT value 579 as the resumed connection's initial RTT. If no previous RTT is 580 available, the initial RTT SHOULD be set to 500ms, resulting in a 1 581 second initial timeout as recommended in [RFC6298]. 583 A connection MAY use the delay between sending a PATH_CHALLENGE and 584 receiving a PATH_RESPONSE to set the initial RTT (see kInitialRtt in 585 Appendix A.2) for a new path, but the delay SHOULD NOT be considered 586 an RTT sample. 588 Until the server has validated the client's address on the path, the 589 amount of data it can send is limited to three times the amount of 590 data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If 591 no data can be sent, then the PTO alarm MUST NOT be armed until 592 datagrams have been received from the client. 594 Since the server could be blocked until more packets are received 595 from the client, it is the client's responsibility to send packets to 596 unblock the server until it is certain that the server has finished 597 its address validation (see Section 8 of [QUIC-TRANSPORT]). That is, 598 the client MUST set the probe timer if the client has not received an 599 acknowledgement for one of its Handshake or 1-RTT packets. 601 Prior to handshake completion, when few to none RTT samples have been 602 generated, it is possible that the probe timer expiration is due to 603 an incorrect RTT estimate at the client. To allow the client to 604 improve its RTT estimate, the new packet that it sends MUST be ack- 605 eliciting. If Handshake keys are available to the client, it MUST 606 send a Handshake packet, and otherwise it MUST send an Initial packet 607 in a UDP datagram of at least 1200 bytes. 609 Initial packets and Handshake packets could be never acknowledged, 610 but they are removed from bytes in flight when the Initial and 611 Handshake keys are discarded. 613 5.3.1. Sending Probe Packets 615 When a PTO timer expires, a sender MUST send at least one ack- 616 eliciting packet in the packet number space as a probe, unless there 617 is no data available to send. An endpoint MAY send up to two full- 618 sized datagrams containing ack-eliciting packets, to avoid an 619 expensive consecutive PTO expiration due to a single lost datagram or 620 transmit data from multiple packet number spaces. 622 In addition to sending data in the packet number space for which the 623 timer expired, the sender SHOULD send ack-eliciting packets from 624 other packet number spaces with in-flight data, coalescing packets if 625 possible. 627 When the PTO timer expires, and there is new or previously sent 628 unacknowledged data, it MUST be sent. 630 It is possible the sender has no new or previously-sent data to send. 631 As an example, consider the following sequence of events: new 632 application data is sent in a STREAM frame, deemed lost, then 633 retransmitted in a new packet, and then the original transmission is 634 acknowledged. When there is no data to send, the sender SHOULD send 635 a PING or other ack-eliciting frame in a single packet, re-arming the 636 PTO timer. 638 Alternatively, instead of sending an ack-eliciting packet, the sender 639 MAY mark any packets still in flight as lost. Doing so avoids 640 sending an additional packet, but increases the risk that loss is 641 declared too aggressively, resulting in an unnecessary rate reduction 642 by the congestion controller. 644 Consecutive PTO periods increase exponentially, and as a result, 645 connection recovery latency increases exponentially as packets 646 continue to be dropped in the network. Sending two packets on PTO 647 expiration increases resilience to packet drops, thus reducing the 648 probability of consecutive PTO events. 650 Probe packets sent on a PTO MUST be ack-eliciting. A probe packet 651 SHOULD carry new data when possible. A probe packet MAY carry 652 retransmitted unacknowledged data when new data is unavailable, when 653 flow control does not permit new data to be sent, or to 654 opportunistically reduce loss recovery delay. Implementations MAY 655 use alternative strategies for determining the content of probe 656 packets, including sending new or retransmitted data based on the 657 application's priorities. 659 When the PTO timer expires multiple times and new data cannot be 660 sent, implementations must choose between sending the same payload 661 every time or sending different payloads. Sending the same payload 662 may be simpler and ensures the highest priority frames arrive first. 663 Sending different payloads each time reduces the chances of spurious 664 retransmission. 666 5.3.2. Loss Detection 668 Delivery or loss of packets in flight is established when an ACK 669 frame is received that newly acknowledges one or more packets. 671 A PTO timer expiration event does not indicate packet loss and MUST 672 NOT cause prior unacknowledged packets to be marked as lost. When an 673 acknowledgement is received that newly acknowledges packets, loss 674 detection proceeds as dictated by packet and time threshold 675 mechanisms; see Section 5.1. 677 5.4. Handling Retry Packets 679 A Retry packet causes a client to send another Initial packet, 680 effectively restarting the connection process. A Retry packet 681 indicates that the Initial was received, but not processed. A Retry 682 packet cannot be treated as an acknowledgment, because it does not 683 indicate that a packet was processed or specify the packet number. 685 Clients that receive a Retry packet reset congestion control and loss 686 recovery state, including resetting any pending timers. Other 687 connection state, in particular cryptographic handshake messages, is 688 retained; see Section 17.2.5 of [QUIC-TRANSPORT]. 690 The client MAY compute an RTT estimate to the server as the time 691 period from when the first Initial was sent to when a Retry or a 692 Version Negotiation packet is received. The client MAY use this 693 value in place of its default for the initial RTT estimate. 695 5.5. Discarding Keys and Packet State 697 When packet protection keys are discarded (see Section 4.10 of 698 [QUIC-TLS]), all packets that were sent with those keys can no longer 699 be acknowledged because their acknowledgements cannot be processed 700 anymore. The sender MUST discard all recovery state associated with 701 those packets and MUST remove them from the count of bytes in flight. 703 Endpoints stop sending and receiving Initial packets once they start 704 exchanging Handshake packets (see Section 17.2.2.1 of 705 [QUIC-TRANSPORT]). At this point, recovery state for all in-flight 706 Initial packets is discarded. 708 When 0-RTT is rejected, recovery state for all in-flight 0-RTT 709 packets is discarded. 711 If a server accepts 0-RTT, but does not buffer 0-RTT packets that 712 arrive before Initial packets, early 0-RTT packets will be declared 713 lost, but that is expected to be infrequent. 715 It is expected that keys are discarded after packets encrypted with 716 them would be acknowledged or declared lost. Initial secrets however 717 might be destroyed sooner, as soon as handshake keys are available 718 (see Section 4.10.1 of [QUIC-TLS]). 720 6. Congestion Control 722 This document specifies a Reno congestion controller for QUIC 723 [RFC6582]. 725 The signals QUIC provides for congestion control are generic and are 726 designed to support different algorithms. Endpoints can unilaterally 727 choose a different algorithm to use, such as Cubic [RFC8312]. 729 If an endpoint uses a different controller than that specified in 730 this document, the chosen controller MUST conform to the congestion 731 control guidelines specified in Section 3.1 of [RFC8085]. 733 The algorithm in this document specifies and uses the controller's 734 congestion window in bytes. 736 An endpoint MUST NOT send a packet if it would cause bytes_in_flight 737 (see Appendix B.2) to be larger than the congestion window, unless 738 the packet is sent on a PTO timer expiration (see Section 5.2). 740 6.1. Explicit Congestion Notification 742 If a path has been verified to support ECN [RFC3168] [RFC8311], QUIC 743 treats a Congestion Experienced(CE) codepoint in the IP header as a 744 signal of congestion. This document specifies an endpoint's response 745 when its peer receives packets with the Congestion Experienced 746 codepoint. 748 6.2. Slow Start 750 QUIC begins every connection in slow start and exits slow start upon 751 loss or upon increase in the ECN-CE counter. QUIC re-enters slow 752 start any time the congestion window is less than ssthresh, which 753 only occurs after persistent congestion is declared. While in slow 754 start, QUIC increases the congestion window by the number of bytes 755 acknowledged when each acknowledgment is processed. 757 6.3. Congestion Avoidance 759 Slow start exits to congestion avoidance. Congestion avoidance in 760 NewReno uses an additive increase multiplicative decrease (AIMD) 761 approach that increases the congestion window by one maximum packet 762 size per congestion window acknowledged. When a loss is detected, 763 NewReno halves the congestion window and sets the slow start 764 threshold to the new congestion window. 766 6.4. Recovery Period 768 A recovery period is entered when loss or ECN-CE marking of a packet 769 is detected. A recovery period ends when a packet sent during the 770 recovery period is acknowledged. This is slightly different from 771 TCP's definition of recovery, which ends when the lost packet that 772 started recovery is acknowledged. 774 The recovery period limits congestion window reduction to once per 775 round trip. During recovery, the congestion window remains unchanged 776 irrespective of new losses or increases in the ECN-CE counter. 778 6.5. Ignoring Loss of Undecryptable Packets 780 During the handshake, some packet protection keys might not be 781 available when a packet arrives. In particular, Handshake and 0-RTT 782 packets cannot be processed until the Initial packets arrive, and 783 1-RTT packets cannot be processed until the handshake completes. 784 Endpoints MAY ignore the loss of Handshake, 0-RTT, and 1-RTT packets 785 that might arrive before the peer has packet protection keys to 786 process those packets. 788 6.6. Probe Timeout 790 Probe packets MUST NOT be blocked by the congestion controller. A 791 sender MUST however count these packets as being additionally in 792 flight, since these packets add network load without establishing 793 packet loss. Note that sending probe packets might cause the 794 sender's bytes in flight to exceed the congestion window until an 795 acknowledgement is received that establishes loss or delivery of 796 packets. 798 6.7. Persistent Congestion 800 When an ACK frame is received that establishes loss of all in-flight 801 packets sent over a long enough period of time, the network is 802 considered to be experiencing persistent congestion. Commonly, this 803 can be established by consecutive PTOs, but since the PTO timer is 804 reset when a new ack-eliciting packet is sent, an explicit duration 805 must be used to account for those cases where PTOs do not occur or 806 are substantially delayed. This duration is computed as follows: 808 (smoothed_rtt + 4 * rttvar + max_ack_delay) * 809 kPersistentCongestionThreshold 811 For example, assume: 813 smoothed_rtt = 1 rttvar = 0 max_ack_delay = 0 814 kPersistentCongestionThreshold = 3 816 If an ack-eliciting packet is sent at time = 0, the following 817 scenario would illustrate persistent congestion: 819 +-----+------------------------+ 820 | t=0 | Send Pkt #1 (App Data) | 821 +=====+========================+ 822 | t=1 | Send Pkt #2 (PTO 1) | 823 +-----+------------------------+ 824 | t=3 | Send Pkt #3 (PTO 2) | 825 +-----+------------------------+ 826 | t=7 | Send Pkt #4 (PTO 3) | 827 +-----+------------------------+ 828 | t=8 | Recv ACK of Pkt #4 | 829 +-----+------------------------+ 831 Table 1 833 The first three packets are determined to be lost when the 834 acknowlegement of packet 4 is received at t=8. The congestion period 835 is calculated as the time between the oldest and newest lost packets: 837 (3 - 0) = 3. The duration for persistent congestion is equal to: (1 838 * kPersistentCongestionThreshold) = 3. Because the threshold was 839 reached and because none of the packets between the oldest and the 840 newest packets are acknowledged, the network is considered to have 841 experienced persistent congestion. 843 When persistent congestion is established, the sender's congestion 844 window MUST be reduced to the minimum congestion window 845 (kMinimumWindow). This response of collapsing the congestion window 846 on persistent congestion is functionally similar to a sender's 847 response on a Retransmission Timeout (RTO) in TCP [RFC5681] after 848 Tail Loss Probes (TLP) [RACK]. 850 6.8. Pacing 852 This document does not specify a pacer, but it is RECOMMENDED that a 853 sender pace sending of all in-flight packets based on input from the 854 congestion controller. For example, a pacer might distribute the 855 congestion window over the smoothed RTT when used with a window-based 856 controller, and a pacer might use the rate estimate of a rate-based 857 controller. 859 An implementation should take care to architect its congestion 860 controller to work well with a pacer. For instance, a pacer might 861 wrap the congestion controller and control the availability of the 862 congestion window, or a pacer might pace out packets handed to it by 863 the congestion controller. Timely delivery of ACK frames is 864 important for efficient loss recovery. Packets containing only ACK 865 frames should therefore not be paced, to avoid delaying their 866 delivery to the peer. 868 Sending multiple packets into the network without any delay between 869 them creates a packet burst that might cause short-term congestion 870 and losses. Implementations MUST either use pacing or limit such 871 bursts to the initial congestion window, which is recommended to be 872 the minimum of 10 * max_datagram_size and max(2* max_datagram_size, 873 14720)), where max_datagram_size is the current maximum size of a 874 datagram for the connection, not including UDP or IP overhead. 876 As an example of a well-known and publicly available implementation 877 of a flow pacer, implementers are referred to the Fair Queue packet 878 scheduler (fq qdisc) in Linux (3.11 onwards). 880 6.9. Under-utilizing the Congestion Window 882 When bytes in flight is smaller than the congestion window and 883 sending is not pacing limited, the congestion window is under- 884 utilized. When this occurs, the congestion window SHOULD NOT be 885 increased in either slow start or congestion avoidance. This can 886 happen due to insufficient application data or flow control credit. 888 A sender MAY use the pipeACK method described in section 4.3 of 889 [RFC7661] to determine if the congestion window is sufficiently 890 utilized. 892 A sender that paces packets (see Section 6.8) might delay sending 893 packets and not fully utilize the congestion window due to this 894 delay. A sender should not consider itself application limited if it 895 would have fully utilized the congestion window without pacing delay. 897 A sender MAY implement alternative mechanisms to update its 898 congestion window after periods of under-utilization, such as those 899 proposed for TCP in [RFC7661]. 901 7. Security Considerations 903 7.1. Congestion Signals 905 Congestion control fundamentally involves the consumption of signals 906 - both loss and ECN codepoints - from unauthenticated entities. On- 907 path attackers can spoof or alter these signals. An attacker can 908 cause endpoints to reduce their sending rate by dropping packets, or 909 alter send rate by changing ECN codepoints. 911 7.2. Traffic Analysis 913 Packets that carry only ACK frames can be heuristically identified by 914 observing packet size. Acknowledgement patterns may expose 915 information about link characteristics or application behavior. 916 Endpoints can use PADDING frames or bundle acknowledgments with other 917 frames to reduce leaked information. 919 7.3. Misreporting ECN Markings 921 A receiver can misreport ECN markings to alter the congestion 922 response of a sender. Suppressing reports of ECN-CE markings could 923 cause a sender to increase their send rate. This increase could 924 result in congestion and loss. 926 A sender MAY attempt to detect suppression of reports by marking 927 occasional packets that they send with ECN-CE. If a packet sent with 928 ECN-CE is not reported as having been CE marked when the packet is 929 acknowledged, then the sender SHOULD disable ECN for that path. 931 Reporting additional ECN-CE markings will cause a sender to reduce 932 their sending rate, which is similar in effect to advertising reduced 933 connection flow control limits and so no advantage is gained by doing 934 so. 936 Endpoints choose the congestion controller that they use. Though 937 congestion controllers generally treat reports of ECN-CE markings as 938 equivalent to loss [RFC8311], the exact response for each controller 939 could be different. Failure to correctly respond to information 940 about ECN markings is therefore difficult to detect. 942 8. IANA Considerations 944 This document has no IANA actions. Yet. 946 9. References 948 9.1. Normative References 950 [QUIC-TLS] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure 951 QUIC", Work in Progress, Internet-Draft, draft-ietf-quic- 952 tls-27, 9 March 2020, 953 . 955 [QUIC-TRANSPORT] 956 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 957 Multiplexed and Secure Transport", Work in Progress, 958 Internet-Draft, draft-ietf-quic-transport-27, 9 March 959 2020, . 962 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 963 Requirement Levels", BCP 14, RFC 2119, 964 DOI 10.17487/RFC2119, March 1997, 965 . 967 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 968 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 969 March 2017, . 971 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 972 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 973 May 2017, . 975 9.2. Informative References 977 [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement: 978 Refining TCP Congestion Control", ACM SIGCOMM , August 979 1996. 981 [RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: 982 a time-based fast loss detection algorithm for TCP", Work 983 in Progress, Internet-Draft, draft-ietf-tcpm-rack-07, 17 984 January 2020, . 987 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 988 of Explicit Congestion Notification (ECN) to IP", 989 RFC 3168, DOI 10.17487/RFC3168, September 2001, 990 . 992 [RFC4653] Bhandarkar, S., Reddy, A. L. N., Allman, M., and E. 993 Blanton, "Improving the Robustness of TCP to Non- 994 Congestion Events", RFC 4653, DOI 10.17487/RFC4653, August 995 2006, . 997 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 998 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 999 . 1001 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1002 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1003 Spurious Retransmission Timeouts with TCP", RFC 5682, 1004 DOI 10.17487/RFC5682, September 2009, 1005 . 1007 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 1008 P. Hurtig, "Early Retransmit for TCP and Stream Control 1009 Transmission Protocol (SCTP)", RFC 5827, 1010 DOI 10.17487/RFC5827, May 2010, 1011 . 1013 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1014 "Computing TCP's Retransmission Timer", RFC 6298, 1015 DOI 10.17487/RFC6298, June 2011, 1016 . 1018 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 1019 NewReno Modification to TCP's Fast Recovery Algorithm", 1020 RFC 6582, DOI 10.17487/RFC6582, April 2012, 1021 . 1023 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1024 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1025 Based on Selective Acknowledgment (SACK) for TCP", 1026 RFC 6675, DOI 10.17487/RFC6675, August 2012, 1027 . 1029 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 1030 "Increasing TCP's Initial Window", RFC 6928, 1031 DOI 10.17487/RFC6928, April 2013, 1032 . 1034 [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating 1035 TCP to Support Rate-Limited Traffic", RFC 7661, 1036 DOI 10.17487/RFC7661, October 2015, 1037 . 1039 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1040 Notification (ECN) Experimentation", RFC 8311, 1041 DOI 10.17487/RFC8311, January 2018, 1042 . 1044 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1045 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1046 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1047 . 1049 Appendix A. Loss Recovery Pseudocode 1051 We now describe an example implementation of the loss detection 1052 mechanisms described in Section 5. 1054 A.1. Tracking Sent Packets 1056 To correctly implement congestion control, a QUIC sender tracks every 1057 ack-eliciting packet until the packet is acknowledged or lost. It is 1058 expected that implementations will be able to access this information 1059 by packet number and crypto context and store the per-packet fields 1060 (Appendix A.1.1) for loss recovery and congestion control. 1062 After a packet is declared lost, the endpoint can track it for an 1063 amount of time comparable to the maximum expected packet reordering, 1064 such as 1 RTT. This allows for detection of spurious 1065 retransmissions. 1067 Sent packets are tracked for each packet number space, and ACK 1068 processing only applies to a single space. 1070 A.1.1. Sent Packet Fields 1072 packet_number: The packet number of the sent packet. 1074 ack_eliciting: A boolean that indicates whether a packet is ack- 1075 eliciting. If true, it is expected that an acknowledgement will 1076 be received, though the peer could delay sending the ACK frame 1077 containing it by up to the MaxAckDelay. 1079 in_flight: A boolean that indicates whether the packet counts 1080 towards bytes in flight. 1082 sent_bytes: The number of bytes sent in the packet, not including 1083 UDP or IP overhead, but including QUIC framing overhead. 1085 time_sent: The time the packet was sent. 1087 A.2. Constants of interest 1089 Constants used in loss recovery are based on a combination of RFCs, 1090 papers, and common practice. 1092 kPacketThreshold: Maximum reordering in packets before packet 1093 threshold loss detection considers a packet lost. The RECOMMENDED 1094 value is 3. 1096 kTimeThreshold: Maximum reordering in time before time threshold 1097 loss detection considers a packet lost. Specified as an RTT 1098 multiplier. The RECOMMENDED value is 9/8. 1100 kGranularity: Timer granularity. This is a system-dependent value. 1101 However, implementations SHOULD use a value no smaller than 1ms. 1103 kInitialRtt: The RTT used before an RTT sample is taken. The 1104 RECOMMENDED value is 500ms. 1106 kPacketNumberSpace: An enum to enumerate the three packet number 1107 spaces. 1109 enum kPacketNumberSpace { 1110 Initial, 1111 Handshake, 1112 ApplicationData, 1113 } 1115 A.3. Variables of interest 1117 Variables required to implement the congestion control mechanisms are 1118 described in this section. 1120 latest_rtt: The most recent RTT measurement made when receiving an 1121 ack for a previously unacked packet. 1123 smoothed_rtt: The smoothed RTT of the connection, computed as 1124 described in [RFC6298] 1126 rttvar: The RTT variation, computed as described in [RFC6298] 1128 min_rtt: The minimum RTT seen in the connection, ignoring ack delay. 1130 max_ack_delay: The maximum amount of time by which the receiver 1131 intends to delay acknowledgments for packets in the 1132 ApplicationData packet number space. The actual ack_delay in a 1133 received ACK frame may be larger due to late timers, reordering, 1134 or lost ACK frames. 1136 loss_detection_timer: Multi-modal timer used for loss detection. 1138 pto_count: The number of times a PTO has been sent without receiving 1139 an ack. 1141 time_of_last_sent_ack_eliciting_packet[kPacketNumberSpace]: The time 1142 the most recent ack-eliciting packet was sent. 1144 largest_acked_packet[kPacketNumberSpace]: The largest packet number 1145 acknowledged in the packet number space so far. 1147 loss_time[kPacketNumberSpace]: The time at which the next packet in 1148 that packet number space will be considered lost based on 1149 exceeding the reordering window in time. 1151 sent_packets[kPacketNumberSpace]: An association of packet numbers 1152 in a packet number space to information about them. Described in 1153 detail above in Appendix A.1. 1155 A.4. Initialization 1157 At the beginning of the connection, initialize the loss detection 1158 variables as follows: 1160 loss_detection_timer.reset() 1161 pto_count = 0 1162 latest_rtt = 0 1163 smoothed_rtt = 0 1164 rttvar = 0 1165 min_rtt = 0 1166 max_ack_delay = 0 1167 for pn_space in [ Initial, Handshake, ApplicationData ]: 1168 largest_acked_packet[pn_space] = infinite 1169 time_of_last_sent_ack_eliciting_packet[pn_space] = 0 1170 loss_time[pn_space] = 0 1172 A.5. On Sending a Packet 1174 After a packet is sent, information about the packet is stored. The 1175 parameters to OnPacketSent are described in detail above in 1176 Appendix A.1.1. 1178 Pseudocode for OnPacketSent follows: 1180 OnPacketSent(packet_number, pn_space, ack_eliciting, 1181 in_flight, sent_bytes): 1182 sent_packets[pn_space][packet_number].packet_number = 1183 packet_number 1184 sent_packets[pn_space][packet_number].time_sent = now 1185 sent_packets[pn_space][packet_number].ack_eliciting = 1186 ack_eliciting 1187 sent_packets[pn_space][packet_number].in_flight = in_flight 1188 if (in_flight): 1189 if (ack_eliciting): 1190 time_of_last_sent_ack_eliciting_packet[pn_space] = now 1191 OnPacketSentCC(sent_bytes) 1192 sent_packets[pn_space][packet_number].size = sent_bytes 1193 SetLossDetectionTimer() 1195 A.6. On Receiving an Acknowledgment 1197 When an ACK frame is received, it may newly acknowledge any number of 1198 packets. 1200 Pseudocode for OnAckReceived and UpdateRtt follow: 1202 OnAckReceived(ack, pn_space): 1203 if (largest_acked_packet[pn_space] == infinite): 1204 largest_acked_packet[pn_space] = ack.largest_acked 1205 else: 1206 largest_acked_packet[pn_space] = 1207 max(largest_acked_packet[pn_space], ack.largest_acked) 1209 // Nothing to do if there are no newly acked packets. 1210 newly_acked_packets = DetermineNewlyAckedPackets(ack, pn_space) 1211 if (newly_acked_packets.empty()): 1212 return 1214 // If the largest acknowledged is newly acked and 1215 // at least one ack-eliciting was newly acked, update the RTT. 1216 if (sent_packets[pn_space].contains(ack.largest_acked) && 1217 IncludesAckEliciting(newly_acked_packets)): 1218 latest_rtt = 1219 now - sent_packets[pn_space][ack.largest_acked].time_sent 1220 ack_delay = 0 1221 if (pn_space == ApplicationData): 1222 ack_delay = ack.ack_delay 1223 UpdateRtt(ack_delay) 1225 // Process ECN information if present. 1226 if (ACK frame contains ECN information): 1227 ProcessECN(ack, pn_space) 1229 for acked_packet in newly_acked_packets: 1230 OnPacketAcked(acked_packet.packet_number, pn_space) 1232 DetectLostPackets(pn_space) 1234 pto_count = 0 1236 SetLossDetectionTimer() 1238 UpdateRtt(ack_delay): 1239 // First RTT sample. 1240 if (smoothed_rtt == 0): 1241 min_rtt = latest_rtt 1242 smoothed_rtt = latest_rtt 1243 rttvar = latest_rtt / 2 1244 return 1246 // min_rtt ignores ack delay. 1247 min_rtt = min(min_rtt, latest_rtt) 1248 // Limit ack_delay by max_ack_delay 1249 ack_delay = min(ack_delay, max_ack_delay) 1250 // Adjust for ack delay if plausible. 1251 adjusted_rtt = latest_rtt 1252 if (latest_rtt > min_rtt + ack_delay): 1253 adjusted_rtt = latest_rtt - ack_delay 1255 rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt) 1256 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt 1258 A.7. On Packet Acknowledgment 1260 When a packet is acknowledged for the first time, the following 1261 OnPacketAcked function is called. Note that a single ACK frame may 1262 newly acknowledge several packets. OnPacketAcked must be called once 1263 for each of these newly acknowledged packets. 1265 OnPacketAcked takes two parameters: acked_packet, which is the struct 1266 detailed in Appendix A.1.1, and the packet number space that this ACK 1267 frame was sent for. 1269 Pseudocode for OnPacketAcked follows: 1271 OnPacketAcked(acked_packet, pn_space): 1272 if (acked_packet.in_flight): 1273 OnPacketAckedCC(acked_packet) 1274 sent_packets[pn_space].remove(acked_packet.packet_number) 1276 A.8. Setting the Loss Detection Timer 1278 QUIC loss detection uses a single timer for all timeout loss 1279 detection. The duration of the timer is based on the timer's mode, 1280 which is set in the packet and timer events further below. The 1281 function SetLossDetectionTimer defined below shows how the single 1282 timer is set. 1284 This algorithm may result in the timer being set in the past, 1285 particularly if timers wake up late. Timers set in the past SHOULD 1286 fire immediately. 1288 Pseudocode for SetLossDetectionTimer follows: 1290 GetEarliestTimeAndSpace(times): 1291 time = times[Initial] 1292 space = Initial 1293 for pn_space in [ Handshake, ApplicationData ]: 1294 if (times[pn_space] != 0 && 1295 (time == 0 || times[pn_space] < time) && 1296 # Skip ApplicationData until handshake completion. 1297 (pn_space != ApplicationData || 1298 IsHandshakeComplete()): 1299 time = times[pn_space]; 1300 space = pn_space 1301 return time, space 1303 PeerNotAwaitingAddressValidation(): 1304 # Assume clients validate the server's address implicitly. 1305 if (endpoint is server): 1306 return true 1307 # Servers complete address validation when a 1308 # protected packet is received. 1309 return has received Handshake ACK || 1310 has received 1-RTT ACK 1312 SetLossDetectionTimer(): 1313 earliest_loss_time, _ = GetEarliestTimeAndSpace(loss_time) 1314 if (earliest_loss_time != 0): 1315 // Time threshold loss detection. 1316 loss_detection_timer.update(earliest_loss_time) 1317 return 1319 if (no ack-eliciting packets in flight && 1320 PeerNotAwaitingAddressValidation()): 1321 loss_detection_timer.cancel() 1322 return 1324 // Use a default timeout if there are no RTT measurements 1325 if (smoothed_rtt == 0): 1326 timeout = 2 * kInitialRtt 1327 else: 1328 // Calculate PTO duration 1329 timeout = smoothed_rtt + max(4 * rttvar, kGranularity) + 1330 max_ack_delay 1331 timeout = timeout * (2 ^ pto_count) 1333 sent_time, _ = GetEarliestTimeAndSpace( 1334 time_of_last_sent_ack_eliciting_packet) 1335 loss_detection_timer.update(sent_time + timeout) 1337 A.9. On Timeout 1339 When the loss detection timer expires, the timer's mode determines 1340 the action to be performed. 1342 Pseudocode for OnLossDetectionTimeout follows: 1344 OnLossDetectionTimeout(): 1345 earliest_loss_time, pn_space = 1346 GetEarliestTimeAndSpace(loss_time) 1347 if (earliest_loss_time != 0): 1348 // Time threshold loss Detection 1349 DetectLostPackets(pn_space) 1350 SetLossDetectionTimer() 1351 return 1353 if (endpoint is client without 1-RTT keys): 1354 // Client sends an anti-deadlock packet: Initial is padded 1355 // to earn more anti-amplification credit, 1356 // a Handshake packet proves address ownership. 1357 if (has Handshake keys): 1358 SendOneAckElicitingHandshakePacket() 1359 else: 1360 SendOneAckElicitingPaddedInitialPacket() 1361 else: 1362 // PTO. Send new data if available, else retransmit old data. 1363 // If neither is available, send a single PING frame. 1364 _, pn_space = GetEarliestTimeAndSpace( 1365 time_of_last_sent_ack_eliciting_packet) 1366 SendOneOrTwoAckElicitingPackets(pn_space) 1368 pto_count++ 1369 SetLossDetectionTimer() 1371 A.10. Detecting Lost Packets 1373 DetectLostPackets is called every time an ACK is received and 1374 operates on the sent_packets for that packet number space. 1376 Pseudocode for DetectLostPackets follows: 1378 DetectLostPackets(pn_space): 1379 assert(largest_acked_packet[pn_space] != infinite) 1380 loss_time[pn_space] = 0 1381 lost_packets = {} 1382 loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt) 1384 // Minimum time of kGranularity before packets are deemed lost. 1385 loss_delay = max(loss_delay, kGranularity) 1387 // Packets sent before this time are deemed lost. 1388 lost_send_time = now() - loss_delay 1390 foreach unacked in sent_packets[pn_space]: 1391 if (unacked.packet_number > largest_acked_packet[pn_space]): 1392 continue 1394 // Mark packet as lost, or set time when it should be marked. 1395 if (unacked.time_sent <= lost_send_time || 1396 largest_acked_packet[pn_space] >= 1397 unacked.packet_number + kPacketThreshold): 1398 sent_packets[pn_space].remove(unacked.packet_number) 1399 if (unacked.in_flight): 1400 lost_packets.insert(unacked) 1401 else: 1402 if (loss_time[pn_space] == 0): 1403 loss_time[pn_space] = unacked.time_sent + loss_delay 1404 else: 1405 loss_time[pn_space] = min(loss_time[pn_space], 1406 unacked.time_sent + loss_delay) 1408 // Inform the congestion controller of lost packets and 1409 // let it decide whether to retransmit immediately. 1410 if (!lost_packets.empty()): 1411 OnPacketsLost(lost_packets) 1413 Appendix B. Congestion Control Pseudocode 1415 We now describe an example implementation of the congestion 1416 controller described in Section 6. 1418 B.1. Constants of interest 1420 Constants used in congestion control are based on a combination of 1421 RFCs, papers, and common practice. 1423 kInitialWindow: Default limit on the initial amount of data in 1424 flight, in bytes. The RECOMMENDED value is the minimum of 10 * 1425 max_datagram_size and max(2 * max_datagram_size, 14720)). This 1426 follows the analysis and recommendations in [RFC6928], increasing 1427 the byte limit to account for the smaller 8 byte overhead of UDP 1428 compared to the 20 byte overhead for TCP. 1430 kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED 1431 value is 2 * max_datagram_size. 1433 kLossReductionFactor: Reduction in congestion window when a new loss 1434 event is detected. The RECOMMENDED value is 0.5. 1436 kPersistentCongestionThreshold: Period of time for persistent 1437 congestion to be established, specified as a PTO multiplier. The 1438 rationale for this threshold is to enable a sender to use initial 1439 PTOs for aggressive probing, as TCP does with Tail Loss Probe 1440 (TLP) [RACK], before establishing persistent congestion, as TCP 1441 does with a Retransmission Timeout (RTO) [RFC5681]. The 1442 RECOMMENDED value for kPersistentCongestionThreshold is 3, which 1443 is approximately equivalent to having two TLPs before an RTO in 1444 TCP. 1446 B.2. Variables of interest 1448 Variables required to implement the congestion control mechanisms are 1449 described in this section. 1451 max_datagram_size: The sender's current maximum payload size. Does 1452 not include UDP or IP overhead. The max datagram size is used for 1453 congestion window computations. An endpoint sets the value of 1454 this variable based on its PMTU (see Section 14.1 of 1455 [QUIC-TRANSPORT]), with a minimum value of 1200 bytes. 1457 ecn_ce_counters[kPacketNumberSpace]: The highest value reported for 1458 the ECN-CE counter in the packet number space by the peer in an 1459 ACK frame. This value is used to detect increases in the reported 1460 ECN-CE counter. 1462 bytes_in_flight: The sum of the size in bytes of all sent packets 1463 that contain at least one ack-eliciting or PADDING frame, and have 1464 not been acked or declared lost. The size does not include IP or 1465 UDP overhead, but does include the QUIC header and AEAD overhead. 1466 Packets only containing ACK frames do not count towards 1467 bytes_in_flight to ensure congestion control does not impede 1468 congestion feedback. 1470 congestion_window: Maximum number of bytes-in-flight that may be 1471 sent. 1473 congestion_recovery_start_time: The time when QUIC first detects 1474 congestion due to loss or ECN, causing it to enter congestion 1475 recovery. When a packet sent after this time is acknowledged, 1476 QUIC exits congestion recovery. 1478 ssthresh: Slow start threshold in bytes. When the congestion window 1479 is below ssthresh, the mode is slow start and the window grows by 1480 the number of bytes acknowledged. 1482 B.3. Initialization 1484 At the beginning of the connection, initialize the congestion control 1485 variables as follows: 1487 congestion_window = kInitialWindow 1488 bytes_in_flight = 0 1489 congestion_recovery_start_time = 0 1490 ssthresh = infinite 1491 for pn_space in [ Initial, Handshake, ApplicationData ]: 1492 ecn_ce_counters[pn_space] = 0 1494 B.4. On Packet Sent 1496 Whenever a packet is sent, and it contains non-ACK frames, the packet 1497 increases bytes_in_flight. 1499 OnPacketSentCC(bytes_sent): 1500 bytes_in_flight += bytes_sent 1502 B.5. On Packet Acknowledgement 1504 Invoked from loss detection's OnPacketAcked and is supplied with the 1505 acked_packet from sent_packets. 1507 InCongestionRecovery(sent_time): 1508 return sent_time <= congestion_recovery_start_time 1510 OnPacketAckedCC(acked_packet): 1511 // Remove from bytes_in_flight. 1512 bytes_in_flight -= acked_packet.size 1513 if (InCongestionRecovery(acked_packet.time_sent)): 1514 // Do not increase congestion window in recovery period. 1515 return 1516 if (IsAppOrFlowControlLimited()): 1517 // Do not increase congestion_window if application 1518 // limited or flow control limited. 1519 return 1520 if (congestion_window < ssthresh): 1521 // Slow start. 1522 congestion_window += acked_packet.size 1523 else: 1524 // Congestion avoidance. 1525 congestion_window += max_datagram_size * acked_packet.size 1526 / congestion_window 1528 B.6. On New Congestion Event 1530 Invoked from ProcessECN and OnPacketsLost when a new congestion event 1531 is detected. May start a new recovery period and reduces the 1532 congestion window. 1534 CongestionEvent(sent_time): 1535 // Start a new congestion event if packet was sent after the 1536 // start of the previous congestion recovery period. 1537 if (!InCongestionRecovery(sent_time)): 1538 congestion_recovery_start_time = Now() 1539 congestion_window *= kLossReductionFactor 1540 congestion_window = max(congestion_window, kMinimumWindow) 1541 ssthresh = congestion_window 1543 B.7. Process ECN Information 1545 Invoked when an ACK frame with an ECN section is received from the 1546 peer. 1548 ProcessECN(ack, pn_space): 1549 // If the ECN-CE counter reported by the peer has increased, 1550 // this could be a new congestion event. 1551 if (ack.ce_counter > ecn_ce_counters[pn_space]): 1552 ecn_ce_counters[pn_space] = ack.ce_counter 1553 CongestionEvent(sent_packets[ack.largest_acked].time_sent) 1555 B.8. On Packets Lost 1557 Invoked from DetectLostPackets when packets are deemed lost. 1559 InPersistentCongestion(largest_lost_packet): 1560 pto = smoothed_rtt + max(4 * rttvar, kGranularity) + 1561 max_ack_delay 1562 congestion_period = pto * kPersistentCongestionThreshold 1563 // Determine if all packets in the time period before the 1564 // newest lost packet, including the edges, are marked 1565 // lost 1566 return AreAllPacketsLost(largest_lost_packet, 1567 congestion_period) 1569 OnPacketsLost(lost_packets): 1570 // Remove lost packets from bytes_in_flight. 1571 for (lost_packet : lost_packets): 1572 bytes_in_flight -= lost_packet.size 1573 largest_lost_packet = lost_packets.last() 1574 CongestionEvent(largest_lost_packet.time_sent) 1576 // Collapse congestion window if persistent congestion 1577 if (InPersistentCongestion(largest_lost_packet)): 1578 congestion_window = kMinimumWindow 1580 Appendix C. Change Log 1582 *RFC Editor's Note:* Please remove this section prior to 1583 publication of a final version of this document. 1585 Issue and pull request numbers are listed with a leading octothorp. 1587 C.1. Since draft-ietf-quic-recovery-26 1589 No changes. 1591 C.2. Since draft-ietf-quic-recovery-25 1593 No significant changes. 1595 C.3. Since draft-ietf-quic-recovery-24 1597 * Require congestion control of some sort (#3247, #3244, #3248) 1599 * Set a minimum reordering threshold (#3256, #3240) 1601 * PTO is specific to a packet number space (#3067, #3074, #3066) 1603 C.4. Since draft-ietf-quic-recovery-23 1605 * Define under-utilizing the congestion window (#2630, #2686, #2675) 1607 * PTO MUST send data if possible (#3056, #3057) 1609 * Connection Close is not ack-eliciting (#3097, #3098) 1611 * MUST limit bursts to the initial congestion window (#3160) 1613 * Define the current max_datagram_size for congestion control 1614 (#3041, #3167) 1616 C.5. Since draft-ietf-quic-recovery-22 1618 * PTO should always send an ack-eliciting packet (#2895) 1620 * Unify the Handshake Timer with the PTO timer (#2648, #2658, #2886) 1622 * Move ACK generation text to transport draft (#1860, #2916) 1624 C.6. Since draft-ietf-quic-recovery-21 1626 * No changes 1628 C.7. Since draft-ietf-quic-recovery-20 1630 * Path validation can be used as initial RTT value (#2644, #2687) 1632 * max_ack_delay transport parameter defaults to 0 (#2638, #2646) 1634 * Ack Delay only measures intentional delays induced by the 1635 implementation (#2596, #2786) 1637 C.8. Since draft-ietf-quic-recovery-19 1639 * Change kPersistentThreshold from an exponent to a multiplier 1640 (#2557) 1642 * Send a PING if the PTO timer fires and there's nothing to send 1643 (#2624) 1645 * Set loss delay to at least kGranularity (#2617) 1647 * Merge application limited and sending after idle sections. Always 1648 limit burst size instead of requiring resetting CWND to initial 1649 CWND after idle (#2605) 1651 * Rewrite RTT estimation, allow RTT samples where a newly acked 1652 packet is ack-eliciting but the largest_acked is not (#2592) 1654 * Don't arm the handshake timer if there is no handshake data 1655 (#2590) 1657 * Clarify that the time threshold loss alarm takes precedence over 1658 the crypto handshake timer (#2590, #2620) 1660 * Change initial RTT to 500ms to align with RFC6298 (#2184) 1662 C.9. Since draft-ietf-quic-recovery-18 1664 * Change IW byte limit to 14720 from 14600 (#2494) 1666 * Update PTO calculation to match RFC6298 (#2480, #2489, #2490) 1668 * Improve loss detection's description of multiple packet number 1669 spaces and pseudocode (#2485, #2451, #2417) 1671 * Declare persistent congestion even if non-probe packets are sent 1672 and don't make persistent congestion more aggressive than RTO 1673 verified was (#2365, #2244) 1675 * Move pseudocode to the appendices (#2408) 1677 * What to send on multiple PTOs (#2380) 1679 C.10. Since draft-ietf-quic-recovery-17 1681 * After Probe Timeout discard in-flight packets or send another 1682 (#2212, #1965) 1684 * Endpoints discard initial keys as soon as handshake keys are 1685 available (#1951, #2045) 1687 * 0-RTT state is discarded when 0-RTT is rejected (#2300) 1689 * Loss detection timer is cancelled when ack-eliciting frames are in 1690 flight (#2117, #2093) 1692 * Packets are declared lost if they are in flight (#2104) 1694 * After becoming idle, either pace packets or reset the congestion 1695 controller (#2138, 2187) 1697 * Process ECN counts before marking packets lost (#2142) 1698 * Mark packets lost before resetting crypto_count and pto_count 1699 (#2208, #2209) 1701 * Congestion and loss recovery state are discarded when keys are 1702 discarded (#2327) 1704 C.11. Since draft-ietf-quic-recovery-16 1706 * Unify TLP and RTO into a single PTO; eliminate min RTO, min TLP 1707 and min crypto timeouts; eliminate timeout validation (#2114, 1708 #2166, #2168, #1017) 1710 * Redefine how congestion avoidance in terms of when the period 1711 starts (#1928, #1930) 1713 * Document what needs to be tracked for packets that are in flight 1714 (#765, #1724, #1939) 1716 * Integrate both time and packet thresholds into loss detection 1717 (#1969, #1212, #934, #1974) 1719 * Reduce congestion window after idle, unless pacing is used (#2007, 1720 #2023) 1722 * Disable RTT calculation for packets that don't elicit 1723 acknowledgment (#2060, #2078) 1725 * Limit ack_delay by max_ack_delay (#2060, #2099) 1727 * Initial keys are discarded once Handshake keys are available 1728 (#1951, #2045) 1730 * Reorder ECN and loss detection in pseudocode (#2142) 1732 * Only cancel loss detection timer if ack-eliciting packets are in 1733 flight (#2093, #2117) 1735 C.12. Since draft-ietf-quic-recovery-14 1737 * Used max_ack_delay from transport params (#1796, #1782) 1739 * Merge ACK and ACK_ECN (#1783) 1741 C.13. Since draft-ietf-quic-recovery-13 1743 * Corrected the lack of ssthresh reduction in CongestionEvent 1744 pseudocode (#1598) 1746 * Considerations for ECN spoofing (#1426, #1626) 1748 * Clarifications for PADDING and congestion control (#837, #838, 1749 #1517, #1531, #1540) 1751 * Reduce early retransmission timer to RTT/8 (#945, #1581) 1753 * Packets are declared lost after an RTO is verified (#935, #1582) 1755 C.14. Since draft-ietf-quic-recovery-12 1757 * Changes to manage separate packet number spaces and encryption 1758 levels (#1190, #1242, #1413, #1450) 1760 * Added ECN feedback mechanisms and handling; new ACK_ECN frame 1761 (#804, #805, #1372) 1763 C.15. Since draft-ietf-quic-recovery-11 1765 No significant changes. 1767 C.16. Since draft-ietf-quic-recovery-10 1769 * Improved text on ack generation (#1139, #1159) 1771 * Make references to TCP recovery mechanisms informational (#1195) 1773 * Define time_of_last_sent_handshake_packet (#1171) 1775 * Added signal from TLS the data it includes needs to be sent in a 1776 Retry packet (#1061, #1199) 1778 * Minimum RTT (min_rtt) is initialized with an infinite value 1779 (#1169) 1781 C.17. Since draft-ietf-quic-recovery-09 1783 No significant changes. 1785 C.18. Since draft-ietf-quic-recovery-08 1787 * Clarified pacing and RTO (#967, #977) 1789 C.19. Since draft-ietf-quic-recovery-07 1791 * Include Ack Delay in RTO(and TLP) computations (#981) 1793 * Ack Delay in SRTT computation (#961) 1794 * Default RTT and Slow Start (#590) 1796 * Many editorial fixes. 1798 C.20. Since draft-ietf-quic-recovery-06 1800 No significant changes. 1802 C.21. Since draft-ietf-quic-recovery-05 1804 * Add more congestion control text (#776) 1806 C.22. Since draft-ietf-quic-recovery-04 1808 No significant changes. 1810 C.23. Since draft-ietf-quic-recovery-03 1812 No significant changes. 1814 C.24. Since draft-ietf-quic-recovery-02 1816 * Integrate F-RTO (#544, #409) 1818 * Add congestion control (#545, #395) 1820 * Require connection abort if a skipped packet was acknowledged 1821 (#415) 1823 * Simplify RTO calculations (#142, #417) 1825 C.25. Since draft-ietf-quic-recovery-01 1827 * Overview added to loss detection 1829 * Changes initial default RTT to 100ms 1831 * Added time-based loss detection and fixes early retransmit 1833 * Clarified loss recovery for handshake packets 1835 * Fixed references and made TCP references informative 1837 C.26. Since draft-ietf-quic-recovery-00 1839 * Improved description of constants and ACK behavior 1841 C.27. Since draft-iyengar-quic-loss-recovery-01 1843 * Adopted as base for draft-ietf-quic-recovery 1845 * Updated authors/editors list 1847 * Added table of contents 1849 Appendix D. Contributors 1851 The IETF QUIC Working Group received an enormous amount of support 1852 from many people. The following people provided substantive 1853 contributions to this document: Alessandro Ghedini, Benjamin 1854 Saunders, Gorry Fairhurst, 奥 一穂 (Kazuho Oku), Lars Eggert, Magnus 1855 Westerlund, Marten Seemann, Martin Duke, Martin Thomson, Nick Banks, 1856 Praveen Balasubramaniam. 1858 Acknowledgments 1860 Authors' Addresses 1862 Jana Iyengar (editor) 1863 Fastly 1865 Email: jri.ietf@gmail.com 1867 Ian Swett (editor) 1868 Google 1870 Email: ianswett@google.com