idnits 2.17.1 draft-ietf-quic-recovery-20.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([2], [3], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 23, 2019) is 1830 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1102 -- Looks like a reference, but probably isn't: '2' on line 1104 -- Looks like a reference, but probably isn't: '3' on line 1106 == Missing Reference: 'Initial' is mentioned on line 1361, but not defined == Outdated reference: A later version (-34) exists of draft-ietf-quic-tls-20 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-20 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-rack-04 -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft Fastly 4 Intended status: Standards Track I. Swett, Ed. 5 Expires: October 25, 2019 Google 6 April 23, 2019 8 QUIC Loss Detection and Congestion Control 9 draft-ietf-quic-recovery-20 11 Abstract 13 This document describes loss detection and congestion control 14 mechanisms for QUIC. 16 Note to Readers 18 Discussion of this draft takes place on the QUIC working group 19 mailing list (quic@ietf.org), which is archived at 20 https://mailarchive.ietf.org/arch/search/?email_list=quic [1]. 22 Working Group information can be found at https://github.com/quicwg 23 [2]; source code and issues list for this draft can be found at 24 https://github.com/quicwg/base-drafts/labels/-recovery [3]. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on October 25, 2019. 43 Copyright Notice 45 Copyright (c) 2019 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 62 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 5 63 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 64 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 6 65 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 66 3.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6 67 3.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 7 68 3.1.5. Explicit Correction For Delayed Acknowledgements . . 7 69 4. Generating Acknowledgements . . . . . . . . . . . . . . . . . 7 70 4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . . . 7 71 4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . . . 8 72 4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . . . 8 73 4.4. Measuring and Reporting Host Delay . . . . . . . . . . . 8 74 5. Estimating the Round-Trip Time . . . . . . . . . . . . . . . 9 75 5.1. Generating RTT samples . . . . . . . . . . . . . . . . . 9 76 5.2. Estimating min_rtt . . . . . . . . . . . . . . . . . . . 10 77 5.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 10 78 6. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 11 79 6.1. Acknowledgement-based Detection . . . . . . . . . . . . . 11 80 6.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 12 81 6.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 12 82 6.2. Crypto Retransmission Timeout . . . . . . . . . . . . . . 13 83 6.2.1. Retry and Version Negotiation . . . . . . . . . . . . 14 84 6.2.2. Discarding Keys and Packet State . . . . . . . . . . 14 85 6.3. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 15 86 6.3.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 15 87 6.3.2. Sending Probe Packets . . . . . . . . . . . . . . . . 16 88 6.3.3. Loss Detection . . . . . . . . . . . . . . . . . . . 17 89 6.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . 17 90 7. Congestion Control . . . . . . . . . . . . . . . . . . . . . 17 91 7.1. Explicit Congestion Notification . . . . . . . . . . . . 17 92 7.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 18 93 7.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 18 94 7.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 18 95 7.5. Ignoring Loss of Undecryptable Packets . . . . . . . . . 18 96 7.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 18 97 7.7. Persistent Congestion . . . . . . . . . . . . . . . . . . 19 98 7.8. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 20 99 7.9. Under-utilizing the Congestion Window . . . . . . . . . . 20 100 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 101 8.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 21 102 8.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 21 103 8.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 21 104 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 105 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 106 10.1. Normative References . . . . . . . . . . . . . . . . . . 22 107 10.2. Informative References . . . . . . . . . . . . . . . . . 22 108 10.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 24 109 Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 24 110 A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 24 111 A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 24 112 A.2. Constants of interest . . . . . . . . . . . . . . . . . . 25 113 A.3. Variables of interest . . . . . . . . . . . . . . . . . . 25 114 A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 26 115 A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 27 116 A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 27 117 A.7. On Packet Acknowledgment . . . . . . . . . . . . . . . . 29 118 A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 29 119 A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 31 120 A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 31 121 Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 32 122 B.1. Constants of interest . . . . . . . . . . . . . . . . . . 32 123 B.2. Variables of interest . . . . . . . . . . . . . . . . . . 33 124 B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 34 125 B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 34 126 B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 34 127 B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 35 128 B.7. Process ECN Information . . . . . . . . . . . . . . . . . 35 129 B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 36 130 Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 36 131 C.1. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 36 132 C.2. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 37 133 C.3. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 37 134 C.4. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 38 135 C.5. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 38 136 C.6. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 38 137 C.7. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 39 138 C.8. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 39 139 C.9. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 39 140 C.10. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 39 141 C.11. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 39 142 C.12. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 39 143 C.13. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 40 144 C.14. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 40 145 C.15. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 40 146 C.16. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 40 147 C.17. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 40 148 C.18. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 40 149 C.19. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 40 150 C.20. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 40 151 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 41 152 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 154 1. Introduction 156 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 157 on decades of transport and security experience, and implements 158 mechanisms that make it attractive as a modern general-purpose 159 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 161 QUIC implements the spirit of existing TCP loss recovery mechanisms, 162 described in RFCs, various Internet-drafts, and also those prevalent 163 in the Linux TCP implementation. This document describes QUIC 164 congestion control and loss recovery, and where applicable, 165 attributes the TCP equivalent in RFCs, Internet-drafts, academic 166 papers, and/or TCP implementations. 168 2. Conventions and Definitions 170 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 171 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 172 "OPTIONAL" in this document are to be interpreted as described in BCP 173 14 [RFC2119] [RFC8174] when, and only when, they appear in all 174 capitals, as shown here. 176 Definitions of terms that are used in this document: 178 ACK-only: Any packet containing only one or more ACK frame(s). 180 In-flight: Packets are considered in-flight when they have been sent 181 and neither acknowledged nor declared lost, and they are not ACK- 182 only. 184 Ack-eliciting Frames: All frames besides ACK or PADDING are 185 considered ack-eliciting. 187 Ack-eliciting Packets: Packets that contain ack-eliciting frames 188 elicit an ACK from the receiver within the maximum ack delay and 189 are called ack-eliciting packets. 191 Crypto Packets: Packets containing CRYPTO data sent in Initial or 192 Handshake packets. 194 Out-of-order Packets: Packets that do not increase the largest 195 received packet number for its packet number space by exactly one. 196 Packets arrive out of order when earlier packets are lost or 197 delayed. 199 3. Design of the QUIC Transmission Machinery 201 All transmissions in QUIC are sent with a packet-level header, which 202 indicates the encryption level and includes a packet sequence number 203 (referred to below as a packet number). The encryption level 204 indicates the packet number space, as described in [QUIC-TRANSPORT]. 205 Packet numbers never repeat within a packet number space for the 206 lifetime of a connection. Packet numbers monotonically increase 207 within a space, preventing ambiguity. 209 This design obviates the need for disambiguating between 210 transmissions and retransmissions and eliminates significant 211 complexity from QUIC's interpretation of TCP loss detection 212 mechanisms. 214 QUIC packets can contain multiple frames of different types. The 215 recovery mechanisms ensure that data and frames that need reliable 216 delivery are acknowledged or declared lost and sent in new packets as 217 necessary. The types of frames contained in a packet affect recovery 218 and congestion control logic: 220 o All packets are acknowledged, though packets that contain no ack- 221 eliciting frames are only acknowledged along with ack-eliciting 222 packets. 224 o Long header packets that contain CRYPTO frames are critical to the 225 performance of the QUIC handshake and use shorter timers for 226 acknowledgement and retransmission. 228 o Packets that contain only ACK frames do not count toward 229 congestion control limits and are not considered in-flight. 231 o PADDING frames cause packets to contribute toward bytes in flight 232 without directly causing an acknowledgment to be sent. 234 3.1. Relevant Differences Between QUIC and TCP 236 Readers familiar with TCP's loss detection and congestion control 237 will find algorithms here that parallel well-known TCP ones. 238 Protocol differences between QUIC and TCP however contribute to 239 algorithmic differences. We briefly describe these protocol 240 differences below. 242 3.1.1. Separate Packet Number Spaces 244 QUIC uses separate packet number spaces for each encryption level, 245 except 0-RTT and all generations of 1-RTT keys use the same packet 246 number space. Separate packet number spaces ensures acknowledgement 247 of packets sent with one level of encryption will not cause spurious 248 retransmission of packets sent with a different encryption level. 249 Congestion control and round-trip time (RTT) measurement are unified 250 across packet number spaces. 252 3.1.2. Monotonically Increasing Packet Numbers 254 TCP conflates transmission order at the sender with delivery order at 255 the receiver, which results in retransmissions of the same data 256 carrying the same sequence number, and consequently leads to 257 "retransmission ambiguity". QUIC separates the two: QUIC uses a 258 packet number to indicate transmission order, and any application 259 data is sent in one or more streams, with delivery order determined 260 by stream offsets encoded within STREAM frames. 262 QUIC's packet number is strictly increasing within a packet number 263 space, and directly encodes transmission order. A higher packet 264 number signifies that the packet was sent later, and a lower packet 265 number signifies that the packet was sent earlier. When a packet 266 containing ack-eliciting frames is detected lost, QUIC rebundles 267 necessary frames in a new packet with a new packet number, removing 268 ambiguity about which packet is acknowledged when an ACK is received. 269 Consequently, more accurate RTT measurements can be made, spurious 270 retransmissions are trivially detected, and mechanisms such as Fast 271 Retransmit can be applied universally, based only on packet number. 273 This design point significantly simplifies loss detection mechanisms 274 for QUIC. Most TCP mechanisms implicitly attempt to infer 275 transmission ordering based on TCP sequence numbers - a non-trivial 276 task, especially when TCP timestamps are not available. 278 3.1.3. No Reneging 280 QUIC ACKs contain information that is similar to TCP SACK, but QUIC 281 does not allow any acked packet to be reneged, greatly simplifying 282 implementations on both sides and reducing memory pressure on the 283 sender. 285 3.1.4. More ACK Ranges 287 QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In 288 high loss environments, this speeds recovery, reduces spurious 289 retransmits, and ensures forward progress without relying on 290 timeouts. 292 3.1.5. Explicit Correction For Delayed Acknowledgements 294 QUIC endpoints measure the delay incurred between when a packet is 295 received and when the corresponding acknowledgment is sent, allowing 296 a peer to maintain a more accurate round-trip time estimate (see 297 Section 4.4). 299 4. Generating Acknowledgements 301 An acknowledgement SHOULD be sent immediately upon receipt of a 302 second ack-eliciting packet. QUIC recovery algorithms do not assume 303 the peer sends an ACK immediately when receiving a second ack- 304 eliciting packet. 306 In order to accelerate loss recovery and reduce timeouts, the 307 receiver SHOULD send an immediate ACK after it receives an out-of- 308 order packet. It could send immediate ACKs for in-order packets for 309 a period of time that SHOULD NOT exceed 1/8 RTT unless more out-of- 310 order packets arrive. If every packet arrives out-of- order, then an 311 immediate ACK SHOULD be sent for every received packet. 313 Similarly, packets marked with the ECN Congestion Experienced (CE) 314 codepoint in the IP header SHOULD be acknowledged immediately, to 315 reduce the peer's response time to congestion events. 317 As an optimization, a receiver MAY process multiple packets before 318 sending any ACK frames in response. In this case the receiver can 319 determine whether an immediate or delayed acknowledgement should be 320 generated after processing incoming packets. 322 4.1. Crypto Handshake Data 324 In order to quickly complete the handshake and avoid spurious 325 retransmissions due to crypto retransmission timeouts, crypto packets 326 SHOULD use a very short ack delay, such as the local timer 327 granularity. ACK frames SHOULD be sent immediately when the crypto 328 stack indicates all data for that packet number space has been 329 received. 331 4.2. ACK Ranges 333 When an ACK frame is sent, one or more ranges of acknowledged packets 334 are included. Including older packets reduces the chance of spurious 335 retransmits caused by losing previously sent ACK frames, at the cost 336 of larger ACK frames. 338 ACK frames SHOULD always acknowledge the most recently received 339 packets, and the more out-of-order the packets are, the more 340 important it is to send an updated ACK frame quickly, to prevent the 341 peer from declaring a packet as lost and spuriously retransmitting 342 the frames it contains. 344 Below is one recommended approach for determining what packets to 345 include in an ACK frame. 347 4.3. Receiver Tracking of ACK Frames 349 When a packet containing an ACK frame is sent, the largest 350 acknowledged in that frame may be saved. When a packet containing an 351 ACK frame is acknowledged, the receiver can stop acknowledging 352 packets less than or equal to the largest acknowledged in the sent 353 ACK frame. 355 In cases without ACK frame loss, this algorithm allows for a minimum 356 of 1 RTT of reordering. In cases with ACK frame loss and reordering, 357 this approach does not guarantee that every acknowledgement is seen 358 by the sender before it is no longer included in the ACK frame. 359 Packets could be received out of order and all subsequent ACK frames 360 containing them could be lost. In this case, the loss recovery 361 algorithm may cause spurious retransmits, but the sender will 362 continue making forward progress. 364 4.4. Measuring and Reporting Host Delay 366 An endpoint measures the delay incurred between when a packet is 367 received and when the corresponding acknowledgment is sent. The 368 endpoint encodes this host delay for the largest acknowledged packet 369 in the Ack Delay field of an ACK frame (see Section 19.3 of 370 [QUIC-TRANSPORT]). This allows the receiver of the ACK to adjust for 371 any host delays, which is important for delayed acknowledgements, 372 when estimating the path RTT. In certain deployments, a packet might 373 be held in the OS kernel or elsewhere on the host before being 374 processed by the QUIC stack. Where possible, an endpoint MAY include 375 these delays when populating the Ack Delay field in an ACK frame. 377 An endpoint MUST NOT excessively delay acknowledgements of ack- 378 eliciting packets. The maximum ack delay is communicated in the 379 max_ack_delay transport parameter, see Section 18.1 of 380 [QUIC-TRANSPORT]. max_ack_delay implies an explicit contract: an 381 endpoint promises to never delay acknowledgments of an ack-eliciting 382 packet by more than the indicated value. If it does, any excess 383 accrues to the RTT estimate and could result in spurious 384 retransmissions from the peer. 386 5. Estimating the Round-Trip Time 388 At a high level, an endpoint measures the time from when a packet was 389 sent to when it is acknowledged as a round-trip time (RTT) sample. 390 The endpoint uses RTT samples and peer-reported host delays 391 (Section 4.4) to generate a statistical description of the 392 connection's RTT. An endpoint computes the following three values: 393 the minimum value observed over the lifetime of the connection 394 (min_rtt), an exponentially-weighted moving average (smoothed_rtt), 395 and the variance in the observed RTT samples (rttvar). 397 5.1. Generating RTT samples 399 An endpoint generates an RTT sample on receiving an ACK frame that 400 meets the following two conditions: 402 o the largest acknowledged packet number is newly acknowledged, and 404 o at least one of the newly acknowledged packets was ack-eliciting. 406 The RTT sample, latest_rtt, is generated as the time elapsed since 407 the largest acknowledged packet was sent: 409 latest_rtt = ack_time - send_time_of_largest_acked 411 An RTT sample is generated using only the largest acknowledged packet 412 in the received ACK frame. This is because a peer reports host 413 delays for only the largest acknowledged packet in an ACK frame. 414 While the reported host delay is not used by the RTT sample 415 measurement, it is used to adjust the RTT sample in subsequent 416 computations of smoothed_rtt and rttvar Section 5.3. 418 To avoid generating multiple RTT samples using the same packet, an 419 ACK frame SHOULD NOT be used to update RTT estimates if it does not 420 newly acknowledge the largest acknowledged packet. 422 An RTT sample MUST NOT be generated on receiving an ACK frame that 423 does not newly acknowledge at least one ack-eliciting packet. A peer 424 does not send an ACK frame on receiving only non-ack-eliciting 425 packets, so an ACK frame that is subsequently sent can include an 426 arbitrarily large Ack Delay field. Ignoring such ACK frames avoids 427 complications in subsequent smoothed_rtt and rttvar computations. 429 A sender might generate multiple RTT samples per RTT when multiple 430 ACK frames are received within an RTT. As suggested in [RFC6298], 431 doing so might result in inadequate history in smoothed_rtt and 432 rttvar. Ensuring that RTT estimates retain sufficient history is an 433 open research question. 435 5.2. Estimating min_rtt 437 min_rtt is the minimum RTT observed over the lifetime of the 438 connection. min_rtt is set to the latest_rtt on the first sample in 439 a connection, and to the lesser of min_rtt and latest_rtt on 440 subsequent samples. 442 An endpoint uses only locally observed times in computing the min_rtt 443 and does not adjust for host delays reported by the peer 444 (Section 4.4). Doing so allows the endpoint to set a lower bound for 445 the smoothed_rtt based entirely on what it observes (see 446 Section 5.3), and limits potential underestimation due to 447 erroneously-reported delays by the peer. 449 5.3. Estimating smoothed_rtt and rttvar 451 smoothed_rtt is an exponentially-weighted moving average of an 452 endpoint's RTT samples, and rttvar is the endpoint's estimated 453 variance in the RTT samples. 455 smoothed_rtt uses path latency after adjusting RTT samples for peer- 456 reported host delays (Section 4.4). A peer limits any delay in 457 sending an acknowledgement for an ack-eliciting packet to no greater 458 than the advertised max_ack_delay transport parameter. Consequently, 459 when a peer reports an Ack Delay that is greater than its 460 max_ack_delay, the delay is attributed to reasons out of the peer's 461 control, such as scheduler latency at the peer or loss of previous 462 ACK frames. Any delays beyond the peer's max_ack_delay are therefore 463 considered effectively part of path delay and incorporated into the 464 smoothed_rtt estimate. 466 When adjusting an RTT sample using peer-reported acknowledgement 467 delays, an endpoint: 469 o MUST use the lesser of the value reported in Ack Delay field of 470 the ACK frame and the peer's max_ack_delay transport parameter 471 (Section 4.4). 473 o MUST NOT apply the adjustment if the resulting RTT sample is 474 smaller than the min_rtt. This limits the underestimation that a 475 misreporting peer can cause to the smoothed_rtt. 477 On the first RTT sample in a connection, the smoothed_rtt is set to 478 the latest_rtt. 480 smoothed_rtt and rttvar are computed as follows, similar to 481 [RFC6298]. On the first RTT sample in a connection: 483 smoothed_rtt = latest_rtt 484 rttvar = latest_rtt / 2 486 On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows: 488 ack_delay = min(Ack Delay in ACK Frame, max_ack_delay) 489 adjusted_rtt = latest_rtt 490 if (min_rtt + ack_delay < latest_rtt): 491 adjusted_rtt = latest_rtt - ack_delay 492 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt 493 rttvar_sample = abs(smoothed_rtt - adjusted_rtt) 494 rttvar = 3/4 * rttvar + 1/4 * rttvar_sample 496 6. Loss Detection 498 QUIC senders use both ack information and timeouts to detect lost 499 packets, and this section provides a description of these algorithms. 501 If a packet is lost, the QUIC transport needs to recover from that 502 loss, such as by retransmitting the data, sending an updated frame, 503 or abandoning the frame. For more information, see Section 13.2 of 504 [QUIC-TRANSPORT]. 506 6.1. Acknowledgement-based Detection 508 Acknowledgement-based loss detection implements the spirit of TCP's 509 Fast Retransmit [RFC5681], Early Retransmit [RFC5827], FACK [FACK], 510 SACK loss recovery [RFC6675], and RACK [RACK]. This section provides 511 an overview of how these algorithms are implemented in QUIC. 513 A packet is declared lost if it meets all the following conditions: 515 o The packet is unacknowledged, in-flight, and was sent prior to an 516 acknowledged packet. 518 o Either its packet number is kPacketThreshold smaller than an 519 acknowledged packet (Section 6.1.1), or it was sent long enough in 520 the past (Section 6.1.2). 522 The acknowledgement indicates that a packet sent later was delivered, 523 while the packet and time thresholds provide some tolerance for 524 packet reordering. 526 Spuriously declaring packets as lost leads to unnecessary 527 retransmissions and may result in degraded performance due to the 528 actions of the congestion controller upon detecting loss. 529 Implementations that detect spurious retransmissions and increase the 530 reordering threshold in packets or time MAY choose to start with 531 smaller initial reordering thresholds to minimize recovery latency. 533 6.1.1. Packet Threshold 535 The RECOMMENDED initial value for the packet reordering threshold 536 (kPacketThreshold) is 3, based on best practices for TCP loss 537 detection [RFC5681] [RFC6675]. 539 Some networks may exhibit higher degrees of reordering, causing a 540 sender to detect spurious losses. Implementers MAY use algorithms 541 developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's 542 reordering resilience. 544 6.1.2. Time Threshold 546 Once a later packet has been acknowledged, an endpoint SHOULD declare 547 an earlier packet lost if it was sent a threshold amount of time in 548 the past. The time threshold is computed as kTimeThreshold * 549 max(SRTT, latest_RTT). If packets sent prior to the largest 550 acknowledged packet cannot yet be declared lost, then a timer SHOULD 551 be set for the remaining time. 553 The RECOMMENDED time threshold (kTimeThreshold), expressed as a 554 round-trip time multiplier, is 9/8. 556 Using max(SRTT, latest_RTT) protects from the two following cases: 558 o the latest RTT sample is lower than the SRTT, perhaps due to 559 reordering where the acknowledgement encountered a shorter path; 561 o the latest RTT sample is higher than the SRTT, perhaps due to a 562 sustained increase in the actual RTT, but the smoothed SRTT has 563 not yet caught up. 565 An endpoint might consistently record RTT samples as 0 in extremely 566 low latency networks, leading to a smoothed_rtt of 0. Consequently, 567 the endpoint could declare all earlier packets as lost immediately 568 upon receiving an acknowledgement for a later packet. That is, the 569 endpoint would not provide any reordering tolerance. To avoid 570 declaring packets as lost too early, the time threshold MUST be set 571 to at least kGranularity (defined in Appendix A.2). 573 Implementations MAY experiment with absolute thresholds, thresholds 574 from previous connections, adaptive thresholds, or including RTT 575 variance. Smaller thresholds reduce reordering resilience and 576 increase spurious retransmissions, and larger thresholds increase 577 loss detection delay. 579 6.2. Crypto Retransmission Timeout 581 Data in CRYPTO frames is critical to QUIC transport and crypto 582 negotiation, so a more aggressive timeout is used to retransmit it. 584 The initial crypto retransmission timeout SHOULD be set to twice the 585 initial RTT. 587 At the beginning, there are no prior RTT samples within a connection. 588 Resumed connections over the same network SHOULD use the previous 589 connection's final smoothed RTT value as the resumed connection's 590 initial RTT. If no previous RTT is available, or if the network 591 changes, the initial RTT SHOULD be set to 500ms, resulting in a 1 592 second initial handshake timeout as recommended in [RFC6298]. 594 When a crypto packet is sent, the sender MUST set a timer for twice 595 the smoothed RTT. This timer MUST be updated when a new crypto 596 packet is sent and when an acknowledgement is received which computes 597 a new RTT sample. Upon timeout, the sender MUST retransmit all 598 unacknowledged CRYPTO data if possible. The sender MUST NOT declare 599 in-flight crypto packets as lost when the crypto timer expires. 601 On each consecutive expiration of the crypto timer without receiving 602 an acknowledgement for a new packet, the sender MUST double the 603 crypto retransmission timeout and set a timer for this period. 605 Until the server has validated the client's address on the path, the 606 amount of data it can send is limited, as specified in Section 8.1 of 607 [QUIC-TRANSPORT]. If not all unacknowledged CRYPTO data can be sent, 608 then all unacknowledged CRYPTO data sent in Initial packets should be 609 retransmitted. If no data can be sent, then no alarm should be armed 610 until data has been received from the client. 612 Because the server could be blocked until more packets are received, 613 the client MUST ensure that the crypto retransmission timer is set if 614 there is unacknowledged crypto data or if the client does not yet 615 have 1-RTT keys. If the crypto retransmission timer expires before 616 the client has 1-RTT keys, it is possible that the client may not 617 have any crypto data to retransmit. However, the client MUST send a 618 new packet, containing only PING or PADDING frames if necessary, to 619 allow the server to continue sending data. If Handshake keys are 620 available to the client, it MUST send a Handshake packet, and 621 otherwise it MUST send an Initial packet in a UDP datagram of at 622 least 1200 bytes. 624 The crypto retransmission timer is not set if the time threshold 625 Section 6.1.2 loss detection timer is set. The time threshold loss 626 detection timer is expected to both expire earlier than the crypto 627 retransmission timeout and be less likely to spuriously retransmit 628 data. The Initial and Handshake packet number spaces will typically 629 contain a small number of packets, so losses are less likely to be 630 detected using packet-threshold loss detection. 632 When the crypto retransmission timer is active, the probe timer 633 (Section 6.3) is not active. 635 6.2.1. Retry and Version Negotiation 637 A Retry or Version Negotiation packet causes a client to send another 638 Initial packet, effectively restarting the connection process and 639 resetting congestion control and loss recovery state, including 640 resetting any pending timers. Either packet indicates that the 641 Initial was received but not processed. Neither packet can be 642 treated as an acknowledgment for the Initial. 644 The client MAY however compute an RTT estimate to the server as the 645 time period from when the first Initial was sent to when a Retry or a 646 Version Negotiation packet is received. The client MAY use this 647 value to seed the RTT estimator for a subsequent connection attempt 648 to the server. 650 6.2.2. Discarding Keys and Packet State 652 When packet protection keys are discarded (see Section 4.9 of 653 [QUIC-TLS]), all packets that were sent with those keys can no longer 654 be acknowledged because their acknowledgements cannot be processed 655 anymore. The sender MUST discard all recovery state associated with 656 those packets and MUST remove them from the count of bytes in flight. 658 Endpoints stop sending and receiving Initial packets once they start 659 exchanging Handshake packets (see Section 17.2.2.1 of 660 [QUIC-TRANSPORT]). At this point, recovery state for all in-flight 661 Initial packets is discarded. 663 When 0-RTT is rejected, recovery state for all in-flight 0-RTT 664 packets is discarded. 666 If a server accepts 0-RTT, but does not buffer 0-RTT packets that 667 arrive before Initial packets, early 0-RTT packets will be declared 668 lost, but that is expected to be infrequent. 670 It is expected that keys are discarded after packets encrypted with 671 them would be acknowledged or declared lost. Initial secrets however 672 might be destroyed sooner, as soon as handshake keys are available 673 (see Section 4.10 of [QUIC-TLS]). 675 6.3. Probe Timeout 677 A Probe Timeout (PTO) triggers a probe packet when ack-eliciting data 678 is in flight but an acknowledgement is not received within the 679 expected period of time. A PTO enables a connection to recover from 680 loss of tail packets or acks. The PTO algorithm used in QUIC 681 implements the reliability functions of Tail Loss Probe [TLP] [RACK], 682 RTO [RFC5681] and F-RTO algorithms for TCP [RFC5682], and the timeout 683 computation is based on TCP's retransmission timeout period 684 [RFC6298]. 686 6.3.1. Computing PTO 688 When an ack-eliciting packet is transmitted, the sender schedules a 689 timer for the PTO period as follows: 691 PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay 693 kGranularity, smoothed_rtt, rttvar, and max_ack_delay are defined in 694 Appendix A.2 and Appendix A.3. 696 The PTO period is the amount of time that a sender ought to wait for 697 an acknowledgement of a sent packet. This time period includes the 698 estimated network roundtrip-time (smoothed_rtt), the variance in the 699 estimate (4*rttvar), and max_ack_delay, to account for the maximum 700 time by which a receiver might delay sending an acknowledgement. 702 The PTO value MUST be set to at least kGranularity, to avoid the 703 timer expiring immediately. 705 When a PTO timer expires, the sender probes the network as described 706 in the next section. The PTO period MUST be set to twice its current 707 value. This exponential reduction in the sender's rate is important 708 because the PTOs might be caused by loss of packets or 709 acknowledgements due to severe congestion. 711 A sender computes its PTO timer every time an ack-eliciting packet is 712 sent. A sender might choose to optimize this by setting the timer 713 fewer times if it knows that more ack-eliciting packets will be sent 714 within a short period of time. 716 6.3.2. Sending Probe Packets 718 When a PTO timer expires, a sender MUST send at least one ack- 719 eliciting packet as a probe, unless there is no data available to 720 send. An endpoint MAY send up to two ack-eliciting packets, to avoid 721 an expensive consecutive PTO expiration due to a single packet loss. 723 It is possible that the sender has no new or previously-sent data to 724 send. As an example, consider the following sequence of events: new 725 application data is sent in a STREAM frame, deemed lost, then 726 retransmitted in a new packet, and then the original transmission is 727 acknowledged. In the absence of any new application data, a PTO 728 timer expiration now would find the sender with no new or previously- 729 sent data to send. 731 When there is no data to send, the sender SHOULD send a PING or other 732 ack-eliciting frame in a single packet, re-arming the PTO timer. 734 Alternatively, instead of sending an ack-eliciting packet, the sender 735 MAY mark any packets still in flight as lost. Doing so avoids 736 sending an additional packet, but increases the risk that loss is 737 declared too aggressively, resulting in an unnecessary rate reduction 738 by the congestion controller. 740 Consecutive PTO periods increase exponentially, and as a result, 741 connection recovery latency increases exponentially as packets 742 continue to be dropped in the network. Sending two packets on PTO 743 expiration increases resilience to packet drops, thus reducing the 744 probability of consecutive PTO events. 746 Probe packets sent on a PTO MUST be ack-eliciting. A probe packet 747 SHOULD carry new data when possible. A probe packet MAY carry 748 retransmitted unacknowledged data when new data is unavailable, when 749 flow control does not permit new data to be sent, or to 750 opportunistically reduce loss recovery delay. Implementations MAY 751 use alternate strategies for determining the content of probe 752 packets, including sending new or retransmitted data based on the 753 application's priorities. 755 When the PTO timer expires multiple times and new data cannot be 756 sent, implementations must choose between sending the same payload 757 every time or sending different payloads. Sending the same payload 758 may be simpler and ensures the highest priority frames arrive first. 759 Sending different payloads each time reduces the chances of spurious 760 retransmission. 762 6.3.3. Loss Detection 764 Delivery or loss of packets in flight is established when an ACK 765 frame is received that newly acknowledges one or more packets. 767 A PTO timer expiration event does not indicate packet loss and MUST 768 NOT cause prior unacknowledged packets to be marked as lost. When an 769 acknowledgement is received that newly acknowledges packets, loss 770 detection proceeds as dictated by packet and time threshold 771 mechanisms, see Section 6.1. 773 6.4. Discussion 775 The majority of constants were derived from best common practices 776 among widely deployed TCP implementations on the internet. 777 Exceptions follow. 779 A shorter delayed ack time of 25ms was chosen because longer delayed 780 acks can delay loss recovery and for the small number of connections 781 where less than packet per 25ms is delivered, acking every packet is 782 beneficial to congestion control and loss recovery. 784 7. Congestion Control 786 QUIC's congestion control is based on TCP NewReno [RFC6582]. NewReno 787 is a congestion window based congestion control. QUIC specifies the 788 congestion window in bytes rather than packets due to finer control 789 and the ease of appropriate byte counting [RFC3465]. 791 QUIC hosts MUST NOT send packets if they would increase 792 bytes_in_flight (defined in Appendix B.2) beyond the available 793 congestion window, unless the packet is a probe packet sent after a 794 PTO timer expires, as described in Section 6.3. 796 Implementations MAY use other congestion control algorithms, such as 797 Cubic [RFC8312], and endpoints MAY use different algorithms from one 798 another. The signals QUIC provides for congestion control are 799 generic and are designed to support different algorithms. 801 7.1. Explicit Congestion Notification 803 If a path has been verified to support ECN, QUIC treats a Congestion 804 Experienced codepoint in the IP header as a signal of congestion. 805 This document specifies an endpoint's response when its peer receives 806 packets with the Congestion Experienced codepoint. As discussed in 807 [RFC8311], endpoints are permitted to experiment with other response 808 functions. 810 7.2. Slow Start 812 QUIC begins every connection in slow start and exits slow start upon 813 loss or upon increase in the ECN-CE counter. QUIC re-enters slow 814 start anytime the congestion window is less than ssthresh, which 815 typically only occurs after an PTO. While in slow start, QUIC 816 increases the congestion window by the number of bytes acknowledged 817 when each acknowledgment is processed. 819 7.3. Congestion Avoidance 821 Slow start exits to congestion avoidance. Congestion avoidance in 822 NewReno uses an additive increase multiplicative decrease (AIMD) 823 approach that increases the congestion window by one maximum packet 824 size per congestion window acknowledged. When a loss is detected, 825 NewReno halves the congestion window and sets the slow start 826 threshold to the new congestion window. 828 7.4. Recovery Period 830 Recovery is a period of time beginning with detection of a lost 831 packet or an increase in the ECN-CE counter. Because QUIC does not 832 retransmit packets, it defines the end of recovery as a packet sent 833 after the start of recovery being acknowledged. This is slightly 834 different from TCP's definition of recovery, which ends when the lost 835 packet that started recovery is acknowledged. 837 The recovery period limits congestion window reduction to once per 838 round trip. During recovery, the congestion window remains unchanged 839 irrespective of new losses or increases in the ECN-CE counter. 841 7.5. Ignoring Loss of Undecryptable Packets 843 During the handshake, some packet protection keys might not be 844 available when a packet arrives. In particular, Handshake and 0-RTT 845 packets cannot be processed until the Initial packets arrive, and 846 1-RTT packets cannot be processed until the handshake completes. 847 Endpoints MAY ignore the loss of Handshake, 0-RTT, and 1-RTT packets 848 that might arrive before the peer has packet protection keys to 849 process those packets. 851 7.6. Probe Timeout 853 Probe packets MUST NOT be blocked by the congestion controller. A 854 sender MUST however count these packets as being additionally in 855 flight, since these packets add network load without establishing 856 packet loss. Note that sending probe packets might cause the 857 sender's bytes in flight to exceed the congestion window until an 858 acknowledgement is received that establishes loss or delivery of 859 packets. 861 7.7. Persistent Congestion 863 When an ACK frame is received that establishes loss of all in-flight 864 packets sent over a long enough period of time, the network is 865 considered to be experiencing persistent congestion. Commonly, this 866 can be established by consecutive PTOs, but since the PTO timer is 867 reset when a new ack-eliciting packet is sent, an explicit duration 868 must be used to account for those cases where PTOs do not occur or 869 are substantially delayed. This duration is computed as follows: 871 (smoothed_rtt + 4 * rttvar + max_ack_delay) * 872 kPersistentCongestionThreshold 874 For example, assume: 876 smoothed_rtt = 1 rttvar = 0 max_ack_delay = 0 877 kPersistentCongestionThreshold = 3 879 If an eck-eliciting packet is sent at time = 0, the following 880 scenario would illustrate persistent congestion: 882 +-----+------------------------+ 883 | t=0 | Send Pkt #1 (App Data) | 884 +-----+------------------------+ 885 | t=1 | Send Pkt #2 (PTO 1) | 886 | | | 887 | t=3 | Send Pkt #3 (PTO 2) | 888 | | | 889 | t=7 | Send Pkt #4 (PTO 3) | 890 | | | 891 | t=8 | Recv ACK of Pkt #4 | 892 +-----+------------------------+ 894 The first three packets are determined to be lost when the ACK of 895 packet 4 is received at t=8. The congestion period is calculated as 896 the time between the oldest and newest lost packets: (3 - 0) = 3. 897 The duration for persistent congestion is equal to: (1 * 898 kPersistentCongestionThreshold) = 3. Because the threshold was 899 reached and because none of the packets between the oldest and the 900 newest packets are acknowledged, the network is considered to have 901 experienced persistent congestion. 903 When persistent congestion is established, the sender's congestion 904 window MUST be reduced to the minimum congestion window 905 (kMinimumWindow). This response of collapsing the congestion window 906 on persistent congestion is functionally similar to a sender's 907 response on a Retransmission Timeout (RTO) in TCP [RFC5681] after 908 Tail Loss Probes (TLP) [TLP]. 910 7.8. Pacing 912 This document does not specify a pacer, but it is RECOMMENDED that a 913 sender pace sending of all in-flight packets based on input from the 914 congestion controller. For example, a pacer might distribute the 915 congestion window over the SRTT when used with a window-based 916 controller, and a pacer might use the rate estimate of a rate-based 917 controller. 919 An implementation should take care to architect its congestion 920 controller to work well with a pacer. For instance, a pacer might 921 wrap the congestion controller and control the availability of the 922 congestion window, or a pacer might pace out packets handed to it by 923 the congestion controller. Timely delivery of ACK frames is 924 important for efficient loss recovery. Packets containing only ACK 925 frames should therefore not be paced, to avoid delaying their 926 delivery to the peer. 928 As an example of a well-known and publicly available implementation 929 of a flow pacer, implementers are referred to the Fair Queue packet 930 scheduler (fq qdisc) in Linux (3.11 onwards). 932 7.9. Under-utilizing the Congestion Window 934 A congestion window that is under-utilized SHOULD NOT be increased in 935 either slow start or congestion avoidance. This can happen due to 936 insufficient application data or flow control credit. 938 A sender MAY use the pipeACK method described in section 4.3 of 939 [RFC7661] to determine if the congestion window is sufficiently 940 utilized. 942 A sender that paces packets (see Section 7.8) might delay sending 943 packets and not fully utilize the congestion window due to this 944 delay. A sender should not consider itself application limited if it 945 would have fully utilized the congestion window without pacing delay. 947 Bursting more than an intial window's worth of data into the network 948 might cause short-term congestion and losses. Implemementations 949 SHOULD either use pacing or reduce their congestion window to limit 950 such bursts. 952 A sender MAY implement alternate mechanisms to update its congestion 953 window after periods of under-utilization, such as those proposed for 954 TCP in [RFC7661]. 956 8. Security Considerations 958 8.1. Congestion Signals 960 Congestion control fundamentally involves the consumption of signals 961 - both loss and ECN codepoints - from unauthenticated entities. On- 962 path attackers can spoof or alter these signals. An attacker can 963 cause endpoints to reduce their sending rate by dropping packets, or 964 alter send rate by changing ECN codepoints. 966 8.2. Traffic Analysis 968 Packets that carry only ACK frames can be heuristically identified by 969 observing packet size. Acknowledgement patterns may expose 970 information about link characteristics or application behavior. 971 Endpoints can use PADDING frames or bundle acknowledgments with other 972 frames to reduce leaked information. 974 8.3. Misreporting ECN Markings 976 A receiver can misreport ECN markings to alter the congestion 977 response of a sender. Suppressing reports of ECN-CE markings could 978 cause a sender to increase their send rate. This increase could 979 result in congestion and loss. 981 A sender MAY attempt to detect suppression of reports by marking 982 occasional packets that they send with ECN-CE. If a packet marked 983 with ECN-CE is not reported as having been marked when the packet is 984 acknowledged, the sender SHOULD then disable ECN for that path. 986 Reporting additional ECN-CE markings will cause a sender to reduce 987 their sending rate, which is similar in effect to advertising reduced 988 connection flow control limits and so no advantage is gained by doing 989 so. 991 Endpoints choose the congestion controller that they use. Though 992 congestion controllers generally treat reports of ECN-CE markings as 993 equivalent to loss [RFC8311], the exact response for each controller 994 could be different. Failure to correctly respond to information 995 about ECN markings is therefore difficult to detect. 997 9. IANA Considerations 999 This document has no IANA actions. Yet. 1001 10. References 1003 10.1. Normative References 1005 [QUIC-TLS] 1006 Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure 1007 QUIC", draft-ietf-quic-tls-20 (work in progress), April 1008 2019. 1010 [QUIC-TRANSPORT] 1011 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1012 Multiplexed and Secure Transport", draft-ietf-quic- 1013 transport-20 (work in progress), April 2019. 1015 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1016 Requirement Levels", BCP 14, RFC 2119, 1017 DOI 10.17487/RFC2119, March 1997, 1018 . 1020 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1021 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1022 May 2017, . 1024 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1025 Notification (ECN) Experimentation", RFC 8311, 1026 DOI 10.17487/RFC8311, January 2018, 1027 . 1029 10.2. Informative References 1031 [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement: 1032 Refining TCP Congestion Control", ACM SIGCOMM , August 1033 1996. 1035 [RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: 1036 a time-based fast loss detection algorithm for TCP", 1037 draft-ietf-tcpm-rack-04 (work in progress), July 2018. 1039 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 1040 Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February 1041 2003, . 1043 [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, 1044 "Improving the Robustness of TCP to Non-Congestion 1045 Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, 1046 . 1048 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1049 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1050 . 1052 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1053 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1054 Spurious Retransmission Timeouts with TCP", RFC 5682, 1055 DOI 10.17487/RFC5682, September 2009, 1056 . 1058 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 1059 P. Hurtig, "Early Retransmit for TCP and Stream Control 1060 Transmission Protocol (SCTP)", RFC 5827, 1061 DOI 10.17487/RFC5827, May 2010, 1062 . 1064 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1065 "Computing TCP's Retransmission Timer", RFC 6298, 1066 DOI 10.17487/RFC6298, June 2011, 1067 . 1069 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 1070 NewReno Modification to TCP's Fast Recovery Algorithm", 1071 RFC 6582, DOI 10.17487/RFC6582, April 2012, 1072 . 1074 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1075 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1076 Based on Selective Acknowledgment (SACK) for TCP", 1077 RFC 6675, DOI 10.17487/RFC6675, August 2012, 1078 . 1080 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 1081 "Increasing TCP's Initial Window", RFC 6928, 1082 DOI 10.17487/RFC6928, April 2013, 1083 . 1085 [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating 1086 TCP to Support Rate-Limited Traffic", RFC 7661, 1087 DOI 10.17487/RFC7661, October 2015, 1088 . 1090 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1091 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1092 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1093 . 1095 [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 1096 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 1097 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 1098 in progress), February 2013. 1100 10.3. URIs 1102 [1] https://mailarchive.ietf.org/arch/search/?email_list=quic 1104 [2] https://github.com/quicwg 1106 [3] https://github.com/quicwg/base-drafts/labels/-recovery 1108 Appendix A. Loss Recovery Pseudocode 1110 We now describe an example implementation of the loss detection 1111 mechanisms described in Section 6. 1113 A.1. Tracking Sent Packets 1115 To correctly implement congestion control, a QUIC sender tracks every 1116 ack-eliciting packet until the packet is acknowledged or lost. It is 1117 expected that implementations will be able to access this information 1118 by packet number and crypto context and store the per-packet fields 1119 (Appendix A.1.1) for loss recovery and congestion control. 1121 After a packet is declared lost, it SHOULD be tracked for an amount 1122 of time comparable to the maximum expected packet reordering, such as 1123 1 RTT. This allows for detection of spurious retransmissions. 1125 Sent packets are tracked for each packet number space, and ACK 1126 processing only applies to a single space. 1128 A.1.1. Sent Packet Fields 1130 packet_number: The packet number of the sent packet. 1132 ack_eliciting: A boolean that indicates whether a packet is ack- 1133 eliciting. If true, it is expected that an acknowledgement will 1134 be received, though the peer could delay sending the ACK frame 1135 containing it by up to the MaxAckDelay. 1137 in_flight: A boolean that indicates whether the packet counts 1138 towards bytes in flight. 1140 is_crypto_packet: A boolean that indicates whether the packet 1141 contains cryptographic handshake messages critical to the 1142 completion of the QUIC handshake. In this version of QUIC, this 1143 includes any packet with the long header that includes a CRYPTO 1144 frame. 1146 sent_bytes: The number of bytes sent in the packet, not including 1147 UDP or IP overhead, but including QUIC framing overhead. 1149 time_sent: The time the packet was sent. 1151 A.2. Constants of interest 1153 Constants used in loss recovery are based on a combination of RFCs, 1154 papers, and common practice. Some may need to be changed or 1155 negotiated in order to better suit a variety of environments. 1157 kPacketThreshold: Maximum reordering in packets before packet 1158 threshold loss detection considers a packet lost. The RECOMMENDED 1159 value is 3. 1161 kTimeThreshold: Maximum reordering in time before time threshold 1162 loss detection considers a packet lost. Specified as an RTT 1163 multiplier. The RECOMMENDED value is 9/8. 1165 kGranularity: Timer granularity. This is a system-dependent value. 1166 However, implementations SHOULD use a value no smaller than 1ms. 1168 kInitialRtt: The RTT used before an RTT sample is taken. The 1169 RECOMMENDED value is 500ms. 1171 kPacketNumberSpace: An enum to enumerate the three packet number 1172 spaces. 1174 enum kPacketNumberSpace { 1175 Initial, 1176 Handshake, 1177 ApplicationData, 1178 } 1180 A.3. Variables of interest 1182 Variables required to implement the congestion control mechanisms are 1183 described in this section. 1185 loss_detection_timer: Multi-modal timer used for loss detection. 1187 crypto_count: The number of times all unacknowledged CRYPTO data has 1188 been retransmitted without receiving an ack. 1190 pto_count: The number of times a PTO has been sent without receiving 1191 an ack. 1193 time_of_last_sent_ack_eliciting_packet: The time the most recent 1194 ack-eliciting packet was sent. 1196 time_of_last_sent_crypto_packet: The time the most recent crypto 1197 packet was sent. 1199 largest_acked_packet[kPacketNumberSpace]: The largest packet number 1200 acknowledged in the packet number space so far. 1202 latest_rtt: The most recent RTT measurement made when receiving an 1203 ack for a previously unacked packet. 1205 smoothed_rtt: The smoothed RTT of the connection, computed as 1206 described in [RFC6298] 1208 rttvar: The RTT variance, computed as described in [RFC6298] 1210 min_rtt: The minimum RTT seen in the connection, ignoring ack delay. 1212 max_ack_delay: The maximum amount of time by which the receiver 1213 intends to delay acknowledgments, in milliseconds. The actual 1214 ack_delay in a received ACK frame may be larger due to late 1215 timers, reordering, or lost ACKs. 1217 loss_time[kPacketNumberSpace]: The time at which the next packet in 1218 that packet number space will be considered lost based on 1219 exceeding the reordering window in time. 1221 sent_packets[kPacketNumberSpace]: An association of packet numbers 1222 in a packet number space to information about them. Described in 1223 detail above in Appendix A.1. 1225 A.4. Initialization 1227 At the beginning of the connection, initialize the loss detection 1228 variables as follows: 1230 loss_detection_timer.reset() 1231 crypto_count = 0 1232 pto_count = 0 1233 latest_rtt = 0 1234 smoothed_rtt = 0 1235 rttvar = 0 1236 min_rtt = 0 1237 time_of_last_sent_ack_eliciting_packet = 0 1238 time_of_last_sent_crypto_packet = 0 1239 for pn_space in [ Initial, Handshake, ApplicationData ]: 1240 largest_acked_packet[pn_space] = 0 1241 loss_time[pn_space] = 0 1243 A.5. On Sending a Packet 1245 After a packet is sent, information about the packet is stored. The 1246 parameters to OnPacketSent are described in detail above in 1247 Appendix A.1.1. 1249 Pseudocode for OnPacketSent follows: 1251 OnPacketSent(packet_number, pn_space, ack_eliciting, 1252 in_flight, is_crypto_packet, sent_bytes): 1253 sent_packets[pn_space][packet_number].packet_number = 1254 packet_number 1255 sent_packets[pn_space][packet_number].time_sent = now 1256 sent_packets[pn_space][packet_number].ack_eliciting = 1257 ack_eliciting 1258 sent_packets[pn_space][packet_number].in_flight = in_flight 1259 if (in_flight): 1260 if (is_crypto_packet): 1261 time_of_last_sent_crypto_packet = now 1262 if (ack_eliciting): 1263 time_of_last_sent_ack_eliciting_packet = now 1264 OnPacketSentCC(sent_bytes) 1265 sent_packets[pn_space][packet_number].size = sent_bytes 1266 SetLossDetectionTimer() 1268 A.6. On Receiving an Acknowledgment 1270 When an ACK frame is received, it may newly acknowledge any number of 1271 packets. 1273 Pseudocode for OnAckReceived and UpdateRtt follow: 1275 OnAckReceived(ack, pn_space): 1276 largest_acked_packet[pn_space] = 1277 max(largest_acked_packet[pn_space], ack.largest_acked) 1279 // Nothing to do if there are no newly acked packets. 1280 newly_acked_packets = DetermineNewlyAckedPackets(ack, pn_space) 1281 if (newly_acked_packets.empty()): 1282 return 1284 // If the largest acknowledged is newly acked and 1285 // at least one ack-eliciting was newly acked, update the RTT. 1286 if (sent_packets[pn_space][ack.largest_acked] && 1287 IncludesAckEliciting(newly_acked_packets)) 1288 latest_rtt = 1289 now - sent_packets[pn_space][ack.largest_acked].time_sent 1290 UpdateRtt(ack.ack_delay) 1292 // Process ECN information if present. 1293 if (ACK frame contains ECN information): 1294 ProcessECN(ack) 1296 for acked_packet in newly_acked_packets: 1297 OnPacketAcked(acked_packet.packet_number, pn_space) 1299 DetectLostPackets(pn_space) 1301 crypto_count = 0 1302 pto_count = 0 1304 SetLossDetectionTimer() 1306 UpdateRtt(ack_delay): 1307 // First RTT sample. 1308 if (smoothed_rtt == 0): 1309 min_rtt = latest_rtt 1310 smoothed_rtt = latest_rtt 1311 rttvar = latest_rtt / 2 1312 return 1314 // min_rtt ignores ack delay. 1315 min_rtt = min(min_rtt, latest_rtt) 1316 // Limit ack_delay by max_ack_delay 1317 ack_delay = min(ack_delay, max_ack_delay) 1318 // Adjust for ack delay if plausible. 1319 adjusted_rtt = latest_rtt 1320 if (latest_rtt > min_rtt + ack_delay): 1321 adjusted_rtt = latest_rtt - ack_delay 1323 rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt) 1324 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt 1326 A.7. On Packet Acknowledgment 1328 When a packet is acknowledged for the first time, the following 1329 OnPacketAcked function is called. Note that a single ACK frame may 1330 newly acknowledge several packets. OnPacketAcked must be called once 1331 for each of these newly acknowledged packets. 1333 OnPacketAcked takes two parameters: acked_packet, which is the struct 1334 detailed in Appendix A.1.1, and the packet number space that this ACK 1335 frame was sent for. 1337 Pseudocode for OnPacketAcked follows: 1339 OnPacketAcked(acked_packet, pn_space): 1340 if (acked_packet.in_flight): 1341 OnPacketAckedCC(acked_packet) 1342 sent_packets[pn_space].remove(acked_packet.packet_number) 1344 A.8. Setting the Loss Detection Timer 1346 QUIC loss detection uses a single timer for all timeout loss 1347 detection. The duration of the timer is based on the timer's mode, 1348 which is set in the packet and timer events further below. The 1349 function SetLossDetectionTimer defined below shows how the single 1350 timer is set. 1352 This algorithm may result in the timer being set in the past, 1353 particularly if timers wake up late. Timers set in the past SHOULD 1354 fire immediately. 1356 Pseudocode for SetLossDetectionTimer follows: 1358 // Returns the earliest loss_time and the packet number 1359 // space it's from. Returns 0 if all times are 0. 1360 GetEarliestLossTime(): 1361 time = loss_time[Initial] 1362 space = Initial 1363 for pn_space in [ Handshake, ApplicationData ]: 1364 if loss_time[pn_space] != 0 && 1365 (time == 0 || loss_time[pn_space] < time): 1366 time = loss_time[pn_space]; 1367 space = pn_space 1368 return time, space 1370 SetLossDetectionTimer(): 1371 loss_time, _ = GetEarliestLossTime() 1372 if (loss_time != 0): 1373 // Time threshold loss detection. 1374 loss_detection_timer.update(loss_time) 1375 return 1377 if (has unacknowledged crypto data 1378 || endpoint is client without 1-RTT keys): 1379 // Crypto retransmission timer. 1380 if (smoothed_rtt == 0): 1381 timeout = 2 * kInitialRtt 1382 else: 1383 timeout = 2 * smoothed_rtt 1384 timeout = max(timeout, kGranularity) 1385 timeout = timeout * (2 ^ crypto_count) 1386 loss_detection_timer.update( 1387 time_of_last_sent_crypto_packet + timeout) 1388 return 1390 // Don't arm timer if there are no ack-eliciting packets 1391 // in flight. 1392 if (no ack-eliciting packets in flight): 1393 loss_detection_timer.cancel() 1394 return 1396 // Calculate PTO duration 1397 timeout = 1398 smoothed_rtt + max(4 * rttvar, kGranularity) + max_ack_delay 1399 timeout = timeout * (2 ^ pto_count) 1401 loss_detection_timer.update( 1402 time_of_last_sent_ack_eliciting_packet + timeout) 1404 A.9. On Timeout 1406 When the loss detection timer expires, the timer's mode determines 1407 the action to be performed. 1409 Pseudocode for OnLossDetectionTimeout follows: 1411 OnLossDetectionTimeout(): 1412 loss_time, pn_space = GetEarliestLossTime() 1413 if (loss_time != 0): 1414 // Time threshold loss Detection 1415 DetectLostPackets(pn_space) 1416 // Retransmit crypto data if no packets were lost 1417 // and there is crypto data to retransmit. 1418 else if (has unacknowledged crypto data): 1419 // Crypto retransmission timeout. 1420 RetransmitUnackedCryptoData() 1421 crypto_count++ 1422 else if (endpoint is client without 1-RTT keys): 1423 // Client sends an anti-deadlock packet: Initial is padded 1424 // to earn more anti-amplification credit, 1425 // a Handshake packet proves address ownership. 1426 if (has Handshake keys): 1427 SendOneHandshakePacket() 1428 else: 1429 SendOnePaddedInitialPacket() 1430 crypto_count++ 1431 else: 1432 // PTO. Send new data if available, else retransmit old data. 1433 // If neither is available, send a single PING frame. 1434 SendOneOrTwoPackets() 1435 pto_count++ 1437 SetLossDetectionTimer() 1439 A.10. Detecting Lost Packets 1441 DetectLostPackets is called every time an ACK is received and 1442 operates on the sent_packets for that packet number space. 1444 Pseudocode for DetectLostPackets follows: 1446 DetectLostPackets(pn_space): 1447 loss_time[pn_space] = 0 1448 lost_packets = {} 1449 loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt) 1451 // Minimum time of kGranularity before packets are deemed lost. 1452 loss_delay = max(loss_delay, kGranularity) 1454 // Packets sent before this time are deemed lost. 1455 lost_send_time = now() - loss_delay 1457 // Packets with packet numbers before this are deemed lost. 1458 lost_pn = largest_acked_packet[pn_space] - kPacketThreshold 1460 foreach unacked in sent_packets[pn_space]: 1461 if (unacked.packet_number > largest_acked_packet[pn_space]): 1462 continue 1464 // Mark packet as lost, or set time when it should be marked. 1465 if (unacked.time_sent <= lost_send_time || 1466 unacked.packet_number <= lost_pn): 1467 sent_packets[pn_space].remove(unacked.packet_number) 1468 if (unacked.in_flight): 1469 lost_packets.insert(unacked) 1470 else: 1471 if (loss_time[pn_space] == 0): 1472 loss_time[pn_space] = unacked.time_sent + loss_delay 1473 else: 1474 loss_time[pn_space] = min(loss_time[pn_space], 1475 unacked.time_sent + loss_delay) 1477 // Inform the congestion controller of lost packets and 1478 // let it decide whether to retransmit immediately. 1479 if (!lost_packets.empty()): 1480 OnPacketsLost(lost_packets) 1482 Appendix B. Congestion Control Pseudocode 1484 We now describe an example implementation of the congestion 1485 controller described in Section 7. 1487 B.1. Constants of interest 1489 Constants used in congestion control are based on a combination of 1490 RFCs, papers, and common practice. Some may need to be changed or 1491 negotiated in order to better suit a variety of environments. 1493 kMaxDatagramSize: The sender's maximum payload size. Does not 1494 include UDP or IP overhead. The max packet size is used for 1495 calculating initial and minimum congestion windows. The 1496 RECOMMENDED value is 1200 bytes. 1498 kInitialWindow: Default limit on the initial amount of data in 1499 flight, in bytes. Taken from [RFC6928], but increased slightly to 1500 account for the smaller 8 byte overhead of UDP vs 20 bytes for 1501 TCP. The RECOMMENDED value is the minimum of 10 * 1502 kMaxDatagramSize and max(2* kMaxDatagramSize, 14720)). 1504 kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED 1505 value is 2 * kMaxDatagramSize. 1507 kLossReductionFactor: Reduction in congestion window when a new loss 1508 event is detected. The RECOMMENDED value is 0.5. 1510 kPersistentCongestionThreshold: Period of time for persistent 1511 congestion to be established, specified as a PTO multiplier. The 1512 rationale for this threshold is to enable a sender to use initial 1513 PTOs for aggressive probing, as TCP does with Tail Loss Probe 1514 (TLP) [TLP] [RACK], before establishing persistent congestion, as 1515 TCP does with a Retransmission Timeout (RTO) [RFC5681]. The 1516 RECOMMENDED value for kPersistentCongestionThreshold is 3, which 1517 is approximately equivalent to having two TLPs before an RTO in 1518 TCP. 1520 B.2. Variables of interest 1522 Variables required to implement the congestion control mechanisms are 1523 described in this section. 1525 ecn_ce_counter: The highest value reported for the ECN-CE counter by 1526 the peer in an ACK frame. This variable is used to detect 1527 increases in the reported ECN-CE counter. 1529 bytes_in_flight: The sum of the size in bytes of all sent packets 1530 that contain at least one ack-eliciting or PADDING frame, and have 1531 not been acked or declared lost. The size does not include IP or 1532 UDP overhead, but does include the QUIC header and AEAD overhead. 1533 Packets only containing ACK frames do not count towards 1534 bytes_in_flight to ensure congestion control does not impede 1535 congestion feedback. 1537 congestion_window: Maximum number of bytes-in-flight that may be 1538 sent. 1540 congestion_recovery_start_time: The time when QUIC first detects 1541 congestion due to loss or ECN, causing it to enter congestion 1542 recovery. When a packet sent after this time is acknowledged, 1543 QUIC exits congestion recovery. 1545 ssthresh: Slow start threshold in bytes. When the congestion window 1546 is below ssthresh, the mode is slow start and the window grows by 1547 the number of bytes acknowledged. 1549 B.3. Initialization 1551 At the beginning of the connection, initialize the congestion control 1552 variables as follows: 1554 congestion_window = kInitialWindow 1555 bytes_in_flight = 0 1556 congestion_recovery_start_time = 0 1557 ssthresh = infinite 1558 ecn_ce_counter = 0 1560 B.4. On Packet Sent 1562 Whenever a packet is sent, and it contains non-ACK frames, the packet 1563 increases bytes_in_flight. 1565 OnPacketSentCC(bytes_sent): 1566 bytes_in_flight += bytes_sent 1568 B.5. On Packet Acknowledgement 1570 Invoked from loss detection's OnPacketAcked and is supplied with the 1571 acked_packet from sent_packets. 1573 InCongestionRecovery(sent_time): 1574 return sent_time <= congestion_recovery_start_time 1576 OnPacketAckedCC(acked_packet): 1577 // Remove from bytes_in_flight. 1578 bytes_in_flight -= acked_packet.size 1579 if (InCongestionRecovery(acked_packet.time_sent)): 1580 // Do not increase congestion window in recovery period. 1581 return 1582 if (IsAppLimited()) 1583 // Do not increase congestion_window if application 1584 // limited. 1585 return 1586 if (congestion_window < ssthresh): 1587 // Slow start. 1588 congestion_window += acked_packet.size 1589 else: 1590 // Congestion avoidance. 1591 congestion_window += kMaxDatagramSize * acked_packet.size 1592 / congestion_window 1594 B.6. On New Congestion Event 1596 Invoked from ProcessECN and OnPacketsLost when a new congestion event 1597 is detected. May start a new recovery period and reduces the 1598 congestion window. 1600 CongestionEvent(sent_time): 1601 // Start a new congestion event if packet was sent after the 1602 // start of the previous congestion recovery period. 1603 if (!InCongestionRecovery(sent_time)): 1604 congestion_recovery_start_time = Now() 1605 congestion_window *= kLossReductionFactor 1606 congestion_window = max(congestion_window, kMinimumWindow) 1607 ssthresh = congestion_window 1609 B.7. Process ECN Information 1611 Invoked when an ACK frame with an ECN section is received from the 1612 peer. 1614 ProcessECN(ack): 1615 // If the ECN-CE counter reported by the peer has increased, 1616 // this could be a new congestion event. 1617 if (ack.ce_counter > ecn_ce_counter): 1618 ecn_ce_counter = ack.ce_counter 1619 CongestionEvent(sent_packets[ack.largest_acked].time_sent) 1621 B.8. On Packets Lost 1623 Invoked from DetectLostPackets when packets are deemed lost. 1625 InPersistentCongestion(largest_lost_packet): 1626 pto = smoothed_rtt + max(4 * rttvar, kGranularity) + 1627 max_ack_delay 1628 congestion_period = pto * kPersistentCongestionThreshold 1629 // Determine if all packets in the window before the 1630 // newest lost packet, including the edges, are marked 1631 // lost 1632 return IsWindowLost(largest_lost_packet, congestion_period) 1634 OnPacketsLost(lost_packets): 1635 // Remove lost packets from bytes_in_flight. 1636 for (lost_packet : lost_packets): 1637 bytes_in_flight -= lost_packet.size 1638 largest_lost_packet = lost_packets.last() 1639 CongestionEvent(largest_lost_packet.time_sent) 1641 // Collapse congestion window if persistent congestion 1642 if (InPersistentCongestion(largest_lost_packet)): 1643 congestion_window = kMinimumWindow 1645 Appendix C. Change Log 1647 *RFC Editor's Note:* Please remove this section prior to 1648 publication of a final version of this document. 1650 Issue and pull request numbers are listed with a leading octothorp. 1652 C.1. Since draft-ietf-quic-recovery-19 1654 o Send a PING if the PTO timer fires and there's nothing to send 1655 (#2624) 1657 o Set loss delay to at least kGranularity (#2617) 1659 o Merge application limited and sending after idle sections. Always 1660 limit burst size instead of requiring resetting CWND to initial 1661 CWND after idle (#2605) 1663 o Rewrite RTT estimation, allow RTT samples where a newly acked 1664 packet is ack-eliciting but the largest_acked is not (#2592) 1666 o Don't arm the handshake timer if there is no handshake data 1667 (#2590) 1669 o Clarify that the time threshold loss alarm takes precedence over 1670 the crypto handshake timer (#2590, #2620) 1672 o Change initial RTT to 500ms to align with RFC6298 (#2184) 1674 C.2. Since draft-ietf-quic-recovery-18 1676 o Change IW byte limit to 14720 from 14600 (#2494) 1678 o Update PTO calculation to match RFC6298 (#2480, #2489, #2490) 1680 o Improve loss detection's description of multiple packet number 1681 spaces and pseudocode (#2485, #2451, #2417) 1683 o Declare persistent congestion even if non-probe packets are sent 1684 and don't make persistent congestion more aggressive than RTO 1685 verified was (#2365, #2244) 1687 o Move pseudocode to the appendices (#2408) 1689 o What to send on multiple PTOs (#2380) 1691 C.3. Since draft-ietf-quic-recovery-17 1693 o After Probe Timeout discard in-flight packets or send another 1694 (#2212, #1965) 1696 o Endpoints discard initial keys as soon as handshake keys are 1697 available (#1951, #2045) 1699 o 0-RTT state is discarded when 0-RTT is rejected (#2300) 1701 o Loss detection timer is cancelled when ack-eliciting frames are in 1702 flight (#2117, #2093) 1704 o Packets are declared lost if they are in flight (#2104) 1706 o After becoming idle, either pace packets or reset the congestion 1707 controller (#2138, 2187) 1709 o Process ECN counts before marking packets lost (#2142) 1711 o Mark packets lost before resetting crypto_count and pto_count 1712 (#2208, #2209) 1714 o Congestion and loss recovery state are discarded when keys are 1715 discarded (#2327) 1717 C.4. Since draft-ietf-quic-recovery-16 1719 o Unify TLP and RTO into a single PTO; eliminate min RTO, min TLP 1720 and min crypto timeouts; eliminate timeout validation (#2114, 1721 #2166, #2168, #1017) 1723 o Redefine how congestion avoidance in terms of when the period 1724 starts (#1928, #1930) 1726 o Document what needs to be tracked for packets that are in flight 1727 (#765, #1724, #1939) 1729 o Integrate both time and packet thresholds into loss detection 1730 (#1969, #1212, #934, #1974) 1732 o Reduce congestion window after idle, unless pacing is used (#2007, 1733 #2023) 1735 o Disable RTT calculation for packets that don't elicit 1736 acknowledgment (#2060, #2078) 1738 o Limit ack_delay by max_ack_delay (#2060, #2099) 1740 o Initial keys are discarded once Handshake are avaialble (#1951, 1741 #2045) 1743 o Reorder ECN and loss detection in pseudocode (#2142) 1745 o Only cancel loss detection timer if ack-eliciting packets are in 1746 flight (#2093, #2117) 1748 C.5. Since draft-ietf-quic-recovery-14 1750 o Used max_ack_delay from transport params (#1796, #1782) 1752 o Merge ACK and ACK_ECN (#1783) 1754 C.6. Since draft-ietf-quic-recovery-13 1756 o Corrected the lack of ssthresh reduction in CongestionEvent 1757 pseudocode (#1598) 1759 o Considerations for ECN spoofing (#1426, #1626) 1761 o Clarifications for PADDING and congestion control (#837, #838, 1762 #1517, #1531, #1540) 1764 o Reduce early retransmission timer to RTT/8 (#945, #1581) 1765 o Packets are declared lost after an RTO is verified (#935, #1582) 1767 C.7. Since draft-ietf-quic-recovery-12 1769 o Changes to manage separate packet number spaces and encryption 1770 levels (#1190, #1242, #1413, #1450) 1772 o Added ECN feedback mechanisms and handling; new ACK_ECN frame 1773 (#804, #805, #1372) 1775 C.8. Since draft-ietf-quic-recovery-11 1777 No significant changes. 1779 C.9. Since draft-ietf-quic-recovery-10 1781 o Improved text on ack generation (#1139, #1159) 1783 o Make references to TCP recovery mechanisms informational (#1195) 1785 o Define time_of_last_sent_handshake_packet (#1171) 1787 o Added signal from TLS the data it includes needs to be sent in a 1788 Retry packet (#1061, #1199) 1790 o Minimum RTT (min_rtt) is initialized with an infinite value 1791 (#1169) 1793 C.10. Since draft-ietf-quic-recovery-09 1795 No significant changes. 1797 C.11. Since draft-ietf-quic-recovery-08 1799 o Clarified pacing and RTO (#967, #977) 1801 C.12. Since draft-ietf-quic-recovery-07 1803 o Include Ack Delay in RTO(and TLP) computations (#981) 1805 o Ack Delay in SRTT computation (#961) 1807 o Default RTT and Slow Start (#590) 1809 o Many editorial fixes. 1811 C.13. Since draft-ietf-quic-recovery-06 1813 No significant changes. 1815 C.14. Since draft-ietf-quic-recovery-05 1817 o Add more congestion control text (#776) 1819 C.15. Since draft-ietf-quic-recovery-04 1821 No significant changes. 1823 C.16. Since draft-ietf-quic-recovery-03 1825 No significant changes. 1827 C.17. Since draft-ietf-quic-recovery-02 1829 o Integrate F-RTO (#544, #409) 1831 o Add congestion control (#545, #395) 1833 o Require connection abort if a skipped packet was acknowledged 1834 (#415) 1836 o Simplify RTO calculations (#142, #417) 1838 C.18. Since draft-ietf-quic-recovery-01 1840 o Overview added to loss detection 1842 o Changes initial default RTT to 100ms 1844 o Added time-based loss detection and fixes early retransmit 1846 o Clarified loss recovery for handshake packets 1848 o Fixed references and made TCP references informative 1850 C.19. Since draft-ietf-quic-recovery-00 1852 o Improved description of constants and ACK behavior 1854 C.20. Since draft-iyengar-quic-loss-recovery-01 1856 o Adopted as base for draft-ietf-quic-recovery 1858 o Updated authors/editors list 1859 o Added table of contents 1861 Acknowledgments 1863 Authors' Addresses 1865 Jana Iyengar (editor) 1866 Fastly 1868 Email: jri.ietf@gmail.com 1870 Ian Swett (editor) 1871 Google 1873 Email: ianswett@google.com