idnits 2.17.1 draft-ietf-quic-recovery-22.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([2], [3], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 09, 2019) is 1746 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1123 -- Looks like a reference, but probably isn't: '2' on line 1125 -- Looks like a reference, but probably isn't: '3' on line 1127 == Missing Reference: 'Initial' is mentioned on line 1391, but not defined == Outdated reference: A later version (-34) exists of draft-ietf-quic-tls-22 == Outdated reference: A later version (-34) exists of draft-ietf-quic-transport-22 == Outdated reference: A later version (-15) exists of draft-ietf-tcpm-rack-05 -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 QUIC J. Iyengar, Ed. 3 Internet-Draft Fastly 4 Intended status: Standards Track I. Swett, Ed. 5 Expires: January 10, 2020 Google 6 July 09, 2019 8 QUIC Loss Detection and Congestion Control 9 draft-ietf-quic-recovery-22 11 Abstract 13 This document describes loss detection and congestion control 14 mechanisms for QUIC. 16 Note to Readers 18 Discussion of this draft takes place on the QUIC working group 19 mailing list (quic@ietf.org), which is archived at 20 https://mailarchive.ietf.org/arch/search/?email_list=quic [1]. 22 Working Group information can be found at https://github.com/quicwg 23 [2]; source code and issues list for this draft can be found at 24 https://github.com/quicwg/base-drafts/labels/-recovery [3]. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on January 10, 2020. 43 Copyright Notice 45 Copyright (c) 2019 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 62 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 5 63 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 6 64 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 6 65 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 66 3.1.3. Clearer Loss Epoch . . . . . . . . . . . . . . . . . 7 67 3.1.4. No Reneging . . . . . . . . . . . . . . . . . . . . . 7 68 3.1.5. More ACK Ranges . . . . . . . . . . . . . . . . . . . 7 69 3.1.6. Explicit Correction For Delayed Acknowledgements . . 7 70 4. Generating Acknowledgements . . . . . . . . . . . . . . . . . 7 71 4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . . . 8 72 4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . . . 8 73 4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . . . 8 74 4.4. Measuring and Reporting Host Delay . . . . . . . . . . . 9 75 5. Estimating the Round-Trip Time . . . . . . . . . . . . . . . 9 76 5.1. Generating RTT samples . . . . . . . . . . . . . . . . . 9 77 5.2. Estimating min_rtt . . . . . . . . . . . . . . . . . . . 10 78 5.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 10 79 6. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 11 80 6.1. Acknowledgement-based Detection . . . . . . . . . . . . . 12 81 6.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 12 82 6.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 12 83 6.2. Crypto Retransmission Timeout . . . . . . . . . . . . . . 13 84 6.3. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 14 85 6.3.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 15 86 6.3.2. Sending Probe Packets . . . . . . . . . . . . . . . . 15 87 6.3.3. Loss Detection . . . . . . . . . . . . . . . . . . . 16 88 6.4. Retry and Version Negotiation . . . . . . . . . . . . . . 16 89 6.5. Discarding Keys and Packet State . . . . . . . . . . . . 17 90 6.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 17 91 7. Congestion Control . . . . . . . . . . . . . . . . . . . . . 17 92 7.1. Explicit Congestion Notification . . . . . . . . . . . . 18 93 7.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 18 94 7.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 18 95 7.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 18 96 7.5. Ignoring Loss of Undecryptable Packets . . . . . . . . . 19 97 7.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 19 98 7.7. Persistent Congestion . . . . . . . . . . . . . . . . . . 19 99 7.8. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 20 100 7.9. Under-utilizing the Congestion Window . . . . . . . . . . 21 101 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 102 8.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 21 103 8.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 21 104 8.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 22 105 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 106 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 107 10.1. Normative References . . . . . . . . . . . . . . . . . . 22 108 10.2. Informative References . . . . . . . . . . . . . . . . . 23 109 10.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 24 110 Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 24 111 A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 25 112 A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 25 113 A.2. Constants of interest . . . . . . . . . . . . . . . . . . 25 114 A.3. Variables of interest . . . . . . . . . . . . . . . . . . 26 115 A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 27 116 A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 27 117 A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 28 118 A.7. On Packet Acknowledgment . . . . . . . . . . . . . . . . 29 119 A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 30 120 A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 32 121 A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 32 122 Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 33 123 B.1. Constants of interest . . . . . . . . . . . . . . . . . . 33 124 B.2. Variables of interest . . . . . . . . . . . . . . . . . . 34 125 B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 35 126 B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 35 127 B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 35 128 B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 36 129 B.7. Process ECN Information . . . . . . . . . . . . . . . . . 36 130 B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 37 131 Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 37 132 C.1. Since draft-ietf-quic-recovery-21 . . . . . . . . . . . . 37 133 C.2. Since draft-ietf-quic-recovery-20 . . . . . . . . . . . . 37 134 C.3. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 38 135 C.4. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 38 136 C.5. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 38 137 C.6. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 39 138 C.7. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 40 139 C.8. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 40 140 C.9. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 40 141 C.10. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 40 142 C.11. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 40 143 C.12. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 41 144 C.13. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 41 145 C.14. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 41 146 C.15. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 41 147 C.16. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 41 148 C.17. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 41 149 C.18. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 41 150 C.19. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 41 151 C.20. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 42 152 C.21. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 42 153 C.22. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 42 154 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 42 155 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 42 157 1. Introduction 159 QUIC is a new multiplexed and secure transport atop UDP. QUIC builds 160 on decades of transport and security experience, and implements 161 mechanisms that make it attractive as a modern general-purpose 162 transport. The QUIC protocol is described in [QUIC-TRANSPORT]. 164 QUIC implements the spirit of existing TCP loss recovery mechanisms, 165 described in RFCs, various Internet-drafts, and also those prevalent 166 in the Linux TCP implementation. This document describes QUIC 167 congestion control and loss recovery, and where applicable, 168 attributes the TCP equivalent in RFCs, Internet-drafts, academic 169 papers, and/or TCP implementations. 171 2. Conventions and Definitions 173 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 174 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 175 "OPTIONAL" in this document are to be interpreted as described in BCP 176 14 [RFC2119] [RFC8174] when, and only when, they appear in all 177 capitals, as shown here. 179 Definitions of terms that are used in this document: 181 ACK-only: Any packet containing only one or more ACK frame(s). 183 In-flight: Packets are considered in-flight when they have been sent 184 and are not ACK-only, and they are not acknowledged, declared 185 lost, or abandoned along with old keys. 187 Ack-eliciting Frames: All frames besides ACK or PADDING are 188 considered ack-eliciting. 190 Ack-eliciting Packets: Packets that contain ack-eliciting frames 191 elicit an ACK from the receiver within the maximum ack delay and 192 are called ack-eliciting packets. 194 Crypto Packets: Packets containing CRYPTO data sent in Initial or 195 Handshake packets. 197 Out-of-order Packets: Packets that do not increase the largest 198 received packet number for its packet number space by exactly one. 199 Packets arrive out of order when earlier packets are lost or 200 delayed. 202 3. Design of the QUIC Transmission Machinery 204 All transmissions in QUIC are sent with a packet-level header, which 205 indicates the encryption level and includes a packet sequence number 206 (referred to below as a packet number). The encryption level 207 indicates the packet number space, as described in [QUIC-TRANSPORT]. 208 Packet numbers never repeat within a packet number space for the 209 lifetime of a connection. Packet numbers monotonically increase 210 within a space, preventing ambiguity. 212 This design obviates the need for disambiguating between 213 transmissions and retransmissions and eliminates significant 214 complexity from QUIC's interpretation of TCP loss detection 215 mechanisms. 217 QUIC packets can contain multiple frames of different types. The 218 recovery mechanisms ensure that data and frames that need reliable 219 delivery are acknowledged or declared lost and sent in new packets as 220 necessary. The types of frames contained in a packet affect recovery 221 and congestion control logic: 223 o All packets are acknowledged, though packets that contain no ack- 224 eliciting frames are only acknowledged along with ack-eliciting 225 packets. 227 o Long header packets that contain CRYPTO frames are critical to the 228 performance of the QUIC handshake and use shorter timers for 229 acknowledgement and retransmission. 231 o Packets that contain only ACK frames do not count toward 232 congestion control limits and are not considered in-flight. 234 o PADDING frames cause packets to contribute toward bytes in flight 235 without directly causing an acknowledgment to be sent. 237 3.1. Relevant Differences Between QUIC and TCP 239 Readers familiar with TCP's loss detection and congestion control 240 will find algorithms here that parallel well-known TCP ones. 241 Protocol differences between QUIC and TCP however contribute to 242 algorithmic differences. We briefly describe these protocol 243 differences below. 245 3.1.1. Separate Packet Number Spaces 247 QUIC uses separate packet number spaces for each encryption level, 248 except 0-RTT and all generations of 1-RTT keys use the same packet 249 number space. Separate packet number spaces ensures acknowledgement 250 of packets sent with one level of encryption will not cause spurious 251 retransmission of packets sent with a different encryption level. 252 Congestion control and round-trip time (RTT) measurement are unified 253 across packet number spaces. 255 3.1.2. Monotonically Increasing Packet Numbers 257 TCP conflates transmission order at the sender with delivery order at 258 the receiver, which results in retransmissions of the same data 259 carrying the same sequence number, and consequently leads to 260 "retransmission ambiguity". QUIC separates the two: QUIC uses a 261 packet number to indicate transmission order, and any application 262 data is sent in one or more streams, with delivery order determined 263 by stream offsets encoded within STREAM frames. 265 QUIC's packet number is strictly increasing within a packet number 266 space, and directly encodes transmission order. A higher packet 267 number signifies that the packet was sent later, and a lower packet 268 number signifies that the packet was sent earlier. When a packet 269 containing ack-eliciting frames is detected lost, QUIC rebundles 270 necessary frames in a new packet with a new packet number, removing 271 ambiguity about which packet is acknowledged when an ACK is received. 272 Consequently, more accurate RTT measurements can be made, spurious 273 retransmissions are trivially detected, and mechanisms such as Fast 274 Retransmit can be applied universally, based only on packet number. 276 This design point significantly simplifies loss detection mechanisms 277 for QUIC. Most TCP mechanisms implicitly attempt to infer 278 transmission ordering based on TCP sequence numbers - a non-trivial 279 task, especially when TCP timestamps are not available. 281 3.1.3. Clearer Loss Epoch 283 QUIC ends a loss epoch when a packet sent after loss is declared is 284 acknowledged. TCP waits for the gap in the sequence number space to 285 be filled, and so if a segment is lost multiple times in a row, the 286 loss epoch may not end for several round trips. Because both should 287 reduce their congestion windows only once per epoch, QUIC will do it 288 correctly once for every round trip that experiences loss, while TCP 289 may only do it once across multiple round trips. 291 3.1.4. No Reneging 293 QUIC ACKs contain information that is similar to TCP SACK, but QUIC 294 does not allow any acked packet to be reneged, greatly simplifying 295 implementations on both sides and reducing memory pressure on the 296 sender. 298 3.1.5. More ACK Ranges 300 QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In 301 high loss environments, this speeds recovery, reduces spurious 302 retransmits, and ensures forward progress without relying on 303 timeouts. 305 3.1.6. Explicit Correction For Delayed Acknowledgements 307 QUIC endpoints measure the delay incurred between when a packet is 308 received and when the corresponding acknowledgment is sent, allowing 309 a peer to maintain a more accurate round-trip time estimate (see 310 Section 4.4). 312 4. Generating Acknowledgements 314 An acknowledgement SHOULD be sent immediately upon receipt of a 315 second ack-eliciting packet. QUIC recovery algorithms do not assume 316 the peer sends an ACK immediately when receiving a second ack- 317 eliciting packet. 319 In order to accelerate loss recovery and reduce timeouts, the 320 receiver SHOULD send an immediate ACK after it receives an out-of- 321 order packet. It could send immediate ACKs for in-order packets for 322 a period of time that SHOULD NOT exceed 1/8 RTT unless more out-of- 323 order packets arrive. If every packet arrives out-of- order, then an 324 immediate ACK SHOULD be sent for every received packet. 326 Similarly, packets marked with the ECN Congestion Experienced (CE) 327 codepoint in the IP header SHOULD be acknowledged immediately, to 328 reduce the peer's response time to congestion events. 330 As an optimization, a receiver MAY process multiple packets before 331 sending any ACK frames in response. In this case the receiver can 332 determine whether an immediate or delayed acknowledgement should be 333 generated after processing incoming packets. 335 4.1. Crypto Handshake Data 337 In order to quickly complete the handshake and avoid spurious 338 retransmissions due to crypto retransmission timeouts, crypto packets 339 SHOULD use a very short ack delay, such as the local timer 340 granularity. ACK frames SHOULD be sent immediately when the crypto 341 stack indicates all data for that packet number space has been 342 received. 344 4.2. ACK Ranges 346 When an ACK frame is sent, one or more ranges of acknowledged packets 347 are included. Including older packets reduces the chance of spurious 348 retransmits caused by losing previously sent ACK frames, at the cost 349 of larger ACK frames. 351 ACK frames SHOULD always acknowledge the most recently received 352 packets, and the more out-of-order the packets are, the more 353 important it is to send an updated ACK frame quickly, to prevent the 354 peer from declaring a packet as lost and spuriously retransmitting 355 the frames it contains. 357 Below is one recommended approach for determining what packets to 358 include in an ACK frame. 360 4.3. Receiver Tracking of ACK Frames 362 When a packet containing an ACK frame is sent, the largest 363 acknowledged in that frame may be saved. When a packet containing an 364 ACK frame is acknowledged, the receiver can stop acknowledging 365 packets less than or equal to the largest acknowledged in the sent 366 ACK frame. 368 In cases without ACK frame loss, this algorithm allows for a minimum 369 of 1 RTT of reordering. In cases with ACK frame loss and reordering, 370 this approach does not guarantee that every acknowledgement is seen 371 by the sender before it is no longer included in the ACK frame. 372 Packets could be received out of order and all subsequent ACK frames 373 containing them could be lost. In this case, the loss recovery 374 algorithm may cause spurious retransmits, but the sender will 375 continue making forward progress. 377 4.4. Measuring and Reporting Host Delay 379 An endpoint measures the delays intentionally introduced between when 380 an ACK-eliciting packet is received and the corresponding 381 acknowledgment is sent. The endpoint encodes this delay for the 382 largest acknowledged packet in the Ack Delay field of an ACK frame 383 (see Section 19.3 of [QUIC-TRANSPORT]). This allows the receiver of 384 the ACK to adjust for any intentional delays, which is important for 385 delayed acknowledgements, when estimating the path RTT. A packet 386 might be held in the OS kernel or elsewhere on the host before being 387 processed. An endpoint SHOULD NOT include these unintentional delays 388 when populating the Ack Delay field in an ACK frame. 390 An endpoint MUST NOT excessively delay acknowledgements of ack- 391 eliciting packets. The maximum ack delay is communicated in the 392 max_ack_delay transport parameter; see Section 18.1 of 393 [QUIC-TRANSPORT]. max_ack_delay implies an explicit contract: an 394 endpoint promises to never delay acknowledgments of an ack-eliciting 395 packet by more than the indicated value. If it does, any excess 396 accrues to the RTT estimate and could result in spurious 397 retransmissions from the peer. For Initial and Handshake packets, a 398 max_ack_delay of 0 is used. 400 5. Estimating the Round-Trip Time 402 At a high level, an endpoint measures the time from when a packet was 403 sent to when it is acknowledged as a round-trip time (RTT) sample. 404 The endpoint uses RTT samples and peer-reported host delays 405 (Section 4.4) to generate a statistical description of the 406 connection's RTT. An endpoint computes the following three values: 407 the minimum value observed over the lifetime of the connection 408 (min_rtt), an exponentially-weighted moving average (smoothed_rtt), 409 and the variance in the observed RTT samples (rttvar). 411 5.1. Generating RTT samples 413 An endpoint generates an RTT sample on receiving an ACK frame that 414 meets the following two conditions: 416 o the largest acknowledged packet number is newly acknowledged, and 418 o at least one of the newly acknowledged packets was ack-eliciting. 420 The RTT sample, latest_rtt, is generated as the time elapsed since 421 the largest acknowledged packet was sent: 423 latest_rtt = ack_time - send_time_of_largest_acked 424 An RTT sample is generated using only the largest acknowledged packet 425 in the received ACK frame. This is because a peer reports host 426 delays for only the largest acknowledged packet in an ACK frame. 427 While the reported host delay is not used by the RTT sample 428 measurement, it is used to adjust the RTT sample in subsequent 429 computations of smoothed_rtt and rttvar Section 5.3. 431 To avoid generating multiple RTT samples using the same packet, an 432 ACK frame SHOULD NOT be used to update RTT estimates if it does not 433 newly acknowledge the largest acknowledged packet. 435 An RTT sample MUST NOT be generated on receiving an ACK frame that 436 does not newly acknowledge at least one ack-eliciting packet. A peer 437 does not send an ACK frame on receiving only non-ack-eliciting 438 packets, so an ACK frame that is subsequently sent can include an 439 arbitrarily large Ack Delay field. Ignoring such ACK frames avoids 440 complications in subsequent smoothed_rtt and rttvar computations. 442 A sender might generate multiple RTT samples per RTT when multiple 443 ACK frames are received within an RTT. As suggested in [RFC6298], 444 doing so might result in inadequate history in smoothed_rtt and 445 rttvar. Ensuring that RTT estimates retain sufficient history is an 446 open research question. 448 5.2. Estimating min_rtt 450 min_rtt is the minimum RTT observed over the lifetime of the 451 connection. min_rtt is set to the latest_rtt on the first sample in 452 a connection, and to the lesser of min_rtt and latest_rtt on 453 subsequent samples. 455 An endpoint uses only locally observed times in computing the min_rtt 456 and does not adjust for host delays reported by the peer 457 (Section 4.4). Doing so allows the endpoint to set a lower bound for 458 the smoothed_rtt based entirely on what it observes (see 459 Section 5.3), and limits potential underestimation due to 460 erroneously-reported delays by the peer. 462 5.3. Estimating smoothed_rtt and rttvar 464 smoothed_rtt is an exponentially-weighted moving average of an 465 endpoint's RTT samples, and rttvar is the endpoint's estimated 466 variance in the RTT samples. 468 The calculation of smoothed_rtt uses path latency after adjusting RTT 469 samples for host delays (Section 4.4). For packets sent in the 470 ApplicationData packet number space, a peer limits any delay in 471 sending an acknowledgement for an ack-eliciting packet to no greater 472 than the value it advertised in the max_ack_delay transport 473 parameter. Consequently, when a peer reports an Ack Delay that is 474 greater than its max_ack_delay, the delay is attributed to reasons 475 out of the peer's control, such as scheduler latency at the peer or 476 loss of previous ACK frames. Any delays beyond the peer's 477 max_ack_delay are therefore considered effectively part of path delay 478 and incorporated into the smoothed_rtt estimate. 480 When adjusting an RTT sample using peer-reported acknowledgement 481 delays, an endpoint: 483 o MUST ignore the Ack Delay field of the ACK frame for packets sent 484 in the Initial and Handshake packet number space. 486 o MUST use the lesser of the value reported in Ack Delay field of 487 the ACK frame and the peer's max_ack_delay transport parameter 488 (Section 4.4). 490 o MUST NOT apply the adjustment if the resulting RTT sample is 491 smaller than the min_rtt. This limits the underestimation that a 492 misreporting peer can cause to the smoothed_rtt. 494 On the first RTT sample in a connection, the smoothed_rtt is set to 495 the latest_rtt. 497 smoothed_rtt and rttvar are computed as follows, similar to 498 [RFC6298]. On the first RTT sample in a connection: 500 smoothed_rtt = latest_rtt 501 rttvar = latest_rtt / 2 503 On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows: 505 ack_delay = min(Ack Delay in ACK Frame, max_ack_delay) 506 adjusted_rtt = latest_rtt 507 if (min_rtt + ack_delay < latest_rtt): 508 adjusted_rtt = latest_rtt - ack_delay 509 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt 510 rttvar_sample = abs(smoothed_rtt - adjusted_rtt) 511 rttvar = 3/4 * rttvar + 1/4 * rttvar_sample 513 6. Loss Detection 515 QUIC senders use both ack information and timeouts to detect lost 516 packets, and this section provides a description of these algorithms. 518 If a packet is lost, the QUIC transport needs to recover from that 519 loss, such as by retransmitting the data, sending an updated frame, 520 or abandoning the frame. For more information, see Section 13.2 of 521 [QUIC-TRANSPORT]. 523 6.1. Acknowledgement-based Detection 525 Acknowledgement-based loss detection implements the spirit of TCP's 526 Fast Retransmit [RFC5681], Early Retransmit [RFC5827], FACK [FACK], 527 SACK loss recovery [RFC6675], and RACK [RACK]. This section provides 528 an overview of how these algorithms are implemented in QUIC. 530 A packet is declared lost if it meets all the following conditions: 532 o The packet is unacknowledged, in-flight, and was sent prior to an 533 acknowledged packet. 535 o Either its packet number is kPacketThreshold smaller than an 536 acknowledged packet (Section 6.1.1), or it was sent long enough in 537 the past (Section 6.1.2). 539 The acknowledgement indicates that a packet sent later was delivered, 540 while the packet and time thresholds provide some tolerance for 541 packet reordering. 543 Spuriously declaring packets as lost leads to unnecessary 544 retransmissions and may result in degraded performance due to the 545 actions of the congestion controller upon detecting loss. 546 Implementations that detect spurious retransmissions and increase the 547 reordering threshold in packets or time MAY choose to start with 548 smaller initial reordering thresholds to minimize recovery latency. 550 6.1.1. Packet Threshold 552 The RECOMMENDED initial value for the packet reordering threshold 553 (kPacketThreshold) is 3, based on best practices for TCP loss 554 detection [RFC5681] [RFC6675]. 556 Some networks may exhibit higher degrees of reordering, causing a 557 sender to detect spurious losses. Implementers MAY use algorithms 558 developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's 559 reordering resilience. 561 6.1.2. Time Threshold 563 Once a later packet packet within the same packet number space has 564 been acknowledged, an endpoint SHOULD declare an earlier packet lost 565 if it was sent a threshold amount of time in the past. To avoid 566 declaring packets as lost too early, this time threshold MUST be set 567 to at least kGranularity. The time threshold is: 569 kTimeThreshold * max(SRTT, latest_RTT, kGranularity) 571 If packets sent prior to the largest acknowledged packet cannot yet 572 be declared lost, then a timer SHOULD be set for the remaining time. 574 Using max(SRTT, latest_RTT) protects from the two following cases: 576 o the latest RTT sample is lower than the SRTT, perhaps due to 577 reordering where the acknowledgement encountered a shorter path; 579 o the latest RTT sample is higher than the SRTT, perhaps due to a 580 sustained increase in the actual RTT, but the smoothed SRTT has 581 not yet caught up. 583 The RECOMMENDED time threshold (kTimeThreshold), expressed as a 584 round-trip time multiplier, is 9/8. 586 Implementations MAY experiment with absolute thresholds, thresholds 587 from previous connections, adaptive thresholds, or including RTT 588 variance. Smaller thresholds reduce reordering resilience and 589 increase spurious retransmissions, and larger thresholds increase 590 loss detection delay. 592 6.2. Crypto Retransmission Timeout 594 Data in CRYPTO frames is critical to QUIC transport and crypto 595 negotiation, so a more aggressive timeout is used to retransmit it. 597 The initial crypto retransmission timeout SHOULD be set to twice the 598 initial RTT. 600 At the beginning, there are no prior RTT samples within a connection. 601 Resumed connections over the same network SHOULD use the previous 602 connection's final smoothed RTT value as the resumed connection's 603 initial RTT. If no previous RTT is available, or if the network 604 changes, the initial RTT SHOULD be set to 500ms, resulting in a 1 605 second initial handshake timeout as recommended in [RFC6298]. 607 A connection MAY use the delay between sending a PATH_CHALLENGE and 608 receiving a PATH_RESPONSE to seed initial_rtt for a new path, but the 609 delay SHOULD NOT be considered an RTT sample. 611 When a crypto packet is sent, the sender MUST set a timer for twice 612 the smoothed RTT. This timer MUST be updated when a new crypto 613 packet is sent and when an acknowledgement is received which computes 614 a new RTT sample. Upon timeout, the sender MUST retransmit all 615 unacknowledged CRYPTO data if possible. The sender MUST NOT declare 616 in-flight crypto packets as lost when the crypto timer expires. 618 On each consecutive expiration of the crypto timer without receiving 619 an acknowledgement for a new packet, the sender MUST double the 620 crypto retransmission timeout and set a timer for this period. 622 Until the server has validated the client's address on the path, the 623 amount of data it can send is limited, as specified in Section 8.1 of 624 [QUIC-TRANSPORT]. If not all unacknowledged CRYPTO data can be sent, 625 then all unacknowledged CRYPTO data sent in Initial packets should be 626 retransmitted. If no data can be sent, then no alarm should be armed 627 until data has been received from the client. 629 Because the server could be blocked until more packets are received, 630 the client MUST ensure that the crypto retransmission timer is set if 631 there is unacknowledged crypto data or if the client does not yet 632 have 1-RTT keys. If the crypto retransmission timer expires before 633 the client has 1-RTT keys, it is possible that the client may not 634 have any crypto data to retransmit. However, the client MUST send a 635 new packet, containing only PADDING frames if necessary, to allow the 636 server to continue sending data. If Handshake keys are available to 637 the client, it MUST send a Handshake packet, and otherwise it MUST 638 send an Initial packet in a UDP datagram of at least 1200 bytes. 640 Because packets only containing PADDING do not elicit an 641 acknowledgement, they may never be acknowledged, but they are removed 642 from bytes in flight when the client gets Handshake keys and the 643 Initial keys are discarded. 645 The crypto retransmission timer is not set if the time threshold 646 Section 6.1.2 loss detection timer is set. The time threshold loss 647 detection timer is expected to both expire earlier than the crypto 648 retransmission timeout and be less likely to spuriously retransmit 649 data. The Initial and Handshake packet number spaces will typically 650 contain a small number of packets, so losses are less likely to be 651 detected using packet-threshold loss detection. 653 When the crypto retransmission timer is active, the probe timer 654 (Section 6.3) is not active. 656 6.3. Probe Timeout 658 A Probe Timeout (PTO) triggers a probe packet when ack-eliciting data 659 is in flight but an acknowledgement is not received within the 660 expected period of time. A PTO enables a connection to recover from 661 loss of tail packets or acks. The PTO algorithm used in QUIC 662 implements the reliability functions of Tail Loss Probe [TLP] [RACK], 663 RTO [RFC5681] and F-RTO algorithms for TCP [RFC5682], and the timeout 664 computation is based on TCP's retransmission timeout period 665 [RFC6298]. 667 6.3.1. Computing PTO 669 When an ack-eliciting packet is transmitted, the sender schedules a 670 timer for the PTO period as follows: 672 PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay 674 kGranularity, smoothed_rtt, rttvar, and max_ack_delay are defined in 675 Appendix A.2 and Appendix A.3. 677 The PTO period is the amount of time that a sender ought to wait for 678 an acknowledgement of a sent packet. This time period includes the 679 estimated network roundtrip-time (smoothed_rtt), the variance in the 680 estimate (4*rttvar), and max_ack_delay, to account for the maximum 681 time by which a receiver might delay sending an acknowledgement. 683 The PTO value MUST be set to at least kGranularity, to avoid the 684 timer expiring immediately. 686 When a PTO timer expires, the sender probes the network as described 687 in the next section. The PTO period MUST be set to twice its current 688 value. This exponential reduction in the sender's rate is important 689 because the PTOs might be caused by loss of packets or 690 acknowledgements due to severe congestion. 692 A sender computes its PTO timer every time an ack-eliciting packet is 693 sent. A sender might choose to optimize this by setting the timer 694 fewer times if it knows that more ack-eliciting packets will be sent 695 within a short period of time. 697 6.3.2. Sending Probe Packets 699 When a PTO timer expires, a sender MUST send at least one ack- 700 eliciting packet as a probe, unless there is no data available to 701 send. An endpoint MAY send up to two ack-eliciting packets, to avoid 702 an expensive consecutive PTO expiration due to a single packet loss. 704 It is possible that the sender has no new or previously-sent data to 705 send. As an example, consider the following sequence of events: new 706 application data is sent in a STREAM frame, deemed lost, then 707 retransmitted in a new packet, and then the original transmission is 708 acknowledged. In the absence of any new application data, a PTO 709 timer expiration now would find the sender with no new or previously- 710 sent data to send. 712 When there is no data to send, the sender SHOULD send a PING or other 713 ack-eliciting frame in a single packet, re-arming the PTO timer. 715 Alternatively, instead of sending an ack-eliciting packet, the sender 716 MAY mark any packets still in flight as lost. Doing so avoids 717 sending an additional packet, but increases the risk that loss is 718 declared too aggressively, resulting in an unnecessary rate reduction 719 by the congestion controller. 721 Consecutive PTO periods increase exponentially, and as a result, 722 connection recovery latency increases exponentially as packets 723 continue to be dropped in the network. Sending two packets on PTO 724 expiration increases resilience to packet drops, thus reducing the 725 probability of consecutive PTO events. 727 Probe packets sent on a PTO MUST be ack-eliciting. A probe packet 728 SHOULD carry new data when possible. A probe packet MAY carry 729 retransmitted unacknowledged data when new data is unavailable, when 730 flow control does not permit new data to be sent, or to 731 opportunistically reduce loss recovery delay. Implementations MAY 732 use alternate strategies for determining the content of probe 733 packets, including sending new or retransmitted data based on the 734 application's priorities. 736 When the PTO timer expires multiple times and new data cannot be 737 sent, implementations must choose between sending the same payload 738 every time or sending different payloads. Sending the same payload 739 may be simpler and ensures the highest priority frames arrive first. 740 Sending different payloads each time reduces the chances of spurious 741 retransmission. 743 6.3.3. Loss Detection 745 Delivery or loss of packets in flight is established when an ACK 746 frame is received that newly acknowledges one or more packets. 748 A PTO timer expiration event does not indicate packet loss and MUST 749 NOT cause prior unacknowledged packets to be marked as lost. When an 750 acknowledgement is received that newly acknowledges packets, loss 751 detection proceeds as dictated by packet and time threshold 752 mechanisms; see Section 6.1. 754 6.4. Retry and Version Negotiation 756 A Retry or Version Negotiation packet causes a client to send another 757 Initial packet, effectively restarting the connection process and 758 resetting congestion control and loss recovery state, including 759 resetting any pending timers. Either packet indicates that the 760 Initial was received but not processed. Neither packet can be 761 treated as an acknowledgment for the Initial. 763 The client MAY however compute an RTT estimate to the server as the 764 time period from when the first Initial was sent to when a Retry or a 765 Version Negotiation packet is received. The client MAY use this 766 value to seed the RTT estimator for a subsequent connection attempt 767 to the server. 769 6.5. Discarding Keys and Packet State 771 When packet protection keys are discarded (see Section 4.9 of 772 [QUIC-TLS]), all packets that were sent with those keys can no longer 773 be acknowledged because their acknowledgements cannot be processed 774 anymore. The sender MUST discard all recovery state associated with 775 those packets and MUST remove them from the count of bytes in flight. 777 Endpoints stop sending and receiving Initial packets once they start 778 exchanging Handshake packets (see Section 17.2.2.1 of 779 [QUIC-TRANSPORT]). At this point, recovery state for all in-flight 780 Initial packets is discarded. 782 When 0-RTT is rejected, recovery state for all in-flight 0-RTT 783 packets is discarded. 785 If a server accepts 0-RTT, but does not buffer 0-RTT packets that 786 arrive before Initial packets, early 0-RTT packets will be declared 787 lost, but that is expected to be infrequent. 789 It is expected that keys are discarded after packets encrypted with 790 them would be acknowledged or declared lost. Initial secrets however 791 might be destroyed sooner, as soon as handshake keys are available 792 (see Section 4.9.1 of [QUIC-TLS]). 794 6.6. Discussion 796 The majority of constants were derived from best common practices 797 among widely deployed TCP implementations on the internet. 798 Exceptions follow. 800 A shorter delayed ack time of 25ms was chosen because longer delayed 801 acks can delay loss recovery and for the small number of connections 802 where less than packet per 25ms is delivered, acking every packet is 803 beneficial to congestion control and loss recovery. 805 7. Congestion Control 807 QUIC's congestion control is based on TCP NewReno [RFC6582]. NewReno 808 is a congestion window based congestion control. QUIC specifies the 809 congestion window in bytes rather than packets due to finer control 810 and the ease of appropriate byte counting [RFC3465]. 812 QUIC hosts MUST NOT send packets if they would increase 813 bytes_in_flight (defined in Appendix B.2) beyond the available 814 congestion window, unless the packet is a probe packet sent after a 815 PTO timer expires, as described in Section 6.3. 817 Implementations MAY use other congestion control algorithms, such as 818 Cubic [RFC8312], and endpoints MAY use different algorithms from one 819 another. The signals QUIC provides for congestion control are 820 generic and are designed to support different algorithms. 822 7.1. Explicit Congestion Notification 824 If a path has been verified to support ECN, QUIC treats a Congestion 825 Experienced codepoint in the IP header as a signal of congestion. 826 This document specifies an endpoint's response when its peer receives 827 packets with the Congestion Experienced codepoint. As discussed in 828 [RFC8311], endpoints are permitted to experiment with other response 829 functions. 831 7.2. Slow Start 833 QUIC begins every connection in slow start and exits slow start upon 834 loss or upon increase in the ECN-CE counter. QUIC re-enters slow 835 start anytime the congestion window is less than ssthresh, which only 836 occurs after persistent congestion is declared. While in slow start, 837 QUIC increases the congestion window by the number of bytes 838 acknowledged when each acknowledgment is processed. 840 7.3. Congestion Avoidance 842 Slow start exits to congestion avoidance. Congestion avoidance in 843 NewReno uses an additive increase multiplicative decrease (AIMD) 844 approach that increases the congestion window by one maximum packet 845 size per congestion window acknowledged. When a loss is detected, 846 NewReno halves the congestion window and sets the slow start 847 threshold to the new congestion window. 849 7.4. Recovery Period 851 Recovery is a period of time beginning with detection of a lost 852 packet or an increase in the ECN-CE counter. Because QUIC does not 853 retransmit packets, it defines the end of recovery as a packet sent 854 after the start of recovery being acknowledged. This is slightly 855 different from TCP's definition of recovery, which ends when the lost 856 packet that started recovery is acknowledged. 858 The recovery period limits congestion window reduction to once per 859 round trip. During recovery, the congestion window remains unchanged 860 irrespective of new losses or increases in the ECN-CE counter. 862 7.5. Ignoring Loss of Undecryptable Packets 864 During the handshake, some packet protection keys might not be 865 available when a packet arrives. In particular, Handshake and 0-RTT 866 packets cannot be processed until the Initial packets arrive, and 867 1-RTT packets cannot be processed until the handshake completes. 868 Endpoints MAY ignore the loss of Handshake, 0-RTT, and 1-RTT packets 869 that might arrive before the peer has packet protection keys to 870 process those packets. 872 7.6. Probe Timeout 874 Probe packets MUST NOT be blocked by the congestion controller. A 875 sender MUST however count these packets as being additionally in 876 flight, since these packets add network load without establishing 877 packet loss. Note that sending probe packets might cause the 878 sender's bytes in flight to exceed the congestion window until an 879 acknowledgement is received that establishes loss or delivery of 880 packets. 882 7.7. Persistent Congestion 884 When an ACK frame is received that establishes loss of all in-flight 885 packets sent over a long enough period of time, the network is 886 considered to be experiencing persistent congestion. Commonly, this 887 can be established by consecutive PTOs, but since the PTO timer is 888 reset when a new ack-eliciting packet is sent, an explicit duration 889 must be used to account for those cases where PTOs do not occur or 890 are substantially delayed. This duration is computed as follows: 892 (smoothed_rtt + 4 * rttvar + max_ack_delay) * 893 kPersistentCongestionThreshold 895 For example, assume: 897 smoothed_rtt = 1 rttvar = 0 max_ack_delay = 0 898 kPersistentCongestionThreshold = 3 900 If an eck-eliciting packet is sent at time = 0, the following 901 scenario would illustrate persistent congestion: 903 +-----+------------------------+ 904 | t=0 | Send Pkt #1 (App Data) | 905 +-----+------------------------+ 906 | t=1 | Send Pkt #2 (PTO 1) | 907 | | | 908 | t=3 | Send Pkt #3 (PTO 2) | 909 | | | 910 | t=7 | Send Pkt #4 (PTO 3) | 911 | | | 912 | t=8 | Recv ACK of Pkt #4 | 913 +-----+------------------------+ 915 The first three packets are determined to be lost when the ACK of 916 packet 4 is received at t=8. The congestion period is calculated as 917 the time between the oldest and newest lost packets: (3 - 0) = 3. 918 The duration for persistent congestion is equal to: (1 * 919 kPersistentCongestionThreshold) = 3. Because the threshold was 920 reached and because none of the packets between the oldest and the 921 newest packets are acknowledged, the network is considered to have 922 experienced persistent congestion. 924 When persistent congestion is established, the sender's congestion 925 window MUST be reduced to the minimum congestion window 926 (kMinimumWindow). This response of collapsing the congestion window 927 on persistent congestion is functionally similar to a sender's 928 response on a Retransmission Timeout (RTO) in TCP [RFC5681] after 929 Tail Loss Probes (TLP) [TLP]. 931 7.8. Pacing 933 This document does not specify a pacer, but it is RECOMMENDED that a 934 sender pace sending of all in-flight packets based on input from the 935 congestion controller. For example, a pacer might distribute the 936 congestion window over the SRTT when used with a window-based 937 controller, and a pacer might use the rate estimate of a rate-based 938 controller. 940 An implementation should take care to architect its congestion 941 controller to work well with a pacer. For instance, a pacer might 942 wrap the congestion controller and control the availability of the 943 congestion window, or a pacer might pace out packets handed to it by 944 the congestion controller. Timely delivery of ACK frames is 945 important for efficient loss recovery. Packets containing only ACK 946 frames should therefore not be paced, to avoid delaying their 947 delivery to the peer. 949 As an example of a well-known and publicly available implementation 950 of a flow pacer, implementers are referred to the Fair Queue packet 951 scheduler (fq qdisc) in Linux (3.11 onwards). 953 7.9. Under-utilizing the Congestion Window 955 A congestion window that is under-utilized SHOULD NOT be increased in 956 either slow start or congestion avoidance. This can happen due to 957 insufficient application data or flow control credit. 959 A sender MAY use the pipeACK method described in section 4.3 of 960 [RFC7661] to determine if the congestion window is sufficiently 961 utilized. 963 A sender that paces packets (see Section 7.8) might delay sending 964 packets and not fully utilize the congestion window due to this 965 delay. A sender should not consider itself application limited if it 966 would have fully utilized the congestion window without pacing delay. 968 Bursting more than an intial window's worth of data into the network 969 might cause short-term congestion and losses. Implemementations 970 SHOULD either use pacing or reduce their congestion window to limit 971 such bursts. 973 A sender MAY implement alternate mechanisms to update its congestion 974 window after periods of under-utilization, such as those proposed for 975 TCP in [RFC7661]. 977 8. Security Considerations 979 8.1. Congestion Signals 981 Congestion control fundamentally involves the consumption of signals 982 - both loss and ECN codepoints - from unauthenticated entities. On- 983 path attackers can spoof or alter these signals. An attacker can 984 cause endpoints to reduce their sending rate by dropping packets, or 985 alter send rate by changing ECN codepoints. 987 8.2. Traffic Analysis 989 Packets that carry only ACK frames can be heuristically identified by 990 observing packet size. Acknowledgement patterns may expose 991 information about link characteristics or application behavior. 992 Endpoints can use PADDING frames or bundle acknowledgments with other 993 frames to reduce leaked information. 995 8.3. Misreporting ECN Markings 997 A receiver can misreport ECN markings to alter the congestion 998 response of a sender. Suppressing reports of ECN-CE markings could 999 cause a sender to increase their send rate. This increase could 1000 result in congestion and loss. 1002 A sender MAY attempt to detect suppression of reports by marking 1003 occasional packets that they send with ECN-CE. If a packet marked 1004 with ECN-CE is not reported as having been marked when the packet is 1005 acknowledged, the sender SHOULD then disable ECN for that path. 1007 Reporting additional ECN-CE markings will cause a sender to reduce 1008 their sending rate, which is similar in effect to advertising reduced 1009 connection flow control limits and so no advantage is gained by doing 1010 so. 1012 Endpoints choose the congestion controller that they use. Though 1013 congestion controllers generally treat reports of ECN-CE markings as 1014 equivalent to loss [RFC8311], the exact response for each controller 1015 could be different. Failure to correctly respond to information 1016 about ECN markings is therefore difficult to detect. 1018 9. IANA Considerations 1020 This document has no IANA actions. Yet. 1022 10. References 1024 10.1. Normative References 1026 [QUIC-TLS] 1027 Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure 1028 QUIC", draft-ietf-quic-tls-22 (work in progress), July 1029 2019. 1031 [QUIC-TRANSPORT] 1032 Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 1033 Multiplexed and Secure Transport", draft-ietf-quic- 1034 transport-22 (work in progress), July 2019. 1036 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1037 Requirement Levels", BCP 14, RFC 2119, 1038 DOI 10.17487/RFC2119, March 1997, 1039 . 1041 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1042 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1043 May 2017, . 1045 [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion 1046 Notification (ECN) Experimentation", RFC 8311, 1047 DOI 10.17487/RFC8311, January 2018, 1048 . 1050 10.2. Informative References 1052 [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement: 1053 Refining TCP Congestion Control", ACM SIGCOMM , August 1054 1996. 1056 [RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: 1057 a time-based fast loss detection algorithm for TCP", 1058 draft-ietf-tcpm-rack-05 (work in progress), April 2019. 1060 [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte 1061 Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February 1062 2003, . 1064 [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, 1065 "Improving the Robustness of TCP to Non-Congestion 1066 Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, 1067 . 1069 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1070 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 1071 . 1073 [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, 1074 "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting 1075 Spurious Retransmission Timeouts with TCP", RFC 5682, 1076 DOI 10.17487/RFC5682, September 2009, 1077 . 1079 [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and 1080 P. Hurtig, "Early Retransmit for TCP and Stream Control 1081 Transmission Protocol (SCTP)", RFC 5827, 1082 DOI 10.17487/RFC5827, May 2010, 1083 . 1085 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 1086 "Computing TCP's Retransmission Timer", RFC 6298, 1087 DOI 10.17487/RFC6298, June 2011, 1088 . 1090 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 1091 NewReno Modification to TCP's Fast Recovery Algorithm", 1092 RFC 6582, DOI 10.17487/RFC6582, April 2012, 1093 . 1095 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 1096 and Y. Nishida, "A Conservative Loss Recovery Algorithm 1097 Based on Selective Acknowledgment (SACK) for TCP", 1098 RFC 6675, DOI 10.17487/RFC6675, August 2012, 1099 . 1101 [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, 1102 "Increasing TCP's Initial Window", RFC 6928, 1103 DOI 10.17487/RFC6928, April 2013, 1104 . 1106 [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating 1107 TCP to Support Rate-Limited Traffic", RFC 7661, 1108 DOI 10.17487/RFC7661, October 2015, 1109 . 1111 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 1112 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 1113 RFC 8312, DOI 10.17487/RFC8312, February 2018, 1114 . 1116 [TLP] Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis, 1117 "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of 1118 Tail Losses", draft-dukkipati-tcpm-tcp-loss-probe-01 (work 1119 in progress), February 2013. 1121 10.3. URIs 1123 [1] https://mailarchive.ietf.org/arch/search/?email_list=quic 1125 [2] https://github.com/quicwg 1127 [3] https://github.com/quicwg/base-drafts/labels/-recovery 1129 Appendix A. Loss Recovery Pseudocode 1131 We now describe an example implementation of the loss detection 1132 mechanisms described in Section 6. 1134 A.1. Tracking Sent Packets 1136 To correctly implement congestion control, a QUIC sender tracks every 1137 ack-eliciting packet until the packet is acknowledged or lost. It is 1138 expected that implementations will be able to access this information 1139 by packet number and crypto context and store the per-packet fields 1140 (Appendix A.1.1) for loss recovery and congestion control. 1142 After a packet is declared lost, the endpoint can track it for an 1143 amount of time comparable to the maximum expected packet reordering, 1144 such as 1 RTT. This allows for detection of spurious 1145 retransmissions. 1147 Sent packets are tracked for each packet number space, and ACK 1148 processing only applies to a single space. 1150 A.1.1. Sent Packet Fields 1152 packet_number: The packet number of the sent packet. 1154 ack_eliciting: A boolean that indicates whether a packet is ack- 1155 eliciting. If true, it is expected that an acknowledgement will 1156 be received, though the peer could delay sending the ACK frame 1157 containing it by up to the MaxAckDelay. 1159 in_flight: A boolean that indicates whether the packet counts 1160 towards bytes in flight. 1162 is_crypto_packet: A boolean that indicates whether the packet 1163 contains cryptographic handshake messages critical to the 1164 completion of the QUIC handshake. In this version of QUIC, this 1165 includes any packet with the long header that includes a CRYPTO 1166 frame. 1168 sent_bytes: The number of bytes sent in the packet, not including 1169 UDP or IP overhead, but including QUIC framing overhead. 1171 time_sent: The time the packet was sent. 1173 A.2. Constants of interest 1175 Constants used in loss recovery are based on a combination of RFCs, 1176 papers, and common practice. Some may need to be changed or 1177 negotiated in order to better suit a variety of environments. 1179 kPacketThreshold: Maximum reordering in packets before packet 1180 threshold loss detection considers a packet lost. The RECOMMENDED 1181 value is 3. 1183 kTimeThreshold: Maximum reordering in time before time threshold 1184 loss detection considers a packet lost. Specified as an RTT 1185 multiplier. The RECOMMENDED value is 9/8. 1187 kGranularity: Timer granularity. This is a system-dependent value. 1188 However, implementations SHOULD use a value no smaller than 1ms. 1190 kInitialRtt: The RTT used before an RTT sample is taken. The 1191 RECOMMENDED value is 500ms. 1193 kPacketNumberSpace: An enum to enumerate the three packet number 1194 spaces. 1196 enum kPacketNumberSpace { 1197 Initial, 1198 Handshake, 1199 ApplicationData, 1200 } 1202 A.3. Variables of interest 1204 Variables required to implement the congestion control mechanisms are 1205 described in this section. 1207 loss_detection_timer: Multi-modal timer used for loss detection. 1209 crypto_count: The number of times all unacknowledged CRYPTO data has 1210 been retransmitted without receiving an ack. 1212 pto_count: The number of times a PTO has been sent without receiving 1213 an ack. 1215 time_of_last_sent_ack_eliciting_packet: The time the most recent 1216 ack-eliciting packet was sent. 1218 time_of_last_sent_crypto_packet: The time the most recent crypto 1219 packet was sent. 1221 largest_acked_packet[kPacketNumberSpace]: The largest packet number 1222 acknowledged in the packet number space so far. 1224 latest_rtt: The most recent RTT measurement made when receiving an 1225 ack for a previously unacked packet. 1227 smoothed_rtt: The smoothed RTT of the connection, computed as 1228 described in [RFC6298] 1230 rttvar: The RTT variance, computed as described in [RFC6298] 1231 min_rtt: The minimum RTT seen in the connection, ignoring ack delay. 1233 max_ack_delay: The maximum amount of time by which the receiver 1234 intends to delay acknowledgments for packets in the 1235 ApplicationData packet number space. The actual ack_delay in a 1236 received ACK frame may be larger due to late timers, reordering, 1237 or lost ACKs. 1239 loss_time[kPacketNumberSpace]: The time at which the next packet in 1240 that packet number space will be considered lost based on 1241 exceeding the reordering window in time. 1243 sent_packets[kPacketNumberSpace]: An association of packet numbers 1244 in a packet number space to information about them. Described in 1245 detail above in Appendix A.1. 1247 A.4. Initialization 1249 At the beginning of the connection, initialize the loss detection 1250 variables as follows: 1252 loss_detection_timer.reset() 1253 crypto_count = 0 1254 pto_count = 0 1255 latest_rtt = 0 1256 smoothed_rtt = 0 1257 rttvar = 0 1258 min_rtt = 0 1259 max_ack_delay = 0 1260 time_of_last_sent_ack_eliciting_packet = 0 1261 time_of_last_sent_crypto_packet = 0 1262 for pn_space in [ Initial, Handshake, ApplicationData ]: 1263 largest_acked_packet[pn_space] = infinite 1264 loss_time[pn_space] = 0 1266 A.5. On Sending a Packet 1268 After a packet is sent, information about the packet is stored. The 1269 parameters to OnPacketSent are described in detail above in 1270 Appendix A.1.1. 1272 Pseudocode for OnPacketSent follows: 1274 OnPacketSent(packet_number, pn_space, ack_eliciting, 1275 in_flight, is_crypto_packet, sent_bytes): 1276 sent_packets[pn_space][packet_number].packet_number = 1277 packet_number 1278 sent_packets[pn_space][packet_number].time_sent = now 1279 sent_packets[pn_space][packet_number].ack_eliciting = 1280 ack_eliciting 1281 sent_packets[pn_space][packet_number].in_flight = in_flight 1282 if (in_flight): 1283 if (is_crypto_packet): 1284 time_of_last_sent_crypto_packet = now 1285 if (ack_eliciting): 1286 time_of_last_sent_ack_eliciting_packet = now 1287 OnPacketSentCC(sent_bytes) 1288 sent_packets[pn_space][packet_number].size = sent_bytes 1289 SetLossDetectionTimer() 1291 A.6. On Receiving an Acknowledgment 1293 When an ACK frame is received, it may newly acknowledge any number of 1294 packets. 1296 Pseudocode for OnAckReceived and UpdateRtt follow: 1298 OnAckReceived(ack, pn_space): 1299 if (largest_acked_packet[pn_space] == infinite): 1300 largest_acked_packet[pn_space] = ack.largest_acked 1301 else: 1302 largest_acked_packet[pn_space] = 1303 max(largest_acked_packet[pn_space], ack.largest_acked) 1305 // Nothing to do if there are no newly acked packets. 1306 newly_acked_packets = DetermineNewlyAckedPackets(ack, pn_space) 1307 if (newly_acked_packets.empty()): 1308 return 1310 // If the largest acknowledged is newly acked and 1311 // at least one ack-eliciting was newly acked, update the RTT. 1312 if (sent_packets[pn_space][ack.largest_acked] && 1313 IncludesAckEliciting(newly_acked_packets)) 1314 latest_rtt = 1315 now - sent_packets[pn_space][ack.largest_acked].time_sent 1316 ack_delay = 0 1317 if pn_space == ApplicationData: 1318 ack_delay = ack.ack_delay 1319 UpdateRtt(ack_delay) 1321 // Process ECN information if present. 1323 if (ACK frame contains ECN information): 1324 ProcessECN(ack) 1326 for acked_packet in newly_acked_packets: 1327 OnPacketAcked(acked_packet.packet_number, pn_space) 1329 DetectLostPackets(pn_space) 1331 crypto_count = 0 1332 pto_count = 0 1334 SetLossDetectionTimer() 1336 UpdateRtt(ack_delay): 1337 // First RTT sample. 1338 if (smoothed_rtt == 0): 1339 min_rtt = latest_rtt 1340 smoothed_rtt = latest_rtt 1341 rttvar = latest_rtt / 2 1342 return 1344 // min_rtt ignores ack delay. 1345 min_rtt = min(min_rtt, latest_rtt) 1346 // Limit ack_delay by max_ack_delay 1347 ack_delay = min(ack_delay, max_ack_delay) 1348 // Adjust for ack delay if plausible. 1349 adjusted_rtt = latest_rtt 1350 if (latest_rtt > min_rtt + ack_delay): 1351 adjusted_rtt = latest_rtt - ack_delay 1353 rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt) 1354 smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt 1356 A.7. On Packet Acknowledgment 1358 When a packet is acknowledged for the first time, the following 1359 OnPacketAcked function is called. Note that a single ACK frame may 1360 newly acknowledge several packets. OnPacketAcked must be called once 1361 for each of these newly acknowledged packets. 1363 OnPacketAcked takes two parameters: acked_packet, which is the struct 1364 detailed in Appendix A.1.1, and the packet number space that this ACK 1365 frame was sent for. 1367 Pseudocode for OnPacketAcked follows: 1369 OnPacketAcked(acked_packet, pn_space): 1370 if (acked_packet.in_flight): 1371 OnPacketAckedCC(acked_packet) 1372 sent_packets[pn_space].remove(acked_packet.packet_number) 1374 A.8. Setting the Loss Detection Timer 1376 QUIC loss detection uses a single timer for all timeout loss 1377 detection. The duration of the timer is based on the timer's mode, 1378 which is set in the packet and timer events further below. The 1379 function SetLossDetectionTimer defined below shows how the single 1380 timer is set. 1382 This algorithm may result in the timer being set in the past, 1383 particularly if timers wake up late. Timers set in the past SHOULD 1384 fire immediately. 1386 Pseudocode for SetLossDetectionTimer follows: 1388 // Returns the earliest loss_time and the packet number 1389 // space it's from. Returns 0 if all times are 0. 1390 GetEarliestLossTime(): 1391 time = loss_time[Initial] 1392 space = Initial 1393 for pn_space in [ Handshake, ApplicationData ]: 1394 if loss_time[pn_space] != 0 && 1395 (time == 0 || loss_time[pn_space] < time): 1396 time = loss_time[pn_space]; 1397 space = pn_space 1398 return time, space 1400 SetLossDetectionTimer(): 1401 loss_time, _ = GetEarliestLossTime() 1402 if (loss_time != 0): 1403 // Time threshold loss detection. 1404 loss_detection_timer.update(loss_time) 1405 return 1407 if (has unacknowledged crypto data 1408 || endpoint is client without 1-RTT keys): 1409 // Crypto retransmission timer. 1410 if (smoothed_rtt == 0): 1411 timeout = 2 * kInitialRtt 1412 else: 1413 timeout = 2 * smoothed_rtt 1414 timeout = max(timeout, kGranularity) 1415 timeout = timeout * (2 ^ crypto_count) 1416 loss_detection_timer.update( 1417 time_of_last_sent_crypto_packet + timeout) 1418 return 1420 // Don't arm timer if there are no ack-eliciting packets 1421 // in flight. 1422 if (no ack-eliciting packets in flight): 1423 loss_detection_timer.cancel() 1424 return 1426 // Calculate PTO duration 1427 timeout = 1428 smoothed_rtt + max(4 * rttvar, kGranularity) + max_ack_delay 1429 timeout = timeout * (2 ^ pto_count) 1431 loss_detection_timer.update( 1432 time_of_last_sent_ack_eliciting_packet + timeout) 1434 A.9. On Timeout 1436 When the loss detection timer expires, the timer's mode determines 1437 the action to be performed. 1439 Pseudocode for OnLossDetectionTimeout follows: 1441 OnLossDetectionTimeout(): 1442 loss_time, pn_space = GetEarliestLossTime() 1443 if (loss_time != 0): 1444 // Time threshold loss Detection 1445 DetectLostPackets(pn_space) 1446 // Retransmit crypto data if no packets were lost 1447 // and there is crypto data to retransmit. 1448 else if (has unacknowledged crypto data): 1449 // Crypto retransmission timeout. 1450 RetransmitUnackedCryptoData() 1451 crypto_count++ 1452 else if (endpoint is client without 1-RTT keys): 1453 // Client sends an anti-deadlock packet: Initial is padded 1454 // to earn more anti-amplification credit, 1455 // a Handshake packet proves address ownership. 1456 if (has Handshake keys): 1457 SendOneHandshakePacket() 1458 else: 1459 SendOnePaddedInitialPacket() 1460 crypto_count++ 1461 else: 1462 // PTO. Send new data if available, else retransmit old data. 1463 // If neither is available, send a single PING frame. 1464 SendOneOrTwoPackets() 1465 pto_count++ 1467 SetLossDetectionTimer() 1469 A.10. Detecting Lost Packets 1471 DetectLostPackets is called every time an ACK is received and 1472 operates on the sent_packets for that packet number space. 1474 Pseudocode for DetectLostPackets follows: 1476 DetectLostPackets(pn_space): 1477 assert(largest_acked_packet[pn_space] != infinite) 1478 loss_time[pn_space] = 0 1479 lost_packets = {} 1480 loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt) 1482 // Minimum time of kGranularity before packets are deemed lost. 1483 loss_delay = max(loss_delay, kGranularity) 1485 // Packets sent before this time are deemed lost. 1486 lost_send_time = now() - loss_delay 1488 foreach unacked in sent_packets[pn_space]: 1489 if (unacked.packet_number > largest_acked_packet[pn_space]): 1490 continue 1492 // Mark packet as lost, or set time when it should be marked. 1493 if (unacked.time_sent <= lost_send_time || 1494 largest_acked_packet[pn_space] >= 1495 unacked.packet_number + kPacketThreshold): 1496 sent_packets[pn_space].remove(unacked.packet_number) 1497 if (unacked.in_flight): 1498 lost_packets.insert(unacked) 1499 else: 1500 if (loss_time[pn_space] == 0): 1501 loss_time[pn_space] = unacked.time_sent + loss_delay 1502 else: 1503 loss_time[pn_space] = min(loss_time[pn_space], 1504 unacked.time_sent + loss_delay) 1506 // Inform the congestion controller of lost packets and 1507 // let it decide whether to retransmit immediately. 1508 if (!lost_packets.empty()): 1509 OnPacketsLost(lost_packets) 1511 Appendix B. Congestion Control Pseudocode 1513 We now describe an example implementation of the congestion 1514 controller described in Section 7. 1516 B.1. Constants of interest 1518 Constants used in congestion control are based on a combination of 1519 RFCs, papers, and common practice. Some may need to be changed or 1520 negotiated in order to better suit a variety of environments. 1522 kMaxDatagramSize: The sender's maximum payload size. Does not 1523 include UDP or IP overhead. The max packet size is used for 1524 calculating initial and minimum congestion windows. The 1525 RECOMMENDED value is 1200 bytes. 1527 kInitialWindow: Default limit on the initial amount of data in 1528 flight, in bytes. Taken from [RFC6928], but increased slightly to 1529 account for the smaller 8 byte overhead of UDP vs 20 bytes for 1530 TCP. The RECOMMENDED value is the minimum of 10 * 1531 kMaxDatagramSize and max(2* kMaxDatagramSize, 14720)). 1533 kMinimumWindow: Minimum congestion window in bytes. The RECOMMENDED 1534 value is 2 * kMaxDatagramSize. 1536 kLossReductionFactor: Reduction in congestion window when a new loss 1537 event is detected. The RECOMMENDED value is 0.5. 1539 kPersistentCongestionThreshold: Period of time for persistent 1540 congestion to be established, specified as a PTO multiplier. The 1541 rationale for this threshold is to enable a sender to use initial 1542 PTOs for aggressive probing, as TCP does with Tail Loss Probe 1543 (TLP) [TLP] [RACK], before establishing persistent congestion, as 1544 TCP does with a Retransmission Timeout (RTO) [RFC5681]. The 1545 RECOMMENDED value for kPersistentCongestionThreshold is 3, which 1546 is approximately equivalent to having two TLPs before an RTO in 1547 TCP. 1549 B.2. Variables of interest 1551 Variables required to implement the congestion control mechanisms are 1552 described in this section. 1554 ecn_ce_counter: The highest value reported for the ECN-CE counter by 1555 the peer in an ACK frame. This variable is used to detect 1556 increases in the reported ECN-CE counter. 1558 bytes_in_flight: The sum of the size in bytes of all sent packets 1559 that contain at least one ack-eliciting or PADDING frame, and have 1560 not been acked or declared lost. The size does not include IP or 1561 UDP overhead, but does include the QUIC header and AEAD overhead. 1562 Packets only containing ACK frames do not count towards 1563 bytes_in_flight to ensure congestion control does not impede 1564 congestion feedback. 1566 congestion_window: Maximum number of bytes-in-flight that may be 1567 sent. 1569 congestion_recovery_start_time: The time when QUIC first detects 1570 congestion due to loss or ECN, causing it to enter congestion 1571 recovery. When a packet sent after this time is acknowledged, 1572 QUIC exits congestion recovery. 1574 ssthresh: Slow start threshold in bytes. When the congestion window 1575 is below ssthresh, the mode is slow start and the window grows by 1576 the number of bytes acknowledged. 1578 B.3. Initialization 1580 At the beginning of the connection, initialize the congestion control 1581 variables as follows: 1583 congestion_window = kInitialWindow 1584 bytes_in_flight = 0 1585 congestion_recovery_start_time = 0 1586 ssthresh = infinite 1587 ecn_ce_counter = 0 1589 B.4. On Packet Sent 1591 Whenever a packet is sent, and it contains non-ACK frames, the packet 1592 increases bytes_in_flight. 1594 OnPacketSentCC(bytes_sent): 1595 bytes_in_flight += bytes_sent 1597 B.5. On Packet Acknowledgement 1599 Invoked from loss detection's OnPacketAcked and is supplied with the 1600 acked_packet from sent_packets. 1602 InCongestionRecovery(sent_time): 1603 return sent_time <= congestion_recovery_start_time 1605 OnPacketAckedCC(acked_packet): 1606 // Remove from bytes_in_flight. 1607 bytes_in_flight -= acked_packet.size 1608 if (InCongestionRecovery(acked_packet.time_sent)): 1609 // Do not increase congestion window in recovery period. 1610 return 1611 if (IsAppLimited()) 1612 // Do not increase congestion_window if application 1613 // limited. 1614 return 1615 if (congestion_window < ssthresh): 1616 // Slow start. 1617 congestion_window += acked_packet.size 1618 else: 1619 // Congestion avoidance. 1620 congestion_window += kMaxDatagramSize * acked_packet.size 1621 / congestion_window 1623 B.6. On New Congestion Event 1625 Invoked from ProcessECN and OnPacketsLost when a new congestion event 1626 is detected. May start a new recovery period and reduces the 1627 congestion window. 1629 CongestionEvent(sent_time): 1630 // Start a new congestion event if packet was sent after the 1631 // start of the previous congestion recovery period. 1632 if (!InCongestionRecovery(sent_time)): 1633 congestion_recovery_start_time = Now() 1634 congestion_window *= kLossReductionFactor 1635 congestion_window = max(congestion_window, kMinimumWindow) 1636 ssthresh = congestion_window 1638 B.7. Process ECN Information 1640 Invoked when an ACK frame with an ECN section is received from the 1641 peer. 1643 ProcessECN(ack): 1644 // If the ECN-CE counter reported by the peer has increased, 1645 // this could be a new congestion event. 1646 if (ack.ce_counter > ecn_ce_counter): 1647 ecn_ce_counter = ack.ce_counter 1648 CongestionEvent(sent_packets[ack.largest_acked].time_sent) 1650 B.8. On Packets Lost 1652 Invoked from DetectLostPackets when packets are deemed lost. 1654 InPersistentCongestion(largest_lost_packet): 1655 pto = smoothed_rtt + max(4 * rttvar, kGranularity) + 1656 max_ack_delay 1657 congestion_period = pto * kPersistentCongestionThreshold 1658 // Determine if all packets in the time period before the 1659 // newest lost packet, including the edges, are marked 1660 // lost 1661 return AreAllPacketsLost(largest_lost_packet, 1662 congestion_period) 1664 OnPacketsLost(lost_packets): 1665 // Remove lost packets from bytes_in_flight. 1666 for (lost_packet : lost_packets): 1667 bytes_in_flight -= lost_packet.size 1668 largest_lost_packet = lost_packets.last() 1669 CongestionEvent(largest_lost_packet.time_sent) 1671 // Collapse congestion window if persistent congestion 1672 if (InPersistentCongestion(largest_lost_packet)): 1673 congestion_window = kMinimumWindow 1675 Appendix C. Change Log 1677 *RFC Editor's Note:* Please remove this section prior to 1678 publication of a final version of this document. 1680 Issue and pull request numbers are listed with a leading octothorp. 1682 C.1. Since draft-ietf-quic-recovery-21 1684 o No changes 1686 C.2. Since draft-ietf-quic-recovery-20 1688 o Path validation can be used as initial RTT value (#2644, #2687) 1690 o max_ack_delay transport parameter defaults to 0 (#2638, #2646) 1692 o Ack Delay only measures intentional delays induced by the 1693 implementation (#2596, #2786) 1695 C.3. Since draft-ietf-quic-recovery-19 1697 o Change kPersistentThreshold from an exponent to a multiplier 1698 (#2557) 1700 o Send a PING if the PTO timer fires and there's nothing to send 1701 (#2624) 1703 o Set loss delay to at least kGranularity (#2617) 1705 o Merge application limited and sending after idle sections. Always 1706 limit burst size instead of requiring resetting CWND to initial 1707 CWND after idle (#2605) 1709 o Rewrite RTT estimation, allow RTT samples where a newly acked 1710 packet is ack-eliciting but the largest_acked is not (#2592) 1712 o Don't arm the handshake timer if there is no handshake data 1713 (#2590) 1715 o Clarify that the time threshold loss alarm takes precedence over 1716 the crypto handshake timer (#2590, #2620) 1718 o Change initial RTT to 500ms to align with RFC6298 (#2184) 1720 C.4. Since draft-ietf-quic-recovery-18 1722 o Change IW byte limit to 14720 from 14600 (#2494) 1724 o Update PTO calculation to match RFC6298 (#2480, #2489, #2490) 1726 o Improve loss detection's description of multiple packet number 1727 spaces and pseudocode (#2485, #2451, #2417) 1729 o Declare persistent congestion even if non-probe packets are sent 1730 and don't make persistent congestion more aggressive than RTO 1731 verified was (#2365, #2244) 1733 o Move pseudocode to the appendices (#2408) 1735 o What to send on multiple PTOs (#2380) 1737 C.5. Since draft-ietf-quic-recovery-17 1739 o After Probe Timeout discard in-flight packets or send another 1740 (#2212, #1965) 1742 o Endpoints discard initial keys as soon as handshake keys are 1743 available (#1951, #2045) 1745 o 0-RTT state is discarded when 0-RTT is rejected (#2300) 1747 o Loss detection timer is cancelled when ack-eliciting frames are in 1748 flight (#2117, #2093) 1750 o Packets are declared lost if they are in flight (#2104) 1752 o After becoming idle, either pace packets or reset the congestion 1753 controller (#2138, 2187) 1755 o Process ECN counts before marking packets lost (#2142) 1757 o Mark packets lost before resetting crypto_count and pto_count 1758 (#2208, #2209) 1760 o Congestion and loss recovery state are discarded when keys are 1761 discarded (#2327) 1763 C.6. Since draft-ietf-quic-recovery-16 1765 o Unify TLP and RTO into a single PTO; eliminate min RTO, min TLP 1766 and min crypto timeouts; eliminate timeout validation (#2114, 1767 #2166, #2168, #1017) 1769 o Redefine how congestion avoidance in terms of when the period 1770 starts (#1928, #1930) 1772 o Document what needs to be tracked for packets that are in flight 1773 (#765, #1724, #1939) 1775 o Integrate both time and packet thresholds into loss detection 1776 (#1969, #1212, #934, #1974) 1778 o Reduce congestion window after idle, unless pacing is used (#2007, 1779 #2023) 1781 o Disable RTT calculation for packets that don't elicit 1782 acknowledgment (#2060, #2078) 1784 o Limit ack_delay by max_ack_delay (#2060, #2099) 1786 o Initial keys are discarded once Handshake are avaialble (#1951, 1787 #2045) 1789 o Reorder ECN and loss detection in pseudocode (#2142) 1790 o Only cancel loss detection timer if ack-eliciting packets are in 1791 flight (#2093, #2117) 1793 C.7. Since draft-ietf-quic-recovery-14 1795 o Used max_ack_delay from transport params (#1796, #1782) 1797 o Merge ACK and ACK_ECN (#1783) 1799 C.8. Since draft-ietf-quic-recovery-13 1801 o Corrected the lack of ssthresh reduction in CongestionEvent 1802 pseudocode (#1598) 1804 o Considerations for ECN spoofing (#1426, #1626) 1806 o Clarifications for PADDING and congestion control (#837, #838, 1807 #1517, #1531, #1540) 1809 o Reduce early retransmission timer to RTT/8 (#945, #1581) 1811 o Packets are declared lost after an RTO is verified (#935, #1582) 1813 C.9. Since draft-ietf-quic-recovery-12 1815 o Changes to manage separate packet number spaces and encryption 1816 levels (#1190, #1242, #1413, #1450) 1818 o Added ECN feedback mechanisms and handling; new ACK_ECN frame 1819 (#804, #805, #1372) 1821 C.10. Since draft-ietf-quic-recovery-11 1823 No significant changes. 1825 C.11. Since draft-ietf-quic-recovery-10 1827 o Improved text on ack generation (#1139, #1159) 1829 o Make references to TCP recovery mechanisms informational (#1195) 1831 o Define time_of_last_sent_handshake_packet (#1171) 1833 o Added signal from TLS the data it includes needs to be sent in a 1834 Retry packet (#1061, #1199) 1836 o Minimum RTT (min_rtt) is initialized with an infinite value 1837 (#1169) 1839 C.12. Since draft-ietf-quic-recovery-09 1841 No significant changes. 1843 C.13. Since draft-ietf-quic-recovery-08 1845 o Clarified pacing and RTO (#967, #977) 1847 C.14. Since draft-ietf-quic-recovery-07 1849 o Include Ack Delay in RTO(and TLP) computations (#981) 1851 o Ack Delay in SRTT computation (#961) 1853 o Default RTT and Slow Start (#590) 1855 o Many editorial fixes. 1857 C.15. Since draft-ietf-quic-recovery-06 1859 No significant changes. 1861 C.16. Since draft-ietf-quic-recovery-05 1863 o Add more congestion control text (#776) 1865 C.17. Since draft-ietf-quic-recovery-04 1867 No significant changes. 1869 C.18. Since draft-ietf-quic-recovery-03 1871 No significant changes. 1873 C.19. Since draft-ietf-quic-recovery-02 1875 o Integrate F-RTO (#544, #409) 1877 o Add congestion control (#545, #395) 1879 o Require connection abort if a skipped packet was acknowledged 1880 (#415) 1882 o Simplify RTO calculations (#142, #417) 1884 C.20. Since draft-ietf-quic-recovery-01 1886 o Overview added to loss detection 1888 o Changes initial default RTT to 100ms 1890 o Added time-based loss detection and fixes early retransmit 1892 o Clarified loss recovery for handshake packets 1894 o Fixed references and made TCP references informative 1896 C.21. Since draft-ietf-quic-recovery-00 1898 o Improved description of constants and ACK behavior 1900 C.22. Since draft-iyengar-quic-loss-recovery-01 1902 o Adopted as base for draft-ietf-quic-recovery 1904 o Updated authors/editors list 1906 o Added table of contents 1908 Acknowledgments 1910 Authors' Addresses 1912 Jana Iyengar (editor) 1913 Fastly 1915 Email: jri.ietf@gmail.com 1917 Ian Swett (editor) 1918 Google 1920 Email: ianswett@google.com