idnits 2.17.1 draft-ietf-quic-recovery-28.txt: -(2081): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (20 May 2020) is 1437 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'Initial' is mentioned on line 1468, but not defined

  == Outdated reference: A later version (-34) exists of
     draft-ietf-quic-tls-28

  == Outdated reference: A later version (-34) exists of
     draft-ietf-quic-transport-28

  == Outdated reference: A later version (-15) exists of
     draft-ietf-tcpm-rack-08

  -- Obsolete informational reference (is this intentional?): RFC 8312
     (Obsoleted by RFC 9438)


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	QUIC                                                     J. Iyengar, Ed.
3	Internet-Draft                                                    Fastly
4	Intended status: Standards Track                           I. Swett, Ed.
5	Expires: 21 November 2020                                         Google
6	                                                             20 May 2020

8	               QUIC Loss Detection and Congestion Control
9	                      draft-ietf-quic-recovery-28

11	Abstract

13	   This document describes loss detection and congestion control
14	   mechanisms for QUIC.

16	Note to Readers

18	   Discussion of this draft takes place on the QUIC working group
19	   mailing list (quic@ietf.org (mailto:quic@ietf.org)), which is
20	   archived at https://mailarchive.ietf.org/arch/
21	   search/?email_list=quic.

23	   Working Group information can be found at https://github.com/quicwg;
24	   source code and issues list for this draft can be found at
25	   https://github.com/quicwg/base-drafts/labels/-recovery.

27	Status of This Memo

29	   This Internet-Draft is submitted in full conformance with the
30	   provisions of BCP 78 and BCP 79.

32	   Internet-Drafts are working documents of the Internet Engineering
33	   Task Force (IETF).  Note that other groups may also distribute
34	   working documents as Internet-Drafts.  The list of current Internet-
35	   Drafts is at https://datatracker.ietf.org/drafts/current/.

37	   Internet-Drafts are draft documents valid for a maximum of six months
38	   and may be updated, replaced, or obsoleted by other documents at any
39	   time.  It is inappropriate to use Internet-Drafts as reference
40	   material or to cite them other than as "work in progress."

42	   This Internet-Draft will expire on 21 November 2020.

44	Copyright Notice

46	   Copyright (c) 2020 IETF Trust and the persons identified as the
47	   document authors.  All rights reserved.

49	   This document is subject to BCP 78 and the IETF Trust's Legal
50	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
51	   license-info) in effect on the date of publication of this document.
52	   Please review these documents carefully, as they describe your rights
53	   and restrictions with respect to this document.  Code Components
54	   extracted from this document must include Simplified BSD License text
55	   as described in Section 4.e of the Trust Legal Provisions and are
56	   provided without warranty as described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
61	   2.  Conventions and Definitions . . . . . . . . . . . . . . . . .   4
62	   3.  Design of the QUIC Transmission Machinery . . . . . . . . . .   5
63	     3.1.  Relevant Differences Between QUIC and TCP . . . . . . . .   5
64	       3.1.1.  Separate Packet Number Spaces . . . . . . . . . . . .   6
65	       3.1.2.  Monotonically Increasing Packet Numbers . . . . . . .   6
66	       3.1.3.  Clearer Loss Epoch  . . . . . . . . . . . . . . . . .   6
67	       3.1.4.  No Reneging . . . . . . . . . . . . . . . . . . . . .   7
68	       3.1.5.  More ACK Ranges . . . . . . . . . . . . . . . . . . .   7
69	       3.1.6.  Explicit Correction For Delayed Acknowledgements  . .   7
70	       3.1.7.  Probe Timeout Replaces RTO and TLP  . . . . . . . . .   7
71	       3.1.8.  The Minimum Congestion Window is Two Packets  . . . .   8
72	   4.  Estimating the Round-Trip Time  . . . . . . . . . . . . . . .   8
73	     4.1.  Generating RTT samples  . . . . . . . . . . . . . . . . .   8
74	     4.2.  Estimating min_rtt  . . . . . . . . . . . . . . . . . . .   9
75	     4.3.  Estimating smoothed_rtt and rttvar  . . . . . . . . . . .   9
76	   5.  Loss Detection  . . . . . . . . . . . . . . . . . . . . . . .  11
77	     5.1.  Acknowledgement-based Detection . . . . . . . . . . . . .  11
78	       5.1.1.  Packet Threshold  . . . . . . . . . . . . . . . . . .  11
79	       5.1.2.  Time Threshold  . . . . . . . . . . . . . . . . . . .  12
80	     5.2.  Probe Timeout . . . . . . . . . . . . . . . . . . . . . .  13
81	       5.2.1.  Computing PTO . . . . . . . . . . . . . . . . . . . .  13
82	       5.2.2.  Handshakes and New Paths  . . . . . . . . . . . . . .  14
83	       5.2.3.  Speeding Up Handshake Completion  . . . . . . . . . .  15
84	       5.2.4.  Sending Probe Packets . . . . . . . . . . . . . . . .  16
85	     5.3.  Handling Retry Packets  . . . . . . . . . . . . . . . . .  17
86	     5.4.  Discarding Keys and Packet State  . . . . . . . . . . . .  17
87	   6.  Congestion Control  . . . . . . . . . . . . . . . . . . . . .  18
88	     6.1.  Explicit Congestion Notification  . . . . . . . . . . . .  19
89	     6.2.  Initial and Minimum Congestion Window . . . . . . . . . .  19
90	     6.3.  Slow Start  . . . . . . . . . . . . . . . . . . . . . . .  19
91	     6.4.  Congestion Avoidance  . . . . . . . . . . . . . . . . . .  20
92	     6.5.  Recovery Period . . . . . . . . . . . . . . . . . . . . .  20
93	     6.6.  Ignoring Loss of Undecryptable Packets  . . . . . . . . .  20
94	     6.7.  Probe Timeout . . . . . . . . . . . . . . . . . . . . . .  21
95	     6.8.  Persistent Congestion . . . . . . . . . . . . . . . . . .  21
96	     6.9.  Pacing  . . . . . . . . . . . . . . . . . . . . . . . . .  22
97	     6.10. Under-utilizing the Congestion Window . . . . . . . . . .  23
98	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  24
99	     7.1.  Congestion Signals  . . . . . . . . . . . . . . . . . . .  24
100	     7.2.  Traffic Analysis  . . . . . . . . . . . . . . . . . . . .  24
101	     7.3.  Misreporting ECN Markings . . . . . . . . . . . . . . . .  24
102	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  25
103	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  25
104	     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  25
105	     9.2.  Informative References  . . . . . . . . . . . . . . . . .  25
106	   Appendix A.  Loss Recovery Pseudocode . . . . . . . . . . . . . .  27
107	     A.1.  Tracking Sent Packets . . . . . . . . . . . . . . . . . .  27
108	       A.1.1.  Sent Packet Fields  . . . . . . . . . . . . . . . . .  27
109	     A.2.  Constants of Interest . . . . . . . . . . . . . . . . . .  28
110	     A.3.  Variables of interest . . . . . . . . . . . . . . . . . .  28
111	     A.4.  Initialization  . . . . . . . . . . . . . . . . . . . . .  29
112	     A.5.  On Sending a Packet . . . . . . . . . . . . . . . . . . .  30
113	     A.6.  On Receiving a Datagram . . . . . . . . . . . . . . . . .  30
114	     A.7.  On Receiving an Acknowledgment  . . . . . . . . . . . . .  31
115	     A.8.  Setting the Loss Detection Timer  . . . . . . . . . . . .  32
116	     A.9.  On Timeout  . . . . . . . . . . . . . . . . . . . . . . .  34
117	     A.10. Detecting Lost Packets  . . . . . . . . . . . . . . . . .  35
118	   Appendix B.  Congestion Control Pseudocode  . . . . . . . . . . .  35
119	     B.1.  Constants of interest . . . . . . . . . . . . . . . . . .  36
120	     B.2.  Variables of interest . . . . . . . . . . . . . . . . . .  36
121	     B.3.  Initialization  . . . . . . . . . . . . . . . . . . . . .  37
122	     B.4.  On Packet Sent  . . . . . . . . . . . . . . . . . . . . .  37
123	     B.5.  On Packet Acknowledgement . . . . . . . . . . . . . . . .  37
124	     B.6.  On New Congestion Event . . . . . . . . . . . . . . . . .  38
125	     B.7.  Process ECN Information . . . . . . . . . . . . . . . . .  38
126	     B.8.  On Packets Lost . . . . . . . . . . . . . . . . . . . . .  39
127	     B.9.  Upon dropping Initial or Handshake keys . . . . . . . . .  39
128	   Appendix C.  Change Log . . . . . . . . . . . . . . . . . . . . .  40
129	     C.1.  Since draft-ietf-quic-recovery-27 . . . . . . . . . . . .  40
130	     C.2.  Since draft-ietf-quic-recovery-26 . . . . . . . . . . . .  40
131	     C.3.  Since draft-ietf-quic-recovery-25 . . . . . . . . . . . .  41
132	     C.4.  Since draft-ietf-quic-recovery-24 . . . . . . . . . . . .  41
133	     C.5.  Since draft-ietf-quic-recovery-23 . . . . . . . . . . . .  41
134	     C.6.  Since draft-ietf-quic-recovery-22 . . . . . . . . . . . .  41
135	     C.7.  Since draft-ietf-quic-recovery-21 . . . . . . . . . . . .  41
136	     C.8.  Since draft-ietf-quic-recovery-20 . . . . . . . . . . . .  41
137	     C.9.  Since draft-ietf-quic-recovery-19 . . . . . . . . . . . .  41
138	     C.10. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . .  42
139	     C.11. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . .  42
140	     C.12. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . .  43
141	     C.13. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . .  44
142	     C.14. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . .  44
143	     C.15. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . .  44
144	     C.16. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . .  44
145	     C.17. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . .  44
146	     C.18. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . .  45
147	     C.19. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . .  45
148	     C.20. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . .  45
149	     C.21. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . .  45
150	     C.22. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . .  45
151	     C.23. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . .  45
152	     C.24. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . .  45
153	     C.25. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . .  45
154	     C.26. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . .  46
155	     C.27. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . .  46
156	     C.28. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . .  46
157	   Appendix D.  Contributors . . . . . . . . . . . . . . . . . . . .  46
158	   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  46
159	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  46

161	1.  Introduction

163	   QUIC is a new multiplexed and secure transport protocol atop UDP,
164	   specified in [QUIC-TRANSPORT].  This document describes congestion
165	   control and loss recovery for QUIC.  Mechanisms described in this
166	   document follow the spirit of existing TCP congestion control and
167	   loss recovery mechanisms, described in RFCs, various Internet-drafts,
168	   or academic papers, and also those prevalent in TCP implementations.

170	2.  Conventions and Definitions

172	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
173	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
174	   "OPTIONAL" in this document are to be interpreted as described in
175	   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
176	   capitals, as shown here.

178	   Definitions of terms that are used in this document:

180	   Ack-eliciting Frames:  All frames other than ACK, PADDING, and
181	      CONNECTION_CLOSE are considered ack-eliciting.

183	   Ack-eliciting Packets:  Packets that contain ack-eliciting frames
184	      elicit an ACK from the receiver within the maximum ack delay and
185	      are called ack-eliciting packets.

187	   In-flight:  Packets are considered in-flight when they are ack-
188	      eliciting or contain a PADDING frame, and they have been sent but
189	      are not acknowledged, declared lost, or abandoned along with old
190	      keys.

192	3.  Design of the QUIC Transmission Machinery

194	   All transmissions in QUIC are sent with a packet-level header, which
195	   indicates the encryption level and includes a packet sequence number
196	   (referred to below as a packet number).  The encryption level
197	   indicates the packet number space, as described in [QUIC-TRANSPORT].
198	   Packet numbers never repeat within a packet number space for the
199	   lifetime of a connection.  Packet numbers are sent in monotonically
200	   increasing order within a space, preventing ambiguity.

202	   This design obviates the need for disambiguating between
203	   transmissions and retransmissions and eliminates significant
204	   complexity from QUIC's interpretation of TCP loss detection
205	   mechanisms.

207	   QUIC packets can contain multiple frames of different types.  The
208	   recovery mechanisms ensure that data and frames that need reliable
209	   delivery are acknowledged or declared lost and sent in new packets as
210	   necessary.  The types of frames contained in a packet affect recovery
211	   and congestion control logic:

213	   *  All packets are acknowledged, though packets that contain no ack-
214	      eliciting frames are only acknowledged along with ack-eliciting
215	      packets.

217	   *  Long header packets that contain CRYPTO frames are critical to the
218	      performance of the QUIC handshake and use shorter timers for
219	      acknowledgement.

221	   *  Packets containing frames besides ACK or CONNECTION_CLOSE frames
222	      count toward congestion control limits and are considered in-
223	      flight.

225	   *  PADDING frames cause packets to contribute toward bytes in flight
226	      without directly causing an acknowledgment to be sent.

228	3.1.  Relevant Differences Between QUIC and TCP

230	   Readers familiar with TCP's loss detection and congestion control
231	   will find algorithms here that parallel well-known TCP ones.
232	   Protocol differences between QUIC and TCP however contribute to
233	   algorithmic differences.  We briefly describe these protocol
234	   differences below.

236	3.1.1.  Separate Packet Number Spaces

238	   QUIC uses separate packet number spaces for each encryption level,
239	   except 0-RTT and all generations of 1-RTT keys use the same packet
240	   number space.  Separate packet number spaces ensures acknowledgement
241	   of packets sent with one level of encryption will not cause spurious
242	   retransmission of packets sent with a different encryption level.
243	   Congestion control and round-trip time (RTT) measurement are unified
244	   across packet number spaces.

246	3.1.2.  Monotonically Increasing Packet Numbers

248	   TCP conflates transmission order at the sender with delivery order at
249	   the receiver, which results in retransmissions of the same data
250	   carrying the same sequence number, and consequently leads to
251	   "retransmission ambiguity".  QUIC separates the two.  QUIC uses a
252	   packet number to indicate transmission order.  Application data is
253	   sent in one or more streams and delivery order is determined by
254	   stream offsets encoded within STREAM frames.

256	   QUIC's packet number is strictly increasing within a packet number
257	   space, and directly encodes transmission order.  A higher packet
258	   number signifies that the packet was sent later, and a lower packet
259	   number signifies that the packet was sent earlier.  When a packet
260	   containing ack-eliciting frames is detected lost, QUIC rebundles
261	   necessary frames in a new packet with a new packet number, removing
262	   ambiguity about which packet is acknowledged when an ACK is received.
263	   Consequently, more accurate RTT measurements can be made, spurious
264	   retransmissions are trivially detected, and mechanisms such as Fast
265	   Retransmit can be applied universally, based only on packet number.

267	   This design point significantly simplifies loss detection mechanisms
268	   for QUIC.  Most TCP mechanisms implicitly attempt to infer
269	   transmission ordering based on TCP sequence numbers - a non-trivial
270	   task, especially when TCP timestamps are not available.

272	3.1.3.  Clearer Loss Epoch

274	   QUIC starts a loss epoch when a packet is lost and ends one when any
275	   packet sent after the epoch starts is acknowledged.  TCP waits for
276	   the gap in the sequence number space to be filled, and so if a
277	   segment is lost multiple times in a row, the loss epoch may not end
278	   for several round trips.  Because both should reduce their congestion
279	   windows only once per epoch, QUIC will do it once for every round
280	   trip that experiences loss, while TCP may only do it once across
281	   multiple round trips.

283	3.1.4.  No Reneging

285	   QUIC ACKs contain information that is similar to TCP SACK, but QUIC
286	   does not allow any acked packet to be reneged, greatly simplifying
287	   implementations on both sides and reducing memory pressure on the
288	   sender.

290	3.1.5.  More ACK Ranges

292	   QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges.  In
293	   high loss environments, this speeds recovery, reduces spurious
294	   retransmits, and ensures forward progress without relying on
295	   timeouts.

297	3.1.6.  Explicit Correction For Delayed Acknowledgements

299	   QUIC endpoints measure the delay incurred between when a packet is
300	   received and when the corresponding acknowledgment is sent, allowing
301	   a peer to maintain a more accurate round-trip time estimate; see
302	   Section 13.2 of [QUIC-TRANSPORT].

304	3.1.7.  Probe Timeout Replaces RTO and TLP

306	   QUIC uses a probe timeout (see Section 5.2), with a timer based on
307	   TCP's RTO computation.  QUIC's PTO includes the peer's maximum
308	   expected acknowledgement delay instead of using a fixed minimum
309	   timeout.  QUIC does not collapse the congestion window until
310	   persistent congestion (Section 6.8) is declared, unlike TCP, which
311	   collapses the congestion window upon expiry of an RTO.  Instead of
312	   collapsing the congestion window and declaring everything in-flight
313	   lost, QUIC allows probe packets to temporarily exceed the congestion
314	   window whenever the timer expires.

316	   In doing this, QUIC avoids unnecessary congestion window reductions,
317	   obviating the need for correcting mechanisms such as F-RTO [RFC5682].
318	   Since QUIC does not collapse the congestion window on a PTO
319	   expiration, a QUIC sender is not limited from sending more in-flight
320	   packets after a PTO expiration if it still has available congestion
321	   window.  This occurs when a sender is application-limited and the PTO
322	   timer expires.  This is more aggressive than TCP's RTO mechanism when
323	   application-limited, but identical when not application-limited.

325	   A single packet loss at the tail does not indicate persistent
326	   congestion, so QUIC specifies a time-based definition to ensure one
327	   or more packets are sent prior to a dramatic decrease in congestion
328	   window; see Section 6.8.

330	3.1.8.  The Minimum Congestion Window is Two Packets

332	   TCP uses a minimum congestion window of one packet.  However, loss of
333	   that single packet means that the sender needs to waiting for a PTO
334	   (Section 5.2) to recover, which can be much longer than a round-trip
335	   time.  Sending a single ack-eliciting packet also increases the
336	   chances of incurring additional latency when a receiver delays its
337	   acknowledgement.

339	   QUIC therefore recommends that the minimum congestion window be two
340	   packets.  While this increases network load, it is considered safe,
341	   since the sender will still reduce its sending rate exponentially
342	   under persistent congestion (Section 5.2).

344	4.  Estimating the Round-Trip Time

346	   At a high level, an endpoint measures the time from when a packet was
347	   sent to when it is acknowledged as a round-trip time (RTT) sample.
348	   The endpoint uses RTT samples and peer-reported host delays (see
349	   Section 13.2 of [QUIC-TRANSPORT]) to generate a statistical
350	   description of the network path's RTT.  An endpoint computes the
351	   following three values for each path: the minimum value observed over
352	   the lifetime of the path (min_rtt), an exponentially-weighted moving
353	   average (smoothed_rtt), and the mean deviation (referred to as
354	   "variation" in the rest of this document) in the observed RTT samples
355	   (rttvar).

357	4.1.  Generating RTT samples

359	   An endpoint generates an RTT sample on receiving an ACK frame that
360	   meets the following two conditions:

362	   *  the largest acknowledged packet number is newly acknowledged, and

364	   *  at least one of the newly acknowledged packets was ack-eliciting.

366	   The RTT sample, latest_rtt, is generated as the time elapsed since
367	   the largest acknowledged packet was sent:

369	   latest_rtt = ack_time - send_time_of_largest_acked

371	   An RTT sample is generated using only the largest acknowledged packet
372	   in the received ACK frame.  This is because a peer reports ACK delays
373	   for only the largest acknowledged packet in an ACK frame.  While the
374	   reported ACK delay is not used by the RTT sample measurement, it is
375	   used to adjust the RTT sample in subsequent computations of
376	   smoothed_rtt and rttvar Section 4.3.

378	   To avoid generating multiple RTT samples for a single packet, an ACK
379	   frame SHOULD NOT be used to update RTT estimates if it does not newly
380	   acknowledge the largest acknowledged packet.

382	   An RTT sample MUST NOT be generated on receiving an ACK frame that
383	   does not newly acknowledge at least one ack-eliciting packet.  A peer
384	   usually does not send an ACK frame when only non-ack-eliciting
385	   packets are received.  Therefore an ACK frame that contains
386	   acknowledgements for only non-ack-eliciting packets could include an
387	   arbitrarily large Ack Delay value.  Ignoring such ACK frames avoids
388	   complications in subsequent smoothed_rtt and rttvar computations.

390	   A sender might generate multiple RTT samples per RTT when multiple
391	   ACK frames are received within an RTT.  As suggested in [RFC6298],
392	   doing so might result in inadequate history in smoothed_rtt and
393	   rttvar.  Ensuring that RTT estimates retain sufficient history is an
394	   open research question.

396	4.2.  Estimating min_rtt

398	   min_rtt is the minimum RTT observed for a given network path.
399	   min_rtt is set to the latest_rtt on the first RTT sample, and to the
400	   lesser of min_rtt and latest_rtt on subsequent samples.  In this
401	   document, min_rtt is used by loss detection to reject implausibly
402	   small rtt samples.

404	   An endpoint uses only locally observed times in computing the min_rtt
405	   and does not adjust for ACK delays reported by the peer.  Doing so
406	   allows the endpoint to set a lower bound for the smoothed_rtt based
407	   entirely on what it observes (see Section 4.3), and limits potential
408	   underestimation due to erroneously-reported delays by the peer.

410	   The RTT for a network path may change over time.  If a path's actual
411	   RTT decreases, the min_rtt will adapt immediately on the first low
412	   sample.  If the path's actual RTT increases, the min_rtt will not
413	   adapt to it, allowing future RTT samples that are smaller than the
414	   new RTT be included in smoothed_rtt.

416	4.3.  Estimating smoothed_rtt and rttvar

418	   smoothed_rtt is an exponentially-weighted moving average of an
419	   endpoint's RTT samples, and rttvar is the variation in the RTT
420	   samples, estimated using a mean variation.

422	   The calculation of smoothed_rtt uses path latency after adjusting RTT
423	   samples for acknowledgement delays.  These delays are computed using
424	   the ACK Delay field of the ACK frame as described in Section 19.3 of
425	   [QUIC-TRANSPORT].  For packets sent in the ApplicationData packet
426	   number space, a peer limits any delay in sending an acknowledgement
427	   for an ack-eliciting packet to no greater than the value it
428	   advertised in the max_ack_delay transport parameter.  Consequently,
429	   when a peer reports an Ack Delay that is greater than its
430	   max_ack_delay, the delay is attributed to reasons out of the peer's
431	   control, such as scheduler latency at the peer or loss of previous
432	   ACK frames.  Any delays beyond the peer's max_ack_delay are therefore
433	   considered effectively part of path delay and incorporated into the
434	   smoothed_rtt estimate.

436	   When adjusting an RTT sample using peer-reported acknowledgement
437	   delays, an endpoint:

439	   *  MUST ignore the Ack Delay field of the ACK frame for packets sent
440	      in the Initial and Handshake packet number space.

442	   *  MUST use the lesser of the value reported in Ack Delay field of
443	      the ACK frame and the peer's max_ack_delay transport parameter.

445	   *  MUST NOT apply the adjustment if the resulting RTT sample is
446	      smaller than the min_rtt.  This limits the underestimation that a
447	      misreporting peer can cause to the smoothed_rtt.

449	   smoothed_rtt and rttvar are computed as follows, similar to
450	   [RFC6298].

452	   When there are no samples for a network path, and on the first RTT
453	   sample for the network path:

455	   smoothed_rtt = rtt_sample
456	   rttvar = rtt_sample / 2

458	   Before any RTT samples are available, the initial RTT is used as
459	   rtt_sample.  On the first RTT sample for the network path, that
460	   sample is used as rtt_sample.  This ensures that the first
461	   measurement erases the history of any persisted or default values.

463	   On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows:

465	   ack_delay = min(Ack Delay in ACK Frame, max_ack_delay)
466	   adjusted_rtt = latest_rtt
467	   if (min_rtt + ack_delay < latest_rtt):
468	     adjusted_rtt = latest_rtt - ack_delay
469	   smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt
470	   rttvar_sample = abs(smoothed_rtt - adjusted_rtt)
471	   rttvar = 3/4 * rttvar + 1/4 * rttvar_sample

473	5.  Loss Detection

475	   QUIC senders use acknowledgements to detect lost packets, and a probe
476	   time out (see Section 5.2) to ensure acknowledgements are received.
477	   This section provides a description of these algorithms.

479	   If a packet is lost, the QUIC transport needs to recover from that
480	   loss, such as by retransmitting the data, sending an updated frame,
481	   or abandoning the frame.  For more information, see Section 13.3 of
482	   [QUIC-TRANSPORT].

484	5.1.  Acknowledgement-based Detection

486	   Acknowledgement-based loss detection implements the spirit of TCP's
487	   Fast Retransmit [RFC5681], Early Retransmit [RFC5827], FACK [FACK],
488	   SACK loss recovery [RFC6675], and RACK [RACK].  This section provides
489	   an overview of how these algorithms are implemented in QUIC.

491	   A packet is declared lost if it meets all the following conditions:

493	   *  The packet is unacknowledged, in-flight, and was sent prior to an
494	      acknowledged packet.

496	   *  Either its packet number is kPacketThreshold smaller than an
497	      acknowledged packet (Section 5.1.1), or it was sent long enough in
498	      the past (Section 5.1.2).

500	   The acknowledgement indicates that a packet sent later was delivered,
501	   and the packet and time thresholds provide some tolerance for packet
502	   reordering.

504	   Spuriously declaring packets as lost leads to unnecessary
505	   retransmissions and may result in degraded performance due to the
506	   actions of the congestion controller upon detecting loss.
507	   Implementations can detect spurious retransmissions and increase the
508	   reordering threshold in packets or time to reduce future spurious
509	   retransmissions and loss events.  Implementations with adaptive time
510	   thresholds MAY choose to start with smaller initial reordering
511	   thresholds to minimize recovery latency.

513	5.1.1.  Packet Threshold

515	   The RECOMMENDED initial value for the packet reordering threshold
516	   (kPacketThreshold) is 3, based on best practices for TCP loss
517	   detection [RFC5681] [RFC6675].  Implementations SHOULD NOT use a
518	   packet threshold less than 3, to keep in line with TCP [RFC5681].

520	   Some networks may exhibit higher degrees of reordering, causing a
521	   sender to detect spurious losses.  Algorithms that increase the
522	   reordering threshold after spuriously detecting losses, such as TCP-
523	   NCR [RFC4653], have proven to be useful in TCP and are expected to at
524	   least as useful in QUIC.  Re-ordering could be more common with QUIC
525	   than TCP, because network elements cannot observe and fix the order
526	   of out-of-order packets.

528	5.1.2.  Time Threshold

530	   Once a later packet within the same packet number space has been
531	   acknowledged, an endpoint SHOULD declare an earlier packet lost if it
532	   was sent a threshold amount of time in the past.  To avoid declaring
533	   packets as lost too early, this time threshold MUST be set to at
534	   least the local timer granularity, as indicated by the kGranularity
535	   constant.  The time threshold is:

537	   max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity)

539	   If packets sent prior to the largest acknowledged packet cannot yet
540	   be declared lost, then a timer SHOULD be set for the remaining time.

542	   Using max(smoothed_rtt, latest_rtt) protects from the two following
543	   cases:

545	   *  the latest RTT sample is lower than the smoothed RTT, perhaps due
546	      to reordering where the acknowledgement encountered a shorter
547	      path;

549	   *  the latest RTT sample is higher than the smoothed RTT, perhaps due
550	      to a sustained increase in the actual RTT, but the smoothed RTT
551	      has not yet caught up.

553	   The RECOMMENDED time threshold (kTimeThreshold), expressed as a
554	   round-trip time multiplier, is 9/8.  The RECOMMENDED value of the
555	   timer granularity (kGranularity) is 1ms.

557	   Implementations MAY experiment with absolute thresholds, thresholds
558	   from previous connections, adaptive thresholds, or including RTT
559	   variation.  Smaller thresholds reduce reordering resilience and
560	   increase spurious retransmissions, and larger thresholds increase
561	   loss detection delay.

563	5.2.  Probe Timeout

565	   A Probe Timeout (PTO) triggers sending one or two probe datagrams
566	   when ack-eliciting packets are not acknowledged within the expected
567	   period of time or the server may not have validated the client's
568	   address.  A PTO enables a connection to recover from loss of tail
569	   packets or acknowledgements.

571	   A PTO timer expiration event does not indicate packet loss and MUST
572	   NOT cause prior unacknowledged packets to be marked as lost.  When an
573	   acknowledgement is received that newly acknowledges packets, loss
574	   detection proceeds as dictated by packet and time threshold
575	   mechanisms; see Section 5.1.

577	   As with loss detection, the probe timeout is per packet number space.
578	   The PTO algorithm used in QUIC implements the reliability functions
579	   of Tail Loss Probe [RACK], RTO [RFC5681], and F-RTO algorithms for
580	   TCP [RFC5682].  The timeout computation is based on TCP's
581	   retransmission timeout period [RFC6298].

583	5.2.1.  Computing PTO

585	   When an ack-eliciting packet is transmitted, the sender schedules a
586	   timer for the PTO period as follows:

588	   PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay

590	   The PTO period is the amount of time that a sender ought to wait for
591	   an acknowledgement of a sent packet.  This time period includes the
592	   estimated network roundtrip-time (smoothed_rtt), the variation in the
593	   estimate (4*rttvar), and max_ack_delay, to account for the maximum
594	   time by which a receiver might delay sending an acknowledgement.
595	   When the PTO is armed for Initial or Handshake packet number spaces,
596	   the max_ack_delay is 0, as specified in 13.2.1 of [QUIC-TRANSPORT].

598	   The PTO value MUST be set to at least kGranularity, to avoid the
599	   timer expiring immediately.

601	   A sender recomputes and may need to reset its PTO timer every time an
602	   ack-eliciting packet is sent or acknowledged, when the handshake is
603	   confirmed, or when Initial or Handshake keys are discarded.  This
604	   ensures the PTO is always set based on the latest RTT information and
605	   for the last sent packet in the correct packet number space.

607	   When ack-eliciting packets in multiple packet number spaces are in
608	   flight, the timer MUST be set for the packet number space with the
609	   earliest timeout, with one exception.  The ApplicationData packet
610	   number space (Section 4.1.1 of [QUIC-TLS]) MUST be ignored until the
611	   handshake completes.  Not arming the PTO for ApplicationData prevents
612	   a client from retransmitting a 0-RTT packet on a PTO expiration
613	   before confirming that the server is able to decrypt 0-RTT packets,
614	   and prevents a server from sending a 1-RTT packet on a PTO expiration
615	   before it has the keys to process an acknowledgement.

617	   When a PTO timer expires, the PTO backoff MUST be increased,
618	   resulting in the PTO period being set to twice its current value.
619	   The PTO backoff factor is reset when an acknowledgement is received,
620	   except in the following case.  A server might take longer to respond
621	   to packets during the handshake than otherwise.  To protect such a
622	   server from repeated client probes, the PTO backoff is not reset at a
623	   client that is not yet certain that the server has finished
624	   validating the client's address.  That is, a client does not reset
625	   the PTO backoff factor on receiving acknowledgements until it
626	   receives a HANDSHAKE_DONE frame or an acknowledgement for one of its
627	   Handshake or 1-RTT packets.

629	   This exponential reduction in the sender's rate is important because
630	   consecutive PTOs might be caused by loss of packets or
631	   acknowledgements due to severe congestion.  Even when there are ack-
632	   eliciting packets in-flight in multiple packet number spaces, the
633	   exponential increase in probe timeout occurs across all spaces to
634	   prevent excess load on the network.  For example, a timeout in the
635	   Initial packet number space doubles the length of the timeout in the
636	   Handshake packet number space.

638	   The life of a connection that is experiencing consecutive PTOs is
639	   limited by the endpoint's idle timeout.

641	   The probe timer MUST NOT be set if the time threshold Section 5.1.2
642	   loss detection timer is set.  The time threshold loss detection timer
643	   is expected to both expire earlier than the PTO and be less likely to
644	   spuriously retransmit data.

646	5.2.2.  Handshakes and New Paths

648	   Resumed connections over the same network MAY use the previous
649	   connection's final smoothed RTT value as the resumed connection's
650	   initial RTT.  When no previous RTT is available, the initial RTT
651	   SHOULD be set to 333ms, resulting in a 1 second initial timeout, as
652	   recommended in [RFC6298].

654	   A connection MAY use the delay between sending a PATH_CHALLENGE and
655	   receiving a PATH_RESPONSE to set the initial RTT (see kInitialRtt in
656	   Appendix A.2) for a new path, but the delay SHOULD NOT be considered
657	   an RTT sample.

659	   Prior to handshake completion, when few to none RTT samples have been
660	   generated, it is possible that the probe timer expiration is due to
661	   an incorrect RTT estimate at the client.  To allow the client to
662	   improve its RTT estimate, the new packet that it sends MUST be ack-
663	   eliciting.

665	   Initial packets and Handshake packets could be never acknowledged,
666	   but they are removed from bytes in flight when the Initial and
667	   Handshake keys are discarded, as described below in
668	   Section Section 5.4.  When Initial or Handshake keys are discarded,
669	   the PTO and loss detection timers MUST be reset, because discarding
670	   keys indicates forward progress and the loss detection timer might
671	   have been set for a now discarded packet number space.

673	5.2.2.1.  Before Address Validation

675	   Until the server has validated the client's address on the path, the
676	   amount of data it can send is limited to three times the amount of
677	   data received, as specified in Section 8.1 of [QUIC-TRANSPORT].  If
678	   no additional data can be sent, the server's PTO timer MUST NOT be
679	   armed until datagrams have been received from the client, because
680	   packets sent on PTO count against the anti-amplification limit.  Note
681	   that the server could fail to validate the client's address even if
682	   0-RTT is accepted.

684	   Since the server could be blocked until more packets are received
685	   from the client, it is the client's responsibility to send packets to
686	   unblock the server until it is certain that the server has finished
687	   its address validation (see Section 8 of [QUIC-TRANSPORT]).  That is,
688	   the client MUST set the probe timer if the client has not received an
689	   acknowledgement for one of its Handshake or 1-RTT packets, and has
690	   not received a HANDSHAKE_DONE frame.  If Handshake keys are available
691	   to the client, it MUST send a Handshake packet, and otherwise it MUST
692	   send an Initial packet in a UDP datagram of at least 1200 bytes.

694	   A client could have received and acknowledged a Handshake packet,
695	   causing it to discard state for the Initial packet number space, but
696	   not sent any ack-eliciting Handshake packets.  In this case, the PTO
697	   is set from the current time.

699	5.2.3.  Speeding Up Handshake Completion

701	   When a server receives an Initial packet containing duplicate CRYPTO
702	   data, it can assume the client did not receive all of the server's
703	   CRYPTO data sent in Initial packets, or the client's estimated RTT is
704	   too small.  When a client receives Handshake or 1-RTT packets prior
705	   to obtaining Handshake keys, it may assume some or all of the
706	   server's Initial packets were lost.

708	   To speed up handshake completion under these conditions, an endpoint
709	   MAY send a packet containing unacknowledged CRYPTO data earlier than
710	   the PTO expiry, subject to address validation limits; see Section 8.1
711	   of [QUIC-TRANSPORT].

713	   Peers can also use coalesced packets to ensure that each datagram
714	   elicits at least one acknowledgement.  For example, clients can
715	   coalesce an Initial packet containing PING and PADDING frames with a
716	   0-RTT data packet and a server can coalesce an Initial packet
717	   containing a PING frame with one or more packets in its first flight.

719	5.2.4.  Sending Probe Packets

721	   When a PTO timer expires, a sender MUST send at least one ack-
722	   eliciting packet in the packet number space as a probe, unless there
723	   is no data available to send.  An endpoint MAY send up to two full-
724	   sized datagrams containing ack-eliciting packets, to avoid an
725	   expensive consecutive PTO expiration due to a single lost datagram or
726	   transmit data from multiple packet number spaces.  All probe packets
727	   sent on a PTO MUST be ack-eliciting.

729	   In addition to sending data in the packet number space for which the
730	   timer expired, the sender SHOULD send ack-eliciting packets from
731	   other packet number spaces with in-flight data, coalescing packets if
732	   possible.  This is particularly valuable when the server has both
733	   Initial and Handshake data in-flight or the client has both Handshake
734	   and ApplicationData in-flight, because the peer might only have
735	   receive keys for one of the two packet number spaces.

737	   If the sender wants to elicit a faster acknowledgement on PTO, it can
738	   skip a packet number to eliminate the ack delay.

740	   When the PTO timer expires, and there is new or previously sent
741	   unacknowledged data, it MUST be sent.  A probe packet SHOULD carry
742	   new data when possible.  A probe packet MAY carry retransmitted
743	   unacknowledged data when new data is unavailable, when flow control
744	   does not permit new data to be sent, or to opportunistically reduce
745	   loss recovery delay.  Implementations MAY use alternative strategies
746	   for determining the content of probe packets, including sending new
747	   or retransmitted data based on the application's priorities.

749	   It is possible the sender has no new or previously-sent data to send.
750	   As an example, consider the following sequence of events: new
751	   application data is sent in a STREAM frame, deemed lost, then
752	   retransmitted in a new packet, and then the original transmission is
753	   acknowledged.  When there is no data to send, the sender SHOULD send
754	   a PING or other ack-eliciting frame in a single packet, re-arming the
755	   PTO timer.

757	   Alternatively, instead of sending an ack-eliciting packet, the sender
758	   MAY mark any packets still in flight as lost.  Doing so avoids
759	   sending an additional packet, but increases the risk that loss is
760	   declared too aggressively, resulting in an unnecessary rate reduction
761	   by the congestion controller.

763	   Consecutive PTO periods increase exponentially, and as a result,
764	   connection recovery latency increases exponentially as packets
765	   continue to be dropped in the network.  Sending two packets on PTO
766	   expiration increases resilience to packet drops, thus reducing the
767	   probability of consecutive PTO events.

769	   When the PTO timer expires multiple times and new data cannot be
770	   sent, implementations must choose between sending the same payload
771	   every time or sending different payloads.  Sending the same payload
772	   may be simpler and ensures the highest priority frames arrive first.
773	   Sending different payloads each time reduces the chances of spurious
774	   retransmission.

776	5.3.  Handling Retry Packets

778	   A Retry packet causes a client to send another Initial packet,
779	   effectively restarting the connection process.  A Retry packet
780	   indicates that the Initial was received, but not processed.  A Retry
781	   packet cannot be treated as an acknowledgment, because it does not
782	   indicate that a packet was processed or specify the packet number.

784	   Clients that receive a Retry packet reset congestion control and loss
785	   recovery state, including resetting any pending timers.  Other
786	   connection state, in particular cryptographic handshake messages, is
787	   retained; see Section 17.2.5 of [QUIC-TRANSPORT].

789	   The client MAY compute an RTT estimate to the server as the time
790	   period from when the first Initial was sent to when a Retry or a
791	   Version Negotiation packet is received.  The client MAY use this
792	   value in place of its default for the initial RTT estimate.

794	5.4.  Discarding Keys and Packet State

796	   When packet protection keys are discarded (see Section 4.10 of
797	   [QUIC-TLS]), all packets that were sent with those keys can no longer
798	   be acknowledged because their acknowledgements cannot be processed
799	   anymore.  The sender MUST discard all recovery state associated with
800	   those packets and MUST remove them from the count of bytes in flight.

802	   Endpoints stop sending and receiving Initial packets once they start
803	   exchanging Handshake packets; see Section 17.2.2.1 of
804	   [QUIC-TRANSPORT].  At this point, recovery state for all in-flight
805	   Initial packets is discarded.

807	   When 0-RTT is rejected, recovery state for all in-flight 0-RTT
808	   packets is discarded.

810	   If a server accepts 0-RTT, but does not buffer 0-RTT packets that
811	   arrive before Initial packets, early 0-RTT packets will be declared
812	   lost, but that is expected to be infrequent.

814	   It is expected that keys are discarded after packets encrypted with
815	   them would be acknowledged or declared lost.  Initial secrets however
816	   might be destroyed sooner, as soon as handshake keys are available;
817	   see Section 4.11.1 of [QUIC-TLS].

819	6.  Congestion Control

821	   This document specifies a congestion controller for QUIC similar to
822	   TCP NewReno [RFC6582].

824	   The signals QUIC provides for congestion control are generic and are
825	   designed to support different algorithms.  Endpoints can unilaterally
826	   choose a different algorithm to use, such as Cubic [RFC8312].

828	   If an endpoint uses a different controller than that specified in
829	   this document, the chosen controller MUST conform to the congestion
830	   control guidelines specified in Section 3.1 of [RFC8085].

832	   Similar to TCP, packets containing only ACK frames do not count
833	   towards bytes in flight and are not congestion controlled.  Unlike
834	   TCP, QUIC can detect the loss of these packets and MAY use that
835	   information to adjust the congestion controller or the rate of ACK-
836	   only packets being sent, but this document does not describe a
837	   mechanism for doing so.

839	   The algorithm in this document specifies and uses the controller's
840	   congestion window in bytes.

842	   An endpoint MUST NOT send a packet if it would cause bytes_in_flight
843	   (see Appendix B.2) to be larger than the congestion window, unless
844	   the packet is sent on a PTO timer expiration; see Section 5.2.

846	6.1.  Explicit Congestion Notification

848	   If a path has been verified to support ECN [RFC3168] [RFC8311], QUIC
849	   treats a Congestion Experienced (CE) codepoint in the IP header as a
850	   signal of congestion.  This document specifies an endpoint's response
851	   when its peer receives packets with the ECN-CE codepoint.

853	6.2.  Initial and Minimum Congestion Window

855	   QUIC begins every connection in slow start with the congestion window
856	   set to an initial value.  Endpoints SHOULD use an initial congestion
857	   window of 10 times the maximum datagram size (max_datagram_size),
858	   limited to the larger of 14720 or twice the maximum datagram size.
859	   This follows the analysis and recommendations in [RFC6928],
860	   increasing the byte limit to account for the smaller 8 byte overhead
861	   of UDP compared to the 20 byte overhead for TCP.

863	   Prior to validating the client's address, the server can be further
864	   limited by the anti-amplification limit as specified in Section 8.1
865	   of [QUIC-TRANSPORT].  Though the anti-amplification limit can prevent
866	   the congestion window from being fully utilized and therefore slow
867	   down the increase in congestion window, it does not directly affect
868	   the congestion window.

870	   The minimum congestion window is the smallest value the congestion
871	   window can decrease to as a response to loss, ECN-CE, or persistent
872	   congestion.  The RECOMMENDED value is 2 * max_datagram_size.

874	6.3.  Slow Start

876	   While in slow start, QUIC increases the congestion window by the
877	   number of bytes acknowledged when each acknowledgment is processed,
878	   resulting in exponential growth of the congestion window.

880	   QUIC exits slow start upon loss or upon increase in the ECN-CE
881	   counter.  When slow start is exited, the congestion window halves and
882	   the slow start threshold is set to the new congestion window.  QUIC
883	   re-enters slow start any time the congestion window is less than the
884	   slow start threshold, which only occurs after persistent congestion
885	   is declared.

887	6.4.  Congestion Avoidance

889	   Slow start exits to congestion avoidance.  Congestion avoidance uses
890	   an Additive Increase Multiplicative Decrease (AIMD) approach that
891	   increases the congestion window by one maximum packet size per
892	   congestion window acknowledged.  When a loss or ECN-CE marking is
893	   detected, NewReno halves the congestion window, sets the slow start
894	   threshold to the new congestion window, and then enters the recovery
895	   period.

897	6.5.  Recovery Period

899	   A recovery period is entered when loss or ECN-CE marking of a packet
900	   is detected in congestion avoidance after the congestion window and
901	   slow start threshold have been decreased.  A recovery period ends
902	   when a packet sent during the recovery period is acknowledged.  This
903	   is slightly different from TCP's definition of recovery, which ends
904	   when the lost packet that started recovery is acknowledged.

906	   The recovery period aims to limit congestion window reduction to once
907	   per round trip.  Therefore during recovery, the congestion window
908	   remains unchanged irrespective of new losses or increases in the ECN-
909	   CE counter.

911	   When entering recovery, a single packet MAY be sent even if bytes in
912	   flight now exceeds the recently reduced congestion window.  This
913	   speeds up loss recovery if the data in the lost packet is
914	   retransmitted and is similar to TCP as described in Section 5 of
915	   [RFC6675].  If further packets are lost while the sender is in
916	   recovery, sending any packets in response MUST obey the congestion
917	   window limit.

919	6.6.  Ignoring Loss of Undecryptable Packets

921	   During the handshake, some packet protection keys might not be
922	   available when a packet arrives and the receiver can choose to drop
923	   the packet.  In particular, Handshake and 0-RTT packets cannot be
924	   processed until the Initial packets arrive and 1-RTT packets cannot
925	   be processed until the handshake completes.  Endpoints MAY ignore the
926	   loss of Handshake, 0-RTT, and 1-RTT packets that might have arrived
927	   before the peer had packet protection keys to process those packets.
928	   Endpoints MUST NOT ignore the loss of packets that were sent after
929	   the earliest acknowledged packet in a given packet number space.

931	6.7.  Probe Timeout

933	   Probe packets MUST NOT be blocked by the congestion controller.  A
934	   sender MUST however count these packets as being additionally in
935	   flight, since these packets add network load without establishing
936	   packet loss.  Note that sending probe packets might cause the
937	   sender's bytes in flight to exceed the congestion window until an
938	   acknowledgement is received that establishes loss or delivery of
939	   packets.

941	6.8.  Persistent Congestion

943	   When an ACK frame is received that establishes loss of all in-flight
944	   packets sent over a long enough period of time, the network is
945	   considered to be experiencing persistent congestion.  Commonly, this
946	   can be established by consecutive PTOs, but since the PTO timer is
947	   reset when a new ack-eliciting packet is sent, an explicit duration
948	   must be used to account for those cases where PTOs do not occur or
949	   are substantially delayed.  The rationale for this threshold is to
950	   enable a sender to use initial PTOs for aggressive probing, as TCP
951	   does with Tail Loss Probe (TLP) [RACK], before establishing
952	   persistent congestion, as TCP does with a Retransmission Timeout
953	   (RTO) [RFC5681].  The RECOMMENDED value for
954	   kPersistentCongestionThreshold is 3, which is approximately
955	   equivalent to two TLPs before an RTO in TCP.

957	   This duration is computed as follows:

959	   (smoothed_rtt + 4 * rttvar + max_ack_delay) *
960	       kPersistentCongestionThreshold

962	   For example, assume:

964	   smoothed_rtt = 1
965	   rttvar = 0
966	   max_ack_delay = 0
967	   kPersistentCongestionThreshold = 3

969	   If an ack-eliciting packet is sent at time t = 0, the following
970	   scenario would illustrate persistent congestion:

972	                     +------+------------------------+
973	                     | Time | Action                 |
974	                     +======+========================+
975	                     | t=0  | Send Pkt #1 (App Data) |
976	                     +------+------------------------+
977	                     | t=1  | Send Pkt #2 (PTO 1)    |
978	                     +------+------------------------+
979	                     | t=3  | Send Pkt #3 (PTO 2)    |
980	                     +------+------------------------+
981	                     | t=7  | Send Pkt #4 (PTO 3)    |
982	                     +------+------------------------+
983	                     | t=8  | Recv ACK of Pkt #4     |
984	                     +------+------------------------+

986	                                  Table 1

988	   The first three packets are determined to be lost when the
989	   acknowledgement of packet 4 is received at t = 8.  The congestion
990	   period is calculated as the time between the oldest and newest lost
991	   packets: (3 - 0) = 3.  The duration for persistent congestion is
992	   equal to: (1 * kPersistentCongestionThreshold) = 3.  Because the
993	   threshold was reached and because none of the packets between the
994	   oldest and the newest packets are acknowledged, the network is
995	   considered to have experienced persistent congestion.

997	   When persistent congestion is established, the sender's congestion
998	   window MUST be reduced to the minimum congestion window
999	   (kMinimumWindow).  This response of collapsing the congestion window
1000	   on persistent congestion is functionally similar to a sender's
1001	   response on a Retransmission Timeout (RTO) in TCP [RFC5681] after
1002	   Tail Loss Probes (TLP) [RACK].

1004	6.9.  Pacing

1006	   This document does not specify a pacer, but it is RECOMMENDED that a
1007	   sender pace sending of all in-flight packets based on input from the
1008	   congestion controller.  For example, a pacer might distribute the
1009	   congestion window over the smoothed RTT when used with a window-based
1010	   controller, or a pacer might use the rate estimate of a rate-based
1011	   controller.

1013	   An implementation should take care to architect its congestion
1014	   controller to work well with a pacer.  For instance, a pacer might
1015	   wrap the congestion controller and control the availability of the
1016	   congestion window, or a pacer might pace out packets handed to it by
1017	   the congestion controller.

1019	   Timely delivery of ACK frames is important for efficient loss
1020	   recovery.  Packets containing only ACK frames SHOULD therefore not be
1021	   paced, to avoid delaying their delivery to the peer.

1023	   Endpoints can implement pacing as they choose.  A perfectly paced
1024	   sender spreads packets exactly evenly over time.  For a window-based
1025	   congestion controller, such as the one in this document, that rate
1026	   can be computed by averaging the congestion window over the round-
1027	   trip time.  Expressed as a rate in bytes:

1029	   rate = N * congestion_window / smoothed_rtt

1031	   Or, expressed as an inter-packet interval:

1033	   interval = smoothed_rtt * packet_size / congestion_window / N

1035	   Using a value for "N" that is small, but at least 1 (for example,
1036	   1.25) ensures that variations in round-trip time don't result in
1037	   under-utilization of the congestion window.  Values of 'N' larger
1038	   than 1 ultimately result in sending packets as acknowledgments are
1039	   received rather than when timers fire, provided the congestion window
1040	   is fully utilized and acknowledgments arrive at regular intervals.

1042	   Practical considerations, such as packetization, scheduling delays,
1043	   and computational efficiency, can cause a sender to deviate from this
1044	   rate over time periods that are much shorter than a round-trip time.
1045	   Sending multiple packets into the network without any delay between
1046	   them creates a packet burst that might cause short-term congestion
1047	   and losses.  Implementations MUST either use pacing or limit such
1048	   bursts to the initial congestion window; see Section 6.2.

1050	   One possible implementation strategy for pacing uses a leaky bucket
1051	   algorithm, where the capacity of the "bucket" is limited to the
1052	   maximum burst size and the rate the "bucket" fills is determined by
1053	   the above function.

1055	6.10.  Under-utilizing the Congestion Window

1057	   When bytes in flight is smaller than the congestion window and
1058	   sending is not pacing limited, the congestion window is under-
1059	   utilized.  When this occurs, the congestion window SHOULD NOT be
1060	   increased in either slow start or congestion avoidance.  This can
1061	   happen due to insufficient application data or flow control limits.

1063	   A sender MAY use the pipeACK method described in Section 4.3 of
1064	   [RFC7661] to determine if the congestion window is sufficiently
1065	   utilized.

1067	   A sender that paces packets (see Section 6.9) might delay sending
1068	   packets and not fully utilize the congestion window due to this
1069	   delay.  A sender SHOULD NOT consider itself application limited if it
1070	   would have fully utilized the congestion window without pacing delay.

1072	   A sender MAY implement alternative mechanisms to update its
1073	   congestion window after periods of under-utilization, such as those
1074	   proposed for TCP in [RFC7661].

1076	7.  Security Considerations

1078	7.1.  Congestion Signals

1080	   Congestion control fundamentally involves the consumption of signals
1081	   - both loss and ECN codepoints - from unauthenticated entities.  On-
1082	   path attackers can spoof or alter these signals.  An attacker can
1083	   cause endpoints to reduce their sending rate by dropping packets, or
1084	   alter send rate by changing ECN codepoints.

1086	7.2.  Traffic Analysis

1088	   Packets that carry only ACK frames can be heuristically identified by
1089	   observing packet size.  Acknowledgement patterns may expose
1090	   information about link characteristics or application behavior.
1091	   Endpoints can use PADDING frames or bundle acknowledgments with other
1092	   frames to reduce leaked information.

1094	7.3.  Misreporting ECN Markings

1096	   A receiver can misreport ECN markings to alter the congestion
1097	   response of a sender.  Suppressing reports of ECN-CE markings could
1098	   cause a sender to increase their send rate.  This increase could
1099	   result in congestion and loss.

1101	   A sender MAY attempt to detect suppression of reports by marking
1102	   occasional packets that they send with ECN-CE.  If a packet sent with
1103	   ECN-CE is not reported as having been CE marked when the packet is
1104	   acknowledged, then the sender SHOULD disable ECN for that path.

1106	   Reporting additional ECN-CE markings will cause a sender to reduce
1107	   their sending rate, which is similar in effect to advertising reduced
1108	   connection flow control limits and so no advantage is gained by doing
1109	   so.

1111	   Endpoints choose the congestion controller that they use.  Though
1112	   congestion controllers generally treat reports of ECN-CE markings as
1113	   equivalent to loss [RFC8311], the exact response for each controller
1114	   could be different.  Failure to correctly respond to information
1115	   about ECN markings is therefore difficult to detect.

1117	8.  IANA Considerations

1119	   This document has no IANA actions.

1121	9.  References

1123	9.1.  Normative References

1125	   [QUIC-TLS] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure
1126	              QUIC", Work in Progress, Internet-Draft, draft-ietf-quic-
1127	              tls-28, 20 May 2020,
1128	              .

1130	   [QUIC-TRANSPORT]
1131	              Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
1132	              Multiplexed and Secure Transport", Work in Progress,
1133	              Internet-Draft, draft-ietf-quic-transport-28, 20 May 2020,
1134	              .

1137	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1138	              Requirement Levels", BCP 14, RFC 2119,
1139	              DOI 10.17487/RFC2119, March 1997,
1140	              .

1142	   [RFC8085]  Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
1143	              Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
1144	              March 2017, .

1146	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
1147	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
1148	              May 2017, .

1150	9.2.  Informative References

1152	   [FACK]     Mathis, M. and J. Mahdavi, "Forward Acknowledgement:
1153	              Refining TCP Congestion Control", ACM SIGCOMM , August
1154	              1996.

1156	   [RACK]     Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK:
1157	              a time-based fast loss detection algorithm for TCP", Work
1158	              in Progress, Internet-Draft, draft-ietf-tcpm-rack-08, 9
1159	              March 2020, .

1162	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
1163	              of Explicit Congestion Notification (ECN) to IP",
1164	              RFC 3168, DOI 10.17487/RFC3168, September 2001,
1165	              .

1167	   [RFC4653]  Bhandarkar, S., Reddy, A. L. N., Allman, M., and E.
1168	              Blanton, "Improving the Robustness of TCP to Non-
1169	              Congestion Events", RFC 4653, DOI 10.17487/RFC4653, August
1170	              2006, .

1172	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
1173	              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
1174	              .

1176	   [RFC5682]  Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
1177	              "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
1178	              Spurious Retransmission Timeouts with TCP", RFC 5682,
1179	              DOI 10.17487/RFC5682, September 2009,
1180	              .

1182	   [RFC5827]  Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and
1183	              P. Hurtig, "Early Retransmit for TCP and Stream Control
1184	              Transmission Protocol (SCTP)", RFC 5827,
1185	              DOI 10.17487/RFC5827, May 2010,
1186	              .

1188	   [RFC6298]  Paxson, V., Allman, M., Chu, J., and M. Sargent,
1189	              "Computing TCP's Retransmission Timer", RFC 6298,
1190	              DOI 10.17487/RFC6298, June 2011,
1191	              .

1193	   [RFC6582]  Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The
1194	              NewReno Modification to TCP's Fast Recovery Algorithm",
1195	              RFC 6582, DOI 10.17487/RFC6582, April 2012,
1196	              .

1198	   [RFC6675]  Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
1199	              and Y. Nishida, "A Conservative Loss Recovery Algorithm
1200	              Based on Selective Acknowledgment (SACK) for TCP",
1201	              RFC 6675, DOI 10.17487/RFC6675, August 2012,
1202	              .

1204	   [RFC6928]  Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis,
1205	              "Increasing TCP's Initial Window", RFC 6928,
1206	              DOI 10.17487/RFC6928, April 2013,
1207	              .

1209	   [RFC7661]  Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating
1210	              TCP to Support Rate-Limited Traffic", RFC 7661,
1211	              DOI 10.17487/RFC7661, October 2015,
1212	              .

1214	   [RFC8311]  Black, D., "Relaxing Restrictions on Explicit Congestion
1215	              Notification (ECN) Experimentation", RFC 8311,
1216	              DOI 10.17487/RFC8311, January 2018,
1217	              .

1219	   [RFC8312]  Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
1220	              R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
1221	              RFC 8312, DOI 10.17487/RFC8312, February 2018,
1222	              .

1224	Appendix A.  Loss Recovery Pseudocode

1226	   We now describe an example implementation of the loss detection
1227	   mechanisms described in Section 5.

1229	A.1.  Tracking Sent Packets

1231	   To correctly implement congestion control, a QUIC sender tracks every
1232	   ack-eliciting packet until the packet is acknowledged or lost.  It is
1233	   expected that implementations will be able to access this information
1234	   by packet number and crypto context and store the per-packet fields
1235	   (Appendix A.1.1) for loss recovery and congestion control.

1237	   After a packet is declared lost, the endpoint can track it for an
1238	   amount of time comparable to the maximum expected packet reordering,
1239	   such as 1 RTT.  This allows for detection of spurious
1240	   retransmissions.

1242	   Sent packets are tracked for each packet number space, and ACK
1243	   processing only applies to a single space.

1245	A.1.1.  Sent Packet Fields

1247	   packet_number:  The packet number of the sent packet.

1249	   ack_eliciting:  A boolean that indicates whether a packet is ack-
1250	      eliciting.  If true, it is expected that an acknowledgement will
1251	      be received, though the peer could delay sending the ACK frame
1252	      containing it by up to the MaxAckDelay.

1254	   in_flight:  A boolean that indicates whether the packet counts
1255	      towards bytes in flight.

1257	   sent_bytes:  The number of bytes sent in the packet, not including
1258	      UDP or IP overhead, but including QUIC framing overhead.

1260	   time_sent:  The time the packet was sent.

1262	A.2.  Constants of Interest

1264	   Constants used in loss recovery are based on a combination of RFCs,
1265	   papers, and common practice.

1267	   kPacketThreshold:  Maximum reordering in packets before packet
1268	      threshold loss detection considers a packet lost.  The value
1269	      recommended in Section 5.1.1 is 3.

1271	   kTimeThreshold:  Maximum reordering in time before time threshold
1272	      loss detection considers a packet lost.  Specified as an RTT
1273	      multiplier.  The value recommended in Section 5.1.2 is 9/8.

1275	   kGranularity:  Timer granularity.  This is a system-dependent value,
1276	      and Section 5.1.2 recommends a value of 1ms.

1278	   kInitialRtt:  The RTT used before an RTT sample is taken.  The value
1279	      recommended in Section 5.2.2 is 500ms.

1281	   kPacketNumberSpace:  An enum to enumerate the three packet number
1282	      spaces.

1284	     enum kPacketNumberSpace {
1285	       Initial,
1286	       Handshake,
1287	       ApplicationData,
1288	     }

1290	A.3.  Variables of interest

1292	   Variables required to implement the congestion control mechanisms are
1293	   described in this section.

1295	   latest_rtt:  The most recent RTT measurement made when receiving an
1296	      ack for a previously unacked packet.

1298	   smoothed_rtt:  The smoothed RTT of the connection, computed as
1299	      described in Section 4.3.

1301	   rttvar:  The RTT variation, computed as described in Section 4.3.

1303	   min_rtt:  The minimum RTT seen in the connection, ignoring ack delay,
1304	      as described in Section 4.2.

1306	   max_ack_delay:  The maximum amount of time by which the receiver
1307	      intends to delay acknowledgments for packets in the
1308	      ApplicationData packet number space.  The actual ack_delay in a
1309	      received ACK frame may be larger due to late timers, reordering,
1310	      or lost ACK frames.

1312	   loss_detection_timer:  Multi-modal timer used for loss detection.

1314	   pto_count:  The number of times a PTO has been sent without receiving
1315	      an ack.

1317	   time_of_last_sent_ack_eliciting_packet[kPacketNumberSpace]:  The time
1318	      the most recent ack-eliciting packet was sent.

1320	   largest_acked_packet[kPacketNumberSpace]:  The largest packet number
1321	      acknowledged in the packet number space so far.

1323	   loss_time[kPacketNumberSpace]:  The time at which the next packet in
1324	      that packet number space will be considered lost based on
1325	      exceeding the reordering window in time.

1327	   sent_packets[kPacketNumberSpace]:  An association of packet numbers
1328	      in a packet number space to information about them.  Described in
1329	      detail above in Appendix A.1.

1331	A.4.  Initialization

1333	   At the beginning of the connection, initialize the loss detection
1334	   variables as follows:

1336	      loss_detection_timer.reset()
1337	      pto_count = 0
1338	      latest_rtt = 0
1339	      smoothed_rtt = initial_rtt
1340	      rttvar = initial_rtt / 2
1341	      min_rtt = 0
1342	      max_ack_delay = 0
1343	      for pn_space in [ Initial, Handshake, ApplicationData ]:
1344	        largest_acked_packet[pn_space] = infinite
1345	        time_of_last_sent_ack_eliciting_packet[pn_space] = 0
1346	        loss_time[pn_space] = 0

1348	A.5.  On Sending a Packet

1350	   After a packet is sent, information about the packet is stored.  The
1351	   parameters to OnPacketSent are described in detail above in
1352	   Appendix A.1.1.

1354	   Pseudocode for OnPacketSent follows:

1356	    OnPacketSent(packet_number, pn_space, ack_eliciting,
1357	                 in_flight, sent_bytes):
1358	      sent_packets[pn_space][packet_number].packet_number =
1359	                                               packet_number
1360	      sent_packets[pn_space][packet_number].time_sent = now()
1361	      sent_packets[pn_space][packet_number].ack_eliciting =
1362	                                               ack_eliciting
1363	      sent_packets[pn_space][packet_number].in_flight = in_flight
1364	      if (in_flight):
1365	        if (ack_eliciting):
1366	          time_of_last_sent_ack_eliciting_packet[pn_space] = now()
1367	        OnPacketSentCC(sent_bytes)
1368	        sent_packets[pn_space][packet_number].size = sent_bytes
1369	        SetLossDetectionTimer()

1371	A.6.  On Receiving a Datagram

1373	   When a server is blocked by anti-amplification limits, receiving a
1374	   datagram unblocks it, even if none of the packets in the datagram are
1375	   successfully processed.  In such a case, the PTO timer will need to
1376	   be re-armed.

1378	   Pseudocode for OnDatagramReceived follows:

1380	   OnDatagramReceived(datagram):
1381	     // If this datagram unblocks the server, arm the
1382	     // PTO timer to avoid deadlock.
1383	     if (server was at anti-amplification limit):
1384	       SetLossDetectionTimer()

1386	A.7.  On Receiving an Acknowledgment

1388	   When an ACK frame is received, it may newly acknowledge any number of
1389	   packets.

1391	   Pseudocode for OnAckReceived and UpdateRtt follow:

1393	   OnAckReceived(ack, pn_space):
1394	     if (largest_acked_packet[pn_space] == infinite):
1395	       largest_acked_packet[pn_space] = ack.largest_acked
1396	     else:
1397	       largest_acked_packet[pn_space] =
1398	           max(largest_acked_packet[pn_space], ack.largest_acked)

1400	     // DetectNewlyAckedPackets finds packets that are newly
1401	     // acknowledged and removes them from sent_packets.
1402	     newly_acked_packets =
1403	         DetectAndRemoveAckedPackets(ack, pn_space)
1404	     // Nothing to do if there are no newly acked packets.
1405	     if (newly_acked_packets.empty()):
1406	       return

1408	     // If the largest acknowledged is newly acked and
1409	     // at least one ack-eliciting was newly acked, update the RTT.
1410	     if (newly_acked_packets.largest().packet_number ==
1411	             ack.largest_acked &&
1412	         IncludesAckEliciting(newly_acked_packets)):
1413	       latest_rtt =
1414	         now - sent_packets[pn_space][ack.largest_acked].time_sent
1415	       ack_delay = 0
1416	       if (pn_space == ApplicationData):
1417	         ack_delay = ack.ack_delay
1418	       UpdateRtt(ack_delay)

1420	     // Process ECN information if present.
1421	     if (ACK frame contains ECN information):
1422	         ProcessECN(ack, pn_space)

1424	     lost_packets = DetectAndRemoveLostPackets(pn_space)
1425	     if (!lost_packets.empty()):
1426	       OnPacketsLost(lost_packets)
1427	     OnPacketsAcked(newly_acked_packets)
1428	     // Reset pto_count unless the client is unsure if
1429	     // the server has validated the client's address.
1430	     if (PeerCompletedAddressValidation()):
1431	       pto_count = 0
1432	     SetLossDetectionTimer()

1434	   UpdateRtt(ack_delay):
1435	     if (is first RTT sample):
1436	       min_rtt = latest_rtt
1437	       smoothed_rtt = latest_rtt
1438	       rttvar = latest_rtt / 2
1439	       return

1441	     // min_rtt ignores ack delay.
1442	     min_rtt = min(min_rtt, latest_rtt)
1443	     // Limit ack_delay by max_ack_delay
1444	     ack_delay = min(ack_delay, max_ack_delay)
1445	     // Adjust for ack delay if plausible.
1446	     adjusted_rtt = latest_rtt
1447	     if (latest_rtt > min_rtt + ack_delay):
1448	       adjusted_rtt = latest_rtt - ack_delay

1450	     rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt)
1451	     smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt

1453	A.8.  Setting the Loss Detection Timer

1455	   QUIC loss detection uses a single timer for all timeout loss
1456	   detection.  The duration of the timer is based on the timer's mode,
1457	   which is set in the packet and timer events further below.  The
1458	   function SetLossDetectionTimer defined below shows how the single
1459	   timer is set.

1461	   This algorithm may result in the timer being set in the past,
1462	   particularly if timers wake up late.  Timers set in the past fire
1463	   immediately.

1465	   Pseudocode for SetLossDetectionTimer follows:

1467	   GetEarliestTimeAndSpace(times):
1468	     time = times[Initial]
1469	     space = Initial
1470	     for pn_space in [ Handshake, ApplicationData ]:
1471	       if (times[pn_space] != 0 &&
1472	           (time == 0 || times[pn_space] < time) &&
1473	           # Skip ApplicationData until handshake completion.
1474	           (pn_space != ApplicationData ||
1475	             IsHandshakeComplete()):
1476	         time = times[pn_space];
1477	         space = pn_space
1478	     return time, space

1480	   PeerCompletedAddressValidation():
1481	     # Assume clients validate the server's address implicitly.
1482	     if (endpoint is server):
1483	       return true
1484	     # Servers complete address validation when a
1485	     # protected packet is received.
1486	     return has received Handshake ACK ||
1487	          has received 1-RTT ACK ||
1488	          has received HANDSHAKE_DONE

1490	   SetLossDetectionTimer():
1491	     earliest_loss_time, _ = GetEarliestTimeAndSpace(loss_time)
1492	     if (earliest_loss_time != 0):
1493	       // Time threshold loss detection.
1494	       loss_detection_timer.update(earliest_loss_time)
1495	       return

1497	     if (server is at anti-amplification limit):
1498	       // The server's timer is not set if nothing can be sent.
1499	       loss_detection_timer.cancel()
1500	       return

1502	     if (no ack-eliciting packets in flight &&
1503	         PeerCompletedAddressValidation()):
1504	       // There is nothing to detect lost, so no timer is set.
1505	       // However, the client needs to arm the timer if the
1506	       // server might be blocked by the anti-amplification limit.
1507	       loss_detection_timer.cancel()
1508	       return

1510	     // Determine which PN space to arm PTO for.
1511	     sent_time, pn_space = GetEarliestTimeAndSpace(
1512	       time_of_last_sent_ack_eliciting_packet)
1513	     // Don't arm PTO for ApplicationData until handshake complete.
1514	     if (pn_space == ApplicationData &&
1515	         handshake is not confirmed):
1516	       loss_detection_timer.cancel()
1517	       return
1518	     if (sent_time == 0):
1519	       assert(!PeerCompletedAddressValidation())
1520	       sent_time = now()

1522	     // Calculate PTO duration
1523	     timeout = smoothed_rtt + max(4 * rttvar, kGranularity) +
1524	       max_ack_delay
1525	     timeout = timeout * (2 ^ pto_count)

1527	     loss_detection_timer.update(sent_time + timeout)

1529	A.9.  On Timeout

1531	   When the loss detection timer expires, the timer's mode determines
1532	   the action to be performed.

1534	   Pseudocode for OnLossDetectionTimeout follows:

1536	   OnLossDetectionTimeout():
1537	     earliest_loss_time, pn_space =
1538	       GetEarliestTimeAndSpace(loss_time)
1539	     if (earliest_loss_time != 0):
1540	       // Time threshold loss Detection
1541	       lost_packets = DetectLostPackets(pn_space)
1542	       assert(!lost_packets.empty())
1543	       OnPacketsLost(lost_packets)
1544	       SetLossDetectionTimer()
1545	       return

1547	     if (bytes_in_flight > 0):
1548	       // PTO. Send new data if available, else retransmit old data.
1549	       // If neither is available, send a single PING frame.
1550	       _, pn_space = GetEarliestTimeAndSpace(
1551	         time_of_last_sent_ack_eliciting_packet)
1552	       SendOneOrTwoAckElicitingPackets(pn_space)
1553	     else:
1554	       assert(endpoint is client without 1-RTT keys)
1555	       // Client sends an anti-deadlock packet: Initial is padded
1556	       // to earn more anti-amplification credit,
1557	       // a Handshake packet proves address ownership.
1558	       if (has Handshake keys):
1559	         SendOneAckElicitingHandshakePacket()
1560	       else:
1561	         SendOneAckElicitingPaddedInitialPacket()

1563	     pto_count++
1564	     SetLossDetectionTimer()

1566	A.10.  Detecting Lost Packets

1568	   DetectAndRemoveLostPackets is called every time an ACK is received or
1569	   the time threshold loss detection timer expires.  This function
1570	   operates on the sent_packets for that packet number space and returns
1571	   a list of packets newly detected as lost.

1573	   Pseudocode for DetectAndRemoveLostPackets follows:

1575	   DetectAndRemoveLostPackets(pn_space):
1576	     assert(largest_acked_packet[pn_space] != infinite)
1577	     loss_time[pn_space] = 0
1578	     lost_packets = {}
1579	     loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt)

1581	     // Minimum time of kGranularity before packets are deemed lost.
1582	     loss_delay = max(loss_delay, kGranularity)

1584	     // Packets sent before this time are deemed lost.
1585	     lost_send_time = now() - loss_delay

1587	     foreach unacked in sent_packets[pn_space]:
1588	       if (unacked.packet_number > largest_acked_packet[pn_space]):
1589	         continue

1591	       // Mark packet as lost, or set time when it should be marked.
1592	       if (unacked.time_sent <= lost_send_time ||
1593	           largest_acked_packet[pn_space] >=
1594	             unacked.packet_number + kPacketThreshold):
1595	         sent_packets[pn_space].remove(unacked.packet_number)
1596	         if (unacked.in_flight):
1597	           lost_packets.insert(unacked)
1598	       else:
1599	         if (loss_time[pn_space] == 0):
1600	           loss_time[pn_space] = unacked.time_sent + loss_delay
1601	         else:
1602	           loss_time[pn_space] = min(loss_time[pn_space],
1603	                                     unacked.time_sent + loss_delay)
1604	     return lost_packets

1606	Appendix B.  Congestion Control Pseudocode

1608	   We now describe an example implementation of the congestion
1609	   controller described in Section 6.

1611	B.1.  Constants of interest

1613	   Constants used in congestion control are based on a combination of
1614	   RFCs, papers, and common practice.

1616	   kInitialWindow:  Default limit on the initial bytes in flight as
1617	      described in Section 6.2.

1619	   kMinimumWindow:  Minimum congestion window in bytes as described in
1620	      Section 6.2.

1622	   kLossReductionFactor:  Reduction in congestion window when a new loss
1623	      event is detected.  The Section 6 section recommends a value is
1624	      0.5.

1626	   kPersistentCongestionThreshold:  Period of time for persistent
1627	      congestion to be established, specified as a PTO multiplier.  The
1628	      Section 6.8 section recommends a value of 3.

1630	B.2.  Variables of interest

1632	   Variables required to implement the congestion control mechanisms are
1633	   described in this section.

1635	   max_datagram_size:  The sender's current maximum payload size.  Does
1636	      not include UDP or IP overhead.  The max datagram size is used for
1637	      congestion window computations.  An endpoint sets the value of
1638	      this variable based on its PMTU (see Section 14.1 of
1639	      [QUIC-TRANSPORT]), with a minimum value of 1200 bytes.

1641	   ecn_ce_counters[kPacketNumberSpace]:  The highest value reported for
1642	      the ECN-CE counter in the packet number space by the peer in an
1643	      ACK frame.  This value is used to detect increases in the reported
1644	      ECN-CE counter.

1646	   bytes_in_flight:  The sum of the size in bytes of all sent packets
1647	      that contain at least one ack-eliciting or PADDING frame, and have
1648	      not been acked or declared lost.  The size does not include IP or
1649	      UDP overhead, but does include the QUIC header and AEAD overhead.
1650	      Packets only containing ACK frames do not count towards
1651	      bytes_in_flight to ensure congestion control does not impede
1652	      congestion feedback.

1654	   congestion_window:  Maximum number of bytes-in-flight that may be
1655	      sent.

1657	   congestion_recovery_start_time:  The time when QUIC first detects
1658	      congestion due to loss or ECN, causing it to enter congestion
1659	      recovery.  When a packet sent after this time is acknowledged,
1660	      QUIC exits congestion recovery.

1662	   ssthresh:  Slow start threshold in bytes.  When the congestion window
1663	      is below ssthresh, the mode is slow start and the window grows by
1664	      the number of bytes acknowledged.

1666	B.3.  Initialization

1668	   At the beginning of the connection, initialize the congestion control
1669	   variables as follows:

1671	      congestion_window = kInitialWindow
1672	      bytes_in_flight = 0
1673	      congestion_recovery_start_time = 0
1674	      ssthresh = infinite
1675	      for pn_space in [ Initial, Handshake, ApplicationData ]:
1676	        ecn_ce_counters[pn_space] = 0

1678	B.4.  On Packet Sent

1680	   Whenever a packet is sent, and it contains non-ACK frames, the packet
1681	   increases bytes_in_flight.

1683	      OnPacketSentCC(bytes_sent):
1684	        bytes_in_flight += bytes_sent

1686	B.5.  On Packet Acknowledgement

1688	   Invoked from loss detection's OnAckReceived and is supplied with the
1689	   newly acked_packets from sent_packets.

1691	      InCongestionRecovery(sent_time):
1692	        return sent_time <= congestion_recovery_start_time

1694	      OnPacketsAcked(acked_packets):
1695	        for (packet in acked_packets):
1696	          // Remove from bytes_in_flight.
1697	          bytes_in_flight -= packet.size
1698	          if (InCongestionRecovery(packet.time_sent)):
1699	            // Do not increase congestion window in recovery period.
1700	            return
1701	          if (IsAppOrFlowControlLimited()):
1702	            // Do not increase congestion_window if application
1703	            // limited or flow control limited.
1704	            return
1705	          if (congestion_window < ssthresh):
1706	            // Slow start.
1707	            congestion_window += packet.size
1708	            return
1709	          // Congestion avoidance.
1710	          congestion_window += max_datagram_size * acked_packet.size
1711	              / congestion_window

1713	B.6.  On New Congestion Event

1715	   Invoked from ProcessECN and OnPacketsLost when a new congestion event
1716	   is detected.  May start a new recovery period and reduces the
1717	   congestion window.

1719	      CongestionEvent(sent_time):
1720	        // Start a new congestion event if packet was sent after the
1721	        // start of the previous congestion recovery period.
1722	        if (!InCongestionRecovery(sent_time)):
1723	          congestion_recovery_start_time = now()
1724	          congestion_window *= kLossReductionFactor
1725	          congestion_window = max(congestion_window, kMinimumWindow)
1726	          ssthresh = congestion_window
1727	          // A packet can be sent to speed up loss recovery.
1728	          MaybeSendOnePacket()

1730	B.7.  Process ECN Information

1732	   Invoked when an ACK frame with an ECN section is received from the
1733	   peer.

1735	      ProcessECN(ack, pn_space):
1736	        // If the ECN-CE counter reported by the peer has increased,
1737	        // this could be a new congestion event.
1738	        if (ack.ce_counter > ecn_ce_counters[pn_space]):
1739	          ecn_ce_counters[pn_space] = ack.ce_counter
1740	          CongestionEvent(sent_packets[ack.largest_acked].time_sent)

1742	B.8.  On Packets Lost

1744	   Invoked from DetectLostPackets when packets are deemed lost.

1746	      InPersistentCongestion(lost_packets):
1747	        pto = smoothed_rtt + max(4 * rttvar, kGranularity) +
1748	          max_ack_delay
1749	        congestion_period = pto * kPersistentCongestionThreshold
1750	        // Determine if all packets in the time period before the
1751	        // largest newly lost packet, including the edges, are
1752	        // marked lost
1753	        return AreAllPacketsLost(lost_packets, congestion_period)

1755	      OnPacketsLost(lost_packets):
1756	        // Remove lost packets from bytes_in_flight.
1757	        for (lost_packet : lost_packets):
1758	          bytes_in_flight -= lost_packet.size
1759	        CongestionEvent(lost_packets.largest().time_sent)

1761	        // Collapse congestion window if persistent congestion
1762	        if (InPersistentCongestion(lost_packets)):
1763	          congestion_window = kMinimumWindow

1765	B.9.  Upon dropping Initial or Handshake keys

1767	   When Initial or Handshake keys are discarded, packets from the space
1768	   are discarded and loss detection state is updated.

1770	   Pseudocode for OnPacketNumberSpaceDiscarded follows:

1772	   OnPacketNumberSpaceDiscarded(pn_space):
1773	     assert(pn_space != ApplicationData)
1774	     // Remove any unacknowledged packets from flight.
1775	     foreach packet in sent_packets[pn_space]:
1776	       if packet.in_flight
1777	         bytes_in_flight -= size
1778	     sent_packets[pn_space].clear()
1779	     // Reset the loss detection and PTO timer
1780	     time_of_last_sent_ack_eliciting_packet[kPacketNumberSpace] = 0
1781	     loss_time[pn_space] = 0
1782	     pto_count = 0
1783	     SetLossDetectionTimer()

1785	Appendix C.  Change Log

1787	      *RFC Editor's Note:* Please remove this section prior to
1788	      publication of a final version of this document.

1790	   Issue and pull request numbers are listed with a leading octothorp.

1792	C.1.  Since draft-ietf-quic-recovery-27

1794	   *  Added recommendations for speeding up handshake under some loss
1795	      conditions (#3078, #3080)

1797	   *  PTO count is reset when handshake progress is made (#3272, #3415)

1799	   *  PTO count is not reset by a client when the server might be
1800	      awaiting address validation (#3546, #3551)

1802	   *  Recommend repairing losses immediately after entering the recovery
1803	      period (#3335, #3443)

1805	   *  Clarified what loss conditions can be ignored during the handshake
1806	      (#3456, #3450)

1808	   *  Allow, but don't recommend, using RTT from previous connection to
1809	      seed RTT (#3464, #3496)

1811	   *  Recommend use of adaptive loss detection thresholds (#3571, #3572)

1813	C.2.  Since draft-ietf-quic-recovery-26

1815	   No changes.

1817	C.3.  Since draft-ietf-quic-recovery-25

1819	   No significant changes.

1821	C.4.  Since draft-ietf-quic-recovery-24

1823	   *  Require congestion control of some sort (#3247, #3244, #3248)

1825	   *  Set a minimum reordering threshold (#3256, #3240)

1827	   *  PTO is specific to a packet number space (#3067, #3074, #3066)

1829	C.5.  Since draft-ietf-quic-recovery-23

1831	   *  Define under-utilizing the congestion window (#2630, #2686, #2675)

1833	   *  PTO MUST send data if possible (#3056, #3057)

1835	   *  Connection Close is not ack-eliciting (#3097, #3098)

1837	   *  MUST limit bursts to the initial congestion window (#3160)

1839	   *  Define the current max_datagram_size for congestion control
1840	      (#3041, #3167)

1842	C.6.  Since draft-ietf-quic-recovery-22

1844	   *  PTO should always send an ack-eliciting packet (#2895)

1846	   *  Unify the Handshake Timer with the PTO timer (#2648, #2658, #2886)

1848	   *  Move ACK generation text to transport draft (#1860, #2916)

1850	C.7.  Since draft-ietf-quic-recovery-21

1852	   *  No changes

1854	C.8.  Since draft-ietf-quic-recovery-20

1856	   *  Path validation can be used as initial RTT value (#2644, #2687)

1858	   *  max_ack_delay transport parameter defaults to 0 (#2638, #2646)

1860	   *  Ack Delay only measures intentional delays induced by the
1861	      implementation (#2596, #2786)

1863	C.9.  Since draft-ietf-quic-recovery-19
1864	   *  Change kPersistentThreshold from an exponent to a multiplier
1865	      (#2557)

1867	   *  Send a PING if the PTO timer fires and there's nothing to send
1868	      (#2624)

1870	   *  Set loss delay to at least kGranularity (#2617)

1872	   *  Merge application limited and sending after idle sections.  Always
1873	      limit burst size instead of requiring resetting CWND to initial
1874	      CWND after idle (#2605)

1876	   *  Rewrite RTT estimation, allow RTT samples where a newly acked
1877	      packet is ack-eliciting but the largest_acked is not (#2592)

1879	   *  Don't arm the handshake timer if there is no handshake data
1880	      (#2590)

1882	   *  Clarify that the time threshold loss alarm takes precedence over
1883	      the crypto handshake timer (#2590, #2620)

1885	   *  Change initial RTT to 500ms to align with RFC6298 (#2184)

1887	C.10.  Since draft-ietf-quic-recovery-18

1889	   *  Change IW byte limit to 14720 from 14600 (#2494)

1891	   *  Update PTO calculation to match RFC6298 (#2480, #2489, #2490)

1893	   *  Improve loss detection's description of multiple packet number
1894	      spaces and pseudocode (#2485, #2451, #2417)

1896	   *  Declare persistent congestion even if non-probe packets are sent
1897	      and don't make persistent congestion more aggressive than RTO
1898	      verified was (#2365, #2244)

1900	   *  Move pseudocode to the appendices (#2408)

1902	   *  What to send on multiple PTOs (#2380)

1904	C.11.  Since draft-ietf-quic-recovery-17

1906	   *  After Probe Timeout discard in-flight packets or send another
1907	      (#2212, #1965)

1909	   *  Endpoints discard initial keys as soon as handshake keys are
1910	      available (#1951, #2045)

1912	   *  0-RTT state is discarded when 0-RTT is rejected (#2300)

1914	   *  Loss detection timer is cancelled when ack-eliciting frames are in
1915	      flight (#2117, #2093)

1917	   *  Packets are declared lost if they are in flight (#2104)

1919	   *  After becoming idle, either pace packets or reset the congestion
1920	      controller (#2138, 2187)

1922	   *  Process ECN counts before marking packets lost (#2142)

1924	   *  Mark packets lost before resetting crypto_count and pto_count
1925	      (#2208, #2209)

1927	   *  Congestion and loss recovery state are discarded when keys are
1928	      discarded (#2327)

1930	C.12.  Since draft-ietf-quic-recovery-16

1932	   *  Unify TLP and RTO into a single PTO; eliminate min RTO, min TLP
1933	      and min crypto timeouts; eliminate timeout validation (#2114,
1934	      #2166, #2168, #1017)

1936	   *  Redefine how congestion avoidance in terms of when the period
1937	      starts (#1928, #1930)

1939	   *  Document what needs to be tracked for packets that are in flight
1940	      (#765, #1724, #1939)

1942	   *  Integrate both time and packet thresholds into loss detection
1943	      (#1969, #1212, #934, #1974)

1945	   *  Reduce congestion window after idle, unless pacing is used (#2007,
1946	      #2023)

1948	   *  Disable RTT calculation for packets that don't elicit
1949	      acknowledgment (#2060, #2078)

1951	   *  Limit ack_delay by max_ack_delay (#2060, #2099)

1953	   *  Initial keys are discarded once Handshake keys are available
1954	      (#1951, #2045)

1956	   *  Reorder ECN and loss detection in pseudocode (#2142)

1958	   *  Only cancel loss detection timer if ack-eliciting packets are in
1959	      flight (#2093, #2117)

1961	C.13.  Since draft-ietf-quic-recovery-14

1963	   *  Used max_ack_delay from transport params (#1796, #1782)

1965	   *  Merge ACK and ACK_ECN (#1783)

1967	C.14.  Since draft-ietf-quic-recovery-13

1969	   *  Corrected the lack of ssthresh reduction in CongestionEvent
1970	      pseudocode (#1598)

1972	   *  Considerations for ECN spoofing (#1426, #1626)

1974	   *  Clarifications for PADDING and congestion control (#837, #838,
1975	      #1517, #1531, #1540)

1977	   *  Reduce early retransmission timer to RTT/8 (#945, #1581)

1979	   *  Packets are declared lost after an RTO is verified (#935, #1582)

1981	C.15.  Since draft-ietf-quic-recovery-12

1983	   *  Changes to manage separate packet number spaces and encryption
1984	      levels (#1190, #1242, #1413, #1450)

1986	   *  Added ECN feedback mechanisms and handling; new ACK_ECN frame
1987	      (#804, #805, #1372)

1989	C.16.  Since draft-ietf-quic-recovery-11

1991	   No significant changes.

1993	C.17.  Since draft-ietf-quic-recovery-10

1995	   *  Improved text on ack generation (#1139, #1159)

1997	   *  Make references to TCP recovery mechanisms informational (#1195)

1999	   *  Define time_of_last_sent_handshake_packet (#1171)

2001	   *  Added signal from TLS the data it includes needs to be sent in a
2002	      Retry packet (#1061, #1199)

2004	   *  Minimum RTT (min_rtt) is initialized with an infinite value
2005	      (#1169)

2007	C.18.  Since draft-ietf-quic-recovery-09

2009	   No significant changes.

2011	C.19.  Since draft-ietf-quic-recovery-08

2013	   *  Clarified pacing and RTO (#967, #977)

2015	C.20.  Since draft-ietf-quic-recovery-07

2017	   *  Include Ack Delay in RTO(and TLP) computations (#981)

2019	   *  Ack Delay in SRTT computation (#961)

2021	   *  Default RTT and Slow Start (#590)

2023	   *  Many editorial fixes.

2025	C.21.  Since draft-ietf-quic-recovery-06

2027	   No significant changes.

2029	C.22.  Since draft-ietf-quic-recovery-05

2031	   *  Add more congestion control text (#776)

2033	C.23.  Since draft-ietf-quic-recovery-04

2035	   No significant changes.

2037	C.24.  Since draft-ietf-quic-recovery-03

2039	   No significant changes.

2041	C.25.  Since draft-ietf-quic-recovery-02

2043	   *  Integrate F-RTO (#544, #409)

2045	   *  Add congestion control (#545, #395)

2047	   *  Require connection abort if a skipped packet was acknowledged
2048	      (#415)

2050	   *  Simplify RTO calculations (#142, #417)

2052	C.26.  Since draft-ietf-quic-recovery-01

2054	   *  Overview added to loss detection

2056	   *  Changes initial default RTT to 100ms

2058	   *  Added time-based loss detection and fixes early retransmit

2060	   *  Clarified loss recovery for handshake packets

2062	   *  Fixed references and made TCP references informative

2064	C.27.  Since draft-ietf-quic-recovery-00

2066	   *  Improved description of constants and ACK behavior

2068	C.28.  Since draft-iyengar-quic-loss-recovery-01

2070	   *  Adopted as base for draft-ietf-quic-recovery

2072	   *  Updated authors/editors list

2074	   *  Added table of contents

2076	Appendix D.  Contributors

2078	   The IETF QUIC Working Group received an enormous amount of support
2079	   from many people.  The following people provided substantive
2080	   contributions to this document: Alessandro Ghedini, Benjamin
2081	   Saunders, Gorry Fairhurst, 奥 一穂 (Kazuho Oku), Lars Eggert, Magnus
2082	   Westerlund, Marten Seemann, Martin Duke, Martin Thomson, Nick Banks,
2083	   Praveen Balasubramaniam.

2085	Acknowledgments

2087	Authors' Addresses

2089	   Jana Iyengar (editor)
2090	   Fastly

2092	   Email: jri.ietf@gmail.com

2094	   Ian Swett (editor)
2095	   Google

2097	   Email: ianswett@google.com