idnits 2.17.1 draft-cardwell-iccrg-bbr-congestion-control-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 3 instances of too long lines in the document, the longest one being 10 characters in excess of 72. ** The abstract seems to contain references ([RFC5681], [RFC793], [RFC8312], [RFC9000]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (7 March 2022) is 779 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Missing Reference: 'RFC6937' is mentioned on line 247, but not defined

  == Missing Reference: 'RFC6928' is mentioned on line 2330, but not defined

  == Unused Reference: 'DC13' is defined on line 2845, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126)

  ** Obsolete normative reference: RFC 8312 (Obsoleted by RFC 9438)


     Summary: 5 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Congestion Control Research Group                   N. Cardwell
3	Internet-Draft                                                  Y. Cheng
4	Intended status: Experimental                          S. Hassas Yeganeh
5	Expires: 8 September 2022                                       I. Swett
6	                                                             V. Jacobson
7	                                                                  Google
8	                                                            7 March 2022

10	                         BBR Congestion Control
11	             draft-cardwell-iccrg-bbr-congestion-control-02

13	Abstract

15	   This document specifies the BBR congestion control algorithm.  BBR
16	   ("Bottleneck Bandwidth and Round-trip propagation time") uses recent
17	   measurements of a transport connection's delivery rate, round-trip
18	   time, and packet loss rate to build an explicit model of the network
19	   path.  BBR then uses this model to control both how fast it sends
20	   data and the maximum volume of data it allows in flight in the
21	   network at any time.  Relative to loss-based congestion control
22	   algorithms such as Reno [RFC5681] or CUBIC [RFC8312], BBR offers
23	   substantially higher throughput for bottlenecks with shallow buffers
24	   or random losses, and substantially lower queueing delays for
25	   bottlenecks with deep buffers (avoiding "bufferbloat").  BBR can be
26	   implemented in any transport protocol that supports packet-delivery
27	   acknowledgment.  Thus far, open source implementations are available
28	   for TCP [RFC793] and QUIC [RFC9000].  This document specifies version
29	   2 of the BBR algorithm, also sometimes referred to as BBRv2 or bbr2.

31	Status of This Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at https://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on 8 September 2022.

48	Copyright Notice

50	   Copyright (c) 2022 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
55	   license-info) in effect on the date of publication of this document.
56	   Please review these documents carefully, as they describe your rights
57	   and restrictions with respect to this document.  Code Components
58	   extracted from this document must include Revised BSD License text as
59	   described in Section 4.e of the Trust Legal Provisions and are
60	   provided without warranty as described in the Revised BSD License.

62	Table of Contents

64	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
65	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   5
66	     2.1.  Transport Connection State  . . . . . . . . . . . . . . .   5
67	     2.2.  Per-Packet State  . . . . . . . . . . . . . . . . . . . .   5
68	     2.3.  Per-ACK Rate Sample State . . . . . . . . . . . . . . . .   5
69	     2.4.  Output Control Parameters . . . . . . . . . . . . . . . .   6
70	     2.5.  Pacing State and Parameters . . . . . . . . . . . . . . .   6
71	     2.6.  cwnd State and Parameters . . . . . . . . . . . . . . . .   7
72	     2.7.  General Algorithm State . . . . . . . . . . . . . . . . .   7
73	     2.8.  Core Algorithm Design Parameters  . . . . . . . . . . . .   7
74	     2.9.  Network Path Model Parameters . . . . . . . . . . . . . .   8
75	       2.9.1.  Data Rate Network Path Model Parameters . . . . . . .   8
76	       2.9.2.  Data Volume Network Path Model Parameters . . . . . .   8
77	     2.10. State for Responding to Congestion  . . . . . . . . . . .   9
78	     2.11. Estimating BBR.max_bw . . . . . . . . . . . . . . . . . .  10
79	     2.12. Estimating BBR.extra_acked  . . . . . . . . . . . . . . .  10
80	     2.13. Startup Parameters and State  . . . . . . . . . . . . . .  10
81	     2.14. ProbeRTT and min_rtt Parameters and State . . . . . . . .  10
82	       2.14.1.  Parameters for Estimating BBR.min_rtt  . . . . . . .  10
83	       2.14.2.  Parameters for Scheduling ProbeRTT . . . . . . . . .  11
84	   3.  Design Overview . . . . . . . . . . . . . . . . . . . . . . .  11
85	     3.1.  High-Level Design Goals . . . . . . . . . . . . . . . . .  11
86	     3.2.  Algorithm Overview  . . . . . . . . . . . . . . . . . . .  12
87	     3.3.  State Machine Overview  . . . . . . . . . . . . . . . . .  13
88	     3.4.  Network Path Model Overview . . . . . . . . . . . . . . .  13
89	       3.4.1.  High-Level Design Goals for the Network Path Model  .  13
90	       3.4.2.  Time Scales for the Network Model . . . . . . . . . .  14
91	     3.5.  Control Parameter Overview  . . . . . . . . . . . . . . .  15
92	     3.6.  Environment and Usage . . . . . . . . . . . . . . . . . .  15
93	   4.  Detailed Algorithm  . . . . . . . . . . . . . . . . . . . . .  15
94	     4.1.  State Machine . . . . . . . . . . . . . . . . . . . . . .  15
95	       4.1.1.  State Transition Diagram  . . . . . . . . . . . . . .  15
96	       4.1.2.  State Machine Operation Overview  . . . . . . . . . .  16
97	       4.1.3.  State Machine Tactics . . . . . . . . . . . . . . . .  17
98	     4.2.  Algorithm Organization  . . . . . . . . . . . . . . . . .  18
99	       4.2.1.  Initialization  . . . . . . . . . . . . . . . . . . .  18
100	       4.2.2.  Per-Transmit Steps  . . . . . . . . . . . . . . . . .  18
101	       4.2.3.  Per-ACK Steps . . . . . . . . . . . . . . . . . . . .  19
102	       4.2.4.  Per-Loss Steps  . . . . . . . . . . . . . . . . . . .  19
103	     4.3.  State Machine Operation . . . . . . . . . . . . . . . . .  19
104	       4.3.1.  Startup . . . . . . . . . . . . . . . . . . . . . . .  19
105	       4.3.2.  Drain . . . . . . . . . . . . . . . . . . . . . . . .  22
106	       4.3.3.  ProbeBW . . . . . . . . . . . . . . . . . . . . . . .  23
107	       4.3.4.  ProbeRTT  . . . . . . . . . . . . . . . . . . . . . .  34
108	     4.4.  Restarting From Idle  . . . . . . . . . . . . . . . . . .  39
109	       4.4.1.  Setting Pacing Rate in ProbeBW  . . . . . . . . . . .  39
110	       4.4.2.  Checking for ProberRTT Completion . . . . . . . . . .  40
111	       4.4.3.  Logic . . . . . . . . . . . . . . . . . . . . . . . .  40
112	     4.5.  Updating Network Path Model Parameters  . . . . . . . . .  40
113	       4.5.1.  BBR.round_count: Tracking Packet-Timed Round Trips  .  41
114	       4.5.2.  BBR.max_bw: Estimated Maximum Bandwidth . . . . . . .  42
115	       4.5.3.  BBR.min_rtt: Estimated Minimum Round-Trip Time  . . .  44
116	       4.5.4.  BBR.offload_budget  . . . . . . . . . . . . . . . . .  46
117	       4.5.5.  BBR.extra_acked . . . . . . . . . . . . . . . . . . .  47
118	       4.5.6.  Updating the Model Upon Packet Loss . . . . . . . . .  48
119	     4.6.  Updating Control Parameters . . . . . . . . . . . . . . .  52
120	       4.6.1.  Summary of Control Behavior in the State Machine  . .  52
121	       4.6.2.  Pacing Rate: BBR.pacing_rate  . . . . . . . . . . . .  53
122	       4.6.3.  Send Quantum: BBR.send_quantum  . . . . . . . . . . .  55
123	       4.6.4.  Congestion Window . . . . . . . . . . . . . . . . . .  55
124	   5.  Implementation Status . . . . . . . . . . . . . . . . . . . .  61
125	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  62
126	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  62
127	   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  63
128	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  63
129	     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  63
130	     9.2.  Informative References  . . . . . . . . . . . . . . . . .  64
131	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  66

133	1.  Introduction

135	   The Internet has traditionally used loss-based congestion control
136	   algorithms like Reno ([Jac88], [Jac90], [WS95] [RFC5681]) and CUBIC
137	   ([HRX08], [RFC8312]).  These algorithms worked well for many years
138	   because they were sufficiently well-matched to the prevalent range of
139	   bandwidth-delay products and degrees of buffering in Internet paths.
140	   As the Internet has evolved, loss-based congestion control is
141	   increasingly problematic in several important scenarios:

143	   1.  Shallow buffers: In shallow buffers, packet loss can happen even
144	       when a link has low utilization.  With high-speed, long-haul
145	       links employing commodity switches with shallow buffers, loss-
146	       based congestion control can cause abysmal throughput because it
147	       overreacts, multiplicatively decreasing the sending rate upon
148	       packet loss, and only slowly growing its sending rate thereafter.
149	       This can happen even if the packet loss arises from transient
150	       traffic bursts when the link is mostly idle.

152	   2.  Deep buffers: At the edge of today's Internet, loss-based
153	       congestion control can cause the problem of "bufferbloat", by
154	       repeatedly filling deep buffers in last-mile links and causing
155	       high queuing delays.

157	   3.  Dynamic traffic workloads: With buffers of any depth, dynamic
158	       mixes of newly-entering flows or flights of data from recently
159	       idle flows can cause frequent packet loss.  In such scenarios
160	       loss-based congestion control can fail to maintain its fair share
161	       of bandwidth, leading to poor application performance.

163	   In both the shallow-buffer (1.) or dynamic-traffic (3.) scenarios
164	   mentioned above it is difficult to achieve full throughput with loss-
165	   based congestion control in practice: for CUBIC, sustaining 10Gbps
166	   over 100ms RTT needs a packet loss rate below 0.000003% (i.e., more
167	   than 40 seconds between packet losses), and over a 100ms RTT path a
168	   more feasible loss rate like 1% can only sustain at most 3 Mbps
169	   [RFC8312].  These limitations apply no matter what the bottleneck
170	   link is capable of or what the connection's fair share is.
171	   Furthermore, failure to reach the fair share can cause poor
172	   throughpout and poor tail latency for latency-sensitive applications.

174	   The BBR ("Bottleneck Bandwidth and Round-trip propagation time")
175	   congestion control algorithm is a model-based algorithm that takes an
176	   approach different from loss-based congestion control: BBR uses
177	   recent measurements of a transport connection's delivery rate, round-
178	   trip time, and packet loss rate to build an explicit model of the
179	   network path, including its estimated available bandwidth, bandwidth-
180	   delay product, and the maximum volume of data that the connection can
181	   place in-flight in the network without causing excessive queue
182	   pressure.  It then uses this model in order to guide its control
183	   behavior in seeking high throughput and low queue pressure.

185	   This document describes the current version of the BBR algorithm,
186	   BBRv2.  The previous version of the algorithm, BBRv1, was described
187	   previously at a high level [CCGHJ16][CCGHJ17].  The implications of
188	   BBR in allowing high utilization of high-speed networks with shallow
189	   buffers have been discussed in other work [MM19].  Active work on the
190	   BBR algorithm is continuing.

192	   This document is organized as follows.  Section 2 provides various
193	   definitions that will be used throughout this document.  Section 3
194	   provides an overview of the design of the BBR algorithm, and section
195	   4 describes the BBR algorithm in detail, including BBR's network path
196	   model, control parameters, and state machine.  Section 5 describes
197	   the implementation status, section 6 describes security
198	   considerations, section 7 notes that there are no IANA
199	   considerations, and section 8 closes with Acknowledgments.

201	2.  Terminology

203	   This document defines state variables and constants for the BBR
204	   algorithm.

206	   The variables starting with C, P, or rs not defined below are defined
207	   in [draft-cheng-iccrg-delivery-rate-estimation].

209	2.1.  Transport Connection State

211	   C.delivered: The total amount of data (tracked in octets or in
212	   packets) delivered so far over the lifetime of the transport
213	   connection C.

215	   SMSS: The Sender Maximum Segment Size.

217	   is_cwnd_limited: True if the connection has fully utilized its cwnd
218	   at any point in the last packet-timed round trip.

220	   InitialCwnd: The initial congestion window set by the transport
221	   protocol implementation for the connection at initialization time.

223	2.2.  Per-Packet State

225	   packet.delivered: C.delivered when the given packet was sent by
226	   transport connection C.

228	   packet.departure_time: The earliest pacing departure time for the
229	   given packet.

231	   packet.tx_in_flight: The volume of data that was estimated to be in
232	   flight at the time of the transmission of the packet.

234	2.3.  Per-ACK Rate Sample State

236	   rs.delivered: The volume of data delivered between the transmission
237	   of the packet that has just been ACKed and the current time.

239	   rs.delivery_rate: The delivery rate (aka bandwidth) sample obtained
240	   from the packet that has just been ACKed.

242	   rs.rtt: The RTT sample calculated based on the most recently-sent
243	   segment of the segments that have just been ACKed.

245	   rs.newly_acked: The volume of data cumulatively or selectively
246	   acknowledged upon the ACK that was just received.  (This quantity is
247	   referred to as "DeliveredData" in [RFC6937].)

249	   rs.newly_lost: The volume of data newly marked lost upon the ACK that
250	   was just received.

252	   rs.tx_in_flight: The volume of data that was estimated to be in
253	   flight at the time of the transmission of the packet that has just
254	   been ACKed (the most recently sent segment among segments ACKed by
255	   the ACK that was just received).

257	   rs.lost: The volume of data that was declared lost between the
258	   transmission and acknowledgement of the packet that has just been
259	   ACKed (the most recently sent segment among segments ACKed by the ACK
260	   that was just received).

262	2.4.  Output Control Parameters

264	   cwnd: The transport sender's congestion window, which limits the
265	   amount of data in flight.

267	   BBR.pacing_rate: The current pacing rate for a BBR flow, which
268	   controls inter-packet spacing.

270	   BBR.send_quantum: The maximum size of a data aggregate scheduled and
271	   transmitted together.

273	2.5.  Pacing State and Parameters

275	   BBR.pacing_gain: The dynamic gain factor used to scale BBR.bw to
276	   produce BBR.pacing_rate.

278	   BBRPacingMarginPercent: The static discount factor of 1% used to
279	   scale BBR.bw to produce BBR.pacing_rate.

281	   BBR.next_departure_time: The earliest pacing departure time for the
282	   next packet BBR schedules for transmission.

284	2.6.  cwnd State and Parameters

286	   BBR.cwnd_gain: The dynamic gain factor used to scale the estimated
287	   BDP to produce a congestion window (cwnd).

289	   BBRStartupPacingGain: A constant specifying the minimum gain value
290	   for calculating the pacing rate that will allow the sending rate to
291	   double each round (4*ln(2) ~= 2.77) [BBRStartupPacingGain]; used in
292	   Startup mode for BBR.pacing_gain.

294	   BBRStartupCwndGain: A constant specifying the minimum gain value for
295	   calculating the cwnd that will allow the sending rate to double each
296	   round (2.0); used in Startup mode for BBR.cwnd_gain.

298	   BBR.packet_conservation: A boolean indicating whether BBR is
299	   currently using packet conservation dynamics to bound cwnd.

301	2.7.  General Algorithm State

303	   BBR.state: The current state of a BBR flow in the BBR state machine.

305	   BBR.round_count: Count of packet-timed round trips elapsed so far.

307	   BBR.round_start: A boolean that BBR sets to true once per packet-
308	   timed round trip, on ACKs that advance BBR.round_count.

310	   BBR.next_round_delivered: packet.delivered value denoting the end of
311	   a packet-timed round trip.

313	   BBR.idle_restart: A boolean that is true if and only if a connection
314	   is restarting after being idle.

316	2.8.  Core Algorithm Design Parameters

318	   BBRLossThresh: The maximum tolerated per-round-trip packet loss rate
319	   when probing for bandwidth (the default is 2%).

321	   BBRBeta: The default multiplicative decrease to make upon each round
322	   trip during which the connection detects packet loss (the value is
323	   0.7).

325	   BBRHeadroom: The multiplicative factor to apply to BBR.inflight_hi
326	   when attempting to leave free headroom in the path (e.g. free space
327	   in the bottleneck buffer or free time slots in the bottleneck link)
328	   that can be used by cross traffic (the value is 0.85).

330	   BBRMinPipeCwnd: The minimal cwnd value BBR targets, to allow
331	   pipelining with TCP endpoints that follow an "ACK every other packet"
332	   delayed-ACK policy: 4 * SMSS.

334	2.9.  Network Path Model Parameters

336	2.9.1.  Data Rate Network Path Model Parameters

338	   The data rate model parameters together estimate both the sending
339	   rate required to reach the full bandwidth available to the flow
340	   (BBR.max_bw), and the maximum pacing rate control parameter that is
341	   consistent with the queue pressure objective (BBR.bw).

343	   BBR.max_bw: The windowed maximum recent bandwidth sample - obtained
344	   using the BBR delivery rate sampling algorithm [draft-cheng-iccrg-
345	   delivery-rate-estimation] - measured during the current or previous
346	   bandwidth probing cycle (or during Startup, if the flow is still in
347	   that state).  (Part of the long-term model.)

349	   BBR.bw_hi: The long-term maximum sending bandwidth that the algorithm
350	   estimates will produce acceptable queue pressure, based on signals in
351	   the current or previous bandwidth probing cycle, as measured by loss.
352	   (Part of the long-term model.)

354	   BBR.bw_lo: The short-term maximum sending bandwidth that the
355	   algorithm estimates is safe for matching the current network path
356	   delivery rate, based on any loss signals in the current bandwidth
357	   probing cycle.  This is generally lower than max_bw or bw_hi (thus
358	   the name).  (Part of the short-term model.)

360	   BBR.bw: The maximum sending bandwidth that the algorithm estimates is
361	   appropriate for matching the current network path delivery rate,
362	   given all available signals in the model, at any time scale.  It is
363	   the min() of max_bw, bw_hi, and bw_lo.

365	2.9.2.  Data Volume Network Path Model Parameters

367	   The data volume model parameters together estimate both the volume of
368	   in-flight data required to reach the full bandwidth available to the
369	   flow (BBR.max_inflight), and the maximum volume of data that is
370	   consistent with the queue pressure objective (cwnd).

372	   BBR.min_rtt: The windowed minimum round-trip time sample measured
373	   over the last MinRTTFilterLen = 10 seconds.  This attempts to
374	   estimate the two-way propagation delay of the network path when all
375	   connections sharing a bottleneck are using BBR, but also allows BBR
376	   to estimate the value required for a bdp estimate that allows full
377	   throughput if there are legacy loss-based Reno or CUBIC flows sharing
378	   the bottleneck.

380	   BBR.bdp: The estimate of the network path's BDP (Bandwidth-Delay
381	   Product), computed as: BBR.bdp = BBR.bw * BBR.min_rtt.

383	   BBR.extra_acked: A volume of data that is the estimate of the recent
384	   degree of aggregation in the network path.

386	   BBR.offload_budget: The estimate of the minimum volume of data
387	   necessary to achieve full throughput when using sender (TSO/GSO) and
388	   receiver (LRO, GRO) host offload mechanisms.

390	   BBR.max_inflight: The estimate of the volume of in-flight data
391	   required to fully utilize the bottleneck bandwidth available to the
392	   flow, based on the BDP estimate (BBR.bdp), the aggregation estimate
393	   (BBR.extra_acked), the offload budget (BBR.offload_budget), and
394	   BBRMinPipeCwnd.

396	   BBR.inflight_hi: Analogous to BBR.bw_hi, the long-term maximum volume
397	   of in-flight data that the algorithm estimates will produce
398	   acceptable queue pressure, based on signals in the current or
399	   previous bandwidth probing cycle, as measured by loss.  That is, if a
400	   flow is probing for bandwidth, and observes that sending a particular
401	   volume of in-flight data causes a loss rate higher than the loss rate
402	   objective, it sets inflight_hi to that volume of data.  (Part of the
403	   long-term model.)

405	   BBR.inflight_lo: Analogous to BBR.bw_lo, the short-term maximum
406	   volume of in-flight data that the algorithm estimates is safe for
407	   matching the current network path delivery process, based on any loss
408	   signals in the current bandwidth probing cycle.  This is generally
409	   lower than max_inflight or inflight_hi (thus the name).  (Part of the
410	   short-term model.)

412	2.10.  State for Responding to Congestion

414	   BBR.bw_latest: a 1-round-trip max of delivered bandwidth
415	   (rs.delivery_rate).

417	   BBR.inflight_latest: a 1-round-trip max of delivered volume of data
418	   (rs.delivered).

420	2.11.  Estimating BBR.max_bw

422	   BBR.MaxBwFilter: The filter for tracking the maximum recent
423	   rs.delivery_rate sample, for estimating BBR.max_bw.

425	   MaxBwFilterLen: The filter window length for BBR.MaxBwFilter = 2
426	   (representing up to 2 ProbeBW cycles, the current cycle and the
427	   previous full cycle).

429	   BBR.cycle_count: The virtual time used by the BBR.max_bw filter
430	   window.  Note that BBR.cycle_count only needs to be tracked with a
431	   single bit, since the BBR.MaxBwFilter only needs to track samples
432	   from two time slots: the previous ProbeBW cycle and the current
433	   ProbeBW cycle.

435	2.12.  Estimating BBR.extra_acked

437	   BBR.extra_acked_interval_start: the start of the time interval for
438	   estimating the excess amount of data acknowledged due to aggregation
439	   effects.

441	   BBR.extra_acked_delivered: the volume of data marked as delivered
442	   since BBR.extra_acked_interval_start.

444	   BBR.ExtraACKedFilter: the max filter tracking the recent maximum
445	   degree of aggregation in the path.

447	   BBRExtraAckedFilterLen = The window length of the
448	   BBR.ExtraACKedFilter max filter window: 10 (in units of packet-timed
449	   round trips).

451	2.13.  Startup Parameters and State

453	   BBR.filled_pipe: A boolean that records whether BBR estimates that it
454	   has ever fully utilized its available bandwidth ("filled the pipe").

456	   BBR.full_bw: A recent baseline BBR.max_bw to estimate if BBR has
457	   "filled the pipe" in Startup.

459	   BBR.full_bw_count: The number of non-app-limited round trips without
460	   large increases in BBR.full_bw.

462	2.14.  ProbeRTT and min_rtt Parameters and State

464	2.14.1.  Parameters for Estimating BBR.min_rtt

466	   BBR.min_rtt_stamp: The wall clock time at which the current
467	   BBR.min_rtt sample was obtained.

469	   MinRTTFilterLen: A constant specifying the length of the BBR.min_rtt
470	   min filter window, MinRTTFilterLen is 10 secs.

472	2.14.2.  Parameters for Scheduling ProbeRTT

474	   BBRProbeRTTCwndGain = A constant specifying the gain value for
475	   calculating the cwnd during ProbeRTT: 0.5 (meaning that ProbeRTT
476	   attempts to reduce in-flight data to 50% of the estimated BDP).

478	   ProbeRTTDuration: A constant specifying the minimum duration for
479	   which ProbeRTT state holds inflight to BBRMinPipeCwnd or fewer
480	   packets: 200 ms.

482	   ProbeRTTInterval: A constant specifying the minimum time interval
483	   between ProbeRTT states: 5 secs.

485	   BBR.probe_rtt_min_delay: The minimum RTT sample recorded in the last
486	   ProbeRTTInterval.

488	   BBR.probe_rtt_min_stamp: The wall clock time at which the current
489	   BBR.probe_rtt_min_delay sample was obtained.

491	   BBR.probe_rtt_expired: A boolean recording whether the
492	   BBR.probe_rtt_min_delay has expired and is due for a refresh with an
493	   application idle period or a transition into ProbeRTT state.

495	   The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
496	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
497	   document are to be interpreted as described in [RFC2119].

499	3.  Design Overview

501	3.1.  High-Level Design Goals

503	   The high-level goal of BBR is to achieve both:

505	   1.  The full throughput (or approximate fair share thereof) available
506	       to a flow

508	       *  Achieved in a fast and scalable manner (using bandwidth in
509	          O(log(BDP)) time).

511	       *  Achieved with average packet loss rates of up to 1%.

513	   2.  Low queue pressure (low queuing delay and low packet loss).

515	   These goals are in tension: sending faster improves the odds of
516	   achieving (1) but reduces the odds of achieving (2), while sending
517	   slower improves the odds of achieving (2) but reduces the odds of
518	   achieving (1).  Thus the algorithm cannot maximize throughput or
519	   minimize queue pressure independently, and must jointly optimize
520	   both.

522	   To try to achieve these goals, and seek an operating point with high
523	   throughput and low delay [K79] [GK81], BBR aims to adapt its sending
524	   process to match the network delivery process, in two dimensions:

526	   1.  data rate: the rate at which the flow sends data should ideally
527	       match the rate at which the network delivers the flow's data (the
528	       available bottleneck bandwidth)

530	   2.  data volume: the amount of unacknowledged data in flight in the
531	       network should ideally match the bandwidth-delay product (BDP) of
532	       the path

534	   Both the control of the data rate (via the pacing rate) and data
535	   volume (directly via the congestion window or cwnd; and indirectly
536	   via the pacing rate) are important.  A mismatch in either dimension
537	   can cause the sender to fail to meet its high-level design goals:

539	   1.  volume mismatch: If a sender perfectly matches its sending rate
540	       to the available bandwidth, but its volume of in-flight data
541	       exceeds the BDP, then the sender can maintain a large standing
542	       queue, increasing network latency and risking packet loss.

544	   2.  rate mismatch: If a sender's volume of in-flight data matches the
545	       BDP perfectly but its sending rate exceeds the available
546	       bottleneck bandwidth (e.g. the sender transmits a BDP of data in
547	       an unpaced fashion, at the sender's link rate), then up to a full
548	       BDP of data can burst into the bottleneck queue, causing high
549	       delay and/or high loss.

551	3.2.  Algorithm Overview

553	   Based on the rationale above, BBR tries to spend most of its time
554	   matching its sending process (data rate and data volume) to the
555	   network path's delivery process.  To do this, it explores the
556	   2-dimensional control parameter space of (1) data rate ("bandwidth"
557	   or "throughput") and (2) data volume ("in-flight data"), with a goal
558	   of finding the maximum values of each control parameter that are
559	   consistent with its objective for queue pressure.

561	   Depending on what signals a given network path manifests at a given
562	   time, the objective for queue pressure is measured in terms of the
563	   most strict among:

565	   *  the amount of data that is estimated to be queued in the
566	      bottleneck buffer (data_in_flight - estimated_BDP): the objective
567	      is to maintain this amount at or below 1.5 * estimated_BDP

569	   *  the packet loss rate: the objective is a maximum per-round-trip
570	      packet loss rate of BBRLossThresh=2% (and an average packet loss
571	      rate considerably lower)

573	3.3.  State Machine Overview

575	   BBR varies its control parameters with a simple state machine that
576	   aims for high throughput, low latency, and an approximately fair
577	   sharing of bandwidth, while maintaining an up-to-date model of the
578	   network path.

580	   A BBR flow starts in the Startup state, and ramps up its sending rate
581	   quickly, to rapidly estimate the maximum available bandwidth
582	   (BBR.max_bw).  When it estimates the bottleneck bandwidth has been
583	   fully utilized, it enters the Drain state to drain the estimated
584	   queue.  In steady state a BBR flow mostly uses the ProbeBW states, to
585	   periodically briefly send faster to probe for higher capacity and
586	   then briefly send slower to try to drain any resulting queue.  If
587	   needed, it briefly enters the ProbeRTT state, to lower the sending
588	   rate to probe for lower BBR.min_rtt samples.  The detailed behavior
589	   for each state is described below.

591	3.4.  Network Path Model Overview

593	3.4.1.  High-Level Design Goals for the Network Path Model

595	   At a high level, the BBR model is trying to reflect two aspects of
596	   the network path:

598	   *  Model what's required for achieving full throughput: Estimate the
599	      minimum data rate and data volume required to fully utilize the
600	      fair share of the bottleneck bandwidth available to the flow.
601	      This must incorporate estimates of the maximum available bandwidth
602	      (BBR.max_bw), the BDP of the path (BBR.bdp), and the requirements
603	      of any offload features on the end hosts or mechanisms on the
604	      network path that produce aggregation effects (summing up to
605	      BBR.max_inflight).

607	   *  Model what's permitted for achieving low queue pressure: Estimate
608	      the maximum data rate (BBR.bw) and data volume (cwnd) consistent
609	      with the queue pressure objective, as measured by the estimated
610	      degree of queuing and packet loss.

612	   Note that those two aspects are in tension: the highest throughput is
613	   available to the flow when it sends as fast as possible and occupies
614	   as many bottleneck buffer slots as possible; the lowest que pressure
615	   is achieved by the flow when it sends as slow as possible and
616	   occupies as few bottleneck buffer slots as possible.  To resolve the
617	   tension, the algorithm aims to achieve the maximum throughput
618	   achievable while still meeting the queue pressure objective.

620	3.4.2.  Time Scales for the Network Model

622	   At a high level, the BBR model is trying to reflect the properties of
623	   the network path on two different time scales:

625	3.4.2.1.  Long-term model

627	   One goal is for BBR to maintain high average utilization of the fair
628	   share of the available bandwidth, over long time intervals.  This
629	   requires estimates of the path's data rate and volume capacities that
630	   are robust over long time intervals.  This means being robust to
631	   congestion signals that may be noisy or may reflect short-term
632	   congestion that has already abated by the time an ACK arrives.  This
633	   also means providing a robust history of the best recently-achievable
634	   performance on the path so that the flow can quickly and robustly aim
635	   to re-probe that level of performance whenever it decides to probe
636	   the capacity of the path.

638	3.4.2.2.  Short-term model

640	   A second goal of BBR is to react to every congestion signal,
641	   including loss, as if it may indicate a persistent/long-term increase
642	   in congestion and/or decrease in the bandwidth available to the flow,
643	   because that may indeed be the case.

645	3.4.2.3.  Time Scale Strategy

647	   BBR sequentially alternates between spending most of its time using
648	   short-term models to conservatively respect all congestion signals in
649	   case they represent persistent congestion, but periodically using its
650	   long-term model to robustly probe the limits of the available path
651	   capacity in case the congestion has abated and more capacity is
652	   available.

654	3.5.  Control Parameter Overview

656	   BBR uses its model to control the connection's sending behavior.
657	   Rather than using a single control parameter, like the cwnd parameter
658	   that limits the volume of in-flight data in the Reno and CUBIC
659	   congestion control algorithms, BBR uses three distinct control
660	   parameters:

662	   1.  pacing rate: the maximum rate at which BBR sends data.

664	   2.  send quantum: the maximum size of any aggregate that the
665	       transport sender implementation may need to transmit as a unit to
666	       amortize per-packet transmission overheads.

668	   3.  cwnd: the maximum volume of data BBR allows in-flight in the
669	       network at any time.

671	3.6.  Environment and Usage

673	   BBR is a congestion control algorithm that is agnostic to transport-
674	   layer and link-layer technologies, requires only sender-side changes,
675	   and does not require changes in the network.  Open source
676	   implementations of BBR are available for the TCP [RFC793] and QUIC
677	   [RFC9000] transport protocols, and these implementations have been
678	   used in production for a large volume of Internet traffic.  An open
679	   source implementation of BBR is also available for DCCP [RFC4340]
680	   [draft-romo-iccrg-ccid5].

682	4.  Detailed Algorithm

684	4.1.  State Machine

686	   BBR implements a state machine that uses the network path model to
687	   guide its decisions, and the control parameters to enact its
688	   decisions.

690	4.1.1.  State Transition Diagram

692	   The following state transition diagram summarizes the flow of control
693	   and the relationship between the different states:

695	                |
696	                V
697	       +---> Startup  ------------+
698	       |        |                 |
699	       |        V                 |
700	       |     Drain  --------------+
701	       |        |                 |
702	       |        V                 |
703	       +---> ProbeBW_DOWN  -------+
704	       | ^      |                 |
705	       | |      V                 |
706	       | |   ProbeBW_CRUISE ------+
707	       | |      |                 |
708	       | |      V                 |
709	       | |   ProbeBW_REFILL  -----+
710	       | |      |                 |
711	       | |      V                 |
712	       | |   ProbeBW_UP  ---------+
713	       | |      |                 |
714	       | +------+                 |
715	       |                          |
716	       +---- ProbeRTT <-----------+

718	4.1.2.  State Machine Operation Overview

720	   When starting up, BBR probes to try to quickly build a model of the
721	   network path; to adapt to later changes to the path or its traffic,
722	   BBR must continue to probe to update its model.  If the available
723	   bottleneck bandwidth increases, BBR must send faster to discover
724	   this.  Likewise, if the round-trip propagation delay changes, this
725	   changes the BDP, and thus BBR must send slower to get inflight below
726	   the new BDP in order to measure the new BBR.min_rtt.  Thus, BBR's
727	   state machine runs periodic, sequential experiments, sending faster
728	   to check for BBR.bw increases or sending slower to yield bandwidth,
729	   drain the queue, and check for BBR.min_rtt decreases.  The frequency,
730	   magnitude, duration, and structure of these experiments differ
731	   depending on what's already known (startup or steady-state) and
732	   application sending behavior (intermittent or continuous).

734	   This state machine has several goals:

736	   *  Achieve high throughput by efficiently utilizing available
737	      bandwidth.

739	   *  Achieve low latency and packet loss rates by keeping queues
740	      bounded and small.

742	   *  Share bandwidth with other flows in an approximately fair manner.

744	   *  Feed samples to the model estimators to refresh and update the
745	      model.

747	4.1.3.  State Machine Tactics

749	   In the BBR framework, at any given time the sender can choose one of
750	   the following tactics:

752	   *  Acceleration: Send faster then the network is delivering data: to
753	      probe the maximum bandwidth available to the flow

755	   *  Cruising: Send at the same rate the network is delivering data:
756	      try to match the sending rate to the flow's current available
757	      bandwidth, to try to achieve high utilization of the available
758	      bandwidth without increasing queue pressure

760	   *  Deceleration: Send slower than the network is delivering data: to
761	      reduce the amount of data in flight, with a number of overlapping
762	      motivations:

764	      -  Reducing queuing delay: to reduce queuing delay, to reduce
765	         latency for request/response cross-traffic (e.g.  RPC, web
766	         traffic).

768	      -  Reducing packet loss: to reduce packet loss, to reduce tail
769	         latency for request/response cross-traffic (e.g.  RPC, web
770	         traffic) and improve coexistence with Reno/CUBIC.

772	      -  Probing BBR.min_rtt: to probe the path's BBR.min_rtt

774	      -  Bandwidth convergence: to aid bandwidth fairness convergence,
775	         by leaving unused capacity in the bottleneck link or bottleneck
776	         buffer, to allow other flows that may have lower sending rates
777	         to discover and utilize the unused capacity

779	      -  Burst tolerance: to allow bursty arrivals of cross-traffic
780	         (e.g. short web or RPC requests) to be able to share the
781	         bottleneck link without causing excessive queuing delay or
782	         packet loss

784	   Throughout the lifetime of a BBR flow, it sequentially cycles through
785	   all three tactics, to measure the network path and try to optimize
786	   its operating point.

788	   BBR's state machine uses two control mechanisms.  Primarily, it uses
789	   the pacing_gain (see the "Pacing Rate" section), which controls how
790	   fast packets are sent relative to BBR.bw.  A pacing_gain > 1
791	   decreases inter-packet time and increases inflight.  A pacing_gain <
792	   1 has the opposite effect, increasing inter-packet time and while
793	   aiming to decrease inflight.  Second, if the state machine needs to
794	   quickly reduce inflight to a particular absolute value, it uses the
795	   cwnd.

797	4.2.  Algorithm Organization

799	   The BBR algorithm is an event-driven algorithm that executes steps
800	   upon the following events: connection initialization, upon each ACK,
801	   upon the transmission of each quantum, and upon loss detection
802	   events.  All of the sub-steps invoked referenced below are described
803	   below.

805	4.2.1.  Initialization

807	   Upon transport connection initialization, BBR executes its
808	   initialization steps:

810	     BBROnInit():
811	       init_windowed_max_filter(filter=BBR.MaxBwFilter, value=0, time=0)
812	       BBR.min_rtt = SRTT ? SRTT : Inf
813	       BBR.min_rtt_stamp = Now()
814	       BBR.probe_rtt_done_stamp = 0
815	       BBR.probe_rtt_round_done = false
816	       BBR.prior_cwnd = 0
817	       BBR.idle_restart = false
818	       BBR.extra_acked_interval_start = Now()
819	       BBR.extra_acked_delivered = 0
820	       BBRResetCongestionSignals()
821	       BBRResetLowerBounds()
822	       BBRInitRoundCounting()
823	       BBRInitFullPipe()
824	       BBRInitPacingRate()
825	       BBREnterStartup()

827	4.2.2.  Per-Transmit Steps

829	   When transmitting, BBR merely needs to check for the case where the
830	   flow is restarting from idle:

832	     BBROnTransmit():
833	       BBRHandleRestartFromIdle()

835	4.2.3.  Per-ACK Steps

837	   On every ACK, the BBR algorithm executes the following
838	   BBRUpdateOnACK() steps in order to update its network path model,
839	   update its state machine, and adjust its control parameters to adapt
840	   to the updated model:

842	     BBRUpdateOnACK():
843	       BBRUpdateModelAndState()
844	       BBRUpdateControlParameters()

846	     BBRUpdateModelAndState():
847	       BBRUpdateLatestDeliverySignals()
848	       BBRUpdateCongestionSignals()
849	       BBRUpdateACKAggregation()
850	       BBRCheckStartupDone()
851	       BBRCheckDrain()
852	       BBRUpdateProbeBWCyclePhase()
853	       BBRUpdateMinRTT()
854	       BBRCheckProbeRTT()
855	       BBRAdvanceLatestDeliverySignals()
856	       BBRBoundBWForModel()

858	     BBRUpdateControlParameters():
859	       BBRSetPacingRate()
860	       BBRSetSendQuantum()
861	       BBRSetCwnd()

863	4.2.4.  Per-Loss Steps

865	   On every packet loss event, where some sequence range "packet" is
866	   marked lost, the BBR algorithm executes the following
867	   BBRUpdateOnLoss() steps in order to update its network path model

869	     BBRUpdateOnLoss(packet):
870	       BBRHandleLostPacket(packet)

872	4.3.  State Machine Operation

874	4.3.1.  Startup
875	4.3.1.1.  Startup Dynamics

877	   When a BBR flow starts up, it performs its first (and most rapid)
878	   sequential probe/drain process in the Startup and Drain states.
879	   Network link bandwidths currently span a range of at least 11 orders
880	   of magnitude, from a few bps to 200 Gbps.  To quickly learn
881	   BBR.max_bw, given this huge range to explore, BBR's Startup state
882	   does an exponential search of the rate space, doubling the sending
883	   rate each round.  This finds BBR.max_bw in O(log_2(BDP)) round trips.

885	   To achieve this rapid probing in the smoothest possible fashion, in
886	   Startup BBR uses the minimum gain values that will allow the sending
887	   rate to double each round: in Startup BBR sets BBR.pacing_gain to
888	   BBRStartupPacingGain (2.77) [BBRStartupPacingGain] and BBR.cwnd_gain
889	   to BBRStartupCwndGain (2).

891	   When initializing a connection, or upon any later entry into Startup
892	   mode, BBR executes the following BBREnterStartup() steps:

894	     BBREnterStartup():
895	       BBR.state = Startup
896	       BBR.pacing_gain = BBRStartupPacingGain
897	       BBR.cwnd_gain = BBRStartupCwndGain

899	   As BBR grows its sending rate rapidly, it obtains higher delivery
900	   rate samples, BBR.max_bw increases, and the pacing rate and cwnd both
901	   adapt by smoothly growing in proportion.  Once the pipe is full, a
902	   queue typically forms, but the cwnd_gain bounds any queue to
903	   (cwnd_gain - 1) * estimated_BDP, which is approximately (2.77 - 1) *
904	   estimated_BDP = 1.77 * estimated_BDP.  The immediately following
905	   Drain state is designed to quickly drain that queue.

907	   During Startup, BBR estimates whether the pipe is full using two
908	   estimators.  The first looks for a plateau in the BBR.max_bw
909	   estimate.  The second looks for packet loss.  The following
910	   subsections discuss these estimators.

912	     BBRCheckStartupDone():
913	       BBRCheckStartupFullBandwidth()
914	       BBRCheckStartupHighLoss()
915	       if (BBR.state == Startup and BBR.filled_pipe)
916	         BBREnterDrain()

918	4.3.1.2.  Exiting Startup Based on Bandwidth Plateau

920	   During Startup, BBR estimates whether the pipe is full by looking for
921	   a plateau in the BBR.max_bw estimate.  The output of this "full pipe"
922	   estimator is tracked in BBR.filled_pipe, a boolean that records
923	   whether BBR estimates that it has ever fully utilized its available
924	   bandwidth ("filled the pipe").  If BBR notices that there are several
925	   (three) rounds where attempts to double the delivery rate actually
926	   result in little increase (less than 25 percent), then it estimates
927	   that it has reached BBR.max_bw, sets BBR.filled_pipe to true, exits
928	   Startup and enters Drain.

930	   Upon connection initialization the full pipe estimator runs:

932	     BBRInitFullPipe():
933	       BBR.filled_pipe = false
934	       BBR.full_bw = 0
935	       BBR.full_bw_count = 0

937	   Once per round trip, upon an ACK that acknowledges new data, and when
938	   the delivery rate sample is not application-limited (see [draft-
939	   cheng-iccrg-delivery-rate-estimation]), BBR runs the "full pipe"
940	   estimator, if needed:

942	     BBRCheckStartupFullBandwidth():
943	       if BBR.filled_pipe or
944	          !BBR.round_start or rs.is_app_limited
945	         return  /* no need to check for a full pipe now */
946	       if (BBR.max_bw >= BBR.full_bw * 1.25)  /* still growing? */
947	         BBR.full_bw = BBR.max_bw    /* record new baseline level */
948	         BBR.full_bw_count = 0
949	         return
950	       BBR.full_bw_count++ /* another round w/o much growth */
951	       if (BBR.full_bw_count >= 3)
952	         BBR.filled_pipe = true

954	   BBR waits three rounds to have solid evidence that the sender is not
955	   detecting a delivery-rate plateau that was temporarily imposed by the
956	   receive window.  Allowing three rounds provides time for the
957	   receiver's receive-window auto-tuning to open up the receive window
958	   and for the BBR sender to realize that BBR.max_bw should be higher:
959	   in the first round the receive-window auto-tuning algorithm grows the
960	   receive window; in the second round the sender fills the higher
961	   receive window; in the third round the sender gets higher delivery-
962	   rate samples.  This three-round threshold was validated by YouTube
963	   experimental data.

965	4.3.1.3.  Exiting Startup Based on Packet Loss

967	   A second method BBR uses for estimating the bottleneck is full is by
968	   looking at sustained packet losses Specifically for a case where the
969	   following criteria are all met:

971	   *  The connection has been in fast recovery for at least one full
972	      round trip.

974	   *  The loss rate over the time scale of a single full round trip
975	      exceeds BBRLossThresh (2%).

977	   *  There are at least BBRStartupFullLossCnt=3 discontiguous sequence
978	      ranges lost in that round trip.

980	   If these criteria are all met, then BBRCheckStartupHighLoss() sets
981	   BBR.filled_pipe = true and exits Startup and enters Drain.

983	   The algorithm waits until all three criteria are met to filter out
984	   noise from burst losses, and to try to ensure the bottleneck is fully
985	   utilized on a sustained basis, and the full bottleneck bandwidth has
986	   been measured, before attempting to drain the level of in-flight data
987	   to the estimated BDP.

989	4.3.2.  Drain

991	   Upon exiting Startup, BBR enters its Drain state.  In Drain, BBR aims
992	   to quickly drain any queue created in Startup by switching to a
993	   pacing_gain well below 1.0, until any estimated queue has been
994	   drained.  It uses a pacing_gain that is the inverse of the value used
995	   during Startup, chosen to try to drain the queue in one round
996	   [BBRDrainPacingGain]:

998	     BBREnterDrain():
999	       BBR.state = Drain
1000	       BBR.pacing_gain = 1/BBRStartupCwndGain  /* pace slowly */
1001	       BBR.cwnd_gain = BBRStartupCwndGain      /* maintain cwnd */

1003	   In Drain, when the amount of data in flight is less than or equal to
1004	   the estimated BDP, meaning BBR estimates that the queue has been
1005	   fully drained, then BBR exits Drain and enters ProbeBW.  To implement
1006	   this, upon every ACK BBR executes:

1008	     BBRCheckDrain():
1009	       if (BBR.state == Drain and packets_in_flight <= BBRInflight(1.0))
1010	         BBREnterProbeBW()  /* BBR estimates the queue was drained */

1012	4.3.3.  ProbeBW

1014	   Long-lived BBR flows tend to spend the vast majority of their time in
1015	   the ProbeBW states.  In the ProbeBW states, a BBR flow sequentially
1016	   accelerates, decelerates, and cruises, to measure the network path,
1017	   improve its operating point (increase throughput and reduce queue
1018	   pressure), and converge toward a more fair allocation of bottleneck
1019	   bandwidth.  To do this, the flow sequentially cycles through all
1020	   three tactics: trying to send faster than, slower than, and at the
1021	   same rate as the network delivery process.  To achieve this, a BBR
1022	   flow in ProbeBW mode cycles through the four Probe bw states - DOWN,
1023	   CRUISE, REFILL, and UP - described below in turn.

1025	4.3.3.1.  ProbeBW_DOWN

1027	   In the ProbeBW_DOWN phase of the cycle, a BBR flow pursues the
1028	   deceleration tactic, to try to send slower than the network is
1029	   delivering data, to reduce the amount of data in flight, with all of
1030	   the standard motivations for the deceleration tactic (discussed in
1031	   "State Machine Tactics", above).  It does this by switching to a
1032	   BBR.pacing_gain of 0.9, sending at 90% of BBR.bw.  The pacing_gain
1033	   value of 0.9 is derived based on the ProbeBW_UP pacing gain of 1.25,
1034	   as the minimum pacing_gain value that allows bandwidth-based
1035	   convergence to approximate fairness.

1037	   Exit conditions: The flow exits this phase and enters CRUISE when the
1038	   flow estimates that both of the following conditions have been met:

1040	   *  There is free headroom: If inflight_hi is set, then BBR remains in
1041	      DOWN at least until the volume of in-flight data is less than or
1042	      equal to BBRHeadroom*BBR.inflight_hi.  The goal of this constraint
1043	      is to ensure that in cases where loss signals suggest an upper
1044	      limit on the volume of in-flight data, then the flow attempts to
1045	      leave some free headroom in the path (e.g. free space in the
1046	      bottleneck buffer or free time slots in the bottleneck link) that
1047	      can be used by cross traffic (both for volume-based convergence of
1048	      bandwidth shares and for burst tolerance).

1050	   *  The volume of in-flight data is less than or equal to BBR.BDP,
1051	      i.e. the flow estimates that it has drained any queue at the
1052	      bottleneck.

1054	4.3.3.2.  ProbeBW_CRUISE

1056	   In the ProbeBW_CRUISE phase of the cycle, a BBR flow pursues the
1057	   "cruising" tactic (discussed in "State Machine Tactics", above),
1058	   attempting to send at the same rate the network is delivering data.
1059	   It tries to match the sending rate to the flow's current available
1060	   bandwidth, to try to achieve high utilization of the available
1061	   bandwidth without increasing queue pressure.  It does this by
1062	   switching to a pacing_gain of 1.0, sending at 100% of BBR.bw.
1063	   Notably, while in this state it responds to concrete congestion
1064	   signals (loss) by reducing BBR.bw_lo and BBR.inflight_lo, because
1065	   these signals suggest that the available bandwidth and deliverable
1066	   volume of in-flight data have likely reduced, and the flow needs to
1067	   change to adapt, slowing down to match the latest delivery process.

1069	   Exit conditions: The connection adaptively holds this state until it
1070	   decides that it is time to probe for bandwidth, at which time it
1071	   enters ProbeBW_REFILL (see "Time Scale for Bandwidth Probing",
1072	   below).

1074	4.3.3.3.  ProbeBW_REFILL

1076	   The goal of the ProbeBW_REFILL state is to "refill the pipe", to try
1077	   to fully utilize the network bottleneck without creating any
1078	   significant queue pressure.

1080	   To do this, BBR first resets the short-term model parameters bw_lo
1081	   and inflight_lo, setting both to "Infinity".  This is the key moment
1082	   in the BBR time scale strategy (see "Time Scale Strategy", above)
1083	   where the flow pivots, discarding its conservative short-term bw_lo
1084	   and inflight_lo parameters and beginning to robustly probe the
1085	   bottleneck's long-term available bandwidth.  During this time bw_hi
1086	   and inflight_hi, if set, constrain the connection.

1088	   During ProbeBW_REFILL BBR uses a BBR.pacing_gain of 1.0, to send at a
1089	   rate that matches the current estimated available bandwidth, for one
1090	   packet-timed round trip.  The goal is to fully utilize the bottleneck
1091	   link before transitioning into ProbeBW_UP and significantly
1092	   increasing the chances of causing a loss signal.  The motivating
1093	   insight is that, as soon as a flow starts acceleration, sending
1094	   faster than the available bandwidth, it will start building a queue
1095	   at the bottleneck.  And if the buffer is shallow enough, then the
1096	   flow can cause loss signals very shortly after the first accelerating
1097	   packets arrive at the bottleneck.  If the flow were to neglect to
1098	   fill the pipe before it causes this loss signal, then these very
1099	   quick signals of excess queue could cause the flow's estimate of the
1100	   path's capacity (i.e. inflight_hi) to significantly underestimate.
1101	   In particular, if the flow were to transition directly from
1102	   ProbeBW_CRUISE to ProbeBW_UP, the volume of in-flight data (at the
1103	   time the first accelerating packets were sent) may often be still
1104	   very close to the volume of in-flight data maintained in CRUISE,
1105	   which may be only BBRHeadroom*inflight_hi.

1107	   Exit conditions: The flow exits ProbeBW_REFILL after one packet-timed
1108	   round trip, and enters UP.  This is because after one full round trip
1109	   of sending in ProbeBW_REFILL the flow (if not application-limited)
1110	   has had an opportunity to place as many packets in flight as its
1111	   BBR.bw estimate permits.  And correspondingly, at this point the flow
1112	   starts to see bandwidth samples reflecting its ProbeBW_REFILL
1113	   behavior, which may be putting too much data in flight.

1115	4.3.3.4.  ProbeBW_UP

1117	   After ProbeBW_REFILL refills the pipe, ProbeBW_UP probes for possible
1118	   increases in available bandwidth by using a BBR.pacing_gain of 1.25,
1119	   sending faster than the current estimated available bandwidth.

1121	   If the flow has not set BBR.inflight_hi or BBR.bw_hi, it tries to
1122	   raise the volume of in-flight data to at least BBR.pacing_gain *
1123	   BBR.bdp = 1.25 * BBR.bdp; note that this may take more than
1124	   BBR.min_rtt if BBR.min_rtt is small (e.g. on a LAN).

1126	   If the flow has set BBR.inflight_hi or BBR.bw_hi, it moves to an
1127	   operating point based on those limits and then gradually increases
1128	   the upper volume bound (BBR.inflight_hi) and rate bound (BBR.bw_hi)
1129	   using the following approach:

1131	   *  bw_hi: The flow raises bw_hi to the latest measured bandwidth
1132	      sample if the latest measured bandwidth sample is above bw_hi and
1133	      the loss rate for the sample is not above the BBRLossThresh.

1135	   *  inflight_hi: The flow raises inflight_hi in ProbeBW_UP in a manner
1136	      that is slow and cautious at first, but increasingly rapid and
1137	      bold over time.  The initial caution is motivated by the fact that
1138	      a given BBR flow may be sharing a shallow buffer with thousands of
1139	      other flows, so that the buffer space available to the flow may be
1140	      quite tight - even just a single packet.  The increasingly rapid
1141	      growth over time is motivated by the fact that in a high-speed WAN
1142	      the increase in available bandwidth (and thus the estimated BDP)
1143	      may require the flow to grow the volume of its inflight data by up
1144	      to O(1,000,000); even a quite typical BDP like 10Gbps * 100ms is
1145	      82,563 packets.  BBR takes an approach where the additive increase
1146	      to BBR.inflight_hi exponentially doubles each round trip; in each
1147	      successive round trip, inflight_hi grows by 1, 2, 4, 8, 16, etc,
1148	      with the increases spread uniformly across the entire round trip.
1149	      This helps allow BBR to utilize a larger BDP in O(log(BDP)) round
1150	      trips, meeting the design goal for scalable utilization of newly-
1151	      available bandwidth.

1153	   Exit conditions: The BBR flow ends ProbeBW_UP bandwidth probing and
1154	   transitions to ProbeBW_DOWN to try to drain the bottleneck queue when
1155	   any of the following conditions are met:

1157	   1.  Estimated queue: The flow has been in ProbeBW_UP for at least
1158	       1*min_rtt, and the estimated queue is high enough that the flow
1159	       judges it has robustly probed for available bandwidth
1160	       (packets_in_flight > 1.25 * BBR.bdp).

1162	   2.  Loss: The current loss rate exceeds BBRLossThresh (2%).

1164	4.3.3.5.  Time Scale for Bandwidth Probing

1166	   Choosing the time scale for probing bandwidth is tied to the question
1167	   of how to coexist with legacy Reno/CUBIC flows, since probing for
1168	   bandwidth runs a significant risk of causing packet loss, and causing
1169	   packet loss can significantly limit the throughput of such legacy
1170	   Reno/CUBIC flows.

1172	4.3.3.5.1.  Bandwidth Probing and Coexistence with Reno/CUBIC

1174	   BBR has an explicit strategy for coexistence with Reno/CUBIC: to try
1175	   to behave in a manner so that Reno/CUBIC flows coexisting with BBR
1176	   can continue to work well in the primary contexts where they do
1177	   today:

1179	   *  Intra-datacenter/LAN traffic: we want Reno/CUBIC to be able to
1180	      perform well in 100M through 40G enterprise and datacenter
1181	      Ethernet
1182	      -  BDP = 40 Gbps * 20 us / (1514 bytes) ~= 66 packets

1184	   *  Public Internet last mile traffic: we want Reno/CUBIC to be able
1185	      to support up to 25Mbps (for 4K Video) at an RTT of 30ms, typical
1186	      parameters for common CDNs for large video services:

1188	      -  BDP = 25Mbps * 30 ms / (1514 bytes) ~= 62 packets

1190	   The challenge in meeting these goals is that Reno/CUBIC need long
1191	   periods of no loss to utilize large BDPs.  The good news is that in
1192	   the environments where Reno/CUBIC work well today (mentioned above),
1193	   the BDPs are small, roughly ~100 packets or less.

1195	4.3.3.5.2.  A Dual-Time-Scale Approach for Coexistence

1197	   The BBR strategy has several aspects:

1199	   1.  The highest priority is to estimate the bandwidth available to
1200	       the BBR flow in question.

1202	   2.  Secondarily, a given BBR flow adapts (within bounds) the
1203	       frequency at which it probes bandwidth and knowingly risks packet
1204	       loss, to allow Reno/CUBIC to reach a bandwidth at least as high
1205	       as that given BBR flow.

1207	   To adapt the frequency of bandwidth probing, BBR considers two time
1208	   scales: a BBR-native time scale, and a bounded Reno-conscious time
1209	   scale:

1211	   *  T_bbr: BBR-native time-scale

1213	      -  T_bbr = uniformly randomly distributed between 2 and 3 secs

1215	   *  T_reno: Reno-coexistence time scale

1217	      -  T_reno_bound = pick_randomly_either({62, 63})

1219	      -  reno_bdp = min(BBR.bdp, cwnd)

1221	      -  T_reno = min(reno_bdp, T_reno_bound) round trips

1223	   *  T_probe: The time between bandwidth probe UP phases:

1225	      -  T_probe = min(T_bbr, T_reno)

1227	   This dual-time-scale approach is similar to that used by CUBIC, which
1228	   has a CUBIC-native time scale given by a cubic curve, and a "Reno
1229	   emulation" module that estimates what cwnd would give the flow Reno-
1230	   equivalent throughput.  At any given moment, CUBIC choose the cwnd
1231	   implied by the more aggressive strategy.

1233	   We randomize both the T_bbr and T_reno parameters, for better mixing
1234	   and fairness convergence.

1236	4.3.3.5.3.  Design Considerations for Choosing Constant Parameters

1238	   We design the maximum wall-clock bounds of BBR-native inter-
1239	   bandwidth-probe wall clock time, T_bbr, to be:

1241	   *  Higher than 2 sec to try to avoid causing loss for a long enough
1242	      time to allow Reno flow with RTT=30ms to get 25Mbps (4K video)
1243	      throughput.  For this workload, given the Reno sawtooth that
1244	      raises cwnd from roughly BDP to 2*BDP, one MSS per round trip, the
1245	      inter-bandwidth-probe time must be at least: BDP * RTT = 25Mbps *
1246	      .030 sec / (1514 bytes) * 0.030 sec = 1.9secs

1248	   *  Lower than 3 sec to ensure flows can start probing in a reasonable
1249	      amount of time to discover unutilized bw on human-scale
1250	      interactive time-scales (e.g. perhaps traffic from a competing web
1251	      page download is now complete).

1253	   The maximum round-trip bounds of the Reno-coexistence time scale,
1254	   T_reno, are chosen to be 62-63 with the following considerations in
1255	   mind:

1257	   *  Choosing a value smaller than roughly 60 would imply that when BBR
1258	      flows coexisted with Reno/CUBIC flows (e.g.  Netflix Reno flows)
1259	      on public Internet broadband links, the Reno/CUBIC flows would not
1260	      be able to achieve enough bandwidth to show 4K video.

1262	   *  Choosing a value larger than roughly 65 would prevent BBR from
1263	      reaching its goal of tolerating 1% loss per round trip.  Given
1264	      that the steady-state (non-bandwidth-probing) BBR response to a
1265	      round trip with X% packet loss is to reduce the sending rate by X%
1266	      (see the "Updating the Model Upon Packet Loss" section), this
1267	      means that the BBR sending rate after N rounds of packet loss at a
1268	      rate loss_rate is (1 - loss_rate)^N.  This means that for a flow
1269	      that encounters 1% loss in 65 round trips of ProbeBW_CRUISE, and
1270	      then doubles its cwnd (back to BBR.inflight_hi) in ProbeBW_REFILL
1271	      and ProbeBW_UP, it will be able to restore and reprobe its
1272	      original sending rate, since: BBW.max_bw * (1 - loss_rate)^N * 2 =
1273	      BBR.max_bw * (1 - .01)^65 ~= 1.04 * BBR.max_bw.  That is, the flow
1274	      will be able to fully respond to packet loss signals in
1275	      ProbeBW_CRUISE while also fully re-measuring its maximum
1276	      achievable throughput in ProbeBW_UP.

1278	   The resulting behavior is that for BBR flows with small BDPs, the
1279	   bandwidth probing will be on roughly the same time scale as Reno/
1280	   CUBIC; flows with large BDPs will intentionally probe more rapidly/
1281	   frequently than Reno/CUBIC would (roughly every 62 round trips for
1282	   low-RTT flows, or 2-3 secs for high-RTT flows).

1284	   The considerations above for timing bandwidth probing can be
1285	   implemented as follows:

1287	     /* Is it time to transition from DOWN or CRUISE to REFILL? */
1288	     BBRCheckTimeToProbeBW():
1289	       if (BBRHasElapsedInPhase(BBR.bw_probe_wait) ||
1290	           BBRIsRenoCoexistenceProbeTime())
1291	         BBRStartProbeBW_REFILL()
1292	         return true
1293	       return false

1295	     /* Randomized decision about how long to wait until
1296	      * probing for bandwidth, using round count and wall clock.
1297	      */
1298	     BBRPickProbeWait():
1299	       /* Decide random round-trip bound for wait: */
1300	       BBR.rounds_since_bw_probe =
1301	         random_int_between(0, 1); /* 0 or 1 */
1302	       /* Decide the random wall clock bound for wait: */
1303	       BBR.bw_probe_wait =
1304	         2sec + random_float_between(0.0, 1.0) /* 0..1 sec */

1306	     BBRIsRenoCoexistenceProbeTime():
1307	       reno_rounds = BBRTargetInflight()
1308	       rounds = min(reno_rounds, 63)
1309	       return BBR.rounds_since_bw_probe >= rounds

1311	     /* How much data do we want in flight?
1312	      * Our estimated BDP, unless congestion cut cwnd. */
1313	     BBRTargetInflight()
1314	       return min(BBR.bdp, cwnd)

1316	4.3.3.6.  ProbeBW Algorithm Details

1318	   BBR's ProbeBW algorithm operates as follows.

1320	   Upon entering ProbeBW, BBR executes:

1322	     BBREnterProbeBW():
1323	       BBRStartProbeBW_DOWN()

1325	   The core logic for entering each state:

1327	     BBRStartProbeBW_DOWN():
1328	       BBRResetCongestionSignals()
1329	       BBR.probe_up_cnt = Infinity /* not growing inflight_hi */
1330	       BBRPickProbeWait()
1331	       BBR.cycle_stamp = Now()  /* start wall clock */
1332	       BBR.ack_phase  = ACKS_PROBE_STOPPING
1333	       BBRStartRound()
1334	       BBR.state = ProbeBW_DOWN

1336	     BBRStartProbeBW_CRUISE():
1337	       BBR.state = ProbeBW_CRUISE

1339	     BBRStartProbeBW_REFILL():
1340	       BBRResetLowerBounds()
1341	       BBR.bw_probe_up_rounds = 0
1342	       BBR.bw_probe_up_acks = 0
1343	       BBR.ack_phase = ACKS_REFILLING
1344	       BBRStartRound()
1345	       BBR.state = ProbeBW_REFILL

1347	     BBRStartProbeBW_UP():
1348	       BBR.ack_phase = ACKS_PROBE_STARTING
1349	       BBRStartRound()
1350	       BBR.cycle_stamp = Now() /* start wall clock */
1351	       BBR.state = ProbeBW_UP
1352	       BBRRaiseInflightHiSlope()

1354	   BBR executes the following BBRUpdateProbeBWCyclePhase() logic on each
1355	   ACK that ACKs or SACKs new data, to advance the ProbeBW state
1356	   machine:

1358	     /* The core state machine logic for ProbeBW: */
1359	     BBRUpdateProbeBWCyclePhase():
1360	       if (!BBR.filled_pipe)
1361	         return  /* only handling steady-state behavior here */
1362	       BBRAdaptUpperBounds()
1363	       if (!IsInAProbeBWState())
1364	         return /* only handling ProbeBW states here: */

1366	       switch (state)

1368	       ProbeBW_DOWN:
1369	         if (BBRCheckTimeToProbeBW())
1370	           return /* already decided state transition */
1371	         if (BBRCheckTimeToCruise())
1372	           BBRStartProbeBW_CRUISE()

1374	       ProbeBW_CRUISE:
1375	         if (BBRCheckTimeToProbeBW())
1376	           return /* already decided state transition */

1378	       ProbeBW_REFILL:
1379	         /* After one round of REFILL, start UP */
1380	         if (BBR.round_start)
1381	           BBR.bw_probe_samples = 1
1382	           BBRStartProbeBW_UP()

1384	       ProbeBW_UP:
1385	         if (BBRHasElapsedInPhase(BBR.min_rtt) and
1386	             inflight > BBRInflight(BBR.max_bw, 1.25))
1387	           BBRStartProbeBW_DOWN()

1389	   The ancillary logic to implement the ProbeBW state machine:

1391	     IsInAProbeBWState()
1392	       state = BBR.state
1393	       return (state == ProbeBW_DOWN or
1394	               state == ProbeBW_CRUISE or
1395	               state == ProbeBW_REFILL or
1396	               state == ProbeBW_UP)

1398	     /* Time to transition from DOWN to CRUISE? */
1399	     BBRCheckTimeToCruise():
1400	       if (inflight > BBRInflightWithHeadroom())
1401	         return false /* not enough headroom */
1402	       if (inflight <= BBRInflight(BBR.max_bw, 1.0))
1403	         return true  /* inflight <= estimated BDP */

1405	     BBRHasElapsedInPhase(interval):

1407	       return Now() > BBR.cycle_stamp + interval

1409	     /* Return a volume of data that tries to leave free
1410	      * headroom in the bottleneck buffer or link for
1411	      * other flows, for fairness convergence and lower
1412	      * RTTs and loss */
1413	     BBRInflightWithHeadroom():
1414	       if (BBR.inflight_hi == Infinity)
1415	         return Infinity
1416	       headroom = max(1, BBRHeadroom * BBR.inflight_hi)
1417	         return max(BBR.inflight_hi - headroom,
1418	                    BBRMinPipeCwnd)

1420	     /* Raise inflight_hi slope if appropriate. */
1421	     BBRRaiseInflightHiSlope():
1422	       growth_this_round = 1MSS << BBR.bw_probe_up_rounds
1423	       BBR.bw_probe_up_rounds = min(BBR.bw_probe_up_rounds + 1, 30)
1424	       BBR.probe_up_cnt = max(cwnd / growth_this_round, 1)

1426	     /* Increase inflight_hi if appropriate. */
1427	     BBRProbeInflightHiUpward():
1428	       if (!is_cwnd_limited or cwnd < BBR.inflight_hi)
1429	         return  /* not fully using inflight_hi, so don't grow it */
1430	      BBR.bw_probe_up_acks += rs.newly_acked
1431	      if (BBR.bw_probe_up_acks >= BBR.probe_up_cnt)
1432	        delta = BBR.bw_probe_up_acks / BBR.probe_up_cnt
1433	        BBR.bw_probe_up_acks -= delta * BBR.bw_probe_up_cnt
1434	        BBR.inflight_hi += delta
1435	      if (BBR.round_start)
1436	        BBRRaiseInflightHiSlope()

1438	     /* Track ACK state and update BBR.max_bw window and
1439	      * BBR.inflight_hi and BBR.bw_hi. */
1440	     BBRAdaptUpperBounds():
1441	       if (BBR.ack_phase == ACKS_PROBE_STARTING and BBR.round_start)
1442	         /* starting to get bw probing samples */
1443	         BBR.ack_phase = ACKS_PROBE_FEEDBACK
1444	       if (BBR.ack_phase == ACKS_PROBE_STOPPING and BBR.round_start)
1445	         /* end of samples from bw probing phase */
1446	         if (IsInAProbeBWState() and !rs.is_app_limited)
1447	           BBRAdvanceMaxBwFilter()

1449	       if (!CheckInflightTooHigh())
1450	         /* Loss rate is safe. Adjust upper bounds upward. */
1451	         if (BBR.inflight_hi == Infinity or BBR.bw_hi == Infinity)
1452	           return /* no upper bounds to raise */
1453	         if (rs.tx_in_flight > BBR.inflight_hi)
1454	           BBR.inflight_hi = rs.tx_in_flight

1456	         if (rs.delivery_rate > BBR.bw_hi)
1457	           BBR.bw_hi = rs.bw
1458	         if (BBR.state == ProbeBW_UP)
1459	           BBRProbeInflightHiUpward()

1461	4.3.4.  ProbeRTT

1463	4.3.4.1.  ProbeRTT Overview

1465	   To help probe for BBR.min_rtt, on an as-needed basis BBR flows enter
1466	   the ProbeRTT state to try to cooperate to periodically drain the
1467	   bottleneck queue - and thus improve their BBR.min_rtt estimate of the
1468	   unloaded two-way propagation delay.

1470	   A critical point is that before BBR raises its BBR.min_rtt estimate
1471	   (which would in turn raise its maximum permissible cwnd), it first
1472	   enters ProbeRTT to try to make a concerted and coordinated effort to
1473	   drain the bottleneck queue and make a robust BBR.min_rtt measurement.
1474	   This allows the BBR.min_rtt estimates of ensembles of BBR flows to
1475	   converge avoiding feedback loops of ever-increasing queues and RTT
1476	   samples.

1478	   The ProbeRTT state works in concert with BBR.min_rtt estimation.  Up
1479	   to once every ProbeRTTInterval = 5 seconds, the flow enters ProbeRTT,
1480	   decelerating by setting its cwnd_gain to BBRProbeRTTCwndGain = 0.5 to
1481	   reduce its volume of inflight data to half of its estimated BDP, to
1482	   try to allow the flow to measure the unloaded two-way propagation
1483	   delay.

1485	   There are two main motivations for making the MinRTTFilterLen roughly
1486	   twice the ProbeRTTInterval.  First, this ensures that during a
1487	   ProbeRTT episode the flow will "remember" the BBR.min_rtt value it
1488	   measured during the previous ProbeRTT episode, providing a robust bdp
1489	   estimate for the cwnd = 0.5*bdp calculation, increasing the
1490	   likelihood of fully draining the bottleneck queue.  Second, this
1491	   allows the flow's BBR.min_rtt filter window to generally include RTT
1492	   samples from two ProbeTT episodes, providing a more robust estimate.

1494	   The algorithm for ProbeRTT is as follows:

1496	   Entry conditions: In any state other than ProbeRTT itself, if the
1497	   BBR.probe_rtt_min_delay estimate has not been updated (i.e., by
1498	   getting a lower RTT measurement) for more than ProbeRTTInterval = 5
1499	   seconds, then BBR enters ProbeRTT and reduces the BBR.cwnd_gain to
1500	   BBRProbeRTTCwndGain = 0.5.

1502	   Exit conditions: After maintaining the volume of in-flight data at
1503	   BBRProbeRTTCwndGain*BBR.bdp for at least ProbeRTTDuration (200 ms)
1504	   and at least one round trip, BBR leaves ProbeRTT and transitions to
1505	   ProbeBW if it estimates the pipe was filled already, or Startup
1506	   otherwise.

1508	4.3.4.2.  ProbeRTT Design Rationale

1510	   BBR is designed to have ProbeRTT sacrifice no more than roughly 2% of
1511	   a flow's available bandwidth.  It is also designed to spend the vast
1512	   majority of its time (at least roughly 96 percent) in ProbeBW and the
1513	   rest in ProbeRTT, based on a set of tradeoffs.  ProbeRTT lasts long
1514	   enough (at least ProbeRTTDuration = 200 ms) to allow flows with
1515	   different RTTs to have overlapping ProbeRTT states, while still being
1516	   short enough to bound the throughput penalty of ProbeRTT's cwnd
1517	   capping to roughly 2%, with the average throughput targeted at:

1519	     throughput = (200ms*0.5*BBR.bw + (5s - 200ms)*BBR.bw) / 5s
1520	                = (.1s + 4.8s)/5s * BBR.bw = 0.98 * BBR.bw

1522	   As discussed above, BBR's BBR.min_rtt filter window, MinRTTFilterLen,
1523	   and time interval between ProbeRTT states, ProbeRTTInterval, work in
1524	   concert.  BBR uses a MinRTTFilterLen equal to or longer than
1525	   ProbeRTTInterval to allow the filter window to include at least one
1526	   ProbeRTT.

1528	   To allow coordination with other BBR flows, each flow MUST use the
1529	   standard ProbeRTTInterval of 5 secs.

1531	   An ProbeRTTInterval of 5 secs is short enough to allow quick
1532	   convergence if traffic levels or routes change, but long enough so
1533	   that interactive applications (e.g., Web, remote procedure calls,
1534	   video chunks) often have natural silences or low-rate periods within
1535	   the window where the flow's rate is low enough for long enough to
1536	   drain its queue in the bottleneck.  Then the BBR.probe_rtt_min_delay
1537	   filter opportunistically picks up these measurements, and the
1538	   BBR.probe_rtt_min_delay estimate refreshes without requiring
1539	   ProbeRTT.  This way, flows typically need only pay the 2 percent
1540	   throughput penalty if there are multiple bulk flows busy sending over
1541	   the entire ProbeRTTInterval window.

1543	   As an optimization, when restarting from idle and finding that the
1544	   BBR.probe_rtt_min_delay has expired, BBR does not enter ProbeRTT; the
1545	   idleness is deemed a sufficient attempt to coordinate to drain the
1546	   queue.

1548	4.3.4.3.  Calculating the rs.rtt RTT Sample

1550	   Upon transmitting each packet, BBR (or the associated transport
1551	   protocol) stores in per-packet data the wall-clock scheduled
1552	   transmission time of the packet in packet.departure_time (see the
1553	   "Pacing Rate: BBR.pacing_rate" section for how this is calculated).

1555	   For every ACK that newly acknowledges some data (whether cumulatively
1556	   or selectively), the sender's BBR implementation (or the associated
1557	   transport protocol implementation) attempts to calculate an RTT
1558	   sample.  The sender MUST consider any potential retransmission
1559	   ambiguities that can arise in some transport protocols.  If some of
1560	   the acknowledged data was not retransmitted, or some of the data was
1561	   retransmitted but the sender can still unambiguously determine the
1562	   RTT of the data (e.g. if the transport supports [RFC7323] TCP
1563	   timestamps or an equivalent mechanism), then the sender calculates an
1564	   RTT sample, rs.rtt, as follows:

1566	     rs.rtt = Now() - packet.departure_time

1568	4.3.4.4.  ProbeRTT Logic

1570	   On every ACK BBR executes BBRUpdateMinRTT() to update its ProbeRTT
1571	   scheduling state (BBR.probe_rtt_min_delay and
1572	   BBR.probe_rtt_min_stamp) and its BBR.min_rtt estimate:

1574	     BBRUpdateMinRTT()
1575	       BBR.probe_rtt_expired =
1576	         Now() > BBR.probe_rtt_min_stamp + ProbeRTTInterval
1577	       if (rs.rtt >= 0 and
1578	           (rs.rtt < BBR.probe_rtt_min_delay or
1579	            BBR.probe_rtt_expired))
1580	          BBR.probe_rtt_min_delay = rs.rtt
1581	          BBR.probe_rtt_min_stamp = Now()

1583	       min_rtt_expired =
1584	         Now() > BBR.min_rtt_stamp + MinRTTFilterLen
1585	       if (BBR.probe_rtt_min_delay < BBR.min_rtt or
1586	           min_rtt_expired)
1587	         BBR.min_rtt       = BBR.probe_rtt_min_delay
1588	         BBR.min_rtt_stamp = BBR.probe_rtt_min_stamp

1590	   Here BBR.probe_rtt_expired is a boolean recording whether the
1591	   BBR.probe_rtt_min_delay has expired and is due for a refresh, via
1592	   either an application idle period or a transition into ProbeRTT
1593	   state.

1595	   On every ACK BBR executes BBRCheckProbeRTT() to handle the steps
1596	   related to the ProbeRTT state as follows:

1598	     BBRCheckProbeRTT():
1599	       if (BBR.state != ProbeRTT and
1600	           BBR.probe_rtt_expired and
1601	           not BBR.idle_restart)
1602	         BBREnterProbeRTT()
1603	         BBRSaveCwnd()
1604	         BBR.probe_rtt_done_stamp = 0
1605	         BBR.ack_phase = ACKS_PROBE_STOPPING
1606	         BBRStartRound()
1607	       if (BBR.state == ProbeRTT)
1608	         BBRHandleProbeRTT()
1609	       if (rs.delivered > 0)
1610	         BBR.idle_restart = false

1612	     BBREnterProbeRTT():
1613	       BBR.state = ProbeRTT
1614	       BBR.pacing_gain = 1
1615	       BBR.cwnd_gain = BBRProbeRTTCwndGain  /* 0.5 */

1617	     BBRHandleProbeRTT():
1618	       /* Ignore low rate samples during ProbeRTT: */
1619	       MarkConnectionAppLimited()
1620	       if (BBR.probe_rtt_done_stamp == 0 and
1621	           packets_in_flight <= BBRProbeRTTCwnd())
1622	         /* Wait for at least ProbeRTTDuration to elapse: */
1623	         BBR.probe_rtt_done_stamp =
1624	           Now() + ProbeRTTDuration
1625	         /* Wait for at least one round to elapse: */
1626	         BBR.probe_rtt_round_done = false
1627	         BBRStartRound()
1628	       else if (BBR.probe_rtt_done_stamp != 0)
1629	         if (BBR.round_start)
1630	           BBR.probe_rtt_round_done = true
1631	         if (BBR.probe_rtt_round_done)
1632	           BBRCheckProbeRTTDone()

1634	     BBRCheckProbeRTTDone():
1635	       if (BBR.probe_rtt_done_stamp != 0 and
1636	           Now() > BBR.probe_rtt_done_stamp)
1637	         /* schedule next ProbeRTT: */
1638	         BBR.probe_rtt_min_stamp = Now()
1639	         BBRRestoreCwnd()
1640	         BBRExitProbeRTT()

1642	     MarkConnectionAppLimited():
1643	       C.app_limited =
1644	         (C.delivered + packets_in_flight) ? : 1

1646	4.3.4.5.  Exiting ProbeRTT

1648	   When exiting ProbeRTT, BBR transitions to ProbeBW if it estimates the
1649	   pipe was filled already, or Startup otherwise.

1651	   When transitioning out of ProbeRTT, BBR calls BBRResetLowerBounds()
1652	   to reset the lower bounds, since any congestion encountered in
1653	   ProbeRTT may have pulled the short-term model far below the capacity
1654	   of the path.

1656	   But the algorithm is cautious in timing the next bandwidth probe:
1657	   raising inflight after ProbeRTT may cause loss, so the algorithm
1658	   resets the bandwidth-probing clock by starting the cycle at
1659	   ProbeBW_DOWN().  But then as an optimization, since the connection is
1660	   exiting ProbeRTT, we know that infligh is already below the estimated
1661	   BDP, so the connection can proceed immediately to ProbeBW_CRUISE.

1663	   To summarize, the logic for exiting ProbeRTT is as follows:

1665	     BBRExitProbeRTT():
1666	       BBRResetLowerBounds()
1667	       if (BBR.filled_pipe)
1668	         BBRStartProbeBW_DOWN()
1669	         BBRStartProbeBW_CRUISE()
1670	       else
1671	         BBREnterStartup()

1673	4.4.  Restarting From Idle

1675	4.4.1.  Setting Pacing Rate in ProbeBW

1677	   When restarting from idle in ProbeBW states, BBR leaves its cwnd as-
1678	   is and paces packets at exactly BBR.bw, aiming to return as quickly
1679	   as possible to its target operating point of rate balance and a full
1680	   pipe.  Specifically, if the flow's BBR.state is ProbeBW, and the flow
1681	   is application-limited, and there are no packets in flight currently,
1682	   then at the moment the flow sends one or more packets BBR sets
1683	   BBR.pacing_rate to exactly BBR.bw.  More precisely, the BBR algorithm
1684	   takes the following steps in BBRHandleRestartFromIdle() before
1685	   sending a packet for a flow.

1687	   The "Restarting Idle Connections" section of [RFC5681] suggests
1688	   restarting from idle by slow-starting from the initial window.
1689	   However, this approach was assuming a congestion control algorithm
1690	   that had no estimate of the bottleneck bandwidth and no pacing, and
1691	   thus resorted to relying on slow-starting driven by an ACK clock.
1692	   The long (log_2(BDP)*RTT) delays required to reach full utilization
1693	   with that "slow start after idle" approach caused many large
1694	   deployments to disable this mechanism, resulting in a "BDP-scale
1695	   line-rate burst" approach instead.  Instead of these two approaches,
1696	   BBR restarts by pacing at BBR.bw, typically achieving approximate
1697	   rate balance and a full pipe after only one BBR.min_rtt has elapsed.

1699	4.4.2.  Checking for ProberRTT Completion

1701	   As an optimization, when restarting from idle BBR checks to see if
1702	   the connection is in ProbeRTT and has met the exit conditions for
1703	   ProbeRTT.  If a connection goes idle during ProbeRTT then often it
1704	   will have met those exit conditions by the time it restarts, so that
1705	   the connection can restore the cwnd to its full value before it
1706	   starts transmitting a new flight of data.

1708	4.4.3.  Logic

1710	   The BBR algorithm takes the following steps in
1711	   BBRHandleRestartFromIdle() before sending a packet for a flow:

1713	     BBRHandleRestartFromIdle():
1714	       if (packets_in_flight == 0 and C.app_limited)
1715	         BBR.idle_restart = true
1716	            BBR.extra_acked_interval_start = Now()
1717	         if (IsInAProbeBWState())
1718	           BBRSetPacingRateWithGain(1)
1719	         else if (BBR.state == ProbeRTT)
1720	           BBRCheckProbeRTTDone()

1722	4.5.  Updating Network Path Model Parameters

1724	   BBR is a model-based congestion control algorithm: it is based on an
1725	   explicit model of the network path over which a transport flow
1726	   travels.  The following is a summary of each parameter, including its
1727	   meaning and how the algorithm calculates and uses its value.  We can
1728	   group the parameter into three groups:

1730	   *  core state machine parameters

1732	   *  parameters to model the data rate

1734	   *  parameters to model the volume of in-flight data

1736	4.5.1.  BBR.round_count: Tracking Packet-Timed Round Trips

1738	   Several aspects of the BBR algorithm depend on counting the progress
1739	   of "packet-timed" round trips, which start at the transmission of
1740	   some segment, and then end at the acknowledgement of that segment.
1741	   BBR.round_count is a count of the number of these "packet-timed"
1742	   round trips elapsed so far.  BBR uses this virtual BBR.round_count
1743	   because it is more robust than using wall clock time.  In particular,
1744	   arbitrary intervals of wall clock time can elapse due to application
1745	   idleness, variations in RTTs, or timer delays for retransmission
1746	   timeouts, causing wall-clock-timed model parameter estimates to "time
1747	   out" or to be "forgotten" too quickly to provide robustness.

1749	   BBR counts packet-timed round trips by recording state about a
1750	   sentinel packet, and waiting for an ACK of any data packet that was
1751	   sent after that sentinel packet, using the following pseudocode:

1753	   Upon connection initialization:

1755	     BBRInitRoundCounting():
1756	       BBR.next_round_delivered = 0
1757	       BBR.round_start = false
1758	       BBR.round_count = 0

1760	   Upon sending each packet, the rate estimation algorithm [draft-cheng-
1761	   iccrg-delivery-rate-estimation] records the amount of data thus far
1762	   acknowledged as delivered:

1764	     packet.delivered = C.delivered

1766	   Upon receiving an ACK for a given data packet, the rate estimation
1767	   algorithm [draft-cheng-iccrg-delivery-rate-estimation] updates the
1768	   amount of data thus far acknowledged as delivered:

1770	       C.delivered += packet.size

1772	   Upon receiving an ACK for a given data packet, the BBR algorithm
1773	   first executes the following logic to see if a round trip has
1774	   elapsed, and if so, increment the count of such round trips elapsed:

1776	     BBRUpdateRound():
1777	       if (packet.delivered >= BBR.next_round_delivered)
1778	         BBRStartRound()
1779	         BBR.round_count++
1780	         BBR.rounds_since_probe++
1781	         BBR.round_start = true
1782	       else
1783	         BBR.round_start = false

1785	     BBRStartRound():
1786	       BBR.next_round_delivered = C.delivered

1788	4.5.2.  BBR.max_bw: Estimated Maximum Bandwidth

1790	   BBR.max_bw is BBR's estimate of the maximum bottleneck bandwidth
1791	   available to data transmissions for the transport flow.  At any time,
1792	   a transport connection's data transmissions experience some slowest
1793	   link or bottleneck.  The bottleneck's delivery rate determines the
1794	   connection's maximum data-delivery rate.  BBR tries to closely match
1795	   its sending rate to this bottleneck delivery rate to help seek "rate
1796	   balance", where the flow's packet arrival rate at the bottleneck
1797	   equals the departure rate.  The bottleneck rate varies over the life
1798	   of a connection, so BBR continually estimates BBR.max_bw using recent
1799	   signals.

1801	4.5.2.1.  Delivery Rate Samples for Estimating BBR.max_bw

1803	   Since calculating delivery rate samples is subtle, and the samples
1804	   are useful independent of congestion control, the approach BBR uses
1805	   for measuring each single delivery rate sample is specified in a
1806	   separate Internet Draft [draft-cheng-iccrg-delivery-rate-estimation].

1808	4.5.2.2.  BBR.max_bw Max Filter

1810	   Delivery rate samples are often below the typical bottleneck
1811	   bandwidth available to the flow, due to "noise" introduced by random
1812	   variation in physical transmission processes (e.g. radio link layer
1813	   noise) or queues or along the network path.  To filter these effects
1814	   BBR uses a max filter: BBR estimates BBR.max_bw using the windowed
1815	   maximum recent delivery rate sample seen by the connection over
1816	   recent history.

1818	   The BBR.max_bw max filter window covers a time period extending over
1819	   the past two ProbeBW cycles.  The BBR.max_bw max filter window length
1820	   is driven by trade-offs among several considerations:

1822	   *  It is long enough to cover at least one entire ProbeBW cycle (see
1823	      the "ProbeBW" section).  This ensures that the window contains at
1824	      least some delivery rate samples that are the result of data
1825	      transmitted with a super-unity pacing_gain (a pacing_gain larger
1826	      than 1.0).  Such super-unity delivery rate samples are
1827	      instrumental in revealing the path's underlying available
1828	      bandwidth even when there is noise from delivery rate shortfalls
1829	      due to aggregation delays, queuing delays from variable cross-
1830	      traffic, lossy link layers with uncorrected losses, or short-term
1831	      buffer exhaustion (e.g., brief coincident bursts in a shallow
1832	      buffer).

1834	   *  It aims to be long enough to cover short-term fluctuations in the
1835	      network's delivery rate due to the aforementioned sources of
1836	      noise.  In particular, the delivery rate for radio link layers
1837	      (e.g., wifi and cellular technologies) can be highly variable, and
1838	      the filter window needs to be long enough to remember "good"
1839	      delivery rate samples in order to be robust to such variations.

1841	   *  It aims to be short enough to respond in a timely manner to
1842	      sustained reductions in the bandwidth available to a flow, whether
1843	      this is because other flows are using a larger share of the
1844	      bottleneck, or the bottleneck link service rate has reduced due to
1845	      layer 1 or layer 2 changes, policy changes, or routing changes.
1846	      In any of these cases, existing BBR flows traversing the
1847	      bottleneck should, in a timely manner, reduce their BBR.max_bw
1848	      estimates and thus pacing rate and in-flight data, in order to
1849	      match the sending behavior to the new available bandwidth.

1851	4.5.2.3.  BBR.max_bw and Application-limited Delivery Rate Samples

1853	   Transmissions can be application-limited, meaning the transmission
1854	   rate is limited by the application rather than the congestion control
1855	   algorithm.  This is quite common because of request/response traffic.
1856	   When there is a transmission opportunity but no data to send, the
1857	   delivery rate sampler marks the corresponding bandwidth sample(s) as
1858	   application-limited [draft-cheng-iccrg-delivery-rate-estimation].
1859	   The BBR.max_bw estimator carefully decides which samples to include
1860	   in the bandwidth model to ensure that BBR.max_bw reflects network
1861	   limits, not application limits.  By default, the estimator discards
1862	   application-limited samples, since by definition they reflect
1863	   application limits.  However, the estimator does use application-
1864	   limited samples if the measured delivery rate happens to be larger
1865	   than the current BBR.max_bw estimate, since this indicates the
1866	   current BBR.Max_bw estimate is too low.

1868	4.5.2.4.  Updating the BBR.max_bw Max Filter

1870	   For every ACK that acknowledges some data packets as delivered, BBR
1871	   invokes BBRUpdateMaxBw() to update the BBR.max_bw estimator as
1872	   follows (here rs.delivery_rate is the delivery rate sample obtained
1873	   from the ACK that is being processed, as specified in [draft-cheng-
1874	   iccrg-delivery-rate-estimation]):

1876	     BBRUpdateMaxBw()
1877	       BBRUpdateRound()
1878	       if (rs.delivery_rate >= BBR.max_bw || !rs.is_app_limited)
1879	           BBR.max_bw = update_windowed_max_filter(
1880	                         filter=BBR.MaxBwFilter,
1881	                         value=rs.delivery_rate,
1882	                         time=BBR.cycle_count,
1883	                         window_length=MaxBwFilterLen)

1885	4.5.2.5.  Tracking Time for the BBR.max_bw Max Filter

1887	   BBR tracks time for the BBR.max_bw filter window using a virtual
1888	   (non-wall-clock) time tracked by counting the cyclical progression
1889	   through ProbeBW cycles.  Each time through the Probe bw cycle, one
1890	   round trip after exiting ProbeBW_UP (the point at which the flow has
1891	   its best chance to measure the highest throughput of the cycle), BBR
1892	   increments BBR.cycle_count, the virtual time used by the BBR.max_bw
1893	   filter window.  Note that BBR.cycle_count only needs to be tracked
1894	   with a single bit, since the BBR.max_bw filter only needs to track
1895	   samples from two time slots: the previous ProbeBW cycle and the
1896	   current ProbeBW cycle:

1898	     BBRAdvanceMaxBwFilter():
1899	       BBR.cycle_count++

1901	4.5.3.  BBR.min_rtt: Estimated Minimum Round-Trip Time

1903	   BBR.min_rtt is BBR's estimate of the round-trip propagation delay of
1904	   the path over which a transport connection is sending.  The path's
1905	   round-trip propagation delay determines the minimum amount of time
1906	   over which the connection must be willing to sustain transmissions at
1907	   the BBR.bw rate, and thus the minimum amount of data needed in-
1908	   flight, for the connection to reach full utilization (a "Full Pipe").
1909	   The round-trip propagation delay can vary over the life of a
1910	   connection, so BBR continually estimates BBR.min_rtt using recent
1911	   round-trip delay samples.

1913	4.5.3.1.  Round-Trip Time Samples for Estimating BBR.min_rtt

1915	   For every data packet a connection sends, BBR calculates an RTT
1916	   sample that measures the time interval from sending a data packet
1917	   until that packet is acknowledged.

1919	   For the most part, the same considerations and mechanisms that apply
1920	   to RTT estimation for the purposes of retransmission timeout
1921	   calculations [RFC6298] apply to BBR RTT samples.  Namely, BBR does
1922	   not use RTT samples based on the transmission time of retransmitted
1923	   packets, since these are ambiguous, and thus unreliable.  Also, BBR
1924	   calculates RTT samples using both cumulative and selective
1925	   acknowledgments (if the transport supports [RFC2018] SACK options or
1926	   an equivalent mechanism), or transport-layer timestamps (if the
1927	   transport supports [RFC7323] TCP timestamps or an equivalent
1928	   mechanism).

1930	   The only divergence from RTT estimation for retransmission timeouts
1931	   is in the case where a given acknowledgment ACKs more than one data
1932	   packet.  In order to be conservative and schedule long timeouts to
1933	   avoid spurious retransmissions, the maximum among such potential RTT
1934	   samples is typically used for computing retransmission timeouts;
1935	   i.e., SRTT is typically calculated using the data packet with the
1936	   earliest transmission time.  By contrast, in order for BBR to try to
1937	   reach the minimum amount of data in flight to fill the pipe, BBR uses
1938	   the minimum among such potential RTT samples; i.e., BBR calculates
1939	   the RTT using the data packet with the latest transmission time.

1941	4.5.3.2.  BBR.min_rtt Min Filter

1943	   RTT samples tend to be above the round-trip propagation delay of the
1944	   path, due to "noise" introduced by random variation in physical
1945	   transmission processes (e.g. radio link layer noise), queues along
1946	   the network path, the receiver's delayed ack strategy, ack
1947	   aggregation, etc.  Thus to filter out these effects BBR uses a min
1948	   filter: BBR estimates BBR.min_rtt using the minimum recent RTT sample
1949	   seen by the connection over that past MinRTTFilterLen seconds.  (Many
1950	   of the same network effects that can decrease delivery rate
1951	   measurements can increase RTT samples, which is why BBR's min-
1952	   filtering approach for RTTs is the complement of its max-filtering
1953	   approach for delivery rates.)

1955	   The length of the BBR.min_rtt min filter window is MinRTTFilterLen =
1956	   10 secs.  This is driven by trade-offs among several considerations:

1958	   *  The MinRTTFilterLen is longer than ProbeRTTInterval, so that it
1959	      covers an entire ProbeRTT cycle (see the "ProbeRTT" section
1960	      below).  This helps ensure that the window can contain RTT samples
1961	      that are the result of data transmitted with inflight below the
1962	      estimated BDP of the flow.  Such RTT samples are important for
1963	      helping to reveal the path's underlying two-way propagation delay
1964	      even when the aforementioned "noise" effects can often obscure it.

1966	   *  The MinRTTFilterLen aims to be long enough to avoid needing to cut
1967	      in-flight and throughput often.  Measuring two-way propagation
1968	      delay requires in-flight to be at or below BDP, which risks some
1969	      amount of underutilization, so BBR uses a filter window long
1970	      enough that such underutilization events can be rare.

1972	   *  The MinRTTFilterLen aims to be long enough that many applications
1973	      have a "natural" moment of silence or low utilization that can cut
1974	      in-flight below BDP and naturally serve to refresh the
1975	      BBR.min_rtt, without requiring BBR to force an artificial cut in
1976	      in-flight.  This applies to many popular applications, including
1977	      Web, RPC, chunked audio or video traffic.

1979	   *  The MinRTTFilterLen aims to be short enough to respond in a timely
1980	      manner to real increases in the two-way propagation delay of the
1981	      path, e.g. due to route changes, which are expected to typically
1982	      happen on longer time scales.

1984	   A BBR implementation MAY use a generic windowed min filter to track
1985	   BBR.min_rtt.  However, a significant savings in space and improvement
1986	   in freshness can be achieved by integrating the BBR.min_rtt
1987	   estimation into the ProbeRTT state machine, so this document
1988	   discusses that approach in the ProbeRTT section.

1990	4.5.4.  BBR.offload_budget

1992	   BBR.offload_budget is the estimate of the minimum volume of data
1993	   necessary to achieve full throughput using sender (TSO/GSO) and
1994	   receiver (LRO, GRO) host offload mechanisms, computed as follows:

1996	       BBRUpdateOffloadBudget():
1997	         BBR.offload_budget = 3 * BBR.send_quantum

1999	   The factor of 3 is chosen to allow maintaining at least:

2001	   *  1 quantum in the sending host's queuing discipline layer

2003	   *  1 quantum being segmented in the sending host TSO/GSO engine

2005	   *  1 quantum being reassembled or otherwise remaining unacknowledged
2006	      due to the receiver host's LRO/GRO/delayed-ACK engine

2008	4.5.5.  BBR.extra_acked

2010	   BBR.extra_acked is a volume of data that is the estimate of the
2011	   recent degree of aggregation in the network path.  For each ACK, the
2012	   algorithm computes a sample of the estimated extra ACKed data beyond
2013	   the amount of data that the sender expected to be ACKed over the
2014	   timescale of a round-trip, given the BBR.bw.  Then it computes
2015	   BBR.extra_acked as the windowed maximum sample over the last
2016	   BBRExtraAckedFilterLen=10 packet-timed round-trips.  If the ACK rate
2017	   falls below the expected bandwidth, then the algorithm estimates an
2018	   aggregation episode has terminated, and resets the sampling interval
2019	   to start from the current time.

2021	   The BBR.extra_acked thus reflects the recently-measured magnitude of
2022	   data and ACK aggregation effects such as batching and slotting at
2023	   shared-medium L2 hops (wifi, cellular, DOCSIS), as well as end-host
2024	   offload mechanisms (TSO, GSO, LRO, GRO), and end host or middlebox
2025	   ACK decimation/thinning.

2027	   BBR augments its cwnd by BBR.extra_acked to allow the connection to
2028	   keep sending during inter-ACK silences, to an extent that matches the
2029	   recently measured degree of aggregation.

2031	   More precisely, this is computed as:

2033	     BBRUpdateACKAggregation():
2034	       /* Find excess ACKed beyond expected amount over this interval */
2035	       interval = (Now() - BBR.extra_acked_interval_start)
2036	       expected_delivered = BBR.bw * interval
2037	       /* Reset interval if ACK rate is below expected rate: */
2038	       if (BBR.extra_acked_delivered <= expected_delivered)
2039	           BBR.extra_acked_delivered = 0
2040	           BBR.extra_acked_interval_start = Now()
2041	           expected_delivered = 0
2042	       BBR.extra_acked_delivered += rs.newly_acked
2043	       extra = BBR.extra_acked_delivered - expected_delivered
2044	       extra = min(extra, cwnd)
2045	       BBR.extra_acked =
2046	         update_windowed_max_filter(
2047	           filter=BBR.ExtraACKedFilter,
2048	           value=extra,
2049	           time=BBR.round_count,
2050	           window_length=BBRExtraAckedFilterLen)

2052	4.5.6.  Updating the Model Upon Packet Loss

2054	   In every state, BBR responds to (filtered) congestion signals,
2055	   including loss.  The response to those congestion signals depends on
2056	   the flow's current state, since the information that the flow can
2057	   infer depends on what the flow was doing when the flow experienced
2058	   the signal.

2060	4.5.6.1.  Probing for Bandwidth In Startup

2062	   In Startup, if the congestion signals meet the Startup exit criteria,
2063	   the flow exits Startup and enters Drain.

2065	4.5.6.2.  Probing for Bandwidth In ProbeBW

2067	   BBR searches for the maximum volume of data that can be sensibly
2068	   placed in-flight in the network.  A key precondition is that the flow
2069	   is actually trying robustly to find that operating point.  To
2070	   implement this, when a flow is in ProbeBW, and an ACK covers data
2071	   sent in one of the accelerating phases (REFILL or UP), and the ACK
2072	   indicates that the loss rate over the past round trip exceeds the
2073	   queue pressure objective, and the flow is not application limited,
2074	   and has not yet responded to congestion signals from the most recent
2075	   REFILL or UP phase, then the flow estimates that the volume of data
2076	   it allowed in flight exceeded what matches the current delivery
2077	   process on the path, and reduces BBR.inflight_hi:

2079	     /* Do loss signals suggest inflight is too high?
2080	      * If so, react. */
2081	     CheckInflightTooHigh():
2082	       if (IsInflightTooHigh(rs))
2083	         if (BBR.bw_probe_samples)
2084	           BBRHandleInflightTooHigh()
2085	         return true  /* inflight too high */
2086	       else
2087	         return false /* inflight not too high */

2089	     IsInflightTooHigh():
2090	       return (rs.lost > rs.tx_in_flight * BBRLossThresh)

2092	     BBRHandleInflightTooHigh():
2093	       BBR.bw_probe_samples = 0;  /* only react once per bw probe */
2094	       if (!rs.is_app_limited)
2095	         BBR.inflight_hi = max(rs.tx_in_flight,
2096	                               BBRTargetInflight() * BBRBeta))
2097	       If (BBR.state == ProbeBW_UP)
2098	         BBRStartProbeBW_DOWN()

2100	   Here rs.tx_in_flight is the amount of data that was estimated to be
2101	   in flight when the most recently ACKed packet was sent.  And the
2102	   BBRBeta (0.7x) bound is to try to ensure that BBR does not react more
2103	   dramatically than CUBIC's 0.7x multiplicative decrease factor.

2105	   Some loss detection algorithms, including algorithms like RACK
2106	   [RFC8985] that delay loss marking while waiting for potential
2107	   reordering to resolve, may mark packets as lost long after the loss
2108	   itself happened.  In such cases, the tx_in_flight for the delivered
2109	   sequence range that allowed the loss to be detected may be
2110	   considerably smaller than the tx_in_flight of the lost packet itself.
2111	   In such cases using the former tx_in_flight rather than the latter
2112	   can cause BBR.inflight_hi to be significantly underestimated.  To
2113	   avoid such issues, BBR processes each loss detection event to more
2114	   precisely estimate the volume of in-flight data at which loss rates
2115	   cross BBRLossThresh, noting that this may have happened mid-way
2116	   through some packet.  To estimate this value, we can solve for
2117	   "lost_prefix" in the following equation, where inflight_prev
2118	   represents the volume of in-flight data preceding this packet,
2119	   lost_prev represents the data lost among that previous in-flight
2120	   data:

2122	    lost                     /  inflight                     >= BBRLossThresh
2123	   (lost_prev + lost_prefix) / (inflight_prev + lost_prefix) >= BBRLossThresh
2124	   /* solving for lost_prefix we arrive at: */
2125	   lost_prefix = (BBRLossThresh * inflight_prev - lost_prev) / (1 - BBRLossThresh)

2127	   In pseudocode:

2129	     BBRHandleLostPacket(packet):
2130	       if (!BBR.bw_probe_samples)
2131	         return /* not a packet sent while probing bandwidth */
2132	       rs.tx_in_flight = packet.tx_in_flight /* inflight at transmit */
2133	       rs.lost = C.lost - packet.lost /* data lost since transmit */
2134	       rs.is_app_limited = packet.is_app_limited;
2135	       if (IsInflightTooHigh(rs))
2136	         rs.tx_in_flight = BBRInflightHiFromLostPacket(rs, packet)
2137	         BBRHandleInflightTooHigh(rs)

2139	     /* At what prefix of packet did losses exceed BBRLossThresh? */
2140	     BBRInflightHiFromLostPacket(rs, packet):
2141	       size = packet.size
2142	       /* What was in flight before this packet? */
2143	       inflight_prev = rs.tx_in_flight - size
2144	       /* What was lost before this packet? */
2145	       lost_prev = rs.lost - size
2146	       lost_prefix = (BBRLossThresh * inflight_prev - lost_prev) /
2147	                     (1 - BBRLossThresh)
2148	       /* At what inflight value did losses cross BBRLossThresh? */
2149	       inflight = inflight_prev + lost_prefix
2150	       return inflight

2152	4.5.6.3.  When not Probing for Bandwidth

2154	   When not explicitly accelerating to probe for bandwidth (Drain,
2155	   ProbeRTT, ProbeBW_DOWN, ProbeBW_CRUISE), BBR responds to loss by
2156	   slowing down to some extent.  This is because loss suggests that the
2157	   available bandwidth and safe volume of in-flight data may have
2158	   decreased recently, and the flow needs to adapt, slowing down toward
2159	   the latest delivery process.  BBR flows implement this response by
2160	   reducing the short-term model parameters, BBR.bw_lo and
2161	   BBR.inflight_lo.

2163	   When encountering packet loss when the flow is not probing for
2164	   bandwidth, the strategy is to gradually adapt to the current measured
2165	   delivery process (the rate and volume of data that is delivered
2166	   through the network path over the last round trip).  This applies
2167	   generally: whether in fast recovery, RTO recovery, TLP recovery;
2168	   whether application-limited or not.

2170	   There are two key parameters the algorithm tracks, to measure the
2171	   current delivery process:

2173	   BBR.bw_latest: a 1-round-trip max of delivered bandwidth
2174	   (rs.delivery_rate).

2176	   BBR.inflight_latest: a 1-round-trip max of delivered volume of data
2177	   (rs.delivered).

2179	   Upon the ACK at the end of each round that encountered a newly-marked
2180	   loss, the flow updates its model (bw_lo and inflight_lo) as follows:

2182	         bw_lo     = max(       bw_latest, BBRBeta *       bw_lo )
2183	   inflight_lo     = max( inflight_latest, BBRBeta * inflight_lo )

2185	   This logic can be represented as follows:

2187	     /* Near start of ACK processing: */
2188	     BBRUpdateLatestDeliverySignals():
2189	       BBR.loss_round_start = 0
2190	       BBR.bw_latest       = max(BBR.bw_latest,       rs.delivery_rate)
2191	       BBR.inflight_latest = max(BBR.inflight_latest, rs.delivered)
2192	       if (rs.prior_delivered >= BBR.loss_round_delivered)
2193	         BBR.loss_round_delivered = C.delivered
2194	         BBR.loss_round_start = 1

2196	     /* Near end of ACK processing: */
2197	     BBRAdvanceLatestDeliverySignals():
2198	       if (BBR.loss_round_start)
2199	         BBR.bw_latest       = rs.delivery_rate
2200	         BBR.inflight_latest = rs.delivered

2202	     BBRResetCongestionSignals():
2203	       BBR.loss_in_round = 0
2204	       BBR.bw_latest = 0
2205	       BBR.inflight_latest = 0

2207	     /* Update congestion state on every ACK */
2208	     BBRUpdateCongestionSignals():
2209	       BBRUpdateMaxBw()
2210	       if (rs.losses > 0)
2211	         BBR.loss_in_round = 1
2212	       if (!BBR.loss_round_start)
2213	         return  /* wait until end of round trip */
2214	       BBRAdaptLowerBoundsFromCongestion()
2215	       BBR.loss_in_round = 0

2217	     /* Once per round-trip respond to congestion */
2218	     BBRAdaptLowerBoundsFromCongestion():
2219	       if (BBRIsProbingBW())
2220	         return
2221	       if (BBR.loss_in_round())
2222	         BBRInitLowerBounds()
2223	         BBRLossLowerBounds()

2225	     /* Handle the first congestion episode in this cycle */
2226	     BBRInitLowerBounds():
2227	       if (BBR.bw_lo == Infinity)
2228	         BBR.bw_lo = BBR.max_bw
2229	       if (BBR.inflight_lo == Infinity)
2230	         BBR.inflight_lo = cwnd

2232	     /* Adjust model once per round based on loss */
2233	     BBRLossLowerBounds()
2234	       BBR.bw_lo       = max(BBR.bw_latest,
2235	                             BBRBeta * BBR.bw_lo)
2236	       BBR.inflight_lo = max(BBR.inflight_latest,
2237	                             BBRBeta * BBR.infligh_lo)

2239	     BBRResetLowerBounds():
2240	       BBR.bw_lo       = Infinity
2241	       BBR.inflight_lo = Infinity

2243	     BBRBoundBWForModel():
2244	       BBR.bw = min(BBR.max_bw, BBR.bw_lo, BBR.bw_hi)

2246	4.6.  Updating Control Parameters

2248	   BBR uses three distinct but interrelated control parameters: pacing
2249	   rate, send quantum, and congestion window (cwnd).

2251	4.6.1.  Summary of Control Behavior in the State Machine

2253	   The following table summarizes how BBR modulates the control
2254	   parameters in each state.  In the table below, the semantics of the
2255	   columns are as follows:

2257	   *  State: the state in the BBR state machine, as depicted in the
2258	      "State Transition Diagram" section above.

2260	   *  Tactic: The tactic chosen from the "State Machine Tactics"
2261	      subsection above: "accel" refers to acceleration, "decel" to
2262	      deceleration, and "cruise" to cruising.

2264	   *  Pacing Gain: the value used for BBR.pacing_gain in the given
2265	      state.

2267	   *  Cwnd Gain: the value used for BBR.cwnd_gain in the given state.

2269	   *  Rate Cap: the rate values applied as bounds on the BBR.max_bw
2270	      value applied to compute BBR.bw.

2272	   *  Volume Cap: the volume values applied as bounds on the
2273	      BBR.max_inflight value to compute cwnd.

2275	   The control behavior can be summarized as follows.  Upon processing
2276	   each ACK, BBR uses the values in the table below to compute BBR.bw in
2277	   BBRBoundBWForModel(), and the cwnd in BBRBoundCwndForModel():

2279	+-----------------+--------+--------+------+--------+------------------+
2280	| State           | Tactic | Pacing | Cwnd | Rate   | Volume           |
2281	|                 |        | Gain   | Gain | Cap    | Cap              |
2282	+-----------------+--------+--------+------+--------+------------------+
2283	| Startup         | accel  | 2.77   | 2    |        |                  |
2284	|                 |        |        |      |        |                  |
2285	+-----------------+--------+--------+------+--------+------------------+
2286	| Drain           | decel  | 0.5    | 2    | bw_hi, | inflight_hi,     |
2287	|                 |        |        |      | bw_lo  | inflight_lo      |
2288	+-----------------+--------+--------+------+--------+------------------+
2289	| ProbeBW_DOWN    | decel  | 0.9    | 2    | bw_hi, | inflight_hi,     |
2290	|                 |        |        |      | bw_lo  | inflight_lo      |
2291	+-----------------+--------+--------+------+--------+------------------+
2292	| ProbeBW_CRUISE  | cruise | 1.0    | 2    | bw_hi, | 0.85*inflight_hi |
2293	|                 |        |        |      | bw_lo  | inflight_lo      |
2294	+-----------------+--------+--------+------+--------+------------------+
2295	| ProbeBW_REFILL  | accel  | 1.0    | 2    | bw_hi  | inflight_hi      |
2296	|                 |        |        |      |        |                  |
2297	+-----------------+--------+--------+------+--------+------------------+
2298	| ProbeBW_UP      | accel  | 1.25   | 2    | bw_hi  | inflight_hi      |
2299	|                 |        |        |      |        |                  |
2300	+-----------------+--------+--------+------+--------+------------------+
2301	| ProbeRTT        | decel  | 1.0    | 0.5  | bw_hi, | 0.85*inflight_hi |
2302	|                 |        |        |      | bw_lo  | inflight_lo      |
2303	+-----------------+--------+--------+------+--------+------------------+

2305	4.6.2.  Pacing Rate: BBR.pacing_rate

2307	   To help match the packet-arrival rate to the bottleneck bandwidth
2308	   available to the flow, BBR paces data packets.  Pacing enforces a
2309	   maximum rate at which BBR schedules quanta of packets for
2310	   transmission.

2312	   The sending host implements pacing by maintaining inter-quantum
2313	   spacing at the time each packet is scheduled for departure,
2314	   calculating the next departure time for a packet for a given flow
2315	   (BBR.next_departure_time) as a function of the most recent packet
2316	   size and the current pacing rate, as follows:

2318	     BBR.next_departure_time = max(Now(), BBR.next_departure_time)
2319	     packet.departure_time = BBR.next_departure_time
2320	     pacing_delay = packet.size / BBR.pacing_rate
2321	     BBR.next_departure_time = BBR.next_departure_time + pacing_delay

2323	   To adapt to the bottleneck, in general BBR sets the pacing rate to be
2324	   proportional to bw, with a dynamic gain, or scaling factor of
2325	   proportionality, called pacing_gain.

2327	   When a BBR flow starts it has no bw estimate (bw is 0).  So in this
2328	   case it sets an initial pacing rate based on the transport sender
2329	   implementation's initial congestion window ("InitialCwnd", e.g. from
2330	   [RFC6928]), the initial SRTT (smoothed round-trip time) after the
2331	   first non-zero RTT sample, and the initial pacing_gain:

2333	     BBRInitPacingRate():
2334	       nominal_bandwidth = InitialCwnd / (SRTT ? SRTT : 1ms)
2335	       BBR.pacing_rate =  BBRStartupPacingGain * nominal_bandwidth

2337	   After initialization, on each data ACK BBR updates its pacing rate to
2338	   be proportional to bw, as long as it estimates that it has filled the
2339	   pipe (BBR.filled_pipe is true; see the "Startup" section for
2340	   details), or doing so increases the pacing rate.  Limiting the pacing
2341	   rate updates in this way helps the connection probe robustly for
2342	   bandwidth until it estimates it has reached its full available
2343	   bandwidth ("filled the pipe").  In particular, this prevents the
2344	   pacing rate from being reduced when the connection has only seen
2345	   application-limited bandwidth samples.  BBR updates the pacing rate
2346	   on each ACK by executing the BBRSetPacingRate() step as follows:

2348	     BBRSetPacingRateWithGain(pacing_gain):
2349	       rate = pacing_gain * bw * (100 - BBRPacingMarginPercent) / 100
2350	       if (BBR.filled_pipe || rate > BBR.pacing_rate)
2351	         BBR.pacing_rate = rate

2353	     BBRSetPacingRate():
2354	       BBRSetPacingRateWithGain(BBR.pacing_gain)

2356	   To help drive the network toward lower queues and low latency while
2357	   maintaining high utilization, the BBRPacingMarginPercent constant of
2358	   1 aims to cause BBR to pace at 1% below the bw, on average.

2360	4.6.3.  Send Quantum: BBR.send_quantum

2362	   In order to amortize per-packet overheads involved in the sending
2363	   process (host CPU, NIC processing, and interrupt processing delays),
2364	   high-performance transport sender implementations (e.g., Linux TCP)
2365	   often schedule an aggregate containing multiple packets (multiple
2366	   SMSS) worth of data as a single quantum (using TSO, GSO, or other
2367	   offload mechanisms).  The BBR congestion control algorithm makes this
2368	   control decision explicitly, dynamically calculating a quantum
2369	   control parameter that specifies the maximum size of these
2370	   transmission aggregates.  This decision is based on a trade-off:

2372	   *  A smaller quantum is preferred at lower data rates because it
2373	      results in shorter packet bursts, shorter queues, lower queueing
2374	      delays, and lower rates of packet loss.

2376	   *  A bigger quantum can be required at higher data rates because it
2377	      results in lower CPU overheads at the sending and receiving hosts,
2378	      who can ship larger amounts of data with a single trip through the
2379	      networking stack.

2381	   On each ACK, BBR runs BBRSetSendQuantum() to update BBR.send_quantum
2382	   as follows:

2384	     BBRSetSendQuantum():
2385	       if (BBR.pacing_rate < 1.2 Mbps)
2386	         floor = 1 * SMSS
2387	       else
2388	         floor = 2 * SMSS
2389	       BBR.send_quantum = min(BBR.pacing_rate * 1ms, 64KBytes)
2390	       BBR.send_quantum = max(BBR.send_quantum, floor)

2392	   A BBR implementation MAY use alternate approaches to select a
2393	   BBR.send_quantum, as appropriate for the CPU overheads anticipated
2394	   for senders and receivers, and buffering considerations anticipated
2395	   in the network path.  However, for the sake of the network and other
2396	   users, a BBR implementation SHOULD attempt to use the smallest
2397	   feasible quanta.

2399	4.6.4.  Congestion Window

2401	   The congestion window, or cwnd, controls the maximum volume of data
2402	   BBR allows in flight in the network at any time.  It is the maximum
2403	   volume of in-flight data that the algorithm estimates is appropriate
2404	   for matching the current network path delivery process, given all
2405	   available signals in the model, at any time scale.  BBR adapts the
2406	   cwnd based on its model of the network path and the state machine's
2407	   decisions about how to probe that path.

2409	   By default, BBR grows its cwnd to meet its BBR.max_inflight, which
2410	   models what's required for achieving full throughput, and as such is
2411	   scaled to adapt to the estimated BDP computed from its path model.
2412	   But BBR's selection of cwnd is designed to explicitly trade off among
2413	   competing considerations that dynamically adapt to various
2414	   conditions.  So in loss recovery BBR more conservatively adjusts its
2415	   sending behavior based on more recent delivery samples, and if BBR
2416	   needs to re-probe the current BBR.min_rtt of the path then it cuts
2417	   its cwnd accordingly.  The following sections describe the various
2418	   considerations that impact cwnd.

2420	4.6.4.1.  Initial cwnd

2422	   BBR generally uses measurements to build a model of the network path
2423	   and then adapts control decisions to the path based on that model.
2424	   As such, the selection of the initial cwnd is considered to be
2425	   outside the scope of the BBR algorithm, since at initialization there
2426	   are no measurements yet upon which BBR can operate.  Thus, at
2427	   initialization, BBR uses the transport sender implementation's
2428	   initial congestion window (e.g. from [RFC6298] for TCP).

2430	4.6.4.2.  Computing BBR.max_inflight

2432	   The BBR BBR.max_inflight is the upper bound on the volume of data BBR
2433	   allows in flight.  This bound is always in place, and dominates when
2434	   all other considerations have been satisfied: the flow is not in loss
2435	   recovery, does not need to probe BBR.min_rtt, and has accumulated
2436	   confidence in its model parameters by receiving enough ACKs to
2437	   gradually grow the current cwnd to meet the BBR.max_inflight.

2439	   On each ACK, BBR calculates the BBR.max_inflight in
2440	   BBRUpdateMaxInflight() as follows:

2442	     BBRBDPMultiple(gain):
2443	       if (BBR.min_rtt == Inf)
2444	         return InitialCwnd /* no valid RTT samples yet */
2445	       BBR.bdp = BBR.bw * BBR.min_rtt
2446	       return gain * BBR.bdp

2448	     BBRQuantizationBudget(inflight)
2449	       BBRUpdateOffloadBudget()
2450	       inflight = max(inflight, BBR.offload_budget)
2451	       inflight = max(inflight, BBRMinPipeCwnd)
2452	         if (BBR.state == ProbeBW && BBR.cycle_idx == ProbeBW_UP)
2453	         inflight += 2
2454	       return inflight

2456	     BBRInflight(gain):
2457	       inflight = BBRBDPMultiple(gain)
2458	       return BBRQuantizationBudget(inflight)

2460	     BBRUpdateMaxInflight():
2461	       BBRUpdateAggregationBudget()
2462	       inflight = BBRBDPMultiple(BBR.cwnd_gain)
2463	       inflight += BBR.extra_acked
2464	       BBR.max_inflight = BBRQuantizationBudget(inflight)

2466	   The "estimated_bdp" term tries to allow enough packets in flight to
2467	   fully utilize the estimated BDP of the path, by allowing the flow to
2468	   send at BBR.bw for a duration of BBR.min_rtt.  Scaling up the BDP by
2469	   BBR.cwnd_gain bounds in-flight data to a small multiple of the BDP,
2470	   to handle common network and receiver behavior, such as delayed,
2471	   stretched, or aggregated ACKs [A15].  The "quanta" term allows enough
2472	   quanta in flight on the sending and receiving hosts to reach high
2473	   throughput even in environments using offload mechanisms.

2475	4.6.4.3.  Minimum cwnd for Pipelining

2477	   For BBR.max_inflight, BBR imposes a floor of BBRMinPipeCwnd (4
2478	   packets, i.e. 4 * SMSS).  This floor helps ensure that even at very
2479	   low BDPs, and with a transport like TCP where a receiver may ACK only
2480	   every alternate SMSS of data, there are enough packets in flight to
2481	   maintain full pipelining.  In particular BBR tries to allow at least
2482	   2 data packets in flight and ACKs for at least 2 data packets on the
2483	   path from receiver to sender.

2485	4.6.4.4.  Modulating cwnd in Loss Recovery

2487	   BBR interprets loss as a hint that there may be recent changes in
2488	   path behavior that are not yet fully reflected in its model of the
2489	   path, and thus it needs to be more conservative.

2491	   Upon a retransmission timeout (RTO), BBR conservatively reduces cwnd
2492	   to a value that will allow 1 SMSS to be transmitted.  Then BBR
2493	   gradually increases cwnd using the normal approach outlined below in
2494	   "Core cwnd Adjustment Mechanism".

2496	   When a BBR sender detects packet loss but there are still packets in
2497	   flight, on the first round of the loss-repair process BBR temporarily
2498	   reduces the cwnd to match the current delivery rate as ACKs arrive.
2499	   On second and later rounds of loss repair, it ensures the sending
2500	   rate never exceeds twice the current delivery rate as ACKs arrive.

2502	   When BBR exits loss recovery it restores the cwnd to the "last known
2503	   good" value that cwnd held before entering recovery.  This applies
2504	   equally whether the flow exits loss recovery because it finishes
2505	   repairing all losses or because it executes an "undo" event after
2506	   inferring that a loss recovery event was spurious.

2508	   There are several ways to implement this high-level design for
2509	   updating cwnd in loss recovery.  One is as follows:

2511	   Upon retransmission timeout (RTO):

2513	     BBROnEnterRTO():
2514	       BBR.prior_cwnd = BBRSaveCwnd()
2515	       cwnd = packets_in_flight + 1

2517	   Upon entering Fast Recovery, set cwnd to the number of packets still
2518	   in flight (allowing at least one for a fast retransmit):

2520	     BBROnEnterFastRecovery():
2521	       BBR.prior_cwnd = BBRSaveCwnd()
2522	       cwnd = packets_in_flight + max(rs.newly_acked, 1)
2523	       BBR.packet_conservation = true

2525	   Upon every ACK in Fast Recovery, run the following
2526	   BBRModulateCwndForRecovery() steps, which help ensure packet
2527	   conservation on the first round of recovery, and sending at no more
2528	   than twice the current delivery rate on later rounds of recovery
2529	   (given that "rs.newly_acked" packets were newly marked ACKed or
2530	   SACKed and "rs.newly_lost" were newly marked lost):

2532	     BBRModulateCwndForRecovery():
2533	       if (rs.newly_lost > 0)
2534	         cwnd = max(cwnd - rs.newly_lost, 1)
2535	       if (BBR.packet_conservation)
2536	         cwnd = max(cwnd, packets_in_flight + rs.newly_acked)

2538	   After one round-trip in Fast Recovery:

2540	     BBR.packet_conservation = false

2542	   Upon exiting loss recovery (RTO recovery or Fast Recovery), either by
2543	   repairing all losses or undoing recovery, BBR restores the best-known
2544	   cwnd value we had upon entering loss recovery:

2546	     BBR.packet_conservation = false
2547	     BBRRestoreCwnd()

2549	   Note that exiting loss recovery happens during ACK processing, and at
2550	   the end of ACK processing BBRBoundCwndForModel() will bound the cwnd
2551	   based on the current model parameters.  Thus the cwnd and pacing rate
2552	   after loss recovery will generally be smaller than the values
2553	   entering loss recovery.

2555	   The BBRSaveCwnd() and BBRRestoreCwnd() helpers help remember and
2556	   restore the last-known good cwnd (the latest cwnd unmodulated by loss
2557	   recovery or ProbeRTT), and is defined as follows:

2559	     BBRSaveCwnd():
2560	       if (!InLossRecovery() and BBR.state != ProbeRTT)
2561	         return cwnd
2562	       else
2563	         return max(BBR.prior_cwnd, cwnd)

2565	     BBRRestoreCwnd():
2566	       cwnd = max(cwnd, BBR.prior_cwnd)

2568	4.6.4.5.  Modulating cwnd in ProbeRTT

2570	   If BBR decides it needs to enter the ProbeRTT state (see the
2571	   "ProbeRTT" section below), its goal is to quickly reduce the volume
2572	   of in-flight data and drain the bottleneck queue, thereby allowing
2573	   measurement of BBR.min_rtt.  To implement this mode, BBR bounds the
2574	   cwnd to BBRMinPipeCwnd, the minimal value that allows pipelining (see
2575	   the "Minimum cwnd for Pipelining" section, above):

2577	     BBRProbeRTTCwnd():
2578	       probe_rtt_cwnd = BBRBDPMultiple(BBR.bw, BBRProbeRTTCwndGain)
2579	       probe_rtt_cwnd = max(probe_rtt_cwnd, BBRMinPipeCwnd)
2580	       return probe_rtt_cwnd

2582	     BBRBoundCwndForProbeRTT():
2583	       if (BBR.state == ProbeRTT)
2584	         cwnd = min(cwnd, BBRProbeRTTCwnd())

2586	4.6.4.6.  Core cwnd Adjustment Mechanism

2588	   The network path and traffic traveling over it can make sudden
2589	   dramatic changes.  To adapt to these changes smoothly and robustly,
2590	   and reduce packet losses in such cases, BBR uses a conservative
2591	   strategy.  When cwnd is above the BBR.max_inflight derived from BBR's
2592	   path model, BBR cuts the cwnd immediately to the BBR.max_inflight.
2593	   When cwnd is below BBR.max_inflight, BBR raises the cwnd gradually
2594	   and cautiously, increasing cwnd by no more than the amount of data
2595	   acknowledged (cumulatively or selectively) upon each ACK.

2597	   Specifically, on each ACK that acknowledges "rs.newly_acked" packets
2598	   as newly ACKed or SACKed, BBR runs the following BBRSetCwnd() steps
2599	   to update cwnd:

2601	     BBRSetCwnd():
2602	       BBRUpdateMaxInflight()
2603	       BBRModulateCwndForRecovery()
2604	       if (!BBR.packet_conservation) {
2605	         if (BBR.filled_pipe)
2606	           cwnd = min(cwnd + rs.newly_acked, BBR.max_inflight)
2607	         else if (cwnd < BBR.max_inflight || C.delivered < InitialCwnd)
2608	           cwnd = cwnd + rs.newly_acked
2609	         cwnd = max(cwnd, BBRMinPipeCwnd)
2610	       }
2611	       BBRBoundCwndForProbeRTT()
2612	       BBRBoundCwndForModel()

2614	   There are several considerations embodied in the logic above.  If BBR
2615	   has measured enough samples to achieve confidence that it has filled
2616	   the pipe (see the description of BBR.filled_pipe in the "Startup"
2617	   section below), then it increases its cwnd based on the number of
2618	   packets delivered, while bounding its cwnd to be no larger than the
2619	   BBR.max_inflight adapted to the estimated BDP.  Otherwise, if the
2620	   cwnd is below the BBR.max_inflight, or the sender has marked so
2621	   little data delivered (less than InitialCwnd) that it does not yet
2622	   judge its BBR.max_bw estimate and BBR.max_inflight as useful, then it
2623	   increases cwnd without bounding it to be below BBR.max_inflight.
2624	   Finally, BBR imposes a floor of BBRMinPipeCwnd in order to allow
2625	   pipelining even with small BDPs (see the "Minimum cwnd for
2626	   Pipelining" section, above).

2628	4.6.4.7.  Bounding cwnd Based on Recent Congestion

2630	   Finally, BBR bounds the cwnd based on recent congestion, as outlined
2631	   in the "Volume Cap" column of the table in the "Summary of Control
2632	   Behavior in the State Machine" section:

2634	     BBRBoundCwndForModel():
2635	       cap = Infinity
2636	       if (IsInAProbeBWState() and
2637	           BBR.state != ProbeBW_CRUISE)
2638	         cap = BBR.inflight_hi
2639	       else if (BBR.state == ProbeRTT or
2640	                BBR.state == ProbeBW_CRUISE)
2641	         cap = BBRInflightWithHeadroom()

2643	       /* apply inflight_lo (possibly infinite): */
2644	       cap = min(cap, BBR.inflight_lo)
2645	       cap = max(cap, BBRMinPipeCwnd)
2646	       cwnd = min(cwnd, cap)

2648	5.  Implementation Status

2650	   This section records the status of known implementations of the
2651	   algorithm defined by this specification at the time of posting of
2652	   this Internet-Draft, and is based on a proposal described in
2653	   [RFC7942].  The description of implementations in this section is
2654	   intended to assist the IETF in its decision processes in progressing
2655	   drafts to RFCs.  Please note that the listing of any individual
2656	   implementation here does not imply endorsement by the IETF.
2657	   Furthermore, no effort has been spent to verify the information
2658	   presented here that was supplied by IETF contributors.  This is not
2659	   intended as, and must not be construed to be, a catalog of available
2660	   implementations or their features.  Readers are advised to note that
2661	   other implementations may exist.

2663	   According to [RFC7942], "this will allow reviewers and working groups
2664	   to assign due consideration to documents that have the benefit of
2665	   running code, which may serve as evidence of valuable experimentation
2666	   and feedback that have made the implemented protocols more mature.
2667	   It is up to the individual working groups to use this information as
2668	   they see fit".

2670	   As of the time of writing, the following implementations of BBR have
2671	   been publicly released:

2673	   *  Linux TCP

2675	      -  Source code URL:

2677	         o  https://github.com/google/bbr/blob/v2alpha/README.md

2679	         o  https://github.com/google/bbr/blob/v2alpha/net/ipv4/
2680	            tcp_bbr2.c

2682	      -  Source: Google

2684	      -  Maturity: production

2686	      -  License: dual-licensed: GPLv2 / BSD

2688	      -  Contact: https://groups.google.com/d/forum/bbr-dev

2690	      -  Last updated: August 21, 2021

2692	   *  QUIC

2694	      -  Source code URLs:

2696	         o  https://cs.chromium.org/chromium/src/net/third_party/quiche/
2697	            src/quic/core/congestion_control/bbr2_sender.cc

2699	         o  https://cs.chromium.org/chromium/src/net/third_party/quiche/
2700	            src/quic/core/congestion_control/bbr2_sender.h

2702	      -  Source: Google

2704	      -  Maturity: production

2706	      -  License: BSD-style

2708	      -  Contact: https://groups.google.com/d/forum/bbr-dev

2710	      -  Last updated: October 21, 2021

2712	6.  Security Considerations

2714	   This proposal makes no changes to the underlying security of
2715	   transport protocols or congestion control algorithms.  BBR shares the
2716	   same security considerations as the existing standard congestion
2717	   control algorithm [RFC5681].

2719	7.  IANA Considerations

2721	   This document has no IANA actions.  Here we are using that phrase,
2722	   suggested by [RFC5226], because BBR does not modify or extend the
2723	   wire format of any network protocol, nor does it add new dependencies
2724	   on assigned numbers.  BBR involves only a change to the congestion
2725	   control algorithm of a transport sender, and does not involve changes
2726	   in the network, the receiver, or any network protocol.

2728	   Note to RFC Editor: this section may be removed on publication as an
2729	   RFC.

2731	8.  Acknowledgments

2733	   The authors are grateful to Len Kleinrock for his work on the theory
2734	   underlying congestion control.  We are indebted to Larry Brakmo for
2735	   pioneering work on the Vegas [BP95] and New Vegas [B15] congestion
2736	   control algorithms, which presaged many elements of BBR, and for
2737	   Larry's advice and guidance during BBR's early development.  The
2738	   authors would also like to thank Kevin Yang, Priyaranjan Jha, Yousuk
2739	   Seung, Luke Hsiao for their work on TCP BBR; Jana Iyengar, Victor
2740	   Vasiliev, Bin Wu for their work on QUIC BBR; and Matt Mathis for his
2741	   research work on the BBR algorithm and its implications [MM19].  We
2742	   would also like to thank C.  Stephen Gunn, Eric Dumazet, Nandita
2743	   Dukkipati, Pawel Jurczyk, Biren Roy, David Wetherall, Amin Vahdat,
2744	   Leonidas Kontothanassis, and the YouTube, google.com, Bandwidth
2745	   Enforcer, and Google SRE teams for their invaluable help and support.
2746	   We would like to thank Randall R.  Stewart, Jim Warner, Loganaden
2747	   Velvindron, Hiren Panchasara, and Adrian Zapletal for feedback and
2748	   suggestions on earlier versions of this document.

2750	9.  References

2752	9.1.  Normative References

2754	   [RFC793]   Postel, J., "Transmission Control Protocol", September
2755	              1981.

2757	   [RFC2018]  Mathis, M. and J. Mahdavi, "TCP Selective Acknowledgment
2758	              Options", RFC 2018, October 1996,
2759	              .

2761	   [RFC7323]  Borman, D., Braden, B., Jacobson, V., and R.
2762	              Scheffenegger, "TCP Extensions for High Performance",
2763	              September 2014.

2765	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2766	              Requirement Levels", RFC 2119, March 1997,
2767	              .

2769	   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
2770	              IANA Considerations Section in RFCs", May 2008.

2772	   [RFC6298]  Paxson, V., "Computing TCP's Retransmission Timer",
2773	              RFC 6298, June 2011,
2774	              .

2776	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
2777	              Control", RFC 5681, September 2009,
2778	              .

2780	   [RFC7942]  Sheffer, Y. and A. Farrel, "Improving Awareness of Running
2781	              Code: The Implementation Status Section", July 2016.

2783	   [RFC8312]  Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
2784	              R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
2785	              February 2018, .

2787	   [RFC8985]  Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The
2788	              RACK-TLP Loss Detection Algorithm for TCP", RFC 8985,
2789	              DOI 10.17487/RFC8985, February 2021,
2790	              .

2792	   [RFC9000]  Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
2793	              Multiplexed and Secure Transport", RFC 9000,
2794	              DOI 10.17487/RFC9000, May 2021,
2795	              .

2797	   [RFC4340]  Kohler, E., Handley, M., and S. Floyd, "Datagram
2798	              Congestion Control Protocol (DCCP)", RFC 4340,
2799	              DOI 10.17487/RFC4340, March 2006,
2800	              .

2802	9.2.  Informative References

2804	   [draft-cheng-iccrg-delivery-rate-estimation]
2805	              Cheng, Y., Cardwell, N., Hassas Yeganeh, S., and V.
2806	              Jacobson, "Delivery Rate Estimation", Work in Progress,
2807	              Internet-Draft, draft-cheng-iccrg-delivery-rate-
2808	              estimation, November 2021, .

2811	   [CCGHJ16]  Cardwell, N., Cheng, Y., Gunn, C., Hassas Yeganeh, S., and
2812	              V. Jacobson, "BBR: Congestion-Based Congestion Control",
2813	              ACM Queue Oct 2016, October 2016,
2814	              .

2816	   [CCGHJ17]  Cardwell, N., Cheng, Y., Gunn, C., Hassas Yeganeh, S., and
2817	              V. Jacobson, "BBR: Congestion-Based Congestion Control",
2818	              Communications of the ACM Feb 2017, February 2017,
2819	              .

2822	   [MM19]     Mathis, M. and J. Mahdavi, "Deprecating The TCP
2823	              Macroscopic Model", Computer Communication Review, vol.
2824	              49, no. 5, pp. 63-68 , October 2019.

2826	   [BBRStartupPacingGain]
2827	              Cardwell, N., Cheng, Y., Hassas Yeganeh, S., and V.
2828	              Jacobson, "BBR Startup Pacing Gain: a Derivation", June
2829	              2018, .

2832	   [BBRDrainPacingGain]
2833	              Cardwell, N., Cheng, Y., Hassas Yeganeh, S., and V.
2834	              Jacobson, "BBR Drain Pacing Gain: a Derivation", September
2835	              2021, .

2838	   [draft-romo-iccrg-ccid5]
2839	              Romo, N., Kim, J., and M. Amend, "Profile for Datagram
2840	              Congestion Control Protocol (DCCP) Congestion Control ID
2841	              5", Work in Progress, Internet-Draft, draft-romo-iccrg-
2842	              ccid5, 25 October 2021,
2843	              .

2845	   [DC13]     Dumazet, E. and Y. Cheng, "TSO, fair queuing, pacing:
2846	              three's a charm", IETF 88 , November 2013,
2847	              .

2850	   [A15]      Abrahamsson, M., "TCP ACK suppression", IETF AQM mailing
2851	              list , November 2015, .

2854	   [Jac88]    Jacobson, V., "Congestion Avoidance and Control", SIGCOMM
2855	              1988, Computer Communication Review, vol. 18, no. 4, pp.
2856	              314-329 , August 1988,
2857	              .

2859	   [Jac90]    Jacobson, V., "Modified TCP Congestion Avoidance
2860	              Algorithm", end2end-interest mailing list , April 1990,
2861	              .

2863	   [BP95]     Brakmo, L. and L. Peterson, "TCP Vegas: end-to-end
2864	              congestion avoidance on a global Internet", IEEE Journal
2865	              on Selected Areas in Communications 13(8): 1465-1480 ,
2866	              October 1995.

2868	   [B15]      Brakmo, L., "TCP-NV: An Update to TCP-Vegas",  , August
2869	              2015, .

2872	   [WS95]     Wright, G. and W. Stevens, "TCP/IP Illustrated, Volume 2:
2873	              The Implementation", Addison-Wesley , 1995.

2875	   [HRX08]    Ha, S., Rhee, I., and L. Xu, "CUBIC: A New TCP-Friendly
2876	              High-Speed TCP Variant", ACM SIGOPS Operating System
2877	              Review , 2008.

2879	   [GK81]     Gail, R. and L. Kleinrock, "An Invariant Property of
2880	              Computer Network Power", Proceedings of the International
2881	              Conference on Communications June, 1981,
2882	              .

2884	   [K79]      Kleinrock, L., "Power and deterministic rules of thumb for
2885	              probabilistic problems in computer communications",
2886	              Proceedings of the International Conference on
2887	              Communications 1979.

2889	Authors' Addresses

2891	   Neal Cardwell
2892	   Google
2893	   Email: ncardwell@google.com

2895	   Yuchung Cheng
2896	   Google
2897	   Email: ycheng@google.com

2899	   Soheil Hassas Yeganeh
2900	   Google
2901	   Email: soheil@google.com

2903	   Ian Swett
2904	   Google
2905	   Email: ianswett@google.com

2907	   Van Jacobson
2908	   Google
2909	   Email: vanj@google.com