idnits 2.17.1 draft-cheng-iccrg-delivery-rate-estimation-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC793], [RFC9000]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 141: '...f data delivered MAY be tracked in uni...' RFC 2119 keyword, line 331: '...owing parameters MUST be tracked to im...' RFC 2119 keyword, line 536: '... it is RECOMMENDED that the per-pack...' RFC 2119 keyword, line 550: '... is RECOMMENDED that applications or...' RFC 2119 keyword, line 575: '..., then the following approaches MAY be...' (2 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (7 March 2022) is 780 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)


     Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Congestion Control Research Group                      Y. Cheng
3	Internet-Draft                                               N. Cardwell
4	Intended status: Experimental                          S. Hassas Yeganeh
5	Expires: 8 September 2022                                    V. Jacobson
6	                                                                  Google
7	                                                            7 March 2022

9	                        Delivery Rate Estimation
10	             draft-cheng-iccrg-delivery-rate-estimation-02

12	Abstract

14	   This document describes a generic algorithm for a transport protocol
15	   sender to estimate the current delivery rate of its data.  At a high
16	   level, the algorithm estimates the rate at which the network
17	   delivered the most recent flight of outbound data packets for a
18	   single flow.  In addition, it tracks whether the rate sample was
19	   application-limited, meaning the transmission rate was limited by the
20	   sending application rather than the congestion control algorithm.
21	   This algorithm can be implemented in any transport protocol that
22	   supports packet-delivery acknowledgment (thus far, open source
23	   implementations are available for TCP [RFC793] and QUIC [RFC9000]).

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at https://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on 8 September 2022.

42	Copyright Notice

44	   Copyright (c) 2022 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
49	   license-info) in effect on the date of publication of this document.
50	   Please review these documents carefully, as they describe your rights
51	   and restrictions with respect to this document.  Code Components
52	   extracted from this document must include Revised BSD License text as
53	   described in Section 4.e of the Trust Legal Provisions and are
54	   provided without warranty as described in the Revised BSD License.

56	Table of Contents

58	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
59	   2.  Algorithm Overview  . . . . . . . . . . . . . . . . . . . . .   3
60	     2.1.  Requirements  . . . . . . . . . . . . . . . . . . . . . .   3
61	     2.2.  Estimating Delivery Rate  . . . . . . . . . . . . . . . .   3
62	       2.2.1.  ACK Rate  . . . . . . . . . . . . . . . . . . . . . .   4
63	       2.2.2.  Compression and Aggregation . . . . . . . . . . . . .   5
64	       2.2.3.  Send Rate . . . . . . . . . . . . . . . . . . . . . .   5
65	       2.2.4.  Delivery Rate . . . . . . . . . . . . . . . . . . . .   6
66	     2.3.  Tracking application-limited phases . . . . . . . . . . .   6
67	   3.  Detailed Algorithm  . . . . . . . . . . . . . . . . . . . . .   7
68	     3.1.  Variables . . . . . . . . . . . . . . . . . . . . . . . .   7
69	       3.1.1.  Per-connection (C) state  . . . . . . . . . . . . . .   7
70	       3.1.2.  Per-packet (P) state  . . . . . . . . . . . . . . . .   8
71	       3.1.3.  Rate Sample (rs) Output . . . . . . . . . . . . . . .   8
72	     3.2.  Transmitting or retransmitting a data packet  . . . . . .   9
73	     3.3.  Upon receiving an ACK . . . . . . . . . . . . . . . . . .  10
74	     3.4.  Detecting application-limited phases  . . . . . . . . . .  11
75	   4.  Discussion  . . . . . . . . . . . . . . . . . . . . . . . . .  12
76	     4.1.  Offload Mechanisms  . . . . . . . . . . . . . . . . . . .  12
77	     4.2.  Impact of ACK losses  . . . . . . . . . . . . . . . . . .  12
78	     4.3.  Impact of packet reordering . . . . . . . . . . . . . . .  13
79	     4.4.  Impact of packet loss and retransmissions . . . . . . . .  13
80	     4.5.  Connections without SACK support  . . . . . . . . . . . .  13
81	   5.  Implementation Status . . . . . . . . . . . . . . . . . . . .  14
82	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  15
83	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
84	   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  15
85	   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
86	     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  15
87	     9.2.  Informative References  . . . . . . . . . . . . . . . . .  16
88	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

90	1.  Introduction

92	   This document describes a generic algorithm for a transport protocol
93	   sender to estimate the current delivery rate of its data on the fly.
94	   This technique has been used for a congestion control algorithm that
95	   relies on fresh, reliable, and inexpensive delivery rate information
96	   [draft-cardwell-iccrg-bbr-congestion-control] [CCGHJ17].

98	   At a high level, the algorithm estimates the rate at which the
99	   network delivered the most recent flight of outbound data packets for
100	   a single flow.  In addition, it tracks whether the rate sample was
101	   application-limited, meaning the transmission rate was limited by the
102	   sending application rather than the congestion control algorithm.

104	   Each acknowledgment that cumulatively or selectively acknowledges
105	   that the network has delivered new data produces a rate sample which
106	   records the amount of data delivered over the time interval between
107	   the transmission of a data packet and the acknowledgment of that
108	   packet.  The samples reflect the recent goodput through some
109	   bottleneck, which may reside either in the network or on the end
110	   hosts (sender or receiver).

112	2.  Algorithm Overview

114	2.1.  Requirements

116	   This algorithm can be implemented in any transport protocol that
117	   supports packet-delivery acknowledgment (so far, implementations are
118	   available for TCP [RFC793] and QUIC [RFC9000]).  This algorithm
119	   requires a small amount of added logic on the sender, and requires
120	   that the sender maintain a small amount of additional per-packet
121	   state for packets sent but not yet delivered.  In the most general
122	   case it requires high-precision (microsecond-granularity or better)
123	   timestamps on the sender (though millisecond-granularity may suffice
124	   for lower bandwidths).  It does not require any receiver or network
125	   changes.  While selective acknowledgments for out-of-order data
126	   (e.g., [RFC2018]) are not required, such a mechanism is highly
127	   recommended for accurate estimation during reordering and loss
128	   recovery phases.

130	2.2.  Estimating Delivery Rate

132	   A delivery rate sample records the estimated rate at which the
133	   network delivered packets for a single flow, calculated over the time
134	   interval between the transmission of a data packet and the
135	   acknowledgment of that packet.  Since the rate samples only include
136	   packets actually cumulatively and/or selectively acknowledged, the
137	   sender knows the exact octets that were delivered to the receiver
138	   (not lost), and the sender can compute an estimate of a bottleneck
139	   delivery rate over that time interval.

141	   The amount of data delivered MAY be tracked in units of either octets
142	   or packets.  Tracking data in units of octets is more accurate, since
143	   packet sizes can vary.  But for some purposes, including congestion
144	   control, tracking data in units of packets may suffice.

146	2.2.1.  ACK Rate

148	   First, consider the rate at which data is acknowledged by the
149	   receiver.  In this algorithm, the computation of the ACK rate models
150	   the average slope of a hypothetical "delivered" curve that tracks the
151	   cumulative quantity of data delivered so far on the Y axis, and time
152	   elapsed on the X axis.  Since ACKs arrive in discrete events, this
153	   "delivered" curve forms a step function, where each ACK causes a
154	   discrete increase in the "delivered" count that causes a vertical
155	   upward step up in the curve.  This "ack_rate" computation is the
156	   average slope of the "delivered" step function, as measured from the
157	   "knee" of the step (ACK) preceding the transmit to the "knee" of the
158	   step (ACK) for packet P.

160	   Given this model, the ack rate sample "slope" is computed as the
161	   ratio between the amount of data marked as delivered over this time
162	   interval, and the time over which it is marked as delivered:

164	     ack_rate = data_acked / ack_elapsed

166	   To calculate the amount of data ACKed over the interval, the sender
167	   records in per-packet state "P.delivered", the amount of data that
168	   had been marked delivered before transmitting packet P, and then
169	   records how much data had been marked delivered by the time the ACK
170	   for the packet arrives (in "C.delivered"), and computes the
171	   difference:

173	     data_acked = C.delivered - P.delivered

175	   To compute the time interval, "ack_elapsed", one might imagine that
176	   it would be feasible to use the round-trip time (RTT) of the packet.
177	   But it is not safe to simply calculate a bandwidth estimate by using
178	   the time between the transmit of a packet and the acknowledgment of
179	   that packet.  Transmits and ACKs can happen out of phase with each
180	   other, clocked in separate processes.  In general, transmissions
181	   often happen at some point later than the most recent ACK, due to
182	   processing or pacing delays.  Because of this effect, drastic over-
183	   estimates can happen if a sender were to attempt to estimate
184	   bandwidth by using the round-trip time.

186	   This document specifies the following approach for computing
187	   "ack_elapsed".  The starting time is "P.delivered_time", the time of
188	   the delivery curve "knee" from the ACK preceding the transmit.  The
189	   ending time is "C.delivered_time", the time of the delivery curve
190	   "knee" from the ACK for P.  Then we compute "ack_elapsed" as:

192	     ack_elapsed = C.delivered_time - P.delivered_time

194	   This yields our equation for computing the ACK rate, as the "slope"
195	   from the "knee" preceding the transmit to the "knee" at ACK:

197	     ack_rate = data_acked / ack_elapsed
198	     ack_rate = (C.delivered - P.delivered) /
199	                (C.delivered_time - P.delivered_time)

201	2.2.2.  Compression and Aggregation

203	   For computing the delivery_rate, the sender prefers ack_rate, the
204	   rate at which packets were acknowledged, since this usually the most
205	   reliable metric.  However, this approach of directly using "ack_rate"
206	   faces a challenge when used with paths featuring aggregation,
207	   compression, or ACK decimation, which are prevalent [A15].  In such
208	   cases, ACK arrivals can temporarily make it appear as if data packets
209	   were delivered much faster than the bottleneck rate.  To filter out
210	   such implausible ack_rate samples, we consider the send rate for each
211	   flight of data, as follows.

213	2.2.3.  Send Rate

215	   The sender calculates the send rate, "send_rate", for a flight of
216	   data as follows.  Define "P.first_sent_time" as the time of the first
217	   send in a flight of data, and "P.sent_time" as the time the final
218	   send in that flight of data (the send that transmits packet "P").
219	   The elapsed time for sending the flight is:

221	     send_elapsed = (P.sent_time - P.first_sent_time)

223	   Then we calculate the send_rate as:

225	     send_rate = data_acked / send_elapsed

227	   Using our "delivery" curve model above, the send_rate can be viewed
228	   as the average slope of a "send" curve that traces the amount of data
229	   sent on the Y axis, and the time elapsed on the X axis: the average
230	   slope of the transmission of this flight of data.

232	2.2.4.  Delivery Rate

234	   Since it is physically impossible to have data delivered faster than
235	   it is sent in a sustained fashion, when the estimator notices that
236	   the ack_rate for a flight is faster than the send rate for the
237	   flight, it filters out the implausible ack_rate by capping the
238	   delivery rate sample to be no higher than the send rate.

240	   More precisely, over the interval between each transmission and
241	   corresponding ACK, the sender calculates a delivery rate sample,
242	   "delivery_rate", using the minimum of the rate at which packets were
243	   acknowledged or the rate at which they were sent:

245	     delivery_rate = min(send_rate, ack_rate)

247	   Since ack_rate and send_rate both have data_acked as a numerator,
248	   this can be computed more efficiently with a single division (instead
249	   of two), as follows:

251	     delivery_elapsed = max(ack_elapsed, send_elapsed)
252	     delivery_rate = data_acked / delivery_elapsed

254	2.3.  Tracking application-limited phases

256	   In application-limited phases the transmission rate is limited by the
257	   application rather than the congestion control algorithm.  Modern
258	   transport protocol connections are often application-limited, either
259	   due to request/response workloads (e.g.  Web traffic, RPC traffic) or
260	   because the sender transmits data in chunks (e.g. adaptive streaming
261	   video).

263	   Knowing whether a delivery rate sample was application-limited is
264	   crucial for congestion control algorithms and applications to use the
265	   estimated delivery rate samples properly.  For example, congestion
266	   control algorithms may not want to react to a delivery rate that is
267	   lower simply because the sender is application-limited; for
268	   congestion control the key metric is the rate at which the network
269	   path can deliver data, and not simply the rate at which the
270	   application happens to be transmitting data at any moment.

272	   To track this, the estimator marks a bandwidth sample as application-
273	   limited if there was some moment during the sampled flight of data
274	   packets when there was no data ready to send.

276	   An application-limited phase starts when the sending application
277	   requests to send new data, or the connection's retransmission
278	   mechanisms decide to retransmit data, and the connection meets all of
279	   the following conditions:

281	   1.  The transport send buffer has less than one SMSS of unsent data
282	       available to send.

284	   2.  The sending flow is not currently in the process of transmitting
285	       a packet.

287	   3.  The amount of data considered in flight is less than the
288	       congestion window (cwnd).

290	   4.  All the packets considered lost have been retransmitted.

292	   If these conditions are all met then the sender has run out of data
293	   to feed the network.  This would effectively create a "bubble" of
294	   idle time in the data pipeline.  This idle time means that any
295	   delivery rate sample obtained from this data packet, and any rate
296	   sample from a packet that follows it in the next round trip, is going
297	   to be an application-limited sample that potentially underestimates
298	   the true available bandwidth.  Thus, when the algorithm marks a
299	   transport flow as application-limited, it marks all bandwidth samples
300	   for the next round trip as application-limited (at which point, the
301	   "bubble" can be said to have exited the data pipeline).

303	3.  Detailed Algorithm

305	3.1.  Variables

307	3.1.1.  Per-connection (C) state

309	   This algorithm requires the following new state variables for each
310	   transport connection:

312	   C.delivered: The total amount of data (measured in octets or in
313	   packets) delivered so far over the lifetime of the transport
314	   connection.  This does not include pure ACK packets.

316	   C.delivered_time: The wall clock time when C.delivered was last
317	   updated.

319	   C.first_sent_time: If packets are in flight, then this holds the send
320	   time of the packet that was most recently marked as delivered.  Else,
321	   if the connection was recently idle, then this holds the send time of
322	   most recently sent packet.

324	   C.app_limited: The index of the last transmitted packet marked as
325	   application-limited, or 0 if the connection is not currently
326	   application-limited.

328	   We also assume that the transport protocol sender implementation
329	   tracks the following state per connection.  If the following state
330	   variables are not tracked by an existing implementation, all the
331	   following parameters MUST be tracked to implement this algorithm:

333	   C.write_seq: The data sequence number one higher than that of the
334	   last octet queued for transmission in the transport layer write
335	   buffer.

337	   C.pending_transmissions: The number of bytes queued for transmission
338	   on the sending host at layers lower than the transport layer (i.e.
339	   network layer, traffic shaping layer, network device layer).

341	   C.lost_out: The number of packets in the current outstanding window
342	   that are marked as lost.

344	   C.retrans_out: The number of packets in the current outstanding
345	   window that are being retransmitted.

347	   C.pipe: The sender's estimate of the amount of data outstanding in
348	   the network (measured in octets or packets).  This includes data
349	   packets in the current outstanding window that are being transmitted
350	   or retransmitted and have not been SACKed or marked lost (e.g. "pipe"
351	   from [RFC6675]).  This does not include pure ACK packets.

353	3.1.2.  Per-packet (P) state

355	   This algorithm requires the following new state variables for each
356	   packet that has been transmitted but not yet ACKed or SACKed:

358	   P.delivered: C.delivered when the packet was sent from transport
359	   connection C.

361	   P.delivered_time: C.delivered_time when the packet was sent.

363	   P.first_sent_time: C.first_sent_time when the packet was sent.

365	   P.is_app_limited: true if C.app_limited was non-zero when the packet
366	   was sent, else false.

368	   P.sent_time: The time when the packet was sent.

370	3.1.3.  Rate Sample (rs) Output

372	   This algorithm provides its output in a RateSample structure rs,
373	   containing the following fields:

375	   rs.delivery_rate: The delivery rate sample (in most cases
376	   rs.delivered / rs.interval).

378	   rs.is_app_limited: The P.is_app_limited from the most recent packet
379	   delivered; indicates whether the rate sample is application-limited.

381	   rs.interval: The length of the sampling interval.

383	   rs.delivered: The amount of data marked as delivered over the
384	   sampling interval.

386	   rs.prior_delivered: The P.delivered count from the most recent packet
387	   delivered.

389	   rs.prior_time: The P.delivered_time from the most recent packet
390	   delivered.

392	   rs.send_elapsed: Send time interval calculated from the most recent
393	   packet delivered (see the "Send Rate" section above).

395	   rs.ack_elapsed: ACK time interval calculated from the most recent
396	   packet delivered (see the "ACK Rate" section above).

398	3.2.  Transmitting or retransmitting a data packet

400	   Upon transmitting or retransmitting a data packet, the sender
401	   snapshots the current delivery information in per-packet state.  This
402	   will allow the sender to generate a rate sample later, in the
403	   UpdateRateSample() step, when the packet is (S)ACKed.

405	   If there are packets already in flight, then we need to start
406	   delivery rate samples from the time we received the most recent ACK,
407	   to try to ensure that we include the full time the network needs to
408	   deliver all in-flight packets.  If there are no packets in flight
409	   yet, then we can start the delivery rate interval at the current
410	   time, since we know that any ACKs after now indicate that the network
411	   was able to deliver those packets completely in the sampling interval
412	   between now and the next ACK.

414	   Upon each packet transmission, the sender executes the following
415	   steps:

417	     SendPacket(Packet P):
418	       if (SND.NXT == SND.UNA)  /* no packets in flight yet? */
419	         C.first_sent_time  = C.delivered_time = Now()
420	       P.first_sent_time = C.first_sent_time
421	       P.delivered_time  = C.delivered_time
422	       P.delivered       = C.delivered
423	       P.is_app_limited  = (C.app_limited != 0)

425	3.3.  Upon receiving an ACK

427	   When an ACK arrives, the sender invokes GenerateRateSample() to fill
428	   in a rate sample.  For each packet that was newly SACKed or ACKed,
429	   UpdateRateSample() updates the rate sample based on a snapshot of
430	   connection delivery information from the time at which the packet was
431	   last transmitted.  UpdateRateSample() is invoked multiple times when
432	   a stretched ACK acknowledges multiple data packets.  In this case we
433	   use the information from the most recently sent packet, i.e., the
434	   packet with the highest "P.delivered" value.

436	     /* Upon receiving ACK, fill in delivery rate sample rs. */
437	     GenerateRateSample(RateSample rs):
438	       for each newly SACKed or ACKed packet P
439	         UpdateRateSample(P, rs)

441	       /* Clear app-limited field if bubble is ACKed and gone. */
442	       if (C.app_limited and C.delivered > C.app_limited)
443	         C.app_limited = 0

445	       if (rs.prior_time == 0)
446	         return false  /* nothing delivered on this ACK */

448	       /* Use the longer of the send_elapsed and ack_elapsed */
449	       rs.interval = max(rs.send_elapsed, rs.ack_elapsed)

451	       rs.delivered = C.delivered - rs.prior_delivered

453	       /* Normally we expect interval >= MinRTT.
454	        * Note that rate may still be overestimated when a spuriously
455	        * retransmitted skb was first (s)acked because "interval"
456	        * is under-estimated (up to an RTT). However, continuously
457	        * measuring the delivery rate during loss recovery is crucial
458	        * for connections that suffer heavy or prolonged losses.
459	        */
460	       if (rs.interval <  MinRTT(tp))
461	         rs.interval = -1
462	         return false  /* no reliable sample */

464	       if (rs.interval != 0)
465	         rs.delivery_rate = rs.delivered / rs.interval

467	       return true;  /* we filled in rs with a rate sample */

469	     /* Update rs when a packet is SACKed or ACKed. */
470	     UpdateRateSample(Packet P, RateSample rs):
471	       if P.delivered_time == 0
472	         return /* P already SACKed */

474	       C.delivered += P.data_length
475	       C.delivered_time = Now()

477	       /* Update info using the newest packet: */
478	       if (P.delivered > rs.prior_delivered)
479	         rs.prior_delivered  = P.delivered
480	         rs.prior_time       = P.delivered_time
481	         rs.is_app_limited   = P.is_app_limited
482	         rs.send_elapsed     = P.sent_time - P.first_sent_time
483	         rs.ack_elapsed      = C.delivered_time - P.delivered_time
484	         C.first_sent_time   = P.sent_time

486	       /* Mark the packet as delivered once it's SACKed to
487	        * avoid being used again when it's cumulatively acked.
488	        */
489	       P.delivered_time = 0

491	3.4.  Detecting application-limited phases

493	   An application-limited phase starts when the connection decides to
494	   send more data, at a point in time when the connection has run out of
495	   data.  Some decisions to send more data are triggered by the
496	   application writing more data to the connection, and some are
497	   triggered by loss detection (during ACK processing or upon the
498	   triggering of a timer) estimating that some sequence ranges need to
499	   be retransmitted.  To detect all such cases, the algorithm calls
500	   CheckIfApplicationLimited() to check for application-limited behavior
501	   in the following situations:

503	   *  The sending application asks the transport layer to send more
504	      data; i.e., upon each write from the application, before new
505	      application data is enqueued in the transport send buffer or
506	      transmitted.

508	   *  At the beginning of ACK processing, before updating the estimated
509	      number of packets in flight, and before congestion control
510	      modifies the cwnd or pacing rate.

512	   *  At the beginning of connection timer processing, for all timers
513	      that might result in the transmission of one or more data
514	      segments.  For example: RTO timers, TLP timers, RACK reordering
515	      timers, or Zero Window Probe timers.

517	   When checking for application-limited behavior, the connection checks
518	   all the conditions previously described in the "Tracking application-
519	   limited phases" section, and if all are met then it marks the
520	   connection as application-limited:

522	     CheckIfApplicationLimited():
523	       if (C.write_seq - SND.NXT < SND.MSS and
524	           C.pending_transmissions == 0 and
525	           C.pipe < cwnd and
526	           C.lost_out <= C.retrans_out)
527	         C.app_limited = (C.delivered + C.pipe) ? : 1

529	4.  Discussion

531	4.1.  Offload Mechanisms

533	   If a transport sender implementation uses an offload mechanism (such
534	   as TSO, GSO, etc.) to combine multiple SMSS of data into a single
535	   packet "aggregate" for the purposes of scheduling transmissions, then
536	   it is RECOMMENDED that the per-packet state be tracked for each
537	   packet "aggregate" rather than each SMSS.  For simplicity this
538	   document refers to such state as "per-packet", whether it is per
539	   "aggregate" or per SMSS.

541	4.2.  Impact of ACK losses

543	   Delivery rate samples are generated upon receiving each ACK; ACKs may
544	   contain both cumulative and selective acknowledgment information.
545	   Losing an ACK results in losing the delivery rate sample
546	   corresponding to that ACK, and generating a delivery rate sample at
547	   later a time (upon the arrival of the next ACK).  This can
548	   underestimate the delivery rate due the artificially inflated
549	   "rs.interval".  As with any effect that can cause underestimation, it
550	   is RECOMMENDED that applications or congestion control algorithms
551	   using the output of this algorithm apply appropriate filtering to
552	   mitigate the impact of this effect.

554	4.3.  Impact of packet reordering

556	   This algorithm is robust to packet reordering; it makes no
557	   assumptions about the order in which packets are delivered or ACKed.
558	   In particular, for a particular packet P, it does not matter which
559	   packets are delivered between the transmission of P and the ACK of
560	   packet P, since C.delivered will be incremented appropriately in any
561	   case.

563	4.4.  Impact of packet loss and retransmissions

565	   There are several possible approaches for handling cases where a
566	   delivery rate sample is based on an ACK or SACK for a retransmitted
567	   packet.

569	   If the transport protocol supports unambiguous ACKs for retransmitted
570	   data sequence ranges (as in QUIC [RFC9000]) then the algorithm is
571	   perfectly robust to retransmissions, because the starting packet, P,
572	   for the sample can be unambiguously retrieved.

574	   If the transport protocol, like TCP [RFC793], has ambiguous ACKs for
575	   retransmitted sequence ranges, then the following approaches MAY be
576	   used:

578	   1.  The sender MAY choose to filter out implausible delivery rate
579	       samples, as described in the GenerateRateSample() step in the
580	       "Upon receiving an ACK" section, by discarding samples whose
581	       rs.interval is lower than the minimum RTT seen on the connection.

583	   2.  The sender MAY choose to skip the generation of a delivery rate
584	       sample for a retransmitted sequence range.

586	4.5.  Connections without SACK support

588	   If the transport connection does not use SACK (i.e., either or both
589	   ends of the connections do not accept SACK), then this algorithm can
590	   be extended to estimate approximate delivery rates using duplicate
591	   ACKs (much like Reno and [RFC5681] estimates that each duplicate ACK
592	   indicates that a data packet has been delivered).  The details of
593	   this extension will be described in a future version of this draft.

595	5.  Implementation Status

597	   This section records the status of known implementations of the
598	   algorithm defined by this specification at the time of posting of
599	   this Internet-Draft, and is based on a proposal described in
600	   [RFC7942].  The description of implementations in this section is
601	   intended to assist the IETF in its decision processes in progressing
602	   drafts to RFCs.  Please note that the listing of any individual
603	   implementation here does not imply endorsement by the IETF.
604	   Furthermore, no effort has been spent to verify the information
605	   presented here that was supplied by IETF contributors.  This is not
606	   intended as, and must not be construed to be, a catalog of available
607	   implementations or their features.  Readers are advised to note that
608	   other implementations may exist.

610	   According to [RFC7942], "this will allow reviewers and working groups
611	   to assign due consideration to documents that have the benefit of
612	   running code, which may serve as evidence of valuable experimentation
613	   and feedback that have made the implemented protocols more mature.
614	   It is up to the individual working groups to use this information as
615	   they see fit".

617	   As of the time of writing, the following implementations of this
618	   algorithm have been publicly released:

620	   *  Linux TCP

622	      -  Source code URL:

624	         o  GPLv2 license: https://git.kernel.org/pub/scm/linux/kernel/g
625	            it/torvalds/linux.git/tree/net/ipv4/tcp_rate.c

627	         o  BSD-style license: https://groups.google.com/d/msg/bbr-
628	            dev/X0LbDptlOzo/EVgkRjVHBQAJ

630	      -  Source: Google

632	      -  Maturity: production

634	      -  License: dual-licensed: GPLv2 / BSD-style

636	      -  Contact: https://groups.google.com/d/forum/bbr-dev

638	      -  Last updated: September 24, 2021

640	   *  QUIC

642	      -  Source code URLs:

644	         o  https://source.chromium.org/chromium/chromium/src/+/master:n
645	            et/third_party/quiche/src/quic/core/congestion_control/
646	            bandwidth_sampler.cc

648	         o  https://source.chromium.org/chromium/chromium/src/+/master:n
649	            et/third_party/quiche/src/quic/core/congestion_control/
650	            bandwidth_sampler.h

652	      -  Source: Google

654	      -  Maturity: production

656	      -  License: BSD-style

658	      -  Contact: https://groups.google.com/d/forum/bbr-dev

660	      -  Last updated: October 5, 2021

662	6.  Security Considerations

664	   This proposal makes no changes to the underlying security of
665	   transport protocols or congestion control algorithms.  This algorithm
666	   adds no security considerations beyond those involved in the existing
667	   standard congestion control algorithm [RFC5681].

669	7.  IANA Considerations

671	   This document makes no request of IANA.

673	   Note to RFC Editor: this section may be removed on publication as an
674	   RFC.

676	8.  Acknowledgments

678	   The authors would like to thank C.  Stephen Gunn, Eric Dumazet, Ian
679	   Swett, Jana Iyengar, Victor Vasiliev, Nandita Dukkipati, Pawel
680	   Jurczyk, Biren Roy, David Wetherall, Amin Vahdat, Leonidas
681	   Kontothanassis, and the YouTube, google.com, Bandwidth Enforcer, and
682	   Google SRE teams for their invaluable help and support.  We would
683	   like to thank Hiren Panchasara, Bao Zheng, and Jonathan Morton for
684	   feedback and suggestions on earlier versions of this document.

686	9.  References

688	9.1.  Normative References

690	   [RFC793]   Postel, J., "Transmission Control Protocol", September
691	              1981.

693	   [RFC2018]  Mathis, M. and J. Mahdavi, "TCP Selective Acknowledgment
694	              Options", RFC 2018, October 1996,
695	              .

697	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
698	              Control", RFC 5681, September 2009,
699	              .

701	   [RFC7942]  Sheffer, Y. and A. Farrel, "Improving Awareness of Running
702	              Code: The Implementation Status Section", July 2016.

704	   [RFC6675]  Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
705	              and Y. Nishida, "A Conservative Loss Recovery Algorithm
706	              Based on Selective Acknowledgment (SACK) for TCP",
707	              RFC 6675, August 2012,
708	              .

710	   [RFC9000]  Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
711	              Multiplexed and Secure Transport", RFC 9000,
712	              DOI 10.17487/RFC9000, May 2021,
713	              .

715	9.2.  Informative References

717	   [draft-cardwell-iccrg-bbr-congestion-control]
718	              Cardwell, N., Cheng, Y., Hassas Yeganeh, S., and V.
719	              Jacobson, "BBR Congestion Control", Work in Progress,
720	              Internet-Draft, draft-cardwell-iccrg-bbr-congestion-
721	              control, November 2021, .

724	   [CCGHJ17]  Cardwell, N., Cheng, Y., Gunn, C., Hassas Yeganeh, S., and
725	              V. Jacobson, "BBR: Congestion-Based Congestion Control",
726	              Communications of the ACM Feb 2017, February 2017,
727	              .

730	   [A15]      Abrahamsson, M., "TCP ACK suppression", IETF AQM mailing
731	              list , November 2015, .

734	Authors' Addresses

736	   Yuchung Cheng
737	   Google
738	   Email: ycheng@google.com
739	   Neal Cardwell
740	   Google
741	   Email: ncardwell@google.com

743	   Soheil Hassas Yeganeh
744	   Google
745	   Email: soheil@google.com

747	   Van Jacobson
748	   Google
749	   Email: vanj@google.com