idnits 2.17.1 

draft-ietf-tcpm-prr-rfc6937bis-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  -- The draft header indicates that this document obsoletes RFC6937, but the
     abstract doesn't seem to directly say this.  It does mention RFC6937
     though, so this could be OK.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document date (3 June 2022) is 693 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC0793' is defined on line 602, but no explicit
     reference was found in the text

  == Unused Reference: 'CUBIC' is defined on line 633, but no explicit
     reference was found in the text

  == Unused Reference: 'FACK' is defined on line 636, but no explicit
     reference was found in the text

  == Unused Reference: 'Laminar' is defined on line 655, but no explicit
     reference was found in the text

  == Unused Reference: 'RHweb' is defined on line 682, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC 3517
     (Obsoleted by RFC 6675)


     Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	TCP Maintenance Working Group                                  M. Mathis
3	Internet-Draft                                              N. Dukkipati
4	Obsoletes: 6937 (if approved)                                   Y. Cheng
5	Intended status: Standards Track                            Google, Inc.
6	Expires: 5 December 2022                                     3 June 2022

8	                  Proportional Rate Reduction for TCP
9	                   draft-ietf-tcpm-prr-rfc6937bis-02

11	Abstract

13	   This document updates the experimental Proportional Rate Reduction
14	   (PRR) algorithm, described RFC 6937, to standards track.  PRR
15	   potentially replaces the Fast Recovery to regulate the amount of data
16	   sent by TCP or other transport protocol during loss recovery.  PRR
17	   accurately regulates the actual flight size through recovery such
18	   that at the end of recovery it will be as close as possible to the
19	   slow start threshold (ssthresh), as determined by the congestion
20	   control algorithm.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at https://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on 5 December 2022.

39	Copyright Notice

41	   Copyright (c) 2022 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
46	   license-info) in effect on the date of publication of this document.
47	   Please review these documents carefully, as they describe your rights
48	   and restrictions with respect to this document.  Code Components
49	   extracted from this document must include Revised BSD License text as
50	   described in Section 4.e of the Trust Legal Provisions and are
51	   provided without warranty as described in the Revised BSD License.

53	Table of Contents

55	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
56	     1.1.  Document and WG Information . . . . . . . . . . . . . . .   3
57	   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   3
58	   3.  Changes From RFC 6937 . . . . . . . . . . . . . . . . . . . .   5
59	   4.  Relationships to other standards  . . . . . . . . . . . . . .   6
60	   5.  Definitions . . . . . . . . . . . . . . . . . . . . . . . . .   7
61	   6.  Algorithms  . . . . . . . . . . . . . . . . . . . . . . . . .   8
62	   7.  Examples  . . . . . . . . . . . . . . . . . . . . . . . . . .   9
63	   8.  Properties  . . . . . . . . . . . . . . . . . . . . . . . . .  11
64	   9.  Adapting PRR to other transport protocols . . . . . . . . . .  13
65	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  13
66	   11. Security Considerations . . . . . . . . . . . . . . . . . . .  13
67	   12. Normative References  . . . . . . . . . . . . . . . . . . . .  13
68	   13. Informative References  . . . . . . . . . . . . . . . . . . .  14
69	   Appendix A.  Strong Packet Conservation Bound . . . . . . . . . .  15
70	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

72	1.  Introduction

74	   This document updates the Proportional Rate Reduction (PRR) algorithm
75	   described in [RFC6937] from experimental to standards track.  PRR
76	   accuracy regulates the amount of data sent during loss recovery, such
77	   that at the end of recovery the flight size will be as close as
78	   possible to the slow start threshold (ssthresh), as determined by the
79	   congestion control algorithm.  PRR has been deployed in at least
80	   three major TCP implementations covering the vast majority of today's
81	   web traffic.

83	   The main change from RFC 6937 is the introduction of a new heuristic
84	   that replaces a manual configuration parameter and non-SACK support.
85	   The new heuristic only changes behaviors in corner cases that were
86	   not relevant before the Lost Retransmission Detection (LRD)
87	   algorithm.  LRD was not implemented until after RFC 6937 was
88	   published.  This document also includes additional discussion about
89	   integration into other congestion control and lost detection
90	   algorithms.

92	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
93	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
94	   document are to be interpreted as described in [RFC2119]

96	1.1.  Document and WG Information

98	   Formatted: 2022-06-03 14:55:13-07:00

100	   Please send all comments, questions and feedback to tcpm@ietf.org

102	   About revision 00:

104	   The introduction above was drawn from draft-mathis-tcpm-rfc6937bis-
105	   00.  All of the text below was copied verbatim from RFC 6937, to
106	   facilitate comparison between RFC 6937 and this document as it
107	   evolves.

109	   About revision 01:

111	   *  Recast the RFC 6937 introduction as background

113	   *  Made "Changes From RFC 6937" an explicit section

115	   *  Made Relationships to other standards more explicit

117	   *  Added a generalized safeACK heuristic

119	   *  Provided hints for non TCP implementations

121	   *  Added language about detecting ACK splitting, but have no advice
122	      on actions (yet)

124	   About revision 02:

126	   *  Companion RACK loss detection RECOMMENDED

128	   *  Non-SACK accounting in the pseudo code

130	   *  cwnd computation in the pseudo code

132	   *  Force fast retransmit at the beginning of Fast Recovery

134	   *  Remove deprecated Rate-Halving text

136	   *  Fixed bugs in the example traces

138	2.  Background

140	   This section is copied almost verbatim from the introduction to
141	   [RFC6937].

143	   Standard congestion control [RFC5681] requires that TCP (and other
144	   protocols) reduce their congestion window (cwnd) in response to
145	   losses.  Fast Recovery, described in the same document, is the
146	   reference algorithm for making this adjustment.  Its stated goal is
147	   to recover TCP's self clock by relying on returning ACKs during
148	   recovery to clock more data into the network.  Fast Recovery
149	   typically adjusts the window by waiting for one half round-trip time
150	   (RTT) of ACKs to pass before sending any data.  It is fragile because
151	   it cannot compensate for the implicit window reduction caused by the
152	   losses themselves.

154	   [RFC6675] makes Fast Recovery with Selective Acknowledgement (SACK)
155	   [RFC2018] more accurate by computing "pipe", a sender side estimate
156	   of the number of bytes still outstanding in the network.  With
157	   [RFC6675], Fast Recovery is implemented by sending data as necessary
158	   on each ACK to prevent pipe from falling below ssthresh, the window
159	   size as determined by the congestion control algorithm.  This
160	   protects Fast Recovery from timeouts in many cases where there are
161	   heavy losses, although not if the entire second half of the window of
162	   data or ACKs are lost.  However, a single ACK carrying a SACK option
163	   that implies a large quantity of missing data can cause a step
164	   discontinuity in the pipe estimator, which can cause Fast Retransmit
165	   to send a burst of data.

167	   PRR avoids these excess window adjustments such that at the end of
168	   recovery the actual window size will be as close as possible to
169	   ssthresh, the window size as determined by the congestion control
170	   algorithm.  It uses the fraction that is appropriate for the target
171	   window chosen by the congestion control algorithm.  During PRR, one
172	   of two additional Reduction Bound algorithms limits the total window
173	   reduction due to all mechanisms, including transient application
174	   stalls and the losses themselves.

176	   We describe two slightly different Reduction Bound algorithms:
177	   Conservative Reduction Bound (CRB), which is strictly packet
178	   conserving; and a Slow Start Reduction Bound (SSRB), which is more
179	   aggressive than CRB by, at most, 1 segment per ACK.  PRR-CRB meets
180	   the Strong Packet Conservation Bound described in Appendix A;
181	   however, in real networks it does not perform as well as the
182	   algorithms described in [RFC6675], which prove to be more aggressive
183	   in a significant number of cases.  SSRB offers a compromise by
184	   allowing TCP to send 1 additional segment per ACK relative to CRB in
185	   some situations.  Although SSRB is less aggressive than [RFC6675]
186	   (transmitting fewer segments or taking more time to transmit them),
187	   it outperforms due to the lower probability of additional losses
188	   during recovery.

190	   The Strong Packet Conservation Bound on which PRR and both Reduction
191	   Bounds are based is patterned after Van Jacobson's packet
192	   conservation principle: segments delivered to the receiver are used
193	   as the clock to trigger sending the same number of segments back into
194	   the network.  As much as possible, PRR and the Reduction Bound
195	   algorithms rely on this self clock process, and are only slightly
196	   affected by the accuracy of other estimators, such as pipe [RFC6675]
197	   and cwnd.  This is what gives the algorithms their precision in the
198	   presence of events that cause uncertainty in other estimators.

200	   The original definition of the packet conservation principle
201	   [Jacobson88] treated packets that are presumed to be lost (e.g.,
202	   marked as candidates for retransmission) as having left the network.
203	   This idea is reflected in the pipe estimator defined in RFC 6675 and
204	   used here, but it is distinct from the Strong Packet Conservation
205	   Bound as described in Appendix A, which is defined solely on the
206	   basis of data arriving at the receiver.

208	3.  Changes From RFC 6937

210	   The largest change since [RFC6937] is the introduction of a new
211	   heuristic that uses good recovery progress (For TCP, when the latest
212	   ACK advances snd.una and does not indicate prior fast retransmit has
213	   been lost) to select which Reduction Bound.  [RFC6937] left the
214	   choice of Reduction Bound to the discretion of the implementer but
215	   recommended to use SSRB by default.  For all of the environments
216	   explored in earlier PRR research, the new heuristic is consistent
217	   with the old recommendation.

219	   The paper "An Internet-Wide Analysis of Traffic Policing"
220	   [Flach2016policing] uncovered a crucial situation not previously
221	   explored, where both Reduction Bounds perform very poorly, but for
222	   different reasons.  Under many configurations, token bucket traffic
223	   policers [token_bucket] can suddenly start discarding a large
224	   fraction of the traffic when tokens are depleted, without any warning
225	   to the end systems.  The transport congestion control has no
226	   opportunity to measure the token rate, and sets ssthresh based on the
227	   previously observed path performance.  This value for ssthresh may
228	   cause a data rate that is substantially larger than the token rate,
229	   causing high loss.  Under these conditions, both reduction bounds
230	   perform very poorly.  PRR-CRB is too timid, sometimes causing very
231	   long recovery times at smaller than necessary windows, and PRR-SSRB
232	   is too aggressive, often causing many retransmissions to be lost
233	   multiple rounds.  Both cases lead to prolonged recovery decimating
234	   application goodput.

236	   Investigating these environments led to the development of a
237	   "safeACK" heuristic to dynamically switch between Reduction Bounds:
238	   by default conservatively use PRR-CRB and only switch to PRR-SSRB
239	   when ACKs indicate the recovery is making good progress (snd.una is
240	   advancing without any new losses).

242	   This heuristic is only invoked where application-limited behavior,
243	   losses or other events cause the flight size to fall below ssthresh.
244	   The extreme loss rates that make the heuristic essential are only
245	   common in the presence of heavy losses such as traffic policers
246	   [Flach2016policing].  In these environments the heuristic serves to
247	   salvage a bad situation and any reasonable implementation of the
248	   heuristic performs far better than either bound by itself.

250	   Another subtle change is to force a fast retransmit upon the first
251	   ACK that triggers the recovery.  Previously PRR may not allow a fast
252	   retransmit (i.e. sndcnt is 0) on the first ACK depending on the loss
253	   situation.  Forcing a fast retransmit is important to keep the ACK
254	   clock and avoid potential RTO events.  The forced fast retransmit
255	   only happens once during the entire recovery and still follows the
256	   packet conservation principles in PRR.  This heuristic has been in
257	   the first widely deployed implementation.

259	   Since [RFC6937] was written, PRR has also been adapted to perform
260	   multiplicative window reduction for non-loss based congestion control
261	   algorithms, such as for [RFC3168] style ECN.  This is typically done
262	   by using some parts of the loss recovery state machine (in particular
263	   the RecoveryPoint from [RFC6675]) to invoke the PRR ACK processing
264	   for exactly one round trip worth of ACKs.

266	   For [RFC6937] we published a companion paper [IMC11] in which we
267	   evaluated [RFC3517] and various experimental PRR versions in a large
268	   scale measurement study.  Today, the legacy algorithms used in that
269	   study have already faded from the code base, making such comparisons
270	   impossible without recreating historical algorithms.  Readers
271	   interested in the measurement study should review section 5 of RFC
272	   6937 and the IMC paper [IMC11].

274	4.  Relationships to other standards

276	   PRR is described as modifications to "TCP Congestion Control"
277	   [RFC5681], and "A Conservative Loss Recovery Algorithm Based on
278	   Selective Acknowledgment (SACK) for TCP" [RFC6675].  It is most
279	   accurate with SACK [RFC2018] but does not require SACK.

281	   The SafeACK heuristic came about as a result of robust Lost
282	   Retransmission Detection under development in an early precursor to
283	   [RFC8985].  Without LRD, policers that cause very high loss rates are
284	   guaranteed to also cause retransmission timeouts because both
285	   [RFC5681] and [RFC6675] will send retransmissions above the policed
286	   rate.  It is RECOMMENDED that PRR is implemented together with
287	   [RFC8985].

289	5.  Definitions

291	   The following terms, parameters, and state variables are used as they
292	   are defined in earlier documents:

294	   RFC 793: snd.una (send unacknowledged).

296	   RFC 5681: duplicate ACK, FlightSize, Sender Maximum Segment Size
297	   (SMSS).

299	   RFC 6675: covered (as in "covered sequence numbers").

301	   Voluntary window reductions: choosing not to send data in response to
302	   some ACKs, for the purpose of reducing the sending window size and
303	   data rate.

305	   PRR defines additional variables:

307	   DeliveredData is the number of bytes newly delivered by the most
308	   recent ACK.  With SACK, DeliveredData is the change in snd.una plus
309	   the bytes newly selectively acknowledged.  Without SACK,
310	   DeliveredData is the change in snd.una for a partial ACK or 1 MSS
311	   worth of bytes for a DUPACK.

313	   Note that without SACK, a poorly-behaved receiver that returns
314	   extraneous DUPACKs as depicted in [Savage99] can artificially inflate
315	   DeliveredData.  As a mitigation, PRR disallows incrementing
316	   DeliveredData when the total bytes delivered exceeds the outstanding
317	   data upon recovery (i.e., RecoverFS).

319	   safeACK: A local boolean variable indicating that the current ACK
320	   reported good progress.  SafeACK is true only when the ACK has
321	   cumulatively acknowledged new data and the ACK does not indicate
322	   further losses.  Both conditions indicate the recovery is making good
323	   progress and can send more aggressively.

325	   sndcnt: A local variable indicating exactly how many bytes should be
326	   sent in response to each ACK.  Note that the decision of which data
327	   to send (e.g., retransmit missing data or send more new data) is out
328	   of scope for this document.

330	6.  Algorithms

332	   At the beginning of recovery, initialize the PRR state.  This assumes
333	   a modern congestion control algorithm, CongCtrlAlg(), that might set
334	   ssthresh to something other than FlightSize/2:

336	      ssthresh = CongCtrlAlg()      // Target flight size in recovery
337	      prr_delivered = 0             // Total bytes delivered in recovery
338	      prr_out = 0                   // Total bytes sent in recovery
339	      RecoverFS = snd.nxt - snd.una // FlightSize right before recovery

341	                                  Figure 1

343	   On every ACK starting or during the recovery:
344	      DeliveredData = bytes newly cumulatively acknowledged
345	      if (SACK is used) {
346	         DeliveredData += bytes newly selectively acknowledged
347	      } else if (ACK is a DUPACK and prr_delivered < RecoverFS) {
348	         DeliveredData += MSS
349	      }
350	      if (DeliveredData is 0)
351	         Return

353	      prr_delivered += DeliveredData
354	      pipe = (RFC 6675 pipe algorithm)
355	      safeACK = (snd.una advances and no further loss indicated)
356	      if (pipe > ssthresh) {
357	         // Proportional Rate Reduction
358	         sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out
359	      } else {
360	         // PRR-CRB by default
361	         sndcnt = MAX(prr_delivered - prr_out, DeliveredData)
362	         if (safeACK) {
363	            // PRR-SSRB when recovery is in good progress
364	            sndcnt +=MSS
365	         }
366	         // Attempt to catch up, as permitted
367	         sndcnt = MIN(ssthresh - pipe, sndcnt)
368	      }

370	      if (prr_out is 0 AND sndcnt is 0) {
371	         // Force a fast retransmit upon entering recovery
372	         sndcnt = MSS
373	      }
374	      cwnd = pipe + sndcnt
375	                                  Figure 2

377	   On any data transmission or retransmission:
378	      prr_out += (data sent)

380	                                  Figure 3

382	7.  Examples

384	   We illustrate these algorithms by showing their different behaviors
385	   for two scenarios: TCP experiencing either a single loss or a burst
386	   of 15 consecutive losses.  In all cases we assume bulk data (no
387	   application pauses), standard Additive Increase Multiplicative
388	   Decrease (AIMD) congestion control [RFC5681], and cwnd = FlightSize =
389	   pipe = 20 segments, so ssthresh will be set to 10 at the beginning of
390	   recovery.  We also assume standard Fast Retransmit and Limited
391	   Transmit [RFC3042], so TCP will send 2 new segments followed by 1
392	   retransmit in response to the first 3 duplicate ACKs following the
393	   losses.

395	   Each of the diagrams below shows the per ACK response to the first
396	   round trip for the various recovery algorithms when the zeroth
397	   segment is lost.  The top line indicates the transmitted segment
398	   number triggering the ACKs, with an X for the lost segment.  "cwnd"
399	   and "pipe" indicate the values of these algorithms after processing
400	   each returning ACK but before further (re)transmission.  "Sent"
401	   indicates how much 'N'ew or 'R'etransmitted data would be sent.  Note
402	   that the algorithms for deciding which data to send are out of scope
403	   of this document.

405	   RFC 6675
406	   ack#   X  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
407	   cwnd:    20 20 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
408	   pipe:    19 19 18 18 17 16 15 14 13 12 11 10 10 10 10 10 10 10 10
409	   sent:     N  N  R                          N  N  N  N  N  N  N  N

411	   PRR
412	   ack#   X  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
413	   cwnd:    20 20 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 10
414	   pipe:    19 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 11 10
415	   sent:     N  N  R     N     N     N     N     N     N     N     N

417	                                  Figure 4

419	   Note that both algorithms send the same total amount of data.  RFC
420	   6675 experiences a "half window of silence" while PRR spreads the
421	   voluntary window reduction across an entire RTT.

423	   Next, we consider the same initial conditions when the first 15
424	   packets (0-14) are lost.  During the remainder of the lossy RTT, only
425	   5 ACKs are returned to the sender.  We examine each of these
426	   algorithms in succession.

428	   RFC 6675
429	   ack#   X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  15 16 17 18 19
430	   cwnd:                                               20 20 11 11 11
431	   pipe:                                               19 19  4 10 10
432	   sent:                                                N  N 7R  R  R

434	   PRR
435	   ack#   X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  15 16 17 18 19
436	   cwnd:                                               20 20  5  5  5
437	   pipe:                                               19 19  4  4  4
438	   sent:                                                N  N  R  R  R

440	                                  Figure 5

442	   In this specific situation, RFC 6675 is more aggressive because once
443	   Fast Retransmit is triggered (on the ACK for segment 17), TCP
444	   immediately retransmits sufficient data to bring pipe up to cwnd.
445	   Our earlier measurements [RFC 6937 section 6] indicates that RFC 6675
446	   significantly outperforms PRR, and some other similarly conservative
447	   algorithms that we tested, showing that it is significantly common
448	   for the actual losses to exceed the window reduction determined by
449	   the congestion control algorithm.

451	   Under such heavy losses, PRR uses the PRR-CRB to follow the packet
452	   conservation principle.  Since the total losses bring pipe below
453	   ssthresh, data is sent such that the total data transmitted, prr_out,
454	   follows the total data delivered to the receiver as reported by
455	   returning ACKs.  Transmission is controlled by the sending limit,
456	   which is set to prr_delivered - prr_out.  PRR-CRB's conservative
457	   window reduction causes it to take excessively long to recover the
458	   losses and exposes it to additional timeouts.

460	   While not shown in the figure above, once the fast retransmits sent
461	   upon ACK#17 deliver and solicit further ACKs that increment the
462	   snd.una, PRR enters PRR-SSRB and increases the window by exactly 1
463	   segment per ACK until pipe rises to ssthresh during recovery.  On
464	   heavy losses when cwnd is large, PRR-SSRB recovers the losses
465	   exponentially faster than PRR-CRB.  Although increasing the window
466	   during recovery seems to be ill advised, it is important to remember
467	   that this is actually less aggressive than permitted by RFC 5681,
468	   which sends the same quantity of additional data as a single burst in
469	   response to the ACK that triggered Fast Retransmit.

471	   For less severe loss events, where the total losses are smaller than
472	   the difference between FlightSize and ssthresh, PRR-CRB and PRR-SSRB
473	   are not invoked since PRR stays in the proportional rate reduction
474	   mode.

476	8.  Properties

478	   The following properties are common to both PRR-CRB and PRR-SSRB,
479	   except as noted:

481	   PRR maintains TCP's ACK clocking across most recovery events,
482	   including burst losses.  RFC 6675 can send large unclocked bursts
483	   following burst losses.

485	   Normally, PRR will spread voluntary window reductions out evenly
486	   across a full RTT.  This has the potential to generally reduce the
487	   burstiness of Internet traffic, and could be considered to be a type
488	   of soft pacing.  Hypothetically, any pacing increases the probability
489	   that different flows are interleaved, reducing the opportunity for
490	   ACK compression and other phenomena that increase traffic burstiness.
491	   However, these effects have not been quantified.

493	   If there are minimal losses, PRR will converge to exactly the target
494	   window chosen by the congestion control algorithm.  Note that as TCP
495	   approaches the end of recovery, prr_delivered will approach RecoverFS
496	   and sndcnt will be computed such that prr_out approaches ssthresh.

498	   Implicit window reductions, due to multiple isolated losses during
499	   recovery, cause later voluntary reductions to be skipped.  For small
500	   numbers of losses, the window size ends at exactly the window chosen
501	   by the congestion control algorithm.

503	   For burst losses, earlier voluntary window reductions can be undone
504	   by sending extra segments in response to ACKs arriving later during
505	   recovery.  Note that as long as some voluntary window reductions are
506	   not undone, the final value for pipe will be the same as ssthresh,
507	   the target cwnd value chosen by the congestion control algorithm.

509	   PRR with either Reduction Bound improves the situation when there are
510	   application stalls, e.g., when the sending application does not queue
511	   data for transmission quickly enough or the receiver stops advancing
512	   rwnd (receiver window).  When there is an application stall early
513	   during recovery, prr_out will fall behind the sum of the
514	   transmissions permitted by sndcnt.  The missed opportunities to send
515	   due to stalls are treated like banked voluntary window reductions;
516	   specifically, they cause prr_delivered - prr_out to be significantly
517	   positive.  If the application catches up while TCP is still in
518	   recovery, TCP will send a partial window burst to catch up to exactly
519	   where it would have been had the application never stalled.  Although
520	   this burst might be viewed as being hard on the network, this is
521	   exactly what happens every time there is a partial RTT application
522	   stall while not in recovery.  We have made the partial RTT stall
523	   behavior uniform in all states.  Changing this behavior is out of
524	   scope for this document.

526	   PRR with Reduction Bound is less sensitive to errors in the pipe
527	   estimator.  While in recovery, pipe is intrinsically an estimator,
528	   using incomplete information to estimate if un-SACKed segments are
529	   actually lost or merely out of order in the network.  Under some
530	   conditions, pipe can have significant errors; for example, pipe is
531	   underestimated when a burst of reordered data is prematurely assumed
532	   to be lost and marked for retransmission.  If the transmissions are
533	   regulated directly by pipe as they are with RFC 6675, a step
534	   discontinuity in the pipe estimator causes a burst of data, which
535	   cannot be retracted once the pipe estimator is corrected a few ACKs
536	   later.  For PRR, pipe merely determines which algorithm, PRR or the
537	   Reduction Bound, is used to compute sndcnt from DeliveredData.  While
538	   pipe is underestimated, the algorithms are different by at most 1
539	   segment per ACK.  Once pipe is updated, they converge to the same
540	   final window at the end of recovery.

542	   Under all conditions and sequences of events during recovery, PRR-CRB
543	   strictly bounds the data transmitted to be equal to or less than the
544	   amount of data delivered to the receiver.  We claim that this Strong
545	   Packet Conservation Bound is the most aggressive algorithm that does
546	   not lead to additional forced losses in some environments.  It has
547	   the property that if there is a standing queue at a bottleneck with
548	   no cross traffic, the queue will maintain exactly constant length for
549	   the duration of the recovery, except for +1/-1 fluctuation due to
550	   differences in packet arrival and exit times.  See Appendix A for a
551	   detailed discussion of this property.

553	   Although the Strong Packet Conservation Bound is very appealing for a
554	   number of reasons, our earlier measurements [RFC 6937 section 6]
555	   demonstrate that it is less aggressive and does not perform as well
556	   as RFC 6675, which permits bursts of data when there are bursts of
557	   losses.  PRR-SSRB is a compromise that permits TCP to send 1 extra
558	   segment per ACK as compared to the Packet Conserving Bound when the
559	   ACK indicates the recovery is in good progress without further
560	   losses.  From the perspective of a strict Packet Conserving Bound,
561	   PRR-SSRB does indeed open the window during recovery; however, it is
562	   significantly less aggressive than RFC 6675 in the presence of burst
563	   losses.

565	9.  Adapting PRR to other transport protocols

567	   The main PRR algorithm and reductions bounds can be adapted to any
568	   transport that can support RFC 6675.  In one major implementation
569	   (Linux TCP), PRR has been the default fast recovery algorithm for its
570	   default and supported congestion control modules.

572	   The safeACK heuristic can be generalized as any ACK of a
573	   retransmission that does not cause some other segment to be marked
574	   for retransmission.  That is, PRR_SSRB is safe on any ACK that
575	   reduces the total number of pending and outstanding retransmissions.

577	10.  Acknowledgements

579	   This document is based in part on previous incomplete work by Matt
580	   Mathis, Jeff Semke, and Jamshid Mahdavi [RHID] and influenced by
581	   several discussions with John Heffner.

583	   Monia Ghobadi and Sivasankar Radhakrishnan helped analyze the
584	   experiments.

586	   Ilpo Jarvinen reviewed the initial implementation.

588	   Neal Cardwell and Mark Allman improved the document through their
589	   insightful reviews.

591	11.  Security Considerations

593	   PRR does not change the risk profile for TCP.

595	   Implementers that change PRR from counting bytes to segments have to
596	   be cautious about the effects of ACK splitting attacks [Savage99],
597	   where the receiver acknowledges partial segments for the purpose of
598	   confusing the sender's congestion accounting.

600	12.  Normative References

602	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
603	              RFC 793, DOI 10.17487/RFC0793, September 1981,
604	              <https://www.rfc-editor.org/info/rfc793>.

606	   [RFC2018]  Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
607	              Selective Acknowledgment Options", RFC 2018,
608	              DOI 10.17487/RFC2018, October 1996,
609	              <https://www.rfc-editor.org/info/rfc2018>.

611	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
612	              Requirement Levels", BCP 14, RFC 2119,
613	              DOI 10.17487/RFC2119, March 1997,
614	              <https://www.rfc-editor.org/info/rfc2119>.

616	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
617	              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
618	              <https://www.rfc-editor.org/info/rfc5681>.

620	   [RFC6675]  Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
621	              and Y. Nishida, "A Conservative Loss Recovery Algorithm
622	              Based on Selective Acknowledgment (SACK) for TCP",
623	              RFC 6675, DOI 10.17487/RFC6675, August 2012,
624	              <https://www.rfc-editor.org/info/rfc6675>.

626	   [RFC8985]  Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The
627	              RACK-TLP Loss Detection Algorithm for TCP", RFC 8985,
628	              DOI 10.17487/RFC8985, February 2021,
629	              <https://www.rfc-editor.org/info/rfc8985>.

631	13.  Informative References

633	   [CUBIC]    Rhee, I. and L. Xu, "CUBIC: A new TCP-friendly high-speed
634	              TCP variant", PFLDnet 2005, February 2005.

636	   [FACK]     Mathis, M. and J. Mahdavi, "Forward Acknowledgment:
637	              Refining TCP Congestion Control", ACM SIGCOMM SIGCOMM96,
638	              August 1996.

640	   [Flach2016policing]
641	              Flach, T., Papageorge, P., Terzis, A., Pedrosa, L., Cheng,
642	              Y., Al Karim, T., Katz-Bassett, E., and R. Govindan, "An
643	              Internet-Wide Analysis of Traffic Policing", ACM
644	              SIGCOMM SIGCOMM2016, August 2016.

646	   [IMC11]    Dukkipati, N., Mathis, M., Cheng, Y., and M. Ghobadi,
647	              "Proportional Rate Reduction for TCP", Proceedings of the
648	              11th ACM SIGCOMM Conference on Internet Measurement
649	              2011, Berlin, Germany, November 2011.

651	   [Jacobson88]
652	              Jacobson, V., "Congestion Avoidance and Control", SIGCOMM
653	              Comput. Commun. Rev. 18(4), August 1988.

655	   [Laminar]  Mathis, M., "Laminar TCP and the case for refactoring TCP
656	              congestion control", Work in Progress, 16 July 2012.

658	   [RFC3042]  Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
659	              TCP's Loss Recovery Using Limited Transmit", RFC 3042,
660	              DOI 10.17487/RFC3042, January 2001,
661	              <https://www.rfc-editor.org/info/rfc3042>.

663	   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
664	              of Explicit Congestion Notification (ECN) to IP",
665	              RFC 3168, DOI 10.17487/RFC3168, September 2001,
666	              <https://www.rfc-editor.org/info/rfc3168>.

668	   [RFC3517]  Blanton, E., Allman, M., Fall, K., and L. Wang, "A
669	              Conservative Selective Acknowledgment (SACK)-based Loss
670	              Recovery Algorithm for TCP", RFC 3517,
671	              DOI 10.17487/RFC3517, April 2003,
672	              <https://www.rfc-editor.org/info/rfc3517>.

674	   [RFC6937]  Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional
675	              Rate Reduction for TCP", RFC 6937, DOI 10.17487/RFC6937,
676	              May 2013, <https://www.rfc-editor.org/info/rfc6937>.

678	   [RHID]     Mathis, M., Semke, J., and J. Mahdavi, "The Rate-Halving
679	              Algorithm for TCP Congestion Control", Work in Progress,
680	              August 1999.

682	   [RHweb]    Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding
683	              Parameters", Web publication, December 1997,
684	              <http://www.psc.edu/networking/papers/FACKnotes/current/>.

686	   [Savage99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
687	              "TCP congestion control with a misbehaving receiver",
688	              SIGCOMM Comput. Commun. Rev. 29(5), October 1999.

690	Appendix A.  Strong Packet Conservation Bound

692	   PRR-CRB is based on a conservative, philosophically pure, and
693	   aesthetically appealing Strong Packet Conservation Bound, described
694	   here.  Although inspired by the packet conservation principle
695	   [Jacobson88], it differs in how it treats segments that are missing
696	   and presumed lost.  Under all conditions and sequences of events
697	   during recovery, PRR-CRB strictly bounds the data transmitted to be
698	   equal to or less than the amount of data delivered to the receiver.
699	   Note that the effects of presumed losses are included in the pipe
700	   calculation, but do not affect the outcome of PRR-CRB, once pipe has
701	   fallen below ssthresh.

703	   We claim that this Strong Packet Conservation Bound is the most
704	   aggressive algorithm that does not lead to additional forced losses
705	   in some environments.  It has the property that if there is a
706	   standing queue at a bottleneck that is carrying no other traffic, the
707	   queue will maintain exactly constant length for the entire duration
708	   of the recovery, except for +1/-1 fluctuation due to differences in
709	   packet arrival and exit times.  Any less aggressive algorithm will
710	   result in a declining queue at the bottleneck.  Any more aggressive
711	   algorithm will result in an increasing queue or additional losses if
712	   it is a full drop tail queue.

714	   We demonstrate this property with a little thought experiment:

716	   Imagine a network path that has insignificant delays in both
717	   directions, except for the processing time and queue at a single
718	   bottleneck in the forward path.  By insignificant delay, we mean when
719	   a packet is "served" at the head of the bottleneck queue, the
720	   following events happen in much less than one bottleneck packet time:
721	   the packet arrives at the receiver; the receiver sends an ACK that
722	   arrives at the sender; the sender processes the ACK and sends some
723	   data; the data is queued at the bottleneck.

725	   If sndcnt is set to DeliveredData and nothing else is inhibiting
726	   sending data, then clearly the data arriving at the bottleneck queue
727	   will exactly replace the data that was served at the head of the
728	   queue, so the queue will have a constant length.  If queue is drop
729	   tail and full, then the queue will stay exactly full.  Losses or
730	   reordering on the ACK path only cause wider fluctuations in the queue
731	   size, but do not raise its peak size, independent of whether the data
732	   is in order or out of order (including loss recovery from an earlier
733	   RTT).  Any more aggressive algorithm that sends additional data will
734	   overflow the drop tail queue and cause loss.  Any less aggressive
735	   algorithm will under-fill the queue.  Therefore, setting sndcnt to
736	   DeliveredData is the most aggressive algorithm that does not cause
737	   forced losses in this simple network.  Relaxing the assumptions
738	   (e.g., making delays more authentic and adding more flows, delayed
739	   ACKs, etc.) is likely to increase the fine grained fluctuations in
740	   queue size but does not change its basic behavior.

742	   Note that the congestion control algorithm implements a broader
743	   notion of optimal that includes appropriately sharing the network.
744	   Typical congestion control algorithms are likely to reduce the data
745	   sent relative to the Packet Conserving Bound implemented by PRR,
746	   bringing TCP's actual window down to ssthresh.

748	Authors' Addresses
749	   Matt Mathis
750	   Google, Inc.
751	   1600 Amphitheatre Parkway
752	   Mountain View, California 94043
753	   United States of America
754	   Email: mattmathis@google.com

756	   Nandita Dukkipati
757	   Google, Inc.
758	   1600 Amphitheatre Parkway
759	   Mountain View, California 94043
760	   United States of America
761	   Email: nanditad@google.com

763	   Yuchung Cheng
764	   Google, Inc.
765	   1600 Amphitheatre Parkway
766	   Mountain View, California 94043
767	   United States of America
768	   Email: ycheng@google.com