idnits 2.17.1 draft-ietf-tcpm-prr-rfc6937bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) -- The draft header indicates that this document obsoletes RFC6937, but the abstract doesn't seem to directly say this. It does mention RFC6937 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (3 June 2022) is 693 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC0793' is defined on line 602, but no explicit reference was found in the text == Unused Reference: 'CUBIC' is defined on line 633, but no explicit reference was found in the text == Unused Reference: 'FACK' is defined on line 636, but no explicit reference was found in the text == Unused Reference: 'Laminar' is defined on line 655, but no explicit reference was found in the text == Unused Reference: 'RHweb' is defined on line 682, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 3517 (Obsoleted by RFC 6675) Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TCP Maintenance Working Group M. Mathis 3 Internet-Draft N. Dukkipati 4 Obsoletes: 6937 (if approved) Y. Cheng 5 Intended status: Standards Track Google, Inc. 6 Expires: 5 December 2022 3 June 2022 8 Proportional Rate Reduction for TCP 9 draft-ietf-tcpm-prr-rfc6937bis-02 11 Abstract 13 This document updates the experimental Proportional Rate Reduction 14 (PRR) algorithm, described RFC 6937, to standards track. PRR 15 potentially replaces the Fast Recovery to regulate the amount of data 16 sent by TCP or other transport protocol during loss recovery. PRR 17 accurately regulates the actual flight size through recovery such 18 that at the end of recovery it will be as close as possible to the 19 slow start threshold (ssthresh), as determined by the congestion 20 control algorithm. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on 5 December 2022. 39 Copyright Notice 41 Copyright (c) 2022 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 46 license-info) in effect on the date of publication of this document. 47 Please review these documents carefully, as they describe your rights 48 and restrictions with respect to this document. Code Components 49 extracted from this document must include Revised BSD License text as 50 described in Section 4.e of the Trust Legal Provisions and are 51 provided without warranty as described in the Revised BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 1.1. Document and WG Information . . . . . . . . . . . . . . . 3 57 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 3. Changes From RFC 6937 . . . . . . . . . . . . . . . . . . . . 5 59 4. Relationships to other standards . . . . . . . . . . . . . . 6 60 5. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 7 61 6. Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 8 62 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 9 63 8. Properties . . . . . . . . . . . . . . . . . . . . . . . . . 11 64 9. Adapting PRR to other transport protocols . . . . . . . . . . 13 65 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 66 11. Security Considerations . . . . . . . . . . . . . . . . . . . 13 67 12. Normative References . . . . . . . . . . . . . . . . . . . . 13 68 13. Informative References . . . . . . . . . . . . . . . . . . . 14 69 Appendix A. Strong Packet Conservation Bound . . . . . . . . . . 15 70 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 72 1. Introduction 74 This document updates the Proportional Rate Reduction (PRR) algorithm 75 described in [RFC6937] from experimental to standards track. PRR 76 accuracy regulates the amount of data sent during loss recovery, such 77 that at the end of recovery the flight size will be as close as 78 possible to the slow start threshold (ssthresh), as determined by the 79 congestion control algorithm. PRR has been deployed in at least 80 three major TCP implementations covering the vast majority of today's 81 web traffic. 83 The main change from RFC 6937 is the introduction of a new heuristic 84 that replaces a manual configuration parameter and non-SACK support. 85 The new heuristic only changes behaviors in corner cases that were 86 not relevant before the Lost Retransmission Detection (LRD) 87 algorithm. LRD was not implemented until after RFC 6937 was 88 published. This document also includes additional discussion about 89 integration into other congestion control and lost detection 90 algorithms. 92 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 93 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 94 document are to be interpreted as described in [RFC2119] 96 1.1. Document and WG Information 98 Formatted: 2022-06-03 14:55:13-07:00 100 Please send all comments, questions and feedback to tcpm@ietf.org 102 About revision 00: 104 The introduction above was drawn from draft-mathis-tcpm-rfc6937bis- 105 00. All of the text below was copied verbatim from RFC 6937, to 106 facilitate comparison between RFC 6937 and this document as it 107 evolves. 109 About revision 01: 111 * Recast the RFC 6937 introduction as background 113 * Made "Changes From RFC 6937" an explicit section 115 * Made Relationships to other standards more explicit 117 * Added a generalized safeACK heuristic 119 * Provided hints for non TCP implementations 121 * Added language about detecting ACK splitting, but have no advice 122 on actions (yet) 124 About revision 02: 126 * Companion RACK loss detection RECOMMENDED 128 * Non-SACK accounting in the pseudo code 130 * cwnd computation in the pseudo code 132 * Force fast retransmit at the beginning of Fast Recovery 134 * Remove deprecated Rate-Halving text 136 * Fixed bugs in the example traces 138 2. Background 140 This section is copied almost verbatim from the introduction to 141 [RFC6937]. 143 Standard congestion control [RFC5681] requires that TCP (and other 144 protocols) reduce their congestion window (cwnd) in response to 145 losses. Fast Recovery, described in the same document, is the 146 reference algorithm for making this adjustment. Its stated goal is 147 to recover TCP's self clock by relying on returning ACKs during 148 recovery to clock more data into the network. Fast Recovery 149 typically adjusts the window by waiting for one half round-trip time 150 (RTT) of ACKs to pass before sending any data. It is fragile because 151 it cannot compensate for the implicit window reduction caused by the 152 losses themselves. 154 [RFC6675] makes Fast Recovery with Selective Acknowledgement (SACK) 155 [RFC2018] more accurate by computing "pipe", a sender side estimate 156 of the number of bytes still outstanding in the network. With 157 [RFC6675], Fast Recovery is implemented by sending data as necessary 158 on each ACK to prevent pipe from falling below ssthresh, the window 159 size as determined by the congestion control algorithm. This 160 protects Fast Recovery from timeouts in many cases where there are 161 heavy losses, although not if the entire second half of the window of 162 data or ACKs are lost. However, a single ACK carrying a SACK option 163 that implies a large quantity of missing data can cause a step 164 discontinuity in the pipe estimator, which can cause Fast Retransmit 165 to send a burst of data. 167 PRR avoids these excess window adjustments such that at the end of 168 recovery the actual window size will be as close as possible to 169 ssthresh, the window size as determined by the congestion control 170 algorithm. It uses the fraction that is appropriate for the target 171 window chosen by the congestion control algorithm. During PRR, one 172 of two additional Reduction Bound algorithms limits the total window 173 reduction due to all mechanisms, including transient application 174 stalls and the losses themselves. 176 We describe two slightly different Reduction Bound algorithms: 177 Conservative Reduction Bound (CRB), which is strictly packet 178 conserving; and a Slow Start Reduction Bound (SSRB), which is more 179 aggressive than CRB by, at most, 1 segment per ACK. PRR-CRB meets 180 the Strong Packet Conservation Bound described in Appendix A; 181 however, in real networks it does not perform as well as the 182 algorithms described in [RFC6675], which prove to be more aggressive 183 in a significant number of cases. SSRB offers a compromise by 184 allowing TCP to send 1 additional segment per ACK relative to CRB in 185 some situations. Although SSRB is less aggressive than [RFC6675] 186 (transmitting fewer segments or taking more time to transmit them), 187 it outperforms due to the lower probability of additional losses 188 during recovery. 190 The Strong Packet Conservation Bound on which PRR and both Reduction 191 Bounds are based is patterned after Van Jacobson's packet 192 conservation principle: segments delivered to the receiver are used 193 as the clock to trigger sending the same number of segments back into 194 the network. As much as possible, PRR and the Reduction Bound 195 algorithms rely on this self clock process, and are only slightly 196 affected by the accuracy of other estimators, such as pipe [RFC6675] 197 and cwnd. This is what gives the algorithms their precision in the 198 presence of events that cause uncertainty in other estimators. 200 The original definition of the packet conservation principle 201 [Jacobson88] treated packets that are presumed to be lost (e.g., 202 marked as candidates for retransmission) as having left the network. 203 This idea is reflected in the pipe estimator defined in RFC 6675 and 204 used here, but it is distinct from the Strong Packet Conservation 205 Bound as described in Appendix A, which is defined solely on the 206 basis of data arriving at the receiver. 208 3. Changes From RFC 6937 210 The largest change since [RFC6937] is the introduction of a new 211 heuristic that uses good recovery progress (For TCP, when the latest 212 ACK advances snd.una and does not indicate prior fast retransmit has 213 been lost) to select which Reduction Bound. [RFC6937] left the 214 choice of Reduction Bound to the discretion of the implementer but 215 recommended to use SSRB by default. For all of the environments 216 explored in earlier PRR research, the new heuristic is consistent 217 with the old recommendation. 219 The paper "An Internet-Wide Analysis of Traffic Policing" 220 [Flach2016policing] uncovered a crucial situation not previously 221 explored, where both Reduction Bounds perform very poorly, but for 222 different reasons. Under many configurations, token bucket traffic 223 policers [token_bucket] can suddenly start discarding a large 224 fraction of the traffic when tokens are depleted, without any warning 225 to the end systems. The transport congestion control has no 226 opportunity to measure the token rate, and sets ssthresh based on the 227 previously observed path performance. This value for ssthresh may 228 cause a data rate that is substantially larger than the token rate, 229 causing high loss. Under these conditions, both reduction bounds 230 perform very poorly. PRR-CRB is too timid, sometimes causing very 231 long recovery times at smaller than necessary windows, and PRR-SSRB 232 is too aggressive, often causing many retransmissions to be lost 233 multiple rounds. Both cases lead to prolonged recovery decimating 234 application goodput. 236 Investigating these environments led to the development of a 237 "safeACK" heuristic to dynamically switch between Reduction Bounds: 238 by default conservatively use PRR-CRB and only switch to PRR-SSRB 239 when ACKs indicate the recovery is making good progress (snd.una is 240 advancing without any new losses). 242 This heuristic is only invoked where application-limited behavior, 243 losses or other events cause the flight size to fall below ssthresh. 244 The extreme loss rates that make the heuristic essential are only 245 common in the presence of heavy losses such as traffic policers 246 [Flach2016policing]. In these environments the heuristic serves to 247 salvage a bad situation and any reasonable implementation of the 248 heuristic performs far better than either bound by itself. 250 Another subtle change is to force a fast retransmit upon the first 251 ACK that triggers the recovery. Previously PRR may not allow a fast 252 retransmit (i.e. sndcnt is 0) on the first ACK depending on the loss 253 situation. Forcing a fast retransmit is important to keep the ACK 254 clock and avoid potential RTO events. The forced fast retransmit 255 only happens once during the entire recovery and still follows the 256 packet conservation principles in PRR. This heuristic has been in 257 the first widely deployed implementation. 259 Since [RFC6937] was written, PRR has also been adapted to perform 260 multiplicative window reduction for non-loss based congestion control 261 algorithms, such as for [RFC3168] style ECN. This is typically done 262 by using some parts of the loss recovery state machine (in particular 263 the RecoveryPoint from [RFC6675]) to invoke the PRR ACK processing 264 for exactly one round trip worth of ACKs. 266 For [RFC6937] we published a companion paper [IMC11] in which we 267 evaluated [RFC3517] and various experimental PRR versions in a large 268 scale measurement study. Today, the legacy algorithms used in that 269 study have already faded from the code base, making such comparisons 270 impossible without recreating historical algorithms. Readers 271 interested in the measurement study should review section 5 of RFC 272 6937 and the IMC paper [IMC11]. 274 4. Relationships to other standards 276 PRR is described as modifications to "TCP Congestion Control" 277 [RFC5681], and "A Conservative Loss Recovery Algorithm Based on 278 Selective Acknowledgment (SACK) for TCP" [RFC6675]. It is most 279 accurate with SACK [RFC2018] but does not require SACK. 281 The SafeACK heuristic came about as a result of robust Lost 282 Retransmission Detection under development in an early precursor to 283 [RFC8985]. Without LRD, policers that cause very high loss rates are 284 guaranteed to also cause retransmission timeouts because both 285 [RFC5681] and [RFC6675] will send retransmissions above the policed 286 rate. It is RECOMMENDED that PRR is implemented together with 287 [RFC8985]. 289 5. Definitions 291 The following terms, parameters, and state variables are used as they 292 are defined in earlier documents: 294 RFC 793: snd.una (send unacknowledged). 296 RFC 5681: duplicate ACK, FlightSize, Sender Maximum Segment Size 297 (SMSS). 299 RFC 6675: covered (as in "covered sequence numbers"). 301 Voluntary window reductions: choosing not to send data in response to 302 some ACKs, for the purpose of reducing the sending window size and 303 data rate. 305 PRR defines additional variables: 307 DeliveredData is the number of bytes newly delivered by the most 308 recent ACK. With SACK, DeliveredData is the change in snd.una plus 309 the bytes newly selectively acknowledged. Without SACK, 310 DeliveredData is the change in snd.una for a partial ACK or 1 MSS 311 worth of bytes for a DUPACK. 313 Note that without SACK, a poorly-behaved receiver that returns 314 extraneous DUPACKs as depicted in [Savage99] can artificially inflate 315 DeliveredData. As a mitigation, PRR disallows incrementing 316 DeliveredData when the total bytes delivered exceeds the outstanding 317 data upon recovery (i.e., RecoverFS). 319 safeACK: A local boolean variable indicating that the current ACK 320 reported good progress. SafeACK is true only when the ACK has 321 cumulatively acknowledged new data and the ACK does not indicate 322 further losses. Both conditions indicate the recovery is making good 323 progress and can send more aggressively. 325 sndcnt: A local variable indicating exactly how many bytes should be 326 sent in response to each ACK. Note that the decision of which data 327 to send (e.g., retransmit missing data or send more new data) is out 328 of scope for this document. 330 6. Algorithms 332 At the beginning of recovery, initialize the PRR state. This assumes 333 a modern congestion control algorithm, CongCtrlAlg(), that might set 334 ssthresh to something other than FlightSize/2: 336 ssthresh = CongCtrlAlg() // Target flight size in recovery 337 prr_delivered = 0 // Total bytes delivered in recovery 338 prr_out = 0 // Total bytes sent in recovery 339 RecoverFS = snd.nxt - snd.una // FlightSize right before recovery 341 Figure 1 343 On every ACK starting or during the recovery: 344 DeliveredData = bytes newly cumulatively acknowledged 345 if (SACK is used) { 346 DeliveredData += bytes newly selectively acknowledged 347 } else if (ACK is a DUPACK and prr_delivered < RecoverFS) { 348 DeliveredData += MSS 349 } 350 if (DeliveredData is 0) 351 Return 353 prr_delivered += DeliveredData 354 pipe = (RFC 6675 pipe algorithm) 355 safeACK = (snd.una advances and no further loss indicated) 356 if (pipe > ssthresh) { 357 // Proportional Rate Reduction 358 sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out 359 } else { 360 // PRR-CRB by default 361 sndcnt = MAX(prr_delivered - prr_out, DeliveredData) 362 if (safeACK) { 363 // PRR-SSRB when recovery is in good progress 364 sndcnt +=MSS 365 } 366 // Attempt to catch up, as permitted 367 sndcnt = MIN(ssthresh - pipe, sndcnt) 368 } 370 if (prr_out is 0 AND sndcnt is 0) { 371 // Force a fast retransmit upon entering recovery 372 sndcnt = MSS 373 } 374 cwnd = pipe + sndcnt 375 Figure 2 377 On any data transmission or retransmission: 378 prr_out += (data sent) 380 Figure 3 382 7. Examples 384 We illustrate these algorithms by showing their different behaviors 385 for two scenarios: TCP experiencing either a single loss or a burst 386 of 15 consecutive losses. In all cases we assume bulk data (no 387 application pauses), standard Additive Increase Multiplicative 388 Decrease (AIMD) congestion control [RFC5681], and cwnd = FlightSize = 389 pipe = 20 segments, so ssthresh will be set to 10 at the beginning of 390 recovery. We also assume standard Fast Retransmit and Limited 391 Transmit [RFC3042], so TCP will send 2 new segments followed by 1 392 retransmit in response to the first 3 duplicate ACKs following the 393 losses. 395 Each of the diagrams below shows the per ACK response to the first 396 round trip for the various recovery algorithms when the zeroth 397 segment is lost. The top line indicates the transmitted segment 398 number triggering the ACKs, with an X for the lost segment. "cwnd" 399 and "pipe" indicate the values of these algorithms after processing 400 each returning ACK but before further (re)transmission. "Sent" 401 indicates how much 'N'ew or 'R'etransmitted data would be sent. Note 402 that the algorithms for deciding which data to send are out of scope 403 of this document. 405 RFC 6675 406 ack# X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 407 cwnd: 20 20 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 408 pipe: 19 19 18 18 17 16 15 14 13 12 11 10 10 10 10 10 10 10 10 409 sent: N N R N N N N N N N N 411 PRR 412 ack# X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 413 cwnd: 20 20 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 10 414 pipe: 19 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 11 10 415 sent: N N R N N N N N N N N 417 Figure 4 419 Note that both algorithms send the same total amount of data. RFC 420 6675 experiences a "half window of silence" while PRR spreads the 421 voluntary window reduction across an entire RTT. 423 Next, we consider the same initial conditions when the first 15 424 packets (0-14) are lost. During the remainder of the lossy RTT, only 425 5 ACKs are returned to the sender. We examine each of these 426 algorithms in succession. 428 RFC 6675 429 ack# X X X X X X X X X X X X X X X 15 16 17 18 19 430 cwnd: 20 20 11 11 11 431 pipe: 19 19 4 10 10 432 sent: N N 7R R R 434 PRR 435 ack# X X X X X X X X X X X X X X X 15 16 17 18 19 436 cwnd: 20 20 5 5 5 437 pipe: 19 19 4 4 4 438 sent: N N R R R 440 Figure 5 442 In this specific situation, RFC 6675 is more aggressive because once 443 Fast Retransmit is triggered (on the ACK for segment 17), TCP 444 immediately retransmits sufficient data to bring pipe up to cwnd. 445 Our earlier measurements [RFC 6937 section 6] indicates that RFC 6675 446 significantly outperforms PRR, and some other similarly conservative 447 algorithms that we tested, showing that it is significantly common 448 for the actual losses to exceed the window reduction determined by 449 the congestion control algorithm. 451 Under such heavy losses, PRR uses the PRR-CRB to follow the packet 452 conservation principle. Since the total losses bring pipe below 453 ssthresh, data is sent such that the total data transmitted, prr_out, 454 follows the total data delivered to the receiver as reported by 455 returning ACKs. Transmission is controlled by the sending limit, 456 which is set to prr_delivered - prr_out. PRR-CRB's conservative 457 window reduction causes it to take excessively long to recover the 458 losses and exposes it to additional timeouts. 460 While not shown in the figure above, once the fast retransmits sent 461 upon ACK#17 deliver and solicit further ACKs that increment the 462 snd.una, PRR enters PRR-SSRB and increases the window by exactly 1 463 segment per ACK until pipe rises to ssthresh during recovery. On 464 heavy losses when cwnd is large, PRR-SSRB recovers the losses 465 exponentially faster than PRR-CRB. Although increasing the window 466 during recovery seems to be ill advised, it is important to remember 467 that this is actually less aggressive than permitted by RFC 5681, 468 which sends the same quantity of additional data as a single burst in 469 response to the ACK that triggered Fast Retransmit. 471 For less severe loss events, where the total losses are smaller than 472 the difference between FlightSize and ssthresh, PRR-CRB and PRR-SSRB 473 are not invoked since PRR stays in the proportional rate reduction 474 mode. 476 8. Properties 478 The following properties are common to both PRR-CRB and PRR-SSRB, 479 except as noted: 481 PRR maintains TCP's ACK clocking across most recovery events, 482 including burst losses. RFC 6675 can send large unclocked bursts 483 following burst losses. 485 Normally, PRR will spread voluntary window reductions out evenly 486 across a full RTT. This has the potential to generally reduce the 487 burstiness of Internet traffic, and could be considered to be a type 488 of soft pacing. Hypothetically, any pacing increases the probability 489 that different flows are interleaved, reducing the opportunity for 490 ACK compression and other phenomena that increase traffic burstiness. 491 However, these effects have not been quantified. 493 If there are minimal losses, PRR will converge to exactly the target 494 window chosen by the congestion control algorithm. Note that as TCP 495 approaches the end of recovery, prr_delivered will approach RecoverFS 496 and sndcnt will be computed such that prr_out approaches ssthresh. 498 Implicit window reductions, due to multiple isolated losses during 499 recovery, cause later voluntary reductions to be skipped. For small 500 numbers of losses, the window size ends at exactly the window chosen 501 by the congestion control algorithm. 503 For burst losses, earlier voluntary window reductions can be undone 504 by sending extra segments in response to ACKs arriving later during 505 recovery. Note that as long as some voluntary window reductions are 506 not undone, the final value for pipe will be the same as ssthresh, 507 the target cwnd value chosen by the congestion control algorithm. 509 PRR with either Reduction Bound improves the situation when there are 510 application stalls, e.g., when the sending application does not queue 511 data for transmission quickly enough or the receiver stops advancing 512 rwnd (receiver window). When there is an application stall early 513 during recovery, prr_out will fall behind the sum of the 514 transmissions permitted by sndcnt. The missed opportunities to send 515 due to stalls are treated like banked voluntary window reductions; 516 specifically, they cause prr_delivered - prr_out to be significantly 517 positive. If the application catches up while TCP is still in 518 recovery, TCP will send a partial window burst to catch up to exactly 519 where it would have been had the application never stalled. Although 520 this burst might be viewed as being hard on the network, this is 521 exactly what happens every time there is a partial RTT application 522 stall while not in recovery. We have made the partial RTT stall 523 behavior uniform in all states. Changing this behavior is out of 524 scope for this document. 526 PRR with Reduction Bound is less sensitive to errors in the pipe 527 estimator. While in recovery, pipe is intrinsically an estimator, 528 using incomplete information to estimate if un-SACKed segments are 529 actually lost or merely out of order in the network. Under some 530 conditions, pipe can have significant errors; for example, pipe is 531 underestimated when a burst of reordered data is prematurely assumed 532 to be lost and marked for retransmission. If the transmissions are 533 regulated directly by pipe as they are with RFC 6675, a step 534 discontinuity in the pipe estimator causes a burst of data, which 535 cannot be retracted once the pipe estimator is corrected a few ACKs 536 later. For PRR, pipe merely determines which algorithm, PRR or the 537 Reduction Bound, is used to compute sndcnt from DeliveredData. While 538 pipe is underestimated, the algorithms are different by at most 1 539 segment per ACK. Once pipe is updated, they converge to the same 540 final window at the end of recovery. 542 Under all conditions and sequences of events during recovery, PRR-CRB 543 strictly bounds the data transmitted to be equal to or less than the 544 amount of data delivered to the receiver. We claim that this Strong 545 Packet Conservation Bound is the most aggressive algorithm that does 546 not lead to additional forced losses in some environments. It has 547 the property that if there is a standing queue at a bottleneck with 548 no cross traffic, the queue will maintain exactly constant length for 549 the duration of the recovery, except for +1/-1 fluctuation due to 550 differences in packet arrival and exit times. See Appendix A for a 551 detailed discussion of this property. 553 Although the Strong Packet Conservation Bound is very appealing for a 554 number of reasons, our earlier measurements [RFC 6937 section 6] 555 demonstrate that it is less aggressive and does not perform as well 556 as RFC 6675, which permits bursts of data when there are bursts of 557 losses. PRR-SSRB is a compromise that permits TCP to send 1 extra 558 segment per ACK as compared to the Packet Conserving Bound when the 559 ACK indicates the recovery is in good progress without further 560 losses. From the perspective of a strict Packet Conserving Bound, 561 PRR-SSRB does indeed open the window during recovery; however, it is 562 significantly less aggressive than RFC 6675 in the presence of burst 563 losses. 565 9. Adapting PRR to other transport protocols 567 The main PRR algorithm and reductions bounds can be adapted to any 568 transport that can support RFC 6675. In one major implementation 569 (Linux TCP), PRR has been the default fast recovery algorithm for its 570 default and supported congestion control modules. 572 The safeACK heuristic can be generalized as any ACK of a 573 retransmission that does not cause some other segment to be marked 574 for retransmission. That is, PRR_SSRB is safe on any ACK that 575 reduces the total number of pending and outstanding retransmissions. 577 10. Acknowledgements 579 This document is based in part on previous incomplete work by Matt 580 Mathis, Jeff Semke, and Jamshid Mahdavi [RHID] and influenced by 581 several discussions with John Heffner. 583 Monia Ghobadi and Sivasankar Radhakrishnan helped analyze the 584 experiments. 586 Ilpo Jarvinen reviewed the initial implementation. 588 Neal Cardwell and Mark Allman improved the document through their 589 insightful reviews. 591 11. Security Considerations 593 PRR does not change the risk profile for TCP. 595 Implementers that change PRR from counting bytes to segments have to 596 be cautious about the effects of ACK splitting attacks [Savage99], 597 where the receiver acknowledges partial segments for the purpose of 598 confusing the sender's congestion accounting. 600 12. Normative References 602 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 603 RFC 793, DOI 10.17487/RFC0793, September 1981, 604 . 606 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 607 Selective Acknowledgment Options", RFC 2018, 608 DOI 10.17487/RFC2018, October 1996, 609 . 611 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 612 Requirement Levels", BCP 14, RFC 2119, 613 DOI 10.17487/RFC2119, March 1997, 614 . 616 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 617 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 618 . 620 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 621 and Y. Nishida, "A Conservative Loss Recovery Algorithm 622 Based on Selective Acknowledgment (SACK) for TCP", 623 RFC 6675, DOI 10.17487/RFC6675, August 2012, 624 . 626 [RFC8985] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The 627 RACK-TLP Loss Detection Algorithm for TCP", RFC 8985, 628 DOI 10.17487/RFC8985, February 2021, 629 . 631 13. Informative References 633 [CUBIC] Rhee, I. and L. Xu, "CUBIC: A new TCP-friendly high-speed 634 TCP variant", PFLDnet 2005, February 2005. 636 [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: 637 Refining TCP Congestion Control", ACM SIGCOMM SIGCOMM96, 638 August 1996. 640 [Flach2016policing] 641 Flach, T., Papageorge, P., Terzis, A., Pedrosa, L., Cheng, 642 Y., Al Karim, T., Katz-Bassett, E., and R. Govindan, "An 643 Internet-Wide Analysis of Traffic Policing", ACM 644 SIGCOMM SIGCOMM2016, August 2016. 646 [IMC11] Dukkipati, N., Mathis, M., Cheng, Y., and M. Ghobadi, 647 "Proportional Rate Reduction for TCP", Proceedings of the 648 11th ACM SIGCOMM Conference on Internet Measurement 649 2011, Berlin, Germany, November 2011. 651 [Jacobson88] 652 Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 653 Comput. Commun. Rev. 18(4), August 1988. 655 [Laminar] Mathis, M., "Laminar TCP and the case for refactoring TCP 656 congestion control", Work in Progress, 16 July 2012. 658 [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing 659 TCP's Loss Recovery Using Limited Transmit", RFC 3042, 660 DOI 10.17487/RFC3042, January 2001, 661 . 663 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 664 of Explicit Congestion Notification (ECN) to IP", 665 RFC 3168, DOI 10.17487/RFC3168, September 2001, 666 . 668 [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A 669 Conservative Selective Acknowledgment (SACK)-based Loss 670 Recovery Algorithm for TCP", RFC 3517, 671 DOI 10.17487/RFC3517, April 2003, 672 . 674 [RFC6937] Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional 675 Rate Reduction for TCP", RFC 6937, DOI 10.17487/RFC6937, 676 May 2013, . 678 [RHID] Mathis, M., Semke, J., and J. Mahdavi, "The Rate-Halving 679 Algorithm for TCP Congestion Control", Work in Progress, 680 August 1999. 682 [RHweb] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding 683 Parameters", Web publication, December 1997, 684 . 686 [Savage99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, 687 "TCP congestion control with a misbehaving receiver", 688 SIGCOMM Comput. Commun. Rev. 29(5), October 1999. 690 Appendix A. Strong Packet Conservation Bound 692 PRR-CRB is based on a conservative, philosophically pure, and 693 aesthetically appealing Strong Packet Conservation Bound, described 694 here. Although inspired by the packet conservation principle 695 [Jacobson88], it differs in how it treats segments that are missing 696 and presumed lost. Under all conditions and sequences of events 697 during recovery, PRR-CRB strictly bounds the data transmitted to be 698 equal to or less than the amount of data delivered to the receiver. 699 Note that the effects of presumed losses are included in the pipe 700 calculation, but do not affect the outcome of PRR-CRB, once pipe has 701 fallen below ssthresh. 703 We claim that this Strong Packet Conservation Bound is the most 704 aggressive algorithm that does not lead to additional forced losses 705 in some environments. It has the property that if there is a 706 standing queue at a bottleneck that is carrying no other traffic, the 707 queue will maintain exactly constant length for the entire duration 708 of the recovery, except for +1/-1 fluctuation due to differences in 709 packet arrival and exit times. Any less aggressive algorithm will 710 result in a declining queue at the bottleneck. Any more aggressive 711 algorithm will result in an increasing queue or additional losses if 712 it is a full drop tail queue. 714 We demonstrate this property with a little thought experiment: 716 Imagine a network path that has insignificant delays in both 717 directions, except for the processing time and queue at a single 718 bottleneck in the forward path. By insignificant delay, we mean when 719 a packet is "served" at the head of the bottleneck queue, the 720 following events happen in much less than one bottleneck packet time: 721 the packet arrives at the receiver; the receiver sends an ACK that 722 arrives at the sender; the sender processes the ACK and sends some 723 data; the data is queued at the bottleneck. 725 If sndcnt is set to DeliveredData and nothing else is inhibiting 726 sending data, then clearly the data arriving at the bottleneck queue 727 will exactly replace the data that was served at the head of the 728 queue, so the queue will have a constant length. If queue is drop 729 tail and full, then the queue will stay exactly full. Losses or 730 reordering on the ACK path only cause wider fluctuations in the queue 731 size, but do not raise its peak size, independent of whether the data 732 is in order or out of order (including loss recovery from an earlier 733 RTT). Any more aggressive algorithm that sends additional data will 734 overflow the drop tail queue and cause loss. Any less aggressive 735 algorithm will under-fill the queue. Therefore, setting sndcnt to 736 DeliveredData is the most aggressive algorithm that does not cause 737 forced losses in this simple network. Relaxing the assumptions 738 (e.g., making delays more authentic and adding more flows, delayed 739 ACKs, etc.) is likely to increase the fine grained fluctuations in 740 queue size but does not change its basic behavior. 742 Note that the congestion control algorithm implements a broader 743 notion of optimal that includes appropriately sharing the network. 744 Typical congestion control algorithms are likely to reduce the data 745 sent relative to the Packet Conserving Bound implemented by PRR, 746 bringing TCP's actual window down to ssthresh. 748 Authors' Addresses 749 Matt Mathis 750 Google, Inc. 751 1600 Amphitheatre Parkway 752 Mountain View, California 94043 753 United States of America 754 Email: mattmathis@google.com 756 Nandita Dukkipati 757 Google, Inc. 758 1600 Amphitheatre Parkway 759 Mountain View, California 94043 760 United States of America 761 Email: nanditad@google.com 763 Yuchung Cheng 764 Google, Inc. 765 1600 Amphitheatre Parkway 766 Mountain View, California 94043 767 United States of America 768 Email: ycheng@google.com