idnits 2.17.1 draft-irtf-iccrg-ledbat-plus-plus-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 25, 2020) is 1339 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Balasubramanian 3 Internet-Draft O. Ertugay 4 Intended status: Informational D. Havey 5 Expires: February 26, 2021 Microsoft 6 August 25, 2020 8 LEDBAT++: Congestion Control for Background Traffic 9 draft-irtf-iccrg-ledbat-plus-plus-01 11 Abstract 13 This informational memo describes LEDBAT++, a set of enhancements to 14 the LEDBAT (Low Extra Delay Background Transport) congestion control 15 algorithm for background traffic. The LEDBAT congestion control 16 algorithm has several shortcomings that prevent it from working 17 effectively in practice. LEDBAT++ extends LEDBAT by adding a set of 18 improvements, including reduced congestion window gain, modified 19 slow-start, multiplicative decrease and periodic slowdowns. This set 20 of improvement mitigates the known issues with the LEDBAT algorithm, 21 such as latency drift, latecomer advantage and inter-LEDBAT fairness. 22 LEDBAT++ has been implemented as a TCP congestion control algorithm 23 in the Windows operating system. LEDBAT++ has been deployed in 24 production at scale on a variety of networks and been experimentally 25 verified to achieve the original stated goals of LEDBAT. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on February 26, 2021. 44 Copyright Notice 46 Copyright (c) 2020 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (https://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 3. LEDBAT Issues . . . . . . . . . . . . . . . . . . . . . . . . 3 64 3.1. Latecomer advantage . . . . . . . . . . . . . . . . . . . 3 65 3.2. Inter-LEDBAT fairness . . . . . . . . . . . . . . . . . . 4 66 3.3. Latency drift . . . . . . . . . . . . . . . . . . . . . . 4 67 3.4. Low latency competition . . . . . . . . . . . . . . . . . 4 68 3.5. Dependency on one-way delay measurements . . . . . . . . 5 69 4. LEDBAT++ Mechanisms . . . . . . . . . . . . . . . . . . . . . 5 70 4.1. Modified slow start . . . . . . . . . . . . . . . . . . . 5 71 4.2. Slower than Reno increase . . . . . . . . . . . . . . . . 5 72 4.3. Multiplicative decrease . . . . . . . . . . . . . . . . . 6 73 4.4. Initial and periodic slowdown . . . . . . . . . . . . . . 7 74 4.5. Use of Round Trip Time instead of one way delay . . . . . 7 75 5. Deployment Issues . . . . . . . . . . . . . . . . . . . . . . 8 76 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 77 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 78 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 79 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 80 9.1. Normative References . . . . . . . . . . . . . . . . . . 9 81 9.2. Informative References . . . . . . . . . . . . . . . . . 9 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 84 1. Introduction 86 Operating systems and applications use background connections for a 87 variety of tasks, such as software updates, large media downloads, 88 telemetry, or error reporting. These connections should operate 89 without affecting the general usability of the system. Usability is 90 measured in terms of available network bandwidth and network latency. 91 LEDBAT [RFC6817] is designed to minimize the impact of lower than 92 best effort connections on the latency and bandwidth of other 93 connections. To achieve that, each LEDBAT connection monitors the 94 transmission delay of packets, and compares them to the minimum delay 95 observed on the connection. The difference between the transmission 96 delay and the minimum delay is used as an estimate of the queuing 97 delay. If the queuing delay is above a target, LEDBAT directs the 98 connection to reduce its bandwidth. If the queuing delay is below 99 the target, the connection is allowed to increase its transmission 100 rate. The bandwidth increase and decrease are proportional to the 101 difference between the observed values and the target. LEDBAT reacts 102 to packet losses and other congestion signals in the same way as 103 standard TCP. 105 However, there are a few issues that plague LEDBAT, some previously 106 documented, and some discovered by experiments. LEDBAT++ adds 107 additional mechanisms on top of (and in some cases deviates from) 108 LEDBAT to overcome these problems. The remaining sections describe 109 the problems and the mechanisms in detail. The objective of this 110 informational RFC is to document LEDBAT++ enhancements on top of a 111 base LEDBAT implementation in the Windows operating system. encourage 112 its use so the algorithm can be further verified and improved. 114 2. Terminology 116 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 117 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 118 document are to be interpreted as described in [RFC2119]. 120 3. LEDBAT Issues 122 This section lists some known LEDBAT issues from existing literature 123 and also list some new problems observed as a result of 124 experimentation with an implementation of [RFC6817]. 126 3.1. Latecomer advantage 128 Delay based congestion control protocols like LEDBAT are known to 129 suffer from a latecomer advantage. When the newcomer establishes a 130 connection, the transmission delay that it encounters incorporates 131 queuing delay caused by the existing connections. The newcomer 132 considers this large delay the minimum, and thereby increases its 133 transmission rate while other LEDBAT connections slow down. 134 Eventually, the latecomer will end up using the entire bandwidth of 135 the connection. Standard TCP congestion control as described in 136 [RFC0793] and [RFC5681], causes some queuing, the LEDBAT delay 137 measurements incorporate that queuing, and the base delay as measured 138 by the connection is thus set to a larger value than the actual 139 minimum. As a result, the queues remain mostly full. In some cases, 140 this queuing persists even after the closing of the competing TCP 141 connection. This phenomenon was already known during the design of 142 LEDBAT, but there is no mitigation in the LEDBAT design. The 143 designers of the protocol relied instead on the inherent burstiness 144 of network traffic. Small gaps in transmission schedules would allow 145 the latecomer to measure the true delay of the connection. This 146 reasoning is not satisfactory because workloads can upload large 147 amount of data, and would not always see such gaps. 149 3.2. Inter-LEDBAT fairness 151 The latecomer advantage is caused by the improper evaluation of the 152 base delay, with the latecomer using a larger value than the 153 preexisting connections. However, even when all competing 154 connections have a correct evaluation of the base delay, some of them 155 will receive a larger share of resource. The reason for that 156 persistent unfairness is explained in [RethinkLEDBAT]. LEDBAT 157 specifies proportional feedback based on a ratio between the measured 158 queuing delay and a target. Proportional feedback uses both additive 159 increases and additive decreases. This does stabilize the queue 160 sizes, but it does not guarantee fair sharing between the competing 161 connections. 163 3.3. Latency drift 165 LEDBAT estimates the base delay of a connection as the minimum of all 166 observed transmission delays over a 10-minute interval. It uses an 167 interval rather than a measurement over the whole duration of the 168 connection, because network conditions may change over time. For 169 example, an existing connection may be transparently rerouted over a 170 longer path, with a longer transmission delay. Keeping the old 171 estimate would then cause LEDBAT to unnecessarily reduce the 172 connection throughput. However experiments show that this causes a 173 ratcheting effect when LEDBAT connections are allowed to operate for 174 a long time. The delay feedback in LEDBAT causes the queuing delay 175 to stabilize just below the target. After an initial interval, all 176 new measurements are equal to the initial transmission delay plus a 177 fraction of the target. Every 10 minutes, the measured base delay 178 increases by that fraction of the target queuing delay, leading to 179 potentially large values over time. 181 3.4. Low latency competition 183 LEDBAT compares the observed queuing delays to a fixed target. The 184 target value cannot be set too low, because that would cause poor 185 operation on slow networks. In practice, it is set to 60ms, a value 186 that allows proper operation of latency sensitive applications like 187 VoIP. But if the bottleneck buffer is small such that the queuing 188 delay will never reach the target, then the LEDBAT connection behaves 189 just like an ordinary connection. It competes aggressively, and 190 obtains the same share of the bandwidth as regular TCP connections. 191 On high speed links the problem is exacerbated. 193 3.5. Dependency on one-way delay measurements 195 The LEDBAT algorithm requires use of one-way delay measurements. 196 This makes it harder to use with transport protocols like TCP that 197 have no reliable way to obtain one way delay measurements. TCP 198 timestamps do not standardize clock frequency, and the endpoints will 199 need to rely on heuristics to guess the clock frequency of the remote 200 peer to detect and correct for clock skew. TCP timestamps do not 201 include clock synchronization, and would need some non-standard 202 invention to compensate for clock skew. Any such mechanism is very 203 fragile. 205 4. LEDBAT++ Mechanisms 207 4.1. Modified slow start 209 Traditional initial slow start can cause spikes in bandwidth usage. 210 However skipping exponential congestion window increase results in 211 really poor performance on long delay links. LEDBAT++ applies the 212 dynamic GAIN parameter to the congestion window increases. In 213 standard TCP operation, the congestion window increases for every ACK 214 by exactly the amount of bytes acknowledged. A LEDBAT++ sender 215 increases the congestion window by that number multiplied by the 216 dynamic GAIN value. In low latency links, this ensures that LEDBAT++ 217 connections ramp up slower than regular connections. LEDBAT++ sender 218 limits the initial window to 2 packets. LEDBAT++ sender monitors the 219 transmission delays during the slow start period. If the queuing 220 delay is larger than 3/4ths of the target delay, exit slow start and 221 immediately move to the congestion avoidance phase. After initial 222 slow start, the increase of congestion window is bounded by the 223 SSTHRESH estimate acquired during congestion avoidance, and the risk 224 of creating congestion spikes is very low. Exit slow start in on 225 excessive delay SHOULD be applied only during the initial slow start. 227 4.2. Slower than Reno increase 229 When the queuing delays are below the target delay, LEDBAT behaves 230 like standard TCP [RFC0793]. LEDBAT introduces a GAIN parameter 231 which can be set between 0 and 1. In order to solve the low latency 232 competition problem, LEDBAT++ makes the GAIN parameter dynamic. When 233 standard and reduced connections share the same bottleneck, they 234 experience the same packet drop rate. The GAIN value ensures that 235 the throughput of the LEDBAT connection will be a fraction (1/SQRT(1/ 236 GAIN)) of the throughput of the regular connections. Small values of 237 GAIN work well when the base delay is small, and ensure that the 238 LEDBAT connection will yield to regular connections in these 239 networks. However, large values of GAIN do not work well on long 240 delay links. In the absence of competing traffic, combining large 241 base delays with small GAIN values causes the connection bandwidth to 242 remain well under capacity for a long time. In LEDBAT++, GAIN is a 243 function of the ratio between the base delay and the target delay: 245 GAIN = 1 / (min (16, CEIL (2*TARGET/base))) 247 where CEIL(X) is defined as the smallest integer larger than X. 248 Implementations MAY experiment with the constant value 16 as a 249 tradeoff between responsiveness and performance. 251 4.3. Multiplicative decrease 253 [RethinkLEDBAT] suggests combining additive increases and 254 multiplicative decreases in order to solve the Inter-LEDBAT fairness 255 problem. It proposes to change the way LEDBAT increases and 256 decreases the congestion window based on the ratio between the 257 observed delay and the target. Assuming that the congestion window 258 is changed once per roundtrip measurement. In standard LEDBAT, the 259 per RTT window when delay is less than target is: 261 W += GAIN * (1 - delay/target) 263 In LEDBAT++, with multiplicative decrease, the per RTT window when 264 delay is less than target is: 266 W += GAIN 268 Similarly in standard LEDBAT, the per RTT window when the delay is 269 higher than target is: 271 W -= GAIN * (delay/target - 1) 273 In LEDBAT++, with multiplicative decrease, the per RTT window delay 274 is higher than target is: 276 W += max( (GAIN - Constant * W * (delay/target - 1)), -W/2) ) 278 It is RECOMMENDED that the Constant be set to 1. Implementations MAY 279 experiment with this value. If the connections have different 280 estimates of the base delay, capping the multiplicate decrease to at 281 most W/2 is required. Otherwise, spikes in delay can cause the 282 window to immediately drop to its minimal value. LEDBAT++ sender 283 MUST also ensure that the congestion window never decreases below 2 284 packets, in order to avoid completely starving the connection. 286 4.4. Initial and periodic slowdown 288 The LEDBAT specification assumes that there will be natural gaps in 289 traffic, and that during those gaps the observed delay corresponds to 290 a state where the queues are empty. However, there are workloads 291 where the traffic is sustained for long periods. This causes base 292 delay estimates to be inaccurate and is one of the major reasons 293 behind latency drift as well as the lack of inter-LEDBAT fairness. 294 To ensure stability, LEDBAT++ forces these gaps, or slow down 295 periods. A slowdown is an interval during which the LEDBAT++ 296 connection voluntarily reduces its traffic, allowing queues to drain 297 and transmission delay measurements to converge to the base delay. 298 The slowdown works as follows: 300 o Upon entering slowdown, set SSTHRESH to the current version of the 301 congestion window CWND, and then reduce CWND to 2 packets. 303 o Keep CWND frozen at 2 packets for 2 RTT. 305 o After 2 RTT, ramp up the congestion window according to the slow 306 start algorithm, until the congestion window reaches SSTHRESH. 308 Keeping the CWND frozen at 2 packets for 2 RTT allows the queues to 309 drain, and is key to obtaining accurate delay measurements. The 310 initial slowdown starts shortly after the connection completes the 311 initial slow start phase; 2 RTT after the initial slow start 312 completes. After the initial slowdown, LEDBAT++ sender performs 313 periodic slowdowns. The interval between slowdown is computed so 314 that slowdown does not cause more than a 10% drop in the utilization 315 of the bottleneck. LEDBAT++ sender measures the duration of the 316 slowdown, from the time of entry to the time at which the congestion 317 window regrows to the previous SSTHRESH value. The next slowdown is 318 then scheduled to occur at 9 times this duration after the exit 319 point. The combination of initial and periodic slowdowns allows 320 competing LEDBAT connections to obtain good estimates of the base 321 delay, and when combined with multiplicative decrease solves both the 322 latecomer advantage and the Inter-LEDBAT fairness problems. 324 4.5. Use of Round Trip Time instead of one way delay 326 LEDBAT++ uses Round Trip Time measurements instead of one way delay. 327 One possible shortcoming of round trip delay measurements is that 328 they incorporate queuing delays in both directions. This can lead to 329 unnecessary slowdowns, such as slowing down an upload connection 330 because a download is saturating the downlink but in practice this 331 seems to benefit the workloads because bottleneck link can carry ACK 332 traffic in the other direction for the competing flows. Round trip 333 measurements also include the delay at the receiver between receiving 334 a packet and sending the corresponding acknowledgement. These delays 335 are normally quite small, except when the delayed acknowledgment 336 logic kicks in. Effect of delayed ACK can be particularly acute when 337 the congestion window only includes a few packets, for example at the 338 beginning of the connection. 340 The problems of using one way delay are mitigated through a set of 341 implementation choices. First, LEDBAT++ sender enables the TCP 342 Timestamp option, in order to obtain RTT samples with each 343 acknowledgement. A LEDBAT++ sender SHOULD filter the round trip 344 measurements by using the minimum of the 4 most recent delay samples, 345 as suggested in the LEDBAT specification. Finally, the queueing 346 delay target is set larger than the typical TCP maximum 347 acknowledgement delay. This avoids over reacting to a single delayed 348 ACK measurement. LEDBAT++ default delay target of 60ms is different 349 from the 100ms value recommended in [RFC6817]. 351 5. Deployment Issues 353 LEDBAT++ is a sender-side algorithmic improvement. This implies that 354 for many workloads it requires changes to the servers serving 355 content. It does not address workloads or scenarios where the only 356 entities that can be updated are clients. 358 Transparent proxies prevent measurement of end-to-end delay and might 359 interfere with the effective operation of LEDBAT++. 361 The interaction between Active Queue Management (AQM) and LEDBAT++ is 362 an area of research. 364 6. Security Considerations 366 LEDBAT++ enhances LEDBAT and inherits the general security 367 considerations discussed in [RFC6817]. 369 7. IANA Considerations 371 This document has no actions for IANA. 373 8. Acknowledgements 375 The LEDBAT++ algorithm was designed and implemented by Osman Ertugay, 376 Christian Huitema, Praveen Balasubramanian, and Daniel Havey. 378 9. References 380 9.1. Normative References 382 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 383 RFC 793, DOI 10.17487/RFC0793, September 1981, 384 . 386 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 387 Requirement Levels", BCP 14, RFC 2119, 388 DOI 10.17487/RFC2119, March 1997, 389 . 391 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 392 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 393 . 395 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 396 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 397 DOI 10.17487/RFC6817, December 2012, 398 . 400 9.2. Informative References 402 [RethinkLEDBAT] 403 Carofiglios, G., Muscariello, L., Rossi, D., Testa, C., 404 and S. Valenti, "Rethinking the Low Extra Delay Background 405 Transport (LEDBAT) Protocol", Computer Networks, Volume 406 57, Issue 8, 4 June 2013, Pages 1838-1852, 2013, 407 . 410 Authors' Addresses 412 Praveen Balasubramanian 413 Microsoft 414 One Microsoft Way 415 Redmond, WA 98052 416 USA 418 Phone: +1 425 538 2782 419 Email: pravb@microsoft.com 420 Osman Ertugay 421 Microsoft 423 Phone: +1 425 706 2684 424 Email: osmaner@microsoft.com 426 Daniel Havey 427 Microsoft 429 Phone: +1 425 538 5871 430 Email: dahavey@microsoft.com