idnits 2.17.1 draft-bagnulo-iccrg-rledbat-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 4 instances of too long lines in the document, the longest one being 6 characters in excess of 72. == There are 4 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 8, 2019) is 1754 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-28) exists of draft-ietf-tcpm-rfc793bis-13 -- Obsolete informational reference (is this intentional?): RFC 8312 (Obsoleted by RFC 9438) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Bagnulo 3 Internet-Draft A. Garcia-Martinez 4 Intended status: Experimental UC3M 5 Expires: January 9, 2020 G. Montenegro 6 P. Balasubramanian 7 Microsoft 8 July 8, 2019 10 rLEDBAT: receiver-driven Low Extra Delay Background Transport for TCP 11 draft-bagnulo-iccrg-rledbat-00.txt 13 Abstract 15 This document specifies the rLEDBAT, a receiver-driven, less-than- 16 best-effort congestion control algorithm for TCP. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on January 9, 2020. 35 Copyright Notice 37 Copyright (c) 2019 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (https://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Motivations for rLEDBAT . . . . . . . . . . . . . . . . . . . 3 54 3. rLEDBAT overview . . . . . . . . . . . . . . . . . . . . . . 4 55 4. rLEDBAT design rationale . . . . . . . . . . . . . . . . . . 5 56 4.1. Controlling the receive window . . . . . . . . . . . . . 5 57 4.1.1. Avoiding window shrinking . . . . . . . . . . . . . . 6 58 4.1.2. Window Scale Option . . . . . . . . . . . . . . . . . 6 59 4.2. Using the RTT to estimate the queueing delay . . . . . . 7 60 4.3. Inter-rLEDBAT fairness . . . . . . . . . . . . . . . . . 9 61 4.4. Reacting to packet loss . . . . . . . . . . . . . . . . . 9 62 4.5. Bootstrapping . . . . . . . . . . . . . . . . . . . . . . 10 63 4.6. Reaction to path changes . . . . . . . . . . . . . . . . 11 64 5. rLEDBAT algorithm . . . . . . . . . . . . . . . . . . . . . . 11 65 5.1. Data structures . . . . . . . . . . . . . . . . . . . . . 11 66 5.2. Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 12 67 5.3. rLEDBAT parameters . . . . . . . . . . . . . . . . . . . 13 68 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 69 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 70 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 14 71 9. Informative References . . . . . . . . . . . . . . . . . . . 14 72 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 74 1. Introduction 76 LEDBAT (Low Extra Delay Background Transport) [RFC6817] is a 77 congestion-control algorithm that implements a less-than-best-effort 78 (LBE) traffic class. 80 When LEDBAT traffic shares a bottleneck with one or more TCP 81 connections using standard congestion control algorithms such as 82 Cubic [RFC8312] (hereafter standard-TCP for short), it reduces its 83 sending rate earlier and more aggressively than standard-TCP 84 congestion control, allowing standard-TCP traffic to use more of the 85 available capacity. In the absence of competing standard-TCP 86 traffic, LEDBAT aims to make an efficient use of the available 87 capacity, while keeping the queuing delay within predefined bounds. 89 LEDBAT reacts both to packet loss and to variations in delay. 90 Regarding to packet loss, LEDBAT reacts with a multiplicative 91 decrease, similar to most TCP congestion controllers. Regarding 92 delay, LEDBAT aims for a target queueing delay. When the measured 93 current queueing delay is below the target, LEDBAT increases the 94 sending rate and when the delay is above the target, it reduces the 95 sending rate. LEDBAT estimates the queuing delay by subtracting the 96 measured current one-way delay from the estimated base one-way delay 97 (i.e. the one way delay in the absence of queues). 99 The LEDBAT specification [RFC6817] defines the LEDBAT congestion- 100 control algorithm, implemented in the sender to control its sending 101 rate. LEDBAT is specified in a protocol and layer agnostic manner. 103 In this document, we describe rLEDBAT, a receiver-based, less-than- 104 best-effort congestion control algorithm. rLEDBAT is inspired in 105 LEDBAT but with the following differences: 107 rLEDBAT is implemented in the TCP receiver and controls the 108 sending rate of the sender through the TCP Receiver Window. 110 rLEDBAT uses the round-trip-time (RTT) to estimate the queuing 111 delay. 113 rLEDBAT uses an Additive Increase/Multiplicative Decrease 114 algorithm to achieve inter-(r)LEDBAT fairness and avoid the late- 115 comer advantage observed in LEDBAT. 117 2. Motivations for rLEDBAT 119 rLEDBAT enables new use cases and new deployment models, fostering 120 the use of LBE traffic and benefitting the global Internet by 121 improving overall allocation of resources. The following scenarios 122 are enabled by rLEDBAT: 124 Content Delivery Networks and more sophisticated file distribution 125 scenarios: Consider the case where the source of a file to be 126 distributed (e.g., a software developer that wishes to distribute 127 a software update) would prefer to use LEDBAT and it enables 128 LEDBAT in the servers containing the source file. However, 129 because the file is being distributed through a CDN which 130 surrogates do not support LEDBAT, the result is that the file 131 transfers, originated from CDN surrogates will not be using 132 LEDBAT. Interestingly enough, in the case of the software update, 133 the developer also controls the software performing the download 134 in the client, the receiver of the file, but because current 135 LEDBAT is a sender-based algorithm, controlling the client is not 136 enough to enable LEDBAT in the communication. rLEDBAT would enable 137 the use of LBE traffic class for file distribution in this setup. 139 Interference from proxies and other middleboxes: Proxies and other 140 middleboxes are a commonplace in the Internet. For instance, in 141 the case of mobile networks, proxies are frequently used. In the 142 case of enterprise networks, it is common to deploy corporate 143 proxies for filtering and firewalling. In the case of satellite 144 links, Performance Enhancement Proxies (PEPs) are deployed to 145 mitigate the effect of the long delay in TCP connection. These 146 proxies terminate the TCP connection on both ends and prevent the 147 use of LEDBAT in the segment between the proxy and the sink of the 148 content, the client. By enabling rLEDBAT, clients would be able 149 to enable LBE traffic between them and the proxy. 151 Receiver-defined preferences. It is frequent that the bottleneck 152 of the communication is the access link. This is particularly 153 true in the case of mobile devices. It is then especially 154 relevant for mobile devices to properly manage the capacity of the 155 access link. With current technologies, it is possible for the 156 mobile device to use different congestion control algorithms 157 expressing different preferences for the traffic. For instance, a 158 device can choose to use standard-TCP for some traffic and to use 159 LEDBAT for other traffic. However, this would only affect the 160 outgoing traffic since both standard-TCP and LEDBAT are sender- 161 driven. The mobile device has no means to manage the traffic in 162 the down-link, which is in most cases, the most critical hop for a 163 typical eye-ball end-user. rLEDBAT enables the mobile device to 164 selectively use LBE traffic class for some of the incoming 165 traffic. For instance, by using rLEDBAT, a user can use regular 166 standard-TCP/UDP for video stream (e.g., Youtube) and use rLEDBAT 167 for other background file download. 169 3. rLEDBAT overview 171 rLEDBAT is a congestion control mechanism implemented at the 172 receiver-end of a TCP connection. The rLEDBAT receiver controls the 173 sender's rate through the Receive Window announced to the receiver in 174 the TCP header. 176 rLEDBAT implements an Additive Increase/Multiplicative decrease that 177 reacts to both delay and packet loss. Similarly to LEDBAT, rLEDBAT 178 limits the queueing delay in the path to a target delay T. rLEDBAT 179 uses the RTT to estimate the queueing delay. The rLEDBAT receiver 180 uses the TCP TimeStamp option to measure the RTT. rLEDBAT estimates 181 the Base RTT (i.e. the RTT when there is no queuing delay) as the 182 minimum observed RTT in the last n minutes. rLEDBAT estimation of the 183 queuing delay (qd) is obtained subtracting the Base RTT from latest 184 sample(s) of the RTT. 186 The rLEDBAT algorithm at the receiver calculates a window value 187 (rl.WND) which is then conveyed to the sender though the RECEIVE 188 WINDOW field of the TCP header. We describe next how rl.WND value is 189 calculated. 191 Suppose that the rl.WND was last updated at time t0 and its current 192 value is then rl.WND(t0) and at time t1 a packet is received. The 193 rLEDBAT receiver updates rl.WND as follows: 195 if qd < T, then rl.WND(t1) = rl.WND(t0) + alpha*MSS/rl.WND(t0) 197 if qd > T, then rl.WND(t1) = rl.WND(t0)*betad 199 with MSS being the Maximum Segment Size of the TCP connection, and 200 alpha and betad being the additive increase and multiplicative 201 decrease parameters respectively. This base algorithm results that 202 while the queueing delay is below the target T, the congestion window 203 increases by alpha * MSS$ per RTT, (with alpha > 1) while if the 204 queueing delay is above the target T, the congestion control window 205 is multiplied by betad, with 0 < betad < 1. The multiplicative 206 reduction is applied at most one per RTT. 208 rLEDBAT also performs a multiplicative decrease (with parameter 209 betal) in case there is packet loss. Packet loss are detected at the 210 receiver through the observation of retransmitted packets. 211 Retransmissions of the sender are detected at the receiver by 212 observing the sequence number of the segment and the timestamp value, 213 as we describe later on. 215 4. rLEDBAT design rationale 217 4.1. Controlling the receive window 219 rLEDBAT uses the Receive Window (RCV.WND) of TCP to enable the 220 receiver to control the sender's rate. [I-D.ietf-tcpm-rfc793bis] 221 defines that the RCV.WND is used to announce the available receive 222 buffer to the sender for flow control purposes. In order to avoid 223 confusion, we will call fc.WND the value that a standard RFC793bis 224 TCP receiver calculates to set in the receive window for flow control 225 purposes. We call rl.WND the window value calculated by rLEDBAT 226 algorithm and we call RCV.WND the value actually included in the 227 Receive Window field of the TCP header. For a RFC793bis receiver, 228 RCV.WND == fc.WND. 230 In the case of rLEDBAT receiver, the rLEDBAT receiver sets the 231 RCV.WND to the minimum of rl.WND and fc.WND, honoring both. 233 When using rLEDBAT, two congestion controllers are in action in the 234 flow of data from the sender to the receiver, namely, the congestion 235 control algorithm of TCP in the sender side and the rLEDBAT 236 congestion control algorithm executed in the receiver and conveyed to 237 the sender through the RCV.WND. In the normal TCP operation, the 238 sender uses the minimum of the congestion window cwnd and the 239 receiver window RCV.WND to calculate the sender's window SND.WND. 240 This is also true for rLEDBAT, as the sender is a regular TCP sender. 241 Because rLEDBAT is designed to react earlier and more aggressively to 242 congestion than regular TCP congestion control, the rl.WND contained 243 in the RCV.WND field of TCP will be in general smaller than the 244 congestion window calculated by the TCP sender, implying that the 245 rLEDBAT congestion control algorithm will be effectively controlling 246 the sender's window. Moreover, this also guarantees that even if the 247 queuing delay is mis-estimated, the flow will never transmit more 248 aggressively than a TCP flow, as the sender's congestion window 249 limits the sending rate. 251 In summary, the sender's window is: SND.WND = min(cwnd, rl.WND, 252 fc.WND) 254 4.1.1. Avoiding window shrinking 256 The rLEDBAT algorithm increases or decreases the rl.WND according to 257 congestion signals (variations on the estimations of the queueing 258 delay and packet loss). If the new congestion window is smaller than 259 the current one and there is the possibility that directly announcing 260 it in the RCV.WND may result in shrinking the window, i.e., moving 261 the right window edge to the left. Shrinking the window is 262 discouraged as per [I-D.ietf-tcpm-rfc793bis], as it may cause 263 unnecessary packet loss and performance penalty. 265 In order to avoid window shrinking, upon the reception of a data 266 packet, the announced window can be reduced in the number of bytes 267 contained in the packet at most. This may not always be enough to 268 honor the new calculated value of the rl.WND. So, in order to reduce 269 the window as dictated by the rLEDBAT algorithm, the receiver will 270 progressively reduce the advertised RCV.WND, always honoring that the 271 reduction is less or equal than the received bytes, until the target 272 window determined by the rLEDBAT algorithm is reached. Because the 273 rLEDBAT algorithm only allows to perform at most one multiplicative 274 decrease per RTT, this allows the receiver to drain enough packets 275 from the packets in-flight to reach the reduced window resulting form 276 the rLEDBAT algorithm without need for resorting to shrinking the 277 receiver window. 279 4.1.2. Window Scale Option 281 The Window Scale (WS) option [RFC7323] is a mean to increase the 282 maximum window size permitted by the Receive Window. The use of the 283 WS option implies that the changes in the window are expressed in the 284 units resulting of the WS option used in the TCP connection. This 285 means that the rLEDBAT client will have to accumulate the increases 286 resulting from the different received packets, and only convey a 287 change in the window when the accumulated sum of increases is equal 288 or higher than one unit used to express the receive window according 289 to the WS option in place for the TCP connection. 291 Changes in the receive window that are smaller than 1 MSS are 292 unlikely to have any immediate impact on the sender's rate, as usual 293 TCP segmentation practice results in sending full segments (i.e., 294 segments of size equal to the MSS). So, accumulating changes in the 295 receive window until completing a full MSS in the sender or in the 296 receiver makes little difference. 298 Current WS option specification [RFC7323] defines that allowed values 299 for the WS option are between 0 and 14. Assuming a MSS around 1500 300 bytes, WS option values between 0 and 11 result in the receive window 301 being expressed in units that are about 1 MSS or smaller. So, WS 302 option values between 0 and 11 have no impact in rLEDBAT. 304 WS option values higher than 11 can affect the dynamics of rLEDBAT, 305 since control may become too coarse (e.g., with WS of 14, a change in 306 one unit of the receive window implies a change of 10 MSS in the 307 effective window). 309 For the above reasons, we recommend that when rLEDBAT is used, the 310 rLEDBAT client should set WS option values lower than 12. Additional 311 experimentation is required to explore the impact of larger WS values 312 in rLEDBAT dynamics. 314 Note that the recommendation for rLEDBAT to set the WS option value 315 to lower values does not precludes the communication with servers 316 that set the WS option values to larger values, since the WS option 317 value used is set independently for each direction of the TCP 318 connection. 320 4.2. Using the RTT to estimate the queueing delay 322 rLEDBAT uses the round trip time (RTT) instead of the one-way delay 323 to estimate the queueing delay. In order to estimate the queueing 324 delay using the RTT, the rLEDBAT receiver estimates the base RTT 325 (i.e., the constant components of the RTT) and also measures the 326 current RTT. By subtracting these two values, we obtain the queuing 327 delay to be used by the rLEDBAT controller. 329 rLEDBAT discovers the base RTT (RTTb) by taking the minimum value of 330 the measured RTTs over a period of time. The current RTT (RTTc) is 331 estimated using a number of recent samples and applying a filter, 332 such as the minimum (or the mean) of the last k samples. Using the 333 RTT to estimate the queueing delay has a number of shortcomings and 334 difficulties that we discuss next. 336 The queuing delay measured using the RTT includes also the queueing 337 delay experienced by the return packets in the direction from the 338 rLEDBAT receiver to the sender. This is a fundamental limitation of 339 this approach. The impact of this error is that the rLEDBAT 340 controller will also react to congestion in the reverse path 341 direction which results in an even more conservative mechanism. 343 In order to measure the RTT, rLEDBAT relies on the Time Stamp (TS) 344 option [RFC7323]. By matching the TSVal value carried in outgoing 345 packets with the TSecr value observed in incoming packets, it is 346 possible to measure the RTT. This allows the rLEDBAT receiver to 347 measure the RTT even if it is acting as a pure receiver. In a pure 348 receiver there is no data flowing from the rLEDBAT receiver to the 349 sender, making impossible to match data packets with acknowledgements 350 packets to measure the RTT, as it is usually done in TCP for other 351 purposes. 353 Several issues must be addressed when using this approach in order to 354 avoid an artificial increase of the observed RTT. Consider a TCP 355 communication involving two hosts, host A, which is the legacy 356 server, and host B, the rLEDBAT client, which is a pure receiver i.e. 357 it has no data to send. Following the proposed method for estimating 358 the RTT, host B will include a TSVal value in a TS option when 359 sending packets to A. Since we are assuming that B has no data to 360 send, the TS option will be carried in pure Acknowledgment packets. 361 Upon the reception to the TS Option, host A will copy the value of 362 the TSVal into the TSecr field of the TS option and include that 363 option into the next data packet towards host B. However, there are 364 two reasons why A may not send a packet immediately back to B, 365 artificially increasing the measured RTT. The first reason is when A 366 has no data to send. The second is when A has no available window to 367 put more packets in-flight. We describe next how each of these cases 368 is addressed. 370 The case where the sender has no data to send when it receives the 371 pure Acknowledgement carrying the TSVal to be echoed is rare in the 372 expected rLEDBAT use cases.rLEDBAT will be used mostly for background 373 file transfers so the sender will have data to send throughout the 374 lifetime of the communication. If the file is structured in blocks 375 of data, it may be the case that seldom, the sender will have to wait 376 until the next block is available to proceed with the data transfer. 377 We propose to address this situation by using a minimum filter of the 378 last k samples when measuring the current RTT to discard the (rare) 379 artificially bloated samples. 381 The limitation of available sender's window to send more packets can 382 come either from the congestion window in host A or from the 383 announced receive window from the rLEDBAT in host B. Normally, the 384 receive window will be the one to limit the sender's transmission 385 rate, since rLEDBAT is designed to be more restrictive on the 386 sender's rate than standard-TCP. In any case, if the limiting factor 387 is the congestion window in the sender, it is irrelevant if rLEDBAT 388 further reduces the receive window due to a bloated RTT measurement, 389 since the rLEDBAT is not actively controlling the sender's rate. To 390 address the case in which the limiting factor is the receive window 391 announced by rLEDBAT, the receiver should discard the RTT 392 measurements done while reducing the window and avoid including 393 bloated samples in the queueing delay estimation. The rLEDBAT 394 receiver is aware whether a given TSVal value was sent in a packet 395 where the window was reduced, and if so, it can discard the 396 corresponding RTT measurement. In the proposed algorithm, the 397 affected samples are used for the current RTT estimation, but are not 398 used for updating the rl.WND, as rl.WND remain unchanged for one RTT 399 after a decrease episode. 401 Finally, depending on the frequency of the local clock used to 402 generate the values included in the TS option, several packets may 403 carry the same TSVal value. If that happens, the rLEDBAT receiver 404 will be unable to match the different outgoing packets carrying the 405 same TSVal value with the different incoming packets carrying also 406 the same TSecr value. However, it is not necessary for rLEDBAT to 407 use all packets to estimate the RTT and sampling a subset of in- 408 flight packets per RTT is enough to properly assess the queueing 409 delay. rLEDBAT mitigates this issue by using a minimum filter in the 410 last sampled RTT values to estimate the current RTT. 412 4.3. Inter-rLEDBAT fairness 414 The use of an additive increase/multiplicative decrease (AIMD) 415 algorithm provides inter-rLEDBAT fairness. When using AIMD, the 416 congestion control algorithm causes the larger flow to reduce its 417 rate more aggressively and leave room for the new flow to grow, 418 resulting in the well-known AIMD fairness property. Moreover, in the 419 case of LEDBAT, after a multiplicative decrease, the buffer is 420 drained and the base RTT can be more accurately estimated. 422 In rLEDBAT the congestion window is decreased by a multiplicative 423 factor betad when the measured queueing delay is larger than the 424 target T. 426 4.4. Reacting to packet loss 428 The rLEDBAT receiver is capable of detecting retransmitted packets in 429 the following way. We call RCV.HGH the highest sequence number 430 correspondent to a received byte of data (not assuming that all bytes 431 with smaller sequence numbers have been received already, there may 432 be holes) and we call TSV.HGH the TSVal value corresponding to the 433 segment in which that byte was carried. SEG.SEQ stands for the 434 sequence number of a newly received segment and we call TSV.SEQ the 435 TSVal value of the newly received segment. 437 If SEG.SEQ < RCV.HGH and TSV.SEQ > TSV.HGH then the newly received 438 segment is a retransmission. This is so because the newly received 439 segment was generated later than another already received segment 440 which contained data with a larger sequence number. This means that 441 this segment was lost and was retransmitted. 443 rLEDBAT reduces the rl.WND by a factor betal when detects a 444 retransmission. rLEDBAT reacts to retransmitted packets at most once 445 per RTT. 447 If the sender has detected the packet loss via a timeout, the 448 standard-TCP sender reduces its congestion window to 1 MSS and enters 449 in slow start/exponential increase mode. During the exponential 450 growth, the connection rate will be determined by standard-TCP 451 congestion control at the sender, until the congestion window reaches 452 the receive window announced by rLEDBAT, at which point rLEDBAT takes 453 the control back from the TCP sender. 455 rLEDBAT has two different multiplicative decrease factors, betal and 456 betad. betad is the multiplicative decrease factor used for 457 decreasing the window when the measured queueing exceeds the target T 458 and betal is the one used when packet loss are detected. These two 459 parameters may have different values. 461 The proposed mechanism to detect retransmissions at the receiver 462 fails when there are window tail drops. If all packets in the tail 463 of the window are lost, the receiver will not be able to detect a 464 mismatch between the sequence numbers of the packets and the order of 465 the timestamps. In this case, rLEDBAT will not react to losses but 466 the TCP congestion controller at the sender will, most likely 467 reducing its window to 1MSS and taking over the control of the 468 sending rate, until slow start ramps up and catches the current value 469 of the rLEDBAT window. 471 4.5. Bootstrapping 473 rLEDBAT uses a additive increase mechanism to grow the window. While 474 this algorithm works well in steady-state, it performs poorly for 475 bootstrapping, as it takes significant time to increase the sending 476 rate. In order to ramp-up to the available capacity faster, rLEDBAT 477 uses the initial window used by the flow control algorithm. 479 This implies that when a flow starts, the rLEDBAT algorithm starts 480 with a large window and the sending rate is in fact limited by the 481 slow start algorithm of the sender's TCP. This means that while the 482 queueing delay is no larger than the target T, rLEDBAT increases its 483 sending rate in the same way as standard-TCP, but if the queueing 484 delay exceeds the target, rLEDBAT takes over. 486 4.6. Reaction to path changes 488 rLEDBAT adopts the mechanism defined by LEDBAT to deal with path 489 changes. The LEDBAT algorithm [RFC6817] estimates the base delay by 490 calculating the minimum observed delay in a n minute window. The 491 historical data older than n minutes is not taken into account to 492 estimate the base delay. The reason for this is to react when path 493 changes occur. If the new path has a larger base delay, LEDBAT will 494 keep on using the base delay of the former path and will impose a 495 queueing delay that is larger than the target T. LEDBAT addresses 496 this issue by limiting historical data to n minutes. If there is 497 path change, LEDBAT will use the outdated base delay estimation for a 498 maximum time of n minutes. After that, all the historical data used 499 for the base delay estimation will be of the new path. 501 5. rLEDBAT algorithm 503 5.1. Data structures 505 Parameters: 507 T: Target delay 509 betal: multiplicative decrease factor in case of packet loss 511 betad: multiplicative decrease factor in case of RTT exceeds T 513 alpha: additive increase factor. 515 Variables: 517 current_RTTs is an array with the last k measured RTTs 519 base_RTTs is an array with the minimum observed RTTs in the last n 520 minutes 522 RCV.SEQ is the sequence number of the last byte that was received 523 and acknowledged 525 RCV.HGH is the highest sequence number of a received byte (which 526 may not have been acknowledged yet) 528 TSE.HGH is the TSecr value contained in the segment containing the 529 byte with sequence number RCV.HGH 530 SEG.SEQ is the sequence number of the incoming segment 532 SEG.TSE is the TSecr value of the incoming segment 534 SEG.time is the local time at which the incoming segment was 535 received 537 SEG.RTT is the latest sample of the RTT 539 QD latest estimation of the queueing delay 541 rl.WND window calculated by rLEDBAT without taking into account 542 the window shrinking avoidance constraints 544 rl.WND.WS window calculated by rLEDBAT after taking into account 545 the window shrinking avoidance constrains 547 DRAINED.BYTES number of bytes drained from the flight-size since 548 the last packet sent 550 fc.WND window calculated by standard TCP receiver 552 end.reduction.time auxiliary variable used to prevent rl.WND from 553 being updated after a window reduction 555 5.2. Algorithm 557 on initialization 558 DRAINED.BYTES = 0 559 base_RTTs set to maximum value 560 current_RTTs set to maximum value 561 rl.WND set to max value 562 end.reduction.time = 0 564 on packet arrival 565 DRAINED.BYTES = DRAINED.BYTES + SEG.LEN 566 RTT calculation 567 SEG.RTT = SEG.Time - SEG.TSE (the new sample of the RTT is the time of 568 arrival of the segment minus the time at which the segment containing 569 the TSVal value was issued) 570 Update current_RTTs with SEG.RTT (substitute the oldest RTT sample in 571 the current_RTTs array by SEG.RTT) 572 Update base_RTTs with SEG.RTT (store SEG.RTT in the current current 573 minute position, if SEG.RTT is smaller than the value in that 574 position) 576 QD = min(current_RTTs) - min(base_RTTs) 578 If local.time > end.reduction.time then 579 If SEG.SEQ < RCV.HGH AND SEG.TSE > TSE.HGH then 580 rl.WND = max(rl.WND*betal, 1) 581 end.reduction.time = local.time + min(current_RTTs) 582 else 583 If QD < T, then rl.WND = rl.WND+ alpha*MSS/rl.WND 584 else QD > T, then rl.WND(t1) = max(rl.WND*beta1, 1) 586 on sending a packet 587 if rl.WND > rl.WND.WS or (rl.WND.WS - rl.WND) < DRAINED.BYTES then 588 rl.WND.WS = rl.WND 589 else 590 rl.WND.WS = rl.WND.WS - DRAINED.BYTES 591 DRAINED.BYTES = 0 592 RCV.WND = min(fc.WND, rl.WND.WS) 594 The presented algorithm assumes WS option is not being used. The 595 algorithm also assumes that the precision of the clock used to 596 populate the TS option is fine grained enough for this purpose (e.g. 597 1 ms). If this is not the case, then the receiver should store the 598 local time at which a packet carrying each TSVal value was issued and 599 at which time the same value was received int he TSecr and calculate 600 the RTT subtracting these two values. 602 5.3. rLEDBAT parameters 604 6. Security Considerations 606 7. IANA Considerations 607 8. Acknowledgements 609 This work was supported by the EU through the H2020 5G-RANGE project 610 and by the Spanish Ministry of Economy and Competitiveness through 611 the 5G-City project (TEC2016-76795-C6-3-R). 613 9. Informative References 615 [I-D.ietf-tcpm-rfc793bis] 616 Eddy, W., "Transmission Control Protocol Specification", 617 draft-ietf-tcpm-rfc793bis-13 (work in progress), June 618 2019. 620 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 621 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 622 DOI 10.17487/RFC6817, December 2012, 623 . 625 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 626 Scheffenegger, Ed., "TCP Extensions for High Performance", 627 RFC 7323, DOI 10.17487/RFC7323, September 2014, 628 . 630 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 631 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 632 RFC 8312, DOI 10.17487/RFC8312, February 2018, 633 . 635 Authors' Addresses 637 Marcelo Bagnulo 638 UC3M 640 Email: marcelo@it.uc3m.es 642 Alberto Garcia-Martinez 643 UC3M 645 Email: alberto@it.uc3m.es 647 Gabriel Montenegro 648 Microsoft 650 Email: Gabriel.Montenegro@microsoft.com 651 Praveen Balasubramanian 652 Microsoft 654 Email: pravb@microsoft.com