idnits 2.17.1 draft-ietf-ledbat-survey-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 17, 2010) is 4878 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 1323 (Obsoleted by RFC 7323) -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) -- Obsolete informational reference (is this intentional?): RFC 3662 (Obsoleted by RFC 8622) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force M. Welzl 3 Internet-Draft University of Oslo 4 Intended status: Informational D. Ros 5 Expires: June 20, 2011 Institut Telecom / Telecom 6 Bretagne 7 December 17, 2010 9 A Survey of Lower-than-Best-Effort Transport Protocols 10 draft-ietf-ledbat-survey-03.txt 12 Abstract 14 This document provides a survey of transport protocols which are 15 designed to have a smaller bandwidth and/or delay impact on standard 16 TCP than standard TCP itself when they share a bottleneck with it. 17 Such protocols could be used for delay-insensitive "background" 18 traffic, as they provide what is sometimes called a "less than" (or 19 "lower than") best-effort service. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on June 20, 2011. 38 Copyright Notice 40 Copyright (c) 2010 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Delay-based transport protocols . . . . . . . . . . . . . . . 3 57 2.1. Accuracy of delay-based congestion predictors . . . . . . 6 58 2.2. Potential issues with delay-based congestion control 59 for LBE transport . . . . . . . . . . . . . . . . . . . . 7 60 3. Non-delay-based transport protocols . . . . . . . . . . . . . 8 61 4. Upper-layer approaches . . . . . . . . . . . . . . . . . . . . 9 62 4.1. Receiver-oriented, flow-control based approaches . . . . . 10 63 5. Network-assisted approaches . . . . . . . . . . . . . . . . . 11 64 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 65 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 66 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12 67 9. Changes from the previous version (section to be removed 68 later) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 69 10. Informative References . . . . . . . . . . . . . . . . . . . . 13 70 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 72 1. Introduction 74 This document presents a brief survey of proposals to attain a Less 75 than Best Effort (LBE) service by means of end-host mechanisms. We 76 loosely define a LBE service as a service which results in smaller 77 bandwidth and/or delay impact on standard TCP than standard TCP 78 itself, when sharing a bottleneck with it. We refer to systems that 79 provide this service as LBE systems. 81 Generally, LBE behavior can be achieved by reacting to queue growth 82 earlier than standard TCP would, or by changing the congestion 83 avoidance behavior of TCP without utilizing any additional implicit 84 feedback. It is therefore assumed that readers are familiar with TCP 85 congestion control [RFC5681]. Some mechanisms achieve an LBE 86 behavior without modifying transport protocol standards (e.g., by 87 changing the receiver window of standard TCP), whereas others 88 leverage network-level mechanisms at the transport layer for LBE 89 purposes. According to this classification, solutions have been 90 categorized in this document as delay-based transport protocols, non- 91 delay-based transport protocols, upper-layer approaches and network- 92 assisted approaches. 94 This document is a product of the Low Extra Delay Background 95 Transport (LEDBAT) Working Group for comparison with the chosen 96 approach. Most techniques discussed here were tested in limited 97 simulations or experimental testbeds, but LEDBAT's algorithm is 98 already under widespread deployment. This survey is not exhaustive, 99 as this would not be possible or useful; the authors/editors have 100 selected key, well-known, or otherwise interesting techniques for 101 inclusion at their discretion. There is also a substantial amount of 102 work that is related to the LBE concept but not presenting a solution 103 that can be installed in end hosts or expected to work over the 104 Internet (e.g., a DiffServ-based, Lower-Effort service [RFC3662]); 105 such mechanisms are outside the scope of this document. 107 2. Delay-based transport protocols 109 It is wrong to generally equate "little impact on standard TCP" with 110 "small sending rate". Without ECN support, standard TCP will 111 normally increase its congestion window (and effective sending rate) 112 until a queue overflows, causing one or more packets to be dropped 113 and the effective rate to be reduced. A protocol which stops 114 increasing the rate before this event happens can, in principle, 115 achieve a better performance than standard TCP. In the absence of 116 any other traffic, this is even true for TCP itself when its maximum 117 send window is limited to the bandwidth*round-trip time (RTT) 118 product. 120 TCP Vegas [Bra94] is one of the first protocols that was known to 121 have a smaller sending rate than standard TCP when both protocols 122 share a bottleneck [Kur00] -- yet it was designed to achieve more, 123 not less throughput than standard TCP. Indeed, when it is the only 124 protocol on the bottleneck, the throughput of TCP Vegas is greater 125 than the throughput of standard TCP. Depending on the bottleneck 126 queue length, TCP Vegas itself can be starved by standard TCP flows. 127 This can be remedied to some degree by the RED Active Queue 128 Management mechanism [RFC2309]. Vegas linearly increases or 129 decreases the sending rate, based on the difference between the 130 expected throughput and the actual throughput. The estimation is 131 based on RTT measurements. 133 The congestion avoidance behavior is the protocol's most important 134 feature in terms of historical relevance as well as relevance in the 135 context of this document (it has been shown that other elements of 136 the protocol can sometimes play a greater role for its overall 137 behavior [Hen00]). In congestion avoidance, once per RTT, TCP Vegas 138 calculates the expected throughput as WindowSize / BaseRTT, where 139 WindowSize is the current congestion window and BaseRTT is the 140 minimum of all measured RTTs. The expected throughput is then 141 compared with the actual throughput measured by recent 142 acknowledgements. If the actual throughput is smaller than the 143 expected throughput minus a threshold called "beta", this is taken as 144 a sign of congestion, causing the protocol to linearly decrease its 145 rate. If the actual throughput is greater than the expected 146 throughput minus a threshold called "alpha" (with alpha < beta), this 147 is taken as a sign that the network is underutilized, causing the 148 protocol to linearly increase its rate. 150 TCP Vegas has been analyzed extensively. One of the most prominent 151 properties of TCP Vegas is its fairness between multiple flows of the 152 same kind, which does not penalize flows with large propagation 153 delays in the same way as standard TCP. While it was not the first 154 protocol that uses delay as a congestion indication, its predecessors 155 (like CARD [Jai89], Tri-S [Wan91] or DUAL [Wan92]) are not discussed 156 here because of the historical "landmark" role that TCP Vegas has 157 taken in the literature. 159 Delay-based transport protocols which were designed to be non- 160 intrusive include TCP Nice [Ven02] and TCP Low Priority (TCP-LP) 161 [Kuz06]. TCP Nice [Ven02] follows the same basic approach as TCP 162 Vegas but improves upon it in some aspects. Because of its moderate 163 linear-decrease congestion response, TCP Vegas can affect standard 164 TCP despite its ability to detect congestion early. TCP Nice removes 165 this issue by halving the congestion window (at most once per RTT, 166 like standard TCP) instead of linearly reducing it. To avoid being 167 too conservative, this is only done if a fixed predefined fraction of 168 delay-based incipient congestion signals appears within one RTT. 169 Otherwise, TCP Nice falls back to the congestion avoidance rules of 170 TCP Vegas if no packet was lost or standard TCP if a packet was lost. 171 One more feature of TCP Nice is its ability to support a congestion 172 window of less than one packet, by clocking out single packets over 173 more than one RTT. With ns-2 simulations and real-life experiments 174 using a Linux implementation, the authors of [Ven02] show that TCP 175 Nice achieves its goal of efficiently utilizing spare capacity while 176 being non-intrusive to standard TCP. 178 Other than TCP Vegas and TCP Nice, TCP-LP [Kuz06] uses only the one- 179 way delay (OWD) instead of the RTT as an indicator of incipient 180 congestion. This is done to avoid reacting to delay fluctuations 181 that are caused by reverse cross-traffic. Using the TCP Timestamps 182 option [RFC1323], the OWD is determined as the difference between the 183 receiver's Timestamp value in the ACK and the original Timestamp 184 value that the receiver copied into the ACK. While the result of 185 this subtraction can only precisely represent the OWD if clocks are 186 synchronized, its absolute value is of no concern to TCP-LP and hence 187 clock synchronization is unnecessary. Using a constant smoothing 188 parameter, TCP-LP calculates an Exponentially Weighted Moving Average 189 (EWMA) of the measured OWD and checks whether the result exceeds a 190 threshold within the range of the minimum and maximum OWD that was 191 seen during the connections's lifetime; if it does, this condition is 192 interpreted as an "early congestion indication". The minimum and 193 maximum OWD values are initialized during the slow-start phase. 195 Regarding its reaction to an early congestion indication, TCP-LP 196 tries to strike a middle ground between the overly conservative 197 choice of _immediately_ setting the congestion window to one packet, 198 and the presumably too aggressive choice of simply halving the 199 congestion window like standard TCP; TCP-LP tries to delay the former 200 action by an additional RTT, to see if there is persistent congestion 201 or not. It does so by halving the window at first in response to an 202 early congestion indication, then initializing an "inference time-out 203 timer", and maintaining the current congestion window until this 204 timer fires. If another early congestion indication appeared during 205 this "inference phase", the window is then set to 1; otherwise, the 206 window is maintained and TCP-LP continues to increase it in the 207 standard Additive-Increase fashion. This method ensures that it 208 takes at least two RTTs for a TCP-LP flow to decrease its window to 209 1, and, like standard TCP, TCP-LP reacts to congestion at most once 210 per RTT. 212 Using a simple analytical model, the authors of TCP-LP [Kuz06] 213 illustrate the feasibility of a delay-based LBE transport by showing 214 that, due to the non-linear relationship between throughput and RTT, 215 it is possible to avoid interfering with standard TCP traffic even 216 when the flows under consideration have a larger RTT than standard 217 TCP flows. With ns-2 simulations and real-life experiments using a 218 Linux implementation, the authors of [Kuz06] show that TCP-LP is 219 largely non-intrusive to TCP traffic while at the same time enabling 220 it to utilize a large portion of the excess network bandwidth, which 221 is fairly shared among competing TCP-LP flows. They also show that 222 using their protocol for bulk data transfers greatly reduces file 223 transfer times of competing best-effort web traffic. 225 Sync-TCP [Wei05] follows a similar approach as TCP-LP, by adapting 226 its reaction to congestion according to changes in the OWD. By 227 comparing the estimated (average) forward queuing delay to the 228 maximum observed delay, Sync-TCP adapts the AIMD parameters depending 229 on the trend followed by the average delay over an observation 230 window. Even though the authors of [Wei05] did not explicitly 231 consider its use as an LBE protocol, Sync-TCP was designed to react 232 early to incipient congestion, while grabbing available bandwidth 233 more aggressively than a standard TCP in congestion-avoidance mode. 235 Delay-based congestion control is also at the basis of proposals 236 aiming at adapting TCP's congestion avoidance to very high-speed 237 networks. Some of these proposals, like Compound TCP [Tan06][Sri08] 238 and TCP Illinois [Liu08], are hybrid loss- and delay-based 239 mechanisms, whereas others (e.g., NewVegas [Dev03], FAST TCP [Wei06] 240 or CODE TCP [Cha10]) are variants of Vegas based primarily on delays. 242 2.1. Accuracy of delay-based congestion predictors 244 The accuracy of delay-based congestion predictors has been the 245 subject of a good deal of research, see e.g. [Bia03], [Mar03], 246 [Pra04], [Rew06], [McC08]. The main result of most of these studies 247 is that delays (or, more precisely, round-trip times) are, in 248 general, weakly correlated with congestion. There are several 249 factors that may induce such a poor correlation: 251 o Bottleneck buffer size: in principle, a delay-based mechanism 252 could be made "more than TCP friendly" _if_ buffers are "large 253 enough", so that RTT fluctuations and/or deviations from the 254 minimum RTT can be detected by the end-host with reasonable 255 accuracy. Otherwise, it may be hard to distinguish real delay 256 variations from measurement noise. 258 o RTT measurement issues: in principle, RTT samples may suffer from 259 poor resolution, due to timers which are too coarse-grained with 260 respect to the scale of delay fluctuations. Also, a flow may 261 obtain a very noisy estimate of RTTs due to undersampling, under 262 some circumstances (e.g., the flow rate is much lower than the 263 link bandwidth). For TCP, other potential sources of measurement 264 noise include: TCP segmentation offloading (TSO) and the use of 265 delayed ACKs [Hay10]. A congested reverse path may also result in 266 an erroneous assessment of the congestion state of the forward 267 path. Finally, in the case of fast or short-distance links, the 268 majority of the measured delay can in fact be due to processing in 269 the involved hosts; typically, this processing delay is not of 270 interest, and it can underly fluctuations that are not related to 271 the network at all. 273 o Level of statistical multiplexing and RTT sampling: it may be easy 274 for an individual flow to "miss" loss/queue overflow events, 275 especially if the number of flows sharing a bottleneck buffer is 276 significant. This is nicely illustrated e.g. in Fig. 1 of 277 [McC08]. 279 o Impact of wireless links: several mechanisms that are typical of 280 wireless links, like link-layer scheduling and error recovery, may 281 induce strong delay fluctuations over short time scales [Gur04]. 283 Interestingly, the results of Bhandarkar et al. [Bha07] seem to 284 paint a slightly different picture, regarding the accuracy of delay- 285 based congestion prediction. Bhandarkar et al. claim that it is 286 possible to significantly improve prediction accuracy by adopting 287 some simple techniques (smoothing of RTT samples, increasing the RTT 288 sampling frequency). Nonetheless, they acknowledge that even with 289 such techniques, it is not possible to eradicate detection errors. 290 Their proposed delay-based congestion avoidance method, PERT 291 (Probabilistic Early Response TCP), mitigates the impact of residual 292 detection errors by means of a probabilistic response mechanism to 293 congestion detection events. 295 2.2. Potential issues with delay-based congestion control for LBE 296 transport 298 Whether a delay-based protocol behaves in its intended manner (e.g., 299 it is "more than TCP friendly", or it grabs available bandwidth in a 300 very aggressive manner) may therefore depend on the accuracy issues 301 listed in Section 2.1. Moreover, protocols like Vegas need to keep 302 an estimate of the minimum ("base") delay; this makes such protocols 303 highly sensitive to eventual changes in the end-to-end route during 304 the lifetime of the flow [Mo99]. 306 Regarding the issue of false positives/false negatives with a delay- 307 based congestion detector, most studies focus on the loss of 308 throughput coming from the erroneous detection of queue build-up and 309 of alleviation of congestion. Arguably, for a LBE transport protocol 310 it's better to err on the "more-than-TCP-friendly side", that is, to 311 always yield to _perceived_ congestion whether it is "real" or not; 312 however, failure to detect congestion (due to one of the above 313 accuracy problems) would result in behavior that is not LBE. For 314 instance, consider the case in which the bottleneck buffer is small, 315 so that the contribution of queueing delay at the bottleneck to the 316 global end-to-end delay is small. In such a case, a flow using a 317 delay-based mechanism might end up consuming a good deal of bandwidth 318 with respect to a competing standard TCP flow, unless it also 319 incorporates a suitable reaction to loss. 321 A delay-based mechanism may also suffer from the so-called "latecomer 322 advantage" (or latecomer unfairness) problem. Consider the case in 323 which the bottleneck link is already (very) congested. In such a 324 scenario, delay variations may be quite small, hence, it may be very 325 difficult to tell an empty queue from a heavily-loaded queue, in 326 terms of delay fluctuation. Therefore, a newly-arriving delay-based 327 flow may start sending faster when there is already heavy congestion, 328 eventually driving away loss-based flows [Sha05][Car10]. 330 3. Non-delay-based transport protocols 332 There exist a few transport-layer proposals that achieve an LBE 333 service without relying on delay as an indicator of congestion. In 334 the algorithms discussed below the loss rate of the flow determines, 335 either implicitly or explicitly, the sending rate (which is adapted 336 so as to obtain a lower share of the available bandwidth than 337 standard TCP); such mechanisms likely react to congestion more slowly 338 than delay-based ones. 340 4CP [Liu07], which stands for "Competitive and Considerate Congestion 341 Control", is a protocol which provides a LBE service by changing the 342 window control rules of standard TCP. A "virtual window" is 343 maintained which, during a so-called "bad congestion phase" is 344 reduced to less than a predefined minimum value of the actual 345 congestion window. The congestion window is only increased again 346 once the virtual window exceeds this minimum, and in this way the 347 virtual window controls the duration during which the sender 348 transmits with a fixed minimum rate. Whether the congestion state is 349 "bad" or "good" depends on whether the loss event rate is above or 350 below a threshold (or target) value. The 4CP congestion avoidance 351 algorithm allows for setting a target average window and avoids 352 starvation of "background" flows while bounding the impact on 353 "foreground" flows. Its performance was evaluated in ns-2 354 simulations and in real-life experiments with a kernel-level 355 implementation in Microsoft Windows Vista. 357 The MulTFRC [Dam09] protocol is an extension of TCP-Friendly Rate 358 Control (TFRC) [RFC5348] for multiple flows. MulTFRC takes the main 359 idea of MulTCP [Cro98] and similar proposals (e.g., [Hac04], [Hac08], 360 [Kuo08]) a step further. A single MulTCP flow tries to emulate (and 361 be as friendly as) a number N > 1 of parallel TCP flows. By 362 supporting values of N between 0 and 1, MulTFRC can be used as a 363 mechanism for a LBE service. Since it does not react to delay like 364 the protocols described in Section 2 but adjusts its rate like TFRC, 365 MulTFRC can probably be expected to be more aggressive than 366 mechanisms such as TCP Nice or TCP-LP. This also means that MulTFRC 367 is less likely to be prone to starvation, as its aggressiveness is 368 tunable at a fine granularity, even when N is between 0 and 1. 370 4. Upper-layer approaches 372 The proposals described in this section do not require modifying 373 transport protocol standards. Most of them can be regarded as 374 running "on top" of an existing transport, even though they may be 375 implemented either at the application layer (i.e., in user-level 376 processes), or in the kernel of the end hosts' operating system. 377 Such "upper-layer" mechanisms may arguably be easier to deploy than 378 transport-layer approaches, since they do not require any changes to 379 the transport itself. 381 A simplistic, application-level approach to a background transport 382 service may consist in scheduling automated transfers at times when 383 the network is lightly loaded, as described in e.g. [Dyk02] for 384 cooperative proxy caching. An issue with such a technique is that it 385 may not necessarily be appropriate to applications like peer-to-peer 386 file transfer, since the notion of an "off-peak hour" is not 387 meaningful when end-hosts may be located anywhere in the world. 389 The so-called Background Intelligent Transfer Service (BITS) [BITS] 390 is implemented in several versions of Microsoft Windows. BITS uses a 391 system of application-layer priority levels for file-transfer jobs, 392 together with monitoring of bandwidth usage of the network interface 393 (or, in more recent versions, of the network gateway connected to the 394 end-host), so that, low-priority transfers at a given end-host give 395 way to both high-priority (foreground) transfers and traffic from 396 interactive applications at the same host. 398 A different approach is taken in [Egg05] -- here, the priority of a 399 flow is reduced via a generic idletime scheduling strategy in a 400 host's operating system. While results presented in this paper show 401 that the new scheduler can effectively shield regular tasks from low- 402 priority ones (e.g., TCP from greedy UDP) with only a minor 403 performance impact, it is an underlying assumption that all involved 404 end hosts would use the idletime scheduler. In other words, it is 405 not the focus of this work to protect a standard TCP flow which 406 originates from any host where the presented scheduling scheme may 407 not be implemented. 409 4.1. Receiver-oriented, flow-control based approaches 411 Some proposals for achieving an LBE behavior work by exploiting 412 existing transport-layer features -- typically, at the "receiving" 413 side. In particular, TCP's built-in flow control can be used as a 414 means to achieve a low-priority transport service. 416 The mechanism described in [Spr00] is an example of the above 417 technique. Such mechanism controls the bandwidth by letting the 418 receiver intelligently manipulate the receiver window of standard 419 TCP. This is possible because the authors assume a client-server 420 setting where the receiver's access link is typically the bottleneck. 421 The scheme incorporates a delay-based calculation of the expected 422 queue length at the bottleneck, which is quite similar to the 423 calculation in the above delay-based protocols, e.g. TCP Vegas. 424 Using a Linux implementation, where TCP flows are classified 425 according to their application's needs, Spring et al. show in [Spr00] 426 that a significant improvement in packet latency can be attained over 427 an unmodified system, while maintaining good link utilization. 429 A similar method is employed by Mehra et al. [Meh03], where both the 430 advertised receiver window and the delay in sending ACK messages are 431 dynamically adapted to attain a given rate. As in [Spr00], Mehra et 432 al. assume that the bottleneck is located at the receiver's access 433 link. However, the latter also propose a bandwidth-sharing system, 434 allowing to control the bandwidth allocated to different flows, as 435 well as to allot a minimum rate to some flows. 437 Receiver window tuning is also done in [Key04], where choosing the 438 right value for the window is phrased as an optimization problem. On 439 this basis, two algorithms are presented, binary search -- which is 440 faster than the other one at achieving a good operation point but 441 fluctuates -- and stochastic optimization, which does not fluctuate 442 but converges slower than binary search. These algorithms merely use 443 the previous receiver window and the amount of data received during 444 the previous control interval as input. According to [Key04], the 445 encouraging simulation results suggest that such an application level 446 mechanism can work almost as well as a transport layer scheme like 447 TCP-LP. 449 Another way of dealing with non-interactive flows, like e.g. web 450 prefetching, is to rate-limit the transfer of such bursty traffic 451 [Cro98b]. Note that one of the techniques used in [Cro98b] is, 452 precisely, to have the downloading application adapt the TCP receiver 453 window, so as to reduce the data rate to the minimum needed (thus, 454 disturbing other flows as little as possible while respecting a 455 deadline for the transfer of the data). 457 5. Network-assisted approaches 459 Network-layer mechanisms, like active queue management (AQM) and 460 packet scheduling in routers, can be exploited by a transport 461 protocol for achieving an LBE service. Such approaches may result in 462 improved protection of non-LBE flows (e.g., when scheduling is used); 463 besides, approaches using an explicit, AQM-based congestion signaling 464 may arguably be more robust than, say, delay-based transports for 465 detecting impending congestion. However, an obvious drawback of any 466 network-assisted approach is that, in principle, they need 467 modifications in both end-hosts and intermediate network nodes. 469 Harp [Kok04] realizes a LBE service by dissipating background traffic 470 to less-utilized paths of the network, based on multipath routing and 471 multipath congestion control. This is achieved without changing all 472 routers, by using edge nodes as relays. According to the authors, 473 these edge nodes should be gateways of organizations in order to 474 align their scheme with usage incentives, but the technical solution 475 would also work if Harp was only deployed in end hosts. It detects 476 impending congestion by looking at delay, similar to TCP Nice 477 [Ven02], and manages to improve the utilization and fairness of TCP 478 over pure single-path solutions without requiring any changes to the 479 TCP itself. 481 Another technique is that used by protocols like NF-TCP [Aru10b], 482 where a bandwidth-estimation module integrated into the transport 483 protocol allows to rapidly take advantage of free capacity. NF-TCP 484 combines this with an early congestion detection based on Explicit 485 Congestion Notification (ECN) [RFC3168] and RED [RFC2309]; when 486 congestion starts building up, appropriate tuning of a RED queue 487 allows to mark low-priority (i.e., NF-TCP) packets with a much higher 488 probability than high-priority (i.e., standard TCP) packets, so low- 489 priority flows yield up bandwidth before standard TCP flows. NF-TCP 490 could be implemented by adapting the congestion control behavior of 491 TCP without requiring to change the protocol on the wire -- with the 492 only exception that NF-TCP-capable routers must be able to somehow 493 distinguish NF-TCP traffic from other TCP traffic. 495 In [Ven08], Venkataraman et al. propose a transport-layer approach to 496 leverage an existing, network-layer LBE service based on priority 497 queueing. Their transport protocol, which they call PLT (Priority- 498 Layer Transport), splits a layer-4 connection into two flows, a high- 499 priority one and a low-priority one. The high-priority flow is sent 500 over the higher-priority queueing class (in principle, offering a 501 best-effort service) using an AIMD, TCP-like congestion control 502 mechanism. The low-priority flow, which is mapped to the LBE class, 503 uses a non TCP-friendly congestion control algorithm. The goal of 504 PLT is thus to maximize its aggregate throughput by exploiting unused 505 capacity in an aggressive way, while protecting standard TCP flows 506 carried by the best-effort class. Similar in spirit, [Ott03] 507 proposes simple changes to only the AIMD parameters of TCP for use 508 over a network-layer LBE service, so that such "filler" traffic may 509 aggressively consume unused bandwidth. Note that [Ven08] also 510 considers a mechanism for detecting the lack of priority queueing in 511 the network, so that the non-TCP friendly flow may be inhibited. The 512 PLT receiver monitors the loss rate of both flows; if the high- 513 priority flow starts seeing losses while the low-priority one does 514 not experience 100% loss, this is taken as an indication of the 515 absence of strict priority queueing. 517 6. Acknowledgements 519 The authors would like to thank Dragana Damjanovic, Melissa Chavez 520 and Yinxia Zhao for reference pointers, as well as Mayutan 521 Arumaithurai, Mirja Kuehlewind and Wesley Eddy for their detailed 522 reviews and suggestions. 524 7. IANA Considerations 526 This memo includes no request to IANA. 528 8. Security Considerations 530 This document introduces no new security considerations. 532 9. Changes from the previous version (section to be removed later) 534 o Updated the introduction to cover indended audience and say that 535 this is coming from the LEDBAT WG. 537 o Removed the "+" in reference anchors. 539 o Small editorial changes and various fixes based on Wes' comments 540 throughout the document. 542 10. Informative References 544 [Aru10b] Arumaithurai, M., Fu, X., and K. Ramakrishnan, "NF-TCP: A 545 Network Friendly TCP Variant for Background Delay- 546 Insensitive Applications", Technical Report No. IFI-TB- 547 2010-05, Institute of Computer Science, University of 548 Goettingen, Germany, September 2010, . 552 [BITS] Microsoft, "Windows Background Intelligent Transfer 553 Service", 554 . 556 [Bha07] Bhandarkar, S., Reddy, A., Zhang, Y., and D. Loguinov, 557 "Emulating AQM from end hosts", Proceedings of ACM 558 SIGCOMM 2007, 2007. 560 [Bia03] Biaz, S. and N. Vaidya, "Is the round-trip time correlated 561 with the number of packets in flight?", Proceedings of the 562 3rd ACM SIGCOMM conference on Internet measurement (IMC 563 '03) , pages 273-278, 2003. 565 [Bra94] Brakmo, L., O'Malley, S., and L. Peterson, "TCP Vegas: New 566 techniques for congestion detection and avoidance", 567 Proceedings of SIGCOMM '94, pages 24-35, August 1994. 569 [Car10] Carofiglio, G., Muscariello, L., Rossi, D., and S. 570 Valenti, "The quest for LEDBAT fairness", Proceedings of 571 IEEE GLOBECOM 2010, December 2010. 573 [Cha10] Chan, Y., Lin, C., Chan, C., and C. Ho, "CODE TCP: A 574 competitive delay-based TCP", Computer Communications , 575 33(9):1013-1029, June 2010. 577 [Cro98] Crowcroft, J. and P. Oechslin, "Differentiated end-to-end 578 Internet services using a weighted proportional fair 579 sharing TCP", ACM SIGCOMM Computer Communication 580 Review vol. 28, no. 3 (July 1998), pp. 53-69, 1998. 582 [Cro98b] Crovella, M. and P. Barford, "The network effects of 583 prefetching", Proceedings of IEEE INFOCOM 1998, 584 April 1998. 586 [Dam09] Damjanovic, D. and M. Welzl, "MulTFRC: Providing Weighted 587 Fairness for Multimedia Applications (and others too!)", 588 ACM Computer Communication Review vol. 39, no. 3 (July 589 2009), 2009. 591 [Dev03] De Vendictis, A., Baiocchi, A., and M. Bonacci, "Analysis 592 and enhancement of TCP Vegas congestion control in a mixed 593 TCP Vegas and TCP Reno network scenario", Performance 594 Evaluation , 53(3-4):225-253, 2003. 596 [Dyk02] Dykes, S. and K. Robbins, "Limitations and benefits of 597 cooperative proxy caching", IEEE Journal on Selected Areas 598 in Communications 20(7):1290-1304, September 2002. 600 [Egg05] Eggert, L. and J. Touch, "Idletime Scheduling with 601 Preemption Intervals", Proceedings of 20th ACM Symposium 602 on Operating Systems Principles SOSP 2005, Brighton, 603 United Kingdom, pp. 249/262, October 2005. 605 [Gur04] Gurtov, A. and S. Floyd, "Modeling wireless links for 606 transport protocols", ACM SIGCOMM Computer Communications 607 Review 34(2):85-96, April 2004. 609 [Hac04] Hacker, T., Noble, B., and B. Athey, "Improving Throughput 610 and Maintaining Fairness using Parallel TCP", Proceedings 611 of IEEE INFOCOM 2004, March 2004. 613 [Hac08] Hacker, T. and P. Smith, "Stochastic TCP: A Statistical 614 Approach to Congestion Avoidance", Proceedings of 615 PFLDnet 2008, March 2008. 617 [Hay10] Hayes, D., "Timing enhancements to the FreeBSD kernel to 618 support delay and rate based TCP mechanisms", Technical 619 Report 100219A , Centre for Advanced Internet 620 Architectures, Swinburne University of Technology, 621 February 2010. 623 [Hen00] Hengartner, U., Bolliger, J., and T. Gross, "TCP Vegas 624 revisited", Proceedings of IEEE INFOCOM 2000, March 2000. 626 [Jai89] Jain, R., "A delay-based approach for congestion avoidance 627 in interconnected heterogeneous computer networks", ACM 628 Computer Communication Review , 19(5):56-71, October 1989. 630 [Key04] Key, P., Massoulie, L., and B. Wang, "Emulating Low- 631 Priority Transport at the Application Layer: a Background 632 Transfer Service", Proceedings of ACM SIGMETRICS 2004, 633 January 2004. 635 [Kok04] Kokku, R., Bohra, A., Ganguly, S., and A. Venkataramani, 636 "A Multipath Background Network Architecture", Proceedings 637 of IEEE INFOCOM 2007, May 2007. 639 [Kuo08] Kuo, F. and X. Fu, "Probe-Aided MulTCP: an aggregate 640 congestion control mechanism", ACM SIGCOMM Computer 641 Communication Review vol. 38, no. 1 (January 2008), pp. 642 17-28, 2008. 644 [Kur00] Kurata, K., Hasegawa, G., and M. Murata, "Fairness 645 Comparisons Between TCP Reno and TCP Vegas for Future 646 Deployment of TCP Vegas", Proceedings of INET 2000, 647 July 2000. 649 [Kuz06] Kuzmanovic, A. and E. Knightly, "TCP-LP: low-priority 650 service via end-point congestion control", IEEE/ACM 651 Transactions on Networking (ToN) Volume 14, Issue 4, pp. 652 739-752., August 2006, 653 . 655 [Liu07] Liu, S., Vojnovic, M., and D. Gunawardena, "Competitive 656 and Considerate Congestion Control for Bulk Data 657 Transfers", Proceedings of IWQoS 2007, June 2007. 659 [Liu08] Liu, S., Basar, T., and R. Srikant, "TCP-Illinois: A loss- 660 and delay-based congestion control algorithm for high- 661 speed networks", Performance Evaluation , 65(6-7):417-440, 662 2008. 664 [Mar03] Martin, J., Nilsson, A., and I. Rhee, "Delay-based 665 congestion avoidance for TCP", IEEE/ACM Transactions on 666 Networking , 11(3):356-369, June 2003. 668 [McC08] McCullagh, G. and D. Leith, "Delay-based congestion 669 control: Sampling and correlation issues revisited", 670 Technical report , Hamilton Institute, 2008. 672 [Meh03] Mehra, P., Zakhor, A., and C. De Vleeschouwer, "Receiver- 673 Driven Bandwidth Sharing for TCP", Proceedings of IEEE 674 INFOCOM 2003, April 2003. 676 [Mo99] Mo, J., La, R., Anantharam, V., and J. Walrand, "Analysis 677 and Comparison of TCP Reno and TCP Vegas", Proceedings of 678 IEEE INFOCOM 1999, March 1999. 680 [Ott03] Ott, B., Warnky, T., and V. Liberatore, "Congestion 681 control for low-priority filler traffic", SPIE QoS 2003 682 (Quality of Service over Next-Generation Internet), In 683 Proc. SPIE, Vol. 5245, 154, Monterey (CA), USA, July 2003. 685 [Pra04] Prasad, R., Jain, M., and C. Dovrolis, "On the 686 effectiveness of delay-based congestion avoidance", 687 Proceedings of PFLDnet , 2004. 689 [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions 690 for High Performance", RFC 1323, May 1992. 692 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 693 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 694 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 695 S., Wroclawski, J., and L. Zhang, "Recommendations on 696 Queue Management and Congestion Avoidance in the 697 Internet", RFC 2309, April 1998. 699 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 700 of Explicit Congestion Notification (ECN) to IP", 701 RFC 3168, September 2001. 703 [RFC3662] Bless, R., Nichols, K., and K. Wehrle, "A Lower Effort 704 Per-Domain Behavior (PDB) for Differentiated Services", 705 RFC 3662, December 2003. 707 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 708 Friendly Rate Control (TFRC): Protocol Specification", 709 RFC 5348, September 2008. 711 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 712 Control", RFC 5681, September 2009. 714 [Rew06] Rewaskar, S., Kaur, J., and D. Smith, "Why don't delay- 715 based congestion estimators work in the real-world?", 716 Technical report TR06-001 , University of North Carolina 717 at Chapel Hill, Dept. of Computer Science, January 2006. 719 [Sha05] Shalunov, S., Dunn, L., Gu, Y., Low, S., Rhee, I., Senger, 720 S., Wydrowski, B., and L. Xu, "Design Space for a Bulk 721 Transport Tool", Technical Report , Internet2 Transport 722 Group, May 2005. 724 [Spr00] Spring, N., Chesire, M., Berryman, M., Sahasranaman, V., 725 Anderson, T., and B. Bershad, "Receiver based management 726 of low bandwidth access links", Proceedings of IEEE 727 INFOCOM 2000, pp. 245-254, vol.1, 2000. 729 [Sri08] Sridharan, M., Tan, K., Bansala, D., and D. Thaler, 730 "Compound TCP: A new TCP congestion control for high-speed 731 and long distance networks", Internet Draft 732 draft-sridharan-tcpm-ctcp , work in progress, 733 November 2008. 735 [Tan06] Tan, K., Song, J., Zhang, Q., and M. Sridharan, "A 736 Compound TCP approach for high-speed and long distance 737 networks", Proceedings of IEEE INFOCOM 2006, Barcelona, 738 Spain, April 2008. 740 [Ven02] Venkataramani, A., Kokku, R., and M. Dahlin, "TCP Nice: a 741 mechanism for background transfers", Proceedings of 742 OSDI '02, 2002. 744 [Ven08] Venkataraman, V., Francis, P., Kodialam, M., and T. 745 Lakshman, "A priority-layered approach to transport for 746 high bandwidth-delay product networks", Proceedings of ACM 747 CoNEXT, Madrid, December 2008. 749 [Wan91] Wang, Z. and J. Crowcroft, "A new congestion control 750 scheme: slow start and search (Tri-S)", ACM Computer 751 Communication Review , 21(1):56-71, January 1991. 753 [Wan92] Wang, Z. and J. Crowcroft, "Eliminating periodic packet 754 losses in the 4.3-Tahoe BSD TCP congestion control 755 algorithm", ACM Computer Communication Review , 22(2): 756 9-16, January 1992. 758 [Wei05] Weigle, M., Jeffay, K., and F. Smith, "Delay-based early 759 congestion detection and adaptation in TCP: impact on web 760 performance", Computer Communications 28(8):837-850, 761 May 2005. 763 [Wei06] Wei, D., Jin, C., Low, S., and S. Hegde, "FAST TCP: 764 Motivation, architecture, algorithms, performance", IEEE/ 765 ACM Transactions on Networking , 14(6):1246-1259, 766 December 2006. 768 Authors' Addresses 770 Michael Welzl 771 University of Oslo 772 Department of Informatics, PO Box 1080 Blindern 773 N-0316 Oslo, 774 Norway 776 Phone: +43 512 507 6110 777 Email: michawe@ifi.uio.no 778 David Ros 779 Institut Telecom / Telecom Bretagne 780 Rue de la Chataigneraie, CS 17607 781 35576 Cesson Sevigne cedex, 782 France 784 Phone: +33 2 99 12 70 46 785 Email: david.ros@telecom-bretagne.eu