idnits 2.17.1 draft-welzl-rmcat-coupled-cc-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 18, 2015) is 3235 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTP Media Congestion Avoidance M. Welzl 3 Techniques (rmcat) S. Islam 4 Internet-Draft S. Gjessing 5 Intended status: Experimental University of Oslo 6 Expires: December 20, 2015 June 18, 2015 8 Coupled congestion control for RTP media 9 draft-welzl-rmcat-coupled-cc-05 11 Abstract 13 When multiple congestion controlled RTP sessions traverse the same 14 network bottleneck, it can be beneficial to combine their controls 15 such that the total on-the-wire behavior is improved. This document 16 describes such a method for flows that have the same sender, in a way 17 that is as flexible and simple as possible while minimizing the 18 amount of changes needed to existing RTP applications. It specifies 19 how to apply the method for the NADA congestion control algorithm. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on December 20, 2015. 38 Copyright Notice 40 Copyright (c) 2015 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 3. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 4. Architectural overview . . . . . . . . . . . . . . . . . . . . 5 59 5. Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 5.1. SBD . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 61 5.2. FSE . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 62 5.3. Flows . . . . . . . . . . . . . . . . . . . . . . . . . . 7 63 5.3.1. Example algorithm 1 - Active FSE . . . . . . . . . . . 7 64 5.3.2. Example algorithm 2 - Conservative Active FSE . . . . 8 65 6. Application . . . . . . . . . . . . . . . . . . . . . . . . . 10 66 6.1. NADA . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 67 6.2. General recommendations . . . . . . . . . . . . . . . . . 10 68 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 69 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 70 9. Security Considerations . . . . . . . . . . . . . . . . . . . 11 71 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 72 10.1. Normative References . . . . . . . . . . . . . . . . . . . 11 73 10.2. Informative References . . . . . . . . . . . . . . . . . . 12 74 Appendix A. Example algorithm - Passive FSE . . . . . . . . . . . 12 75 A.1. Example operation (passive) . . . . . . . . . . . . . . . 15 76 Appendix B. Change log . . . . . . . . . . . . . . . . . . . . . 19 77 B.1. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 19 78 B.2. Changes from -01 to -02 . . . . . . . . . . . . . . . . . 19 79 B.3. Changes from -02 to -03 . . . . . . . . . . . . . . . . . 19 80 B.4. Changes from -03 to -04 . . . . . . . . . . . . . . . . . 20 81 B.5. Changes from -04 to -05 . . . . . . . . . . . . . . . . . 20 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 84 1. Introduction 86 When there is enough data to send, a congestion controller must 87 increase its sending rate until the path's capacity has been reached; 88 depending on the controller, sometimes the rate is increased further, 89 until packets are ECN-marked or dropped. This process inevitably 90 creates undesirable queuing delay -- an effect that is amplified when 91 multiple congestion controlled connections traverse the same network 92 bottleneck. When such connections originate from the same host, it 93 would therefore be ideal to use only one single sender-side 94 congestion controller which determines the overall allowed sending 95 rate, and then use a local scheduler to assign a proportion of this 96 rate to each RTP session. This way, priorities could also be 97 implemented quite easily, as a function of the scheduler; honoring 98 user-specified priorities is, for example, required by rtcweb 99 [rtcweb-usecases]. 101 The Congestion Manager (CM) [RFC3124] provides a single congestion 102 controller with a scheduling function just as described above. It is 103 hard to implement because it requires an additional congestion 104 controller and removes all per-connection congestion control 105 functionality, which is quite a significant change to existing RTP 106 based applications. This document presents a method that is easier 107 to implement than the CM and also requires less significant changes 108 to existing RTP based applications. It attempts to roughly 109 approximate the CM behavior by sharing information between existing 110 congestion controllers. 112 2. Definitions 114 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 115 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 116 document are to be interpreted as described in RFC 2119 [RFC2119]. 118 Available Bandwidth: 119 The available bandwidth is the nominal link capacity minus the 120 amount of traffic that traversed the link during a certain time 121 interval, divided by that time interval. 123 Bottleneck: 124 The first link with the smallest available bandwidth along the 125 path between a sender and receiver. 127 Flow: 128 A flow is the entity that congestion control is operating on. 129 It could, for example, be a transport layer connection, an RTP 130 session, or a subsession that is multiplexed onto a single RTP 131 session together with other subsessions. 133 Flow Group Identifier (FGI): 134 A unique identifier for each subset of flows that is limited by 135 a common bottleneck. 137 Flow State Exchange (FSE): 138 The entity that maintains information that is exchanged between 139 flows. 141 Flow Group (FG): 142 A group of flows having the same FGI. 144 Shared Bottleneck Detection (SBD): 145 The entity that determines which flows traverse the same 146 bottleneck in the network, or the process of doing so. 148 3. Limitations 150 Sender-side only: 151 Coupled congestion control as described here only operates 152 inside a single host on the sender side. This is because, 153 irrespective of where the major decisions for congestion 154 control are taken, the sender of a flow needs to eventually 155 decide the transmission rate. Additionally, the necessary 156 information about how much data an application can currently 157 send on a flow is often only available at the sender side, 158 making the sender an obvious choice for placement of the 159 elements and mechanisms described here. 161 Shared bottlenecks do not change quickly: 162 As per the definition above, a bottleneck depends on cross 163 traffic, and since such traffic can heavily fluctuate, 164 bottlenecks can change at a high frequency (e.g., there can be 165 oscillation between two or more links). This means that, when 166 flows are partially routed along different paths, they may 167 quickly change between sharing and not sharing a bottleneck. 168 For simplicity, here it is assumed that a shared bottleneck is 169 valid for a time interval that is significantly longer than the 170 interval at which congestion controllers operate. Note that, 171 for the only SBD mechanism defined in this document 172 (multiplexing on the same five-tuple), the notion of a shared 173 bottleneck stays correct even in the presence of fast traffic 174 fluctuations: since all flows that are assumed to share a 175 bottleneck are routed in the same way, if the bottleneck 176 changes, it will still be shared. 178 4. Architectural overview 180 Figure 1 shows the elements of the architecture for coupled 181 congestion control: the Flow State Exchange (FSE), Shared Bottleneck 182 Detection (SBD) and Flows. The FSE is a storage element that can be 183 implemented in two ways: active and passive. In the active version, 184 it initiates communication with flows and SBD. However, in the 185 passive version, it does not actively initiate communication with 186 flows and SBD; its only active role is internal state maintenance 187 (e.g., an implementation could use soft state to remove a flow's data 188 after long periods of inactivity). Every time a flow's congestion 189 control mechanism would normally update its sending rate, the flow 190 instead updates information in the FSE and performs a query on the 191 FSE, leading to a sending rate that can be different from what the 192 congestion controller originally determined. Using information 193 about/from the currently active flows, SBD updates the FSE with the 194 correct Flow State Identifiers (FSIs). 196 ------- <--- Flow 1 197 | FSE | <--- Flow 2 .. 198 ------- <--- .. Flow N 199 ^ 200 | | 201 ------- | 202 | SBD | <-------| 203 ------- 205 Figure 1: Coupled congestion control architecture 207 Since everything shown in Figure 1 is assumed to operate on a single 208 host (the sender) only, this document only describes aspects that 209 have an influence on the resulting on-the-wire behavior. It does, 210 for instance, not define how many bits must be used to represent 211 FSIs, or in which way the entities communicate. Implementations can 212 take various forms: for instance, all the elements in the figure 213 could be implemented within a single application, thereby operating 214 on flows generated by that application only. Another alternative 215 could be to implement both the FSE and SBD together in a separate 216 process which different applications communicate with via some form 217 of Inter-Process Communication (IPC). Such an implementation would 218 extend the scope to flows generated by multiple applications. The 219 FSE and SBD could also be included in the Operating System kernel. 221 5. Roles 223 This section gives an overview of the roles of the elements of 224 coupled congestion control, and provides an example of how coupled 225 congestion control can operate. 227 5.1. SBD 229 SBD uses knowledge about the flows to determine which flows belong in 230 the same Flow Group (FG), and assigns FGIs accordingly. This 231 knowledge can be derived in three basic ways: 233 1. From multiplexing: it can be based on the simple assumption that 234 packets sharing the same five-tuple (IP source and destination 235 address, protocol, and transport layer port number pair) and 236 having the same Differentiated Services Code Point (DSCP) in the 237 IP header are typically treated in the same way along the path. 238 The latter method is the only one specified in this document: SBD 239 MAY consider all flows that use the same five-tuple and DSCP to 240 belong to the same FG. This classification applies to certain 241 tunnels, or RTP flows that are multiplexed over one transport 242 (cf. [transport-multiplex]). In one way or another, such 243 multiplexing will probably be recommended for use with rtcweb 244 [rtcweb-rtp-usage]. 246 2. Via configuration: e.g. by assuming that a common wireless uplink 247 is also a shared bottleneck. 249 3. From measurements: e.g. by considering correlations among 250 measured delay and loss as an indication of a shared bottleneck. 252 The methods above have some essential trade-offs: e.g., multiplexing 253 is a completely reliable measure, however it is limited in scope to 254 two end points (i.e., it cannot be applied to couple congestion 255 controllers of one sender talking to multiple receivers). A 256 measurement-based SBD mechanism is described in [sbd]. Measurements 257 can never be 100% reliable, in particular because they are based on 258 the past but applying coupled congestion control means to make an 259 assumption about the future; it is therefore recommended to implement 260 cautionary measures, e.g. by disabling coupled congestion control if 261 enabling it causes a significant increase in delay and/or packet 262 loss. Measurements also take time, which entails a certain delay for 263 turning on coupling (refer to [sbd] for details). 265 5.2. FSE 267 The FSE contains a list of all flows that have registered with it. 268 For each flow, it stores the following: 270 o a unique flow number to identify the flow 272 o the FGI of the FG that it belongs to (based on the definitions in 273 this document, a flow has only one bottleneck, and can therefore 274 be in only one FG) 276 o a priority P, which here is assumed to be represented as a 277 floating point number in the range from 0.1 (unimportant) to 1 278 (very important). A negative value is used to indicate that a 279 flow has terminated 281 o The rate used by the flow in bits per second, FSE_R. 283 The FSE can operate on window-based as well as rate-based congestion 284 controllers (TEMPORARY NOTE: and probably -- not yet tested -- 285 combinations thereof, with calculations to convert from one to the 286 other). In case of a window-based controller, FSE_R is a window, and 287 all the text below should be considered to refer to window, not 288 rates. 290 In the FSE, each FG contains one static variable S_CR which is meant 291 to be the sum of the calculated rates of all flows in the same FG 292 (including the flow itself). This value is used to calculate the 293 sending rate. 295 The information listed here is enough to implement the sample flow 296 algorithm given below. FSE implementations could easily be extended 297 to store, e.g., a flow's current sending rate for statistics 298 gathering or future potential optimizations. 300 5.3. Flows 302 Flows register themselves with SBD and FSE when they start, 303 deregister from the FSE when they stop, and carry out an UPDATE 304 function call every time their congestion controller calculates a new 305 sending rate. Via UPDATE, they provide the newly calculated rate and 306 optionally (if the algorithm supports it) the desired rate. The 307 desired rate is less than the calculated rate in case of application- 308 limited flows; otherwise, it is the same as the calculated rate. 310 Below, two example algorithms are described. While other algorithms 311 could be used instead, the same algorithm must be applied to all 312 flows. 314 5.3.1. Example algorithm 1 - Active FSE 316 This algorithm was designed to be the simplest possible method to 317 assign rates according to the priorities of flows. Simulations 318 results in [fse] indicate that it does however not significantly 319 reduce queuing delay and packet loss. 321 (1) When a flow f starts, it registers itself with SBD and the FSE. 322 FSE_R is initialized with the congestion controller's initial 323 rate. SBD will assign the correct FGI. When a flow is assigned 324 an FGI, it adds its FSE_R to S_CR. 326 (2) When a flow f stops, its entry is removed from the list. 328 (3) Every time the congestion controller of the flow f determines a 329 new sending rate CC_R, the flow calls UPDATE, which carries out 330 the tasks listed below to derive the new sending rates for all 331 the flows in the FG. A flow's UPDATE function uses a local 332 (i.e. per-flow) temporary variable S_P, which is the sum of all 333 the priorities. 335 (a) It updates S_CR. 337 S_CR = S_CR + CC_R - FSE_R(f) 339 (b) It calculates the sum of all the priorities, S_P. 341 S_P = 0 342 for all flows i in FG do 343 S_P = S_P + P(i) 344 end for 346 (c) It calculates the sending rates for all the flows in an FG 347 and distributes them. 349 for all flows i in FG do 350 FSE_R(i) = (P(i)*S_CR)/S_P 351 send FSE_R(i) to the flow i 352 end for 354 5.3.2. Example algorithm 2 - Conservative Active FSE 356 This algorithm extends algorithm 1 to conservatively emulate the 357 behavior of a single flow by proportionally reducing the aggregate 358 rate on congestion. Simulations results in [fse] indicate that it 359 can significantly reduce queuing delay and packet loss. 361 (1) When a flow f starts, it registers itself with SBD and the FSE. 362 FSE_R is initialized with the congestion controller's initial 363 rate. SBD will assign the correct FGI. When a flow is assigned 364 an FGI, it adds its FSE_R to S_CR. 366 (2) When a flow f stops, its entry is removed from the list. 368 (3) Every time the congestion controller of the flow f determines a 369 new sending rate CC_R, the flow calls UPDATE, which carries out 370 the tasks listed below to derive the new sending rates for all 371 the flows in the FG. A flow's UPDATE function uses a local 372 (i.e. per-flow) temporary variable S_P, which is the sum of all 373 the priorities, and a local variable DELTA, which is used to 374 calculate the difference between CC_R and the previously stored 375 FSE_R. To prevent flows from either ignoring congestion or 376 overreacting, a timer keeps them from changing their rates 377 immediately after the common rate reduction that follows a 378 congestion event. This timer is set to 2 RTTs of the flow that 379 experienced congestion because it is assumed that a congestion 380 event can persist for up to one RTT of that flow, with another 381 RTT added to compensate for fluctuations in the measured RTT 382 value. 384 (a) It updates S_CR based on DELTA. 386 if Timer has expired or not set then 387 DELTA = CC_R - FSE_R(f) 388 if DELTA < 0 then // Reduce S_CR proportionally 389 S_CR = S_CR * CC_R / FSE_R(f) 390 Set Timer for 2 RTTs 391 else 392 S_CR = S_CR + DELTA 393 end if 394 end if 396 (b) It calculates the sum of all the priorities, S_P. 398 S_P = 0 399 for all flows i in FG do 400 S_P = S_P + P(i) 401 end for 403 (c) It calculates the sending rates for all the flows in an FG 404 and distributes them. 406 for all flows i in FG do 407 FSE_R(i) = (P(i)*S_CR)/S_P 408 send FSE_R(i) to the flow i 409 end for 411 6. Application 413 This section specifies how the FSE can be applied to specific 414 congestion control mechanisms and makes general recommendations that 415 facilitate applying the FSE to future congestion controls. 417 6.1. NADA 419 Network-Assisted Dynamic Adapation (NADA) [nada] is a congestion 420 control scheme for rtcweb. It calculates a reference rate R_n upon 421 receiving an acknowledgment, and then, based on the reference rate, 422 it calculates a video target rate R_v and a sending rate for the 423 flows, R_s. 425 When applying the FSE to NADA, the UPDATE function call described in 426 Section 5.3 gives the FSE NADA's reference rate R_n. The recommended 427 algorithm for NADA is the Active FSE in Section 5.3.1. In step 3 428 (c), when the FSE_R(i) is "sent" to the flow i, this means updating 429 R_v and R_s of flow i with the value of FSE_R(i). 431 NADA simulation results are available from 432 http://heim.ifi.uio.no/safiquli/coupled-cc/. The next version of 433 this document will refer to a technical report that will be made 434 available at the same URL. 436 6.2. General recommendations 438 This section will provides general advice for applying the FSE to 439 congestion control mechanisms. TEMPORARY NOTE: Future versions of 440 this document will contain a longer list. 442 Receiver-side calculations: 443 When receiver-side calculations make assumptions about the rate 444 of the sender, the calculations need to be synchronized or the 445 receiver needs to be updated accordingly. This applies to TFRC 446 [RFC5348], for example, where simulations showed somewhat less 447 favorable results when using the FSE without a receiver-side 448 change [fse]. 450 7. Acknowledgements 452 This document has benefitted from discussions with and feedback from 453 David Hayes, Mirja Kuehlewind, Andreas Petlund, David Ros (who also 454 gave the FSE its name), Zaheduzzaman Sarker and Varun Singh. The 455 authors would like to thank Xiaoqing Zhu for helping with NADA. 457 This work was partially funded by the European Community under its 458 Seventh Framework Programme through the Reducing Internet Transport 459 Latency (RITE) project (ICT-317700). 461 8. IANA Considerations 463 This memo includes no request to IANA. 465 9. Security Considerations 467 In scenarios where the architecture described in this document is 468 applied across applications, various cheating possibilities arise: 469 e.g., supporting wrong values for the calculated rate, the desired 470 rate, or the priority of a flow. In the worst case, such cheating 471 could either prevent other flows from sending or make them send at a 472 rate that is unreasonably large. The end result would be unfair 473 behavior at the network bottleneck, akin to what could be achieved 474 with any UDP based application. Hence, since this is no worse than 475 UDP in general, there seems to be no significant harm in using this 476 in the absence of UDP rate limiters. 478 In the case of a single-user system, it should also be in the 479 interest of any application programmer to give the user the best 480 possible experience by using reasonable flow priorities or even 481 letting the user choose them. In a multi-user system, this interest 482 may not be given, and one could imagine the worst case of an "arms 483 race" situation, where applications end up setting their priorities 484 to the maximum value. If all applications do this, the end result is 485 a fair allocation in which the priority mechanism is implicitly 486 eliminated, and no major harm is done. 488 10. References 490 10.1. Normative References 492 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 493 Requirement Levels", BCP 14, RFC 2119, March 1997. 495 [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", 496 RFC 3124, June 2001. 498 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 499 Friendly Rate Control (TFRC): Protocol Specification", 500 RFC 5348, September 2008. 502 10.2. Informative References 504 [fse] Islam, S., Welzl, M., Gjessing, S., and N. Khademi, 505 "Coupled Congestion Control for RTP Media", ACM SIGCOMM 506 Capacity Sharing Workshop (CSWS 2014); extended version 507 available as a technical report from 508 http://safiquli.at.ifi.uio.no/paper/fse-tech-report.pdf , 509 2014. 511 [nada] Zhu, X., Pan, R., Ramalho, M., Mena, S., Ganzhorn, C., 512 Jones, P., and S. De Aronco, "NADA: A Unified Congestion 513 Control Scheme for Real-Time Media", 514 draft-ietf-rmcat-nada-00 (work in progress), April 2015. 516 [rtcweb-rtp-usage] 517 Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time 518 Communication (WebRTC): Media Transport and Use of RTP", 519 draft-ietf-rtcweb-rtp-usage-18.txt (work in progress), 520 October 2014. 522 [rtcweb-usecases] 523 Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- 524 Time Communication Use-cases and Requirements", 525 draft-ietf-rtcweb-use-cases-and-requirements-14.txt (work 526 in progress), February 2014. 528 [sbd] Hayes, D., Ferlin, S., and M. Welzl, "Shared Bottleneck 529 Detection for Coupled Congestion Control for RTP Media", 530 draft-ietf-rmcat-sbd-00.txt (work in progress), May 2015. 532 [transport-multiplex] 533 Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a 534 Single Lower-Layer Transport", 535 draft-westerlund-avtcore-transport-multiplexing-07.txt 536 (work in progress), October 2013. 538 Appendix A. Example algorithm - Passive FSE 540 Active algorithms calculate the rates for all the flows in the FG and 541 actively distribute them. In a passive algorithm, UPDATE returns a 542 rate that should be used instead of the rate that the congestion 543 controller has determined. This can make a passive algorithm easier 544 to implement; however, when round-trip times of flows are unequal, 545 shorter-RTT flows will update and react to the overall FSE state more 546 often than longer-RTT flows, which can produce unwanted side effects. 547 This problem is more significant when the congestion control 548 convergence depends on the RTT. While the passive algorithm works 549 better for congestion controls with RTT-independent convergence, it 550 can still produce oscillations on short time scales. The algorithm 551 described below is therefore considered as highly experimental. 553 This passive version of the FSE stores the following information in 554 addition to the variables described in Section 5.2: 556 o The desired rate DR. This can be smaller than the calculated rate 557 if the application feeding into the flow has less data to send 558 than the congestion controller would allow. In case of a bulk 559 transfer, DR must be set to CC_R received from the flow's 560 congestion module. 562 The passive version of the FSE contains one static variable per FG 563 called TLO (Total Leftover Rate -- used to let a flow 'take' 564 bandwidth from application-limited or terminated flows) which is 565 initialized to 0. For the passive version, S_CR is limited to 566 increase or decrease as conservatively as a flow's congestion 567 controller decides in order to prohibit sudden rate jumps. 569 (1) When a flow f starts, it registers itself with SBD and the FSE. 570 FSE_R and DR are initialized with the congestion controller's 571 initial rate. SBD will assign the correct FGI. When a flow is 572 assigned an FGI, it adds its FSE_R to S_CR. 574 (2) When a flow f stops, it sets its DR to 0 and sets P to -1. 576 (3) Every time the congestion controller of the flow f determines a 577 new sending rate CC_R, assuming the flow's new desired rate 578 new_DR to be "infinity" in case of a bulk data transfer with an 579 unknown maximum rate, the flow calls UPDATE, which carries out 580 the tasks listed below to derive the flow's new sending rate, 581 Rate. A flow's UPDATE function uses a few local (i.e. per-flow) 582 temporary variables, which are all initialized to 0: DELTA, 583 new_S_CR and S_P. 585 (a) For all the flows in its FG (including itself), it 586 calculates the sum of all the calculated rates, new_S_CR. 587 Then it calculates the difference between FSE_R(f) and 588 CC_R, DELTA. 590 for all flows i in FG do 591 new_S_CR = new_S_CR + FSE_R(i) 592 end for 593 DELTA = CC_R - FSE_R(f) 595 (b) It updates S_CR, FSE_R(f) and DR(f). 597 FSE_R(f) = CC_R 598 if DELTA > 0 then // the flow's rate has increased 599 S_CR = S_CR + DELTA 600 else if DELTA < 0 then 601 S_CR = new_S_CR + DELTA 602 end if 603 DR(f) = min(new_DR,FSE_R(f)) 605 (c) It calculates the leftover rate TLO, removes the terminated 606 flows from the FSE and calculates the sum of all the 607 priorities, S_P. 609 for all flows i in FG do 610 if P(i)<0 then 611 delete flow 612 else 613 S_P = S_P + P(i) 614 end if 615 end for 616 if DR(f) < FSE_R(f) then 617 TLO = TLO + (P(f)/S_P) * S_CR - DR(f)) 618 end if 620 (d) It calculates the sending rate, Rate. 622 Rate = min(new_DR, (P(f)*S_CR)/S_P + TLO) 624 if Rate != new_DR and TLO > 0 then 625 TLO = 0 // f has 'taken' TLO 626 end if 628 (e) It updates DR(f) and FSE_R(f) with Rate. 630 if Rate > DR(f) then 631 DR(f) = Rate 632 end if 633 FSE_R(f) = Rate 635 The goals of the flow algorithm are to achieve prioritization, 636 improve network utilization in the face of application-limited flows, 637 and impose limits on the increase behavior such that the negative 638 impact of multiple flows trying to increase their rate together is 639 minimized. It does that by assigning a flow a sending rate that may 640 not be what the flow's congestion controller expected. It therefore 641 builds on the assumption that no significant inefficiencies arise 642 from temporary application-limited behavior or from quickly jumping 643 to a rate that is higher than the congestion controller intended. 644 How problematic these issues really are depends on the controllers in 645 use and requires careful per-controller experimentation. The coupled 646 congestion control mechanism described here also does not require all 647 controllers to be equal; effects of heterogeneous controllers, or 648 homogeneous controllers being in different states, are also subject 649 to experimentation. 651 This algorithm gives all the leftover rate of application-limited 652 flows to the first flow that updates its sending rate, provided that 653 this flow needs it all (otherwise, its own leftover rate can be taken 654 by the next flow that updates its rate). Other policies could be 655 applied, e.g. to divide the leftover rate of a flow equally among all 656 other flows in the FGI. 658 A.1. Example operation (passive) 660 In order to illustrate the operation of the passive coupled 661 congestion control algorithm, this section presents a toy example of 662 two flows that use it. Let us assume that both flows traverse a 663 common 10 Mbit/s bottleneck and use a simplistic congestion 664 controller that starts out with 1 Mbit/s, increases its rate by 1 665 Mbit/s in the absence of congestion and decreases it by 2 Mbit/s in 666 the presence of congestion. For simplicity, flows are assumed to 667 always operate in a round-robin fashion. Rate numbers below without 668 units are assumed to be in Mbit/s. For illustration purposes, the 669 actual sending rate is also shown for every flow in FSE diagrams even 670 though it is not really stored in the FSE. 672 Flow #1 begins. It is a bulk data transfer and considers itself to 673 have top priority. This is the FSE after the flow algorithm's step 674 1: 676 ---------------------------------------- 677 | # | FGI | P | FSE_R | DR | Rate | 678 | | | | | | | 679 | 1 | 1 | 1 | 1 | 1 | 1 | 680 ---------------------------------------- 681 S_CR = 1, TLO = 0 682 Its congestion controller gradually increases its rate. Eventually, 683 at some point, the FSE should look like this: 685 -------------------------------------- 686 | # | FGI | P | FSE_R | DR | Rate | 687 | | | | | | | 688 | 1 | 1 | 1 | 10 | 10 | 10 | 689 ----------------------------------------- 690 S_CR = 10, TLO = 0 692 Now another flow joins. It is also a bulk data transfer, and has a 693 lower priority (0.5): 695 ---------------------------------------- 696 | # | FGI | P | FSE_R | DR | Rate | 697 | | | | | | | 698 | 1 | 1 | 1 | 10 | 10 | 10 | 699 | 2 | 1 | 0.5 | 1 | 1 | 1 | 700 ------------------------------------------ 701 S_CR = 11, TLO = 0 703 Now assume that the first flow updates its rate to 8, because the 704 total sending rate of 11 exceeds the total capacity. Let us take a 705 closer look at what happens in step 3 of the flow algorithm. 707 CC_R = 8. new_DR = infinity. 708 3 a) new_S_CR = 11; DELTA = 8 - 10 = -2. 709 3 b) FSE_Rf) = 8. DELTA is negative, hence S_CR = 9; 710 DR(f) = 8. 711 3 c) S_P = 1.5. 712 3 d) new sending rate = min(infinity, 1/1.5 * 9 + 0) = 6. 713 3 e) FSE_R(f) = 6. 715 The resulting FSE looks as follows: 716 ---------------------------------------- 717 | # | FGI | P | FSE_R | DR | Rate | 718 | | | | | | | 719 | 1 | 1 | 1 | 6 | 8 | 6 | 720 | 2 | 1 | 0.5 | 1 | 1 | 1 | 721 ------------------------------------------- 722 S_CR = 9, TLO = 0 723 The effect is that flow #1 is sending with 6 Mbit/s instead of the 8 724 Mbit/s that the congestion controller derived. Let us now assume 725 that flow #2 updates its rate. Its congestion controller detects 726 that the network is not fully saturated (the actual total sending 727 rate is 6+1=7) and increases its rate. 729 CC_R=2. new_DR = infinity. 730 3 a) new_S_CR = 7; DELTA = 2 - 1 = 1. 731 3 b) FSE_R(f) = 2. DELTA is positive, hence S_CR = 9 + 1 = 10; 732 DR(f) = 2. 733 3 c) S_P = 1.5. 734 3 d) new sending rate = min(infinity, 0.5/1.5 * 10 + 0) = 3.33. 735 3 e) DR(f) = FSE_R(f) = 3.33. 737 The resulting FSE looks as follows: 738 ------------------------------------------- 739 | # | FGI | P | FSE_R | DR | Rate | 740 | | | | | | | 741 | 1 | 1 | 1 | 6 | 8 | 6 | 742 | 2 | 1 | 0.5 | 3.33 | 3.33 | 3.33 | 743 ------------------------------------------- 744 S_CR = 10, TLO = 0 746 The effect is that flow #2 is now sending with 3.33 Mbit/s, which is 747 close to half of the rate of flow #1 and leads to a total utilization 748 of 6(#1) + 3.33(#2) = 9.33 Mbit/s. Flow #2's congestion controller 749 has increased its rate faster than the controller actually expected. 750 Now, flow #1 updates its rate. Its congestion controller detects 751 that the network is not fully saturated and increases its rate. 752 Additionally, the application feeding into flow #1 limits the flow's 753 sending rate to at most 2 Mbit/s. 755 CC_R=7. new_DR=2. 756 3 a) new_S_CR = 9.33; DELTA = 1. 757 3 b) FSE_R(f) = 7, DELTA is positive, hence S_CR = 10 + 1 = 11; 758 DR = min(2, 7) = 2. 759 3 c) S_P = 1.5; DR(f) < FSE_R(f), hence TLO = 1/1.5 * 11 - 2 = 5.33. 760 3 d) new sending rate = min(2, 1/1.5 * 11 + 5.33) = 2. 761 3 e) FSE_R(f) = 2. 763 The resulting FSE looks as follows: 764 ------------------------------------------- 765 | # | FGI | P | FSE_R | DR | Rate | 766 | | | | | | | 767 | 1 | 1 | 1 | 2 | 2 | 2 | 768 | 2 | 1 | 0.5 | 3.33 | 3.33 | 3.33 | 769 ------------------------------------------- 770 S_CR = 11, TLO = 5.33 772 Now, the total rate of the two flows is 2 + 3.33 = 5.33 Mbit/s, i.e. 773 the network is significantly underutilized due to the limitation of 774 flow #1. Flow #2 updates its rate. Its congestion controller 775 detects that the network is not fully saturated and increases its 776 rate. 778 CC_R=4.33. new_DR = infinity. 779 3 a) new_S_CR = 5.33; DELTA = 1. 780 3 b) FSE_R(f) = 4.33. DELTA is positive, hence S_CR = 12; 781 DR(f) = 4.33. 782 3 c) S_P = 1.5. 783 3 d) new sending rate: min(infinity, 0.5/1.5 * 12 + 5.33 ) = 9.33. 784 3 e) FSE_R(f) = 9.33, DR(f) = 9.33. 786 The resulting FSE looks as follows: 787 ------------------------------------------- 788 | # | FGI | P | FSE_R | DR | Rate | 789 | | | | | | | 790 | 1 | 1 | 1 | 2 | 2 | 2 | 791 | 2 | 1 | 0.5 | 9.33 | 9.33 | 9.33 | 792 ------------------------------------------- 793 S_CR = 12, TLO = 0 795 Now, the total rate of the two flows is 2 + 9.33 = 11.33 Mbit/s. 796 Finally, flow #1 terminates. It sets P to -1 and DR to 0. Let us 797 assume that it terminated late enough for flow #2 to still experience 798 the network in a congested state, i.e. flow #2 decreases its rate in 799 the next iteration. 801 CC_R = 7.33. new_DR = infinity. 802 3 a) new_S_CR = 11.33; DELTA = -2. 803 3 b) FSE_R(f) = 7.33. DELTA is negative, hence S_CR = 9.33; 804 DR(f) = 7.33. 805 3 c) Flow 1 has P = -1, hence it is deleted from the FSE. 806 S_P = 0.5. 807 3 d) new sending rate: min(infinity, 0.5/0.5*9.33 + 0) = 9.33. 808 3 e) FSE_R(f) = DR(f) = 9.33. 810 The resulting FSE looks as follows: 811 ------------------------------------------- 812 | # | FGI | P | FSE_R | DR | Rate | 813 | | | | | | | 814 | 2 | 1 | 0.5 | 9.33 | 9.33 | 9.33 | 815 ------------------------------------------- 816 S_CR = 9.33, TLO = 0 818 Appendix B. Change log 820 B.1. Changes from -00 to -01 822 o Added change log. 824 o Updated the example algorithm and its operation. 826 B.2. Changes from -01 to -02 828 o Included an active version of the algorithm which is simpler. 830 o Replaced "greedy flow" with "bulk data transfer" and "non-greedy" 831 with "application-limited". 833 o Updated new_CR to CC_R, and CR to FSE_R for better understanding. 835 B.3. Changes from -02 to -03 837 o Included an active conservative version of the algorithm which 838 reduces queue growth and packet loss; added a reference to a 839 technical report that shows these benefits with simulations. 841 o Moved the passive variant of the algorithm to appendix. 843 B.4. Changes from -03 to -04 845 o Extended SBD section. 847 o Added a note about window-based controllers. 849 B.5. Changes from -04 to -05 851 o Added a section about applying the FSE to specific congestion 852 control algorithms, with a subsection specifying its use with 853 NADA. 855 Authors' Addresses 857 Michael Welzl 858 University of Oslo 859 PO Box 1080 Blindern 860 Oslo, N-0316 861 Norway 863 Phone: +47 22 85 24 20 864 Email: michawe@ifi.uio.no 866 Safiqul Islam 867 University of Oslo 868 PO Box 1080 Blindern 869 Oslo, N-0316 870 Norway 872 Phone: +47 22 84 08 37 873 Email: safiquli@ifi.uio.no 875 Stein Gjessing 876 University of Oslo 877 PO Box 1080 Blindern 878 Oslo, N-0316 879 Norway 881 Phone: +47 22 85 24 44 882 Email: steing@ifi.uio.no