idnits 2.17.1 draft-ietf-rmcat-coupled-cc-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 14, 2015) is 3146 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTP Media Congestion Avoidance S. Islam 3 Techniques (rmcat) M. Welzl 4 Internet-Draft S. Gjessing 5 Intended status: Experimental University of Oslo 6 Expires: March 17, 2016 September 14, 2015 8 Coupled congestion control for RTP media 9 draft-ietf-rmcat-coupled-cc-00 11 Abstract 13 When multiple congestion controlled RTP sessions traverse the same 14 network bottleneck, it can be beneficial to combine their controls 15 such that the total on-the-wire behavior is improved. This document 16 describes such a method for flows that have the same sender, in a way 17 that is as flexible and simple as possible while minimizing the 18 amount of changes needed to existing RTP applications. It specifies 19 how to apply the method for the NADA congestion control algorithm. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on March 17, 2016. 38 Copyright Notice 40 Copyright (c) 2015 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 3. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 4. Architectural overview . . . . . . . . . . . . . . . . . . . . 5 59 5. Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 5.1. SBD . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 61 5.2. FSE . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 62 5.3. Flows . . . . . . . . . . . . . . . . . . . . . . . . . . 7 63 5.3.1. Example algorithm 1 - Active FSE . . . . . . . . . . . 7 64 5.3.2. Example algorithm 2 - Conservative Active FSE . . . . 8 65 6. Application . . . . . . . . . . . . . . . . . . . . . . . . . 10 66 6.1. NADA . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 67 6.2. General recommendations . . . . . . . . . . . . . . . . . 10 68 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 69 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 70 9. Security Considerations . . . . . . . . . . . . . . . . . . . 11 71 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 72 10.1. Normative References . . . . . . . . . . . . . . . . . . . 11 73 10.2. Informative References . . . . . . . . . . . . . . . . . . 12 74 Appendix A. Scheduling . . . . . . . . . . . . . . . . . . . . . 13 75 Appendix B. Example algorithm - Passive FSE . . . . . . . . . . . 13 76 B.1. Example operation (passive) . . . . . . . . . . . . . . . 15 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 79 1. Introduction 81 When there is enough data to send, a congestion controller must 82 increase its sending rate until the path's capacity has been reached; 83 depending on the controller, sometimes the rate is increased further, 84 until packets are ECN-marked or dropped. This process inevitably 85 creates undesirable queuing delay -- an effect that is amplified when 86 multiple congestion controlled connections traverse the same network 87 bottleneck. 89 The Congestion Manager (CM) [RFC3124] couples flows by providing a 90 single congestion controller. It is hard to implement because it 91 requires an additional congestion controller and removes all per- 92 connection congestion control functionality, which is quite a 93 significant change to existing RTP based applications. This document 94 presents a method to combine the behavior of congestion control 95 mechanisms that is easier to implement than the Congestion Manager 96 [RFC3124] and also requires less significant changes to existing RTP 97 based applications. It attempts to roughly approximate the CM 98 behavior by sharing information between existing congestion 99 controllers. It is able to honor user-specified priorities, which is 100 required by rtcweb [rtcweb-usecases]. 102 2. Definitions 104 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 105 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 106 document are to be interpreted as described in RFC 2119 [RFC2119]. 108 Available Bandwidth: 109 The available bandwidth is the nominal link capacity minus the 110 amount of traffic that traversed the link during a certain time 111 interval, divided by that time interval. 113 Bottleneck: 114 The first link with the smallest available bandwidth along the 115 path between a sender and receiver. 117 Flow: 118 A flow is the entity that congestion control is operating on. 119 It could, for example, be a transport layer connection, an RTP 120 session, or a subsession that is multiplexed onto a single RTP 121 session together with other subsessions. 123 Flow Group Identifier (FGI): 124 A unique identifier for each subset of flows that is limited by 125 a common bottleneck. 127 Flow State Exchange (FSE): 128 The entity that maintains information that is exchanged between 129 flows. 131 Flow Group (FG): 132 A group of flows having the same FGI. 134 Shared Bottleneck Detection (SBD): 135 The entity that determines which flows traverse the same 136 bottleneck in the network, or the process of doing so. 138 3. Limitations 140 Sender-side only: 141 Coupled congestion control as described here only operates 142 inside a single host on the sender side. This is because, 143 irrespective of where the major decisions for congestion 144 control are taken, the sender of a flow needs to eventually 145 decide the transmission rate. Additionally, the necessary 146 information about how much data an application can currently 147 send on a flow is often only available at the sender side, 148 making the sender an obvious choice for placement of the 149 elements and mechanisms described here. 151 Shared bottlenecks do not change quickly: 152 As per the definition above, a bottleneck depends on cross 153 traffic, and since such traffic can heavily fluctuate, 154 bottlenecks can change at a high frequency (e.g., there can be 155 oscillation between two or more links). This means that, when 156 flows are partially routed along different paths, they may 157 quickly change between sharing and not sharing a bottleneck. 158 For simplicity, here it is assumed that a shared bottleneck is 159 valid for a time interval that is significantly longer than the 160 interval at which congestion controllers operate. Note that, 161 for the only SBD mechanism defined in this document 162 (multiplexing on the same five-tuple), the notion of a shared 163 bottleneck stays correct even in the presence of fast traffic 164 fluctuations: since all flows that are assumed to share a 165 bottleneck are routed in the same way, if the bottleneck 166 changes, it will still be shared. 168 4. Architectural overview 170 Figure 1 shows the elements of the architecture for coupled 171 congestion control: the Flow State Exchange (FSE), Shared Bottleneck 172 Detection (SBD) and Flows. The FSE is a storage element that can be 173 implemented in two ways: active and passive. In the active version, 174 it initiates communication with flows and SBD. However, in the 175 passive version, it does not actively initiate communication with 176 flows and SBD; its only active role is internal state maintenance 177 (e.g., an implementation could use soft state to remove a flow's data 178 after long periods of inactivity). Every time a flow's congestion 179 control mechanism would normally update its sending rate, the flow 180 instead updates information in the FSE and performs a query on the 181 FSE, leading to a sending rate that can be different from what the 182 congestion controller originally determined. Using information 183 about/from the currently active flows, SBD updates the FSE with the 184 correct Flow State Identifiers (FSIs). 186 ------- <--- Flow 1 187 | FSE | <--- Flow 2 .. 188 ------- <--- .. Flow N 189 ^ 190 | | 191 ------- | 192 | SBD | <-------| 193 ------- 195 Figure 1: Coupled congestion control architecture 197 Since everything shown in Figure 1 is assumed to operate on a single 198 host (the sender) only, this document only describes aspects that 199 have an influence on the resulting on-the-wire behavior. It does, 200 for instance, not define how many bits must be used to represent 201 FSIs, or in which way the entities communicate. Implementations can 202 take various forms: for instance, all the elements in the figure 203 could be implemented within a single application, thereby operating 204 on flows generated by that application only. Another alternative 205 could be to implement both the FSE and SBD together in a separate 206 process which different applications communicate with via some form 207 of Inter-Process Communication (IPC). Such an implementation would 208 extend the scope to flows generated by multiple applications. The 209 FSE and SBD could also be included in the Operating System kernel. 211 5. Roles 213 This section gives an overview of the roles of the elements of 214 coupled congestion control, and provides an example of how coupled 215 congestion control can operate. 217 5.1. SBD 219 SBD uses knowledge about the flows to determine which flows belong in 220 the same Flow Group (FG), and assigns FGIs accordingly. This 221 knowledge can be derived in three basic ways: 223 1. From multiplexing: it can be based on the simple assumption that 224 packets sharing the same five-tuple (IP source and destination 225 address, protocol, and transport layer port number pair) and 226 having the same Differentiated Services Code Point (DSCP) in the 227 IP header are typically treated in the same way along the path. 228 The latter method is the only one specified in this document: SBD 229 MAY consider all flows that use the same five-tuple and DSCP to 230 belong to the same FG. This classification applies to certain 231 tunnels, or RTP flows that are multiplexed over one transport 232 (cf. [transport-multiplex]). In one way or another, such 233 multiplexing will probably be recommended for use with rtcweb 234 [rtcweb-rtp-usage]. 236 2. Via configuration: e.g. by assuming that a common wireless uplink 237 is also a shared bottleneck. 239 3. From measurements: e.g. by considering correlations among 240 measured delay and loss as an indication of a shared bottleneck. 242 The methods above have some essential trade-offs: e.g., multiplexing 243 is a completely reliable measure, however it is limited in scope to 244 two end points (i.e., it cannot be applied to couple congestion 245 controllers of one sender talking to multiple receivers). A 246 measurement-based SBD mechanism is described in [sbd]. Measurements 247 can never be 100% reliable, in particular because they are based on 248 the past but applying coupled congestion control means to make an 249 assumption about the future; it is therefore recommended to implement 250 cautionary measures, e.g. by disabling coupled congestion control if 251 enabling it causes a significant increase in delay and/or packet 252 loss. Measurements also take time, which entails a certain delay for 253 turning on coupling (refer to [sbd] for details). 255 5.2. FSE 257 The FSE contains a list of all flows that have registered with it. 258 For each flow, it stores the following: 260 o a unique flow number to identify the flow 262 o the FGI of the FG that it belongs to (based on the definitions in 263 this document, a flow has only one bottleneck, and can therefore 264 be in only one FG) 266 o a priority P, which here is assumed to be represented as a 267 floating point number in the range from 0.1 (unimportant) to 1 268 (very important). A negative value is used to indicate that a 269 flow has terminated 271 o The rate used by the flow in bits per second, FSE_R. 273 The FSE can operate on window-based as well as rate-based congestion 274 controllers (TEMPORARY NOTE: and probably -- not yet tested -- 275 combinations thereof, with calculations to convert from one to the 276 other). In case of a window-based controller, FSE_R is a window, and 277 all the text below should be considered to refer to window, not 278 rates. 280 In the FSE, each FG contains one static variable S_CR which is meant 281 to be the sum of the calculated rates of all flows in the same FG 282 (including the flow itself). This value is used to calculate the 283 sending rate. 285 The information listed here is enough to implement the sample flow 286 algorithm given below. FSE implementations could easily be extended 287 to store, e.g., a flow's current sending rate for statistics 288 gathering or future potential optimizations. 290 5.3. Flows 292 Flows register themselves with SBD and FSE when they start, 293 deregister from the FSE when they stop, and carry out an UPDATE 294 function call every time their congestion controller calculates a new 295 sending rate. Via UPDATE, they provide the newly calculated rate and 296 optionally (if the algorithm supports it) the desired rate. The 297 desired rate is less than the calculated rate in case of application- 298 limited flows; otherwise, it is the same as the calculated rate. 300 Below, two example algorithms are described. While other algorithms 301 could be used instead, the same algorithm must be applied to all 302 flows. 304 5.3.1. Example algorithm 1 - Active FSE 306 This algorithm was designed to be the simplest possible method to 307 assign rates according to the priorities of flows. Simulations 308 results in [fse] indicate that it does however not significantly 309 reduce queuing delay and packet loss. 311 (1) When a flow f starts, it registers itself with SBD and the FSE. 312 FSE_R is initialized with the congestion controller's initial 313 rate. SBD will assign the correct FGI. When a flow is assigned 314 an FGI, it adds its FSE_R to S_CR. 316 (2) When a flow f stops, its entry is removed from the list. 318 (3) Every time the congestion controller of the flow f determines a 319 new sending rate CC_R, the flow calls UPDATE, which carries out 320 the tasks listed below to derive the new sending rates for all 321 the flows in the FG. A flow's UPDATE function uses a local 322 (i.e. per-flow) temporary variable S_P, which is the sum of all 323 the priorities. 325 (a) It updates S_CR. 327 S_CR = S_CR + CC_R - FSE_R(f) 329 (b) It calculates the sum of all the priorities, S_P. 331 S_P = 0 332 for all flows i in FG do 333 S_P = S_P + P(i) 334 end for 336 (c) It calculates the sending rates for all the flows in an FG 337 and distributes them. 339 for all flows i in FG do 340 FSE_R(i) = (P(i)*S_CR)/S_P 341 send FSE_R(i) to the flow i 342 end for 344 5.3.2. Example algorithm 2 - Conservative Active FSE 346 This algorithm extends algorithm 1 to conservatively emulate the 347 behavior of a single flow by proportionally reducing the aggregate 348 rate on congestion. Simulations results in [fse] indicate that it 349 can significantly reduce queuing delay and packet loss. 351 (1) When a flow f starts, it registers itself with SBD and the FSE. 352 FSE_R is initialized with the congestion controller's initial 353 rate. SBD will assign the correct FGI. When a flow is assigned 354 an FGI, it adds its FSE_R to S_CR. 356 (2) When a flow f stops, its entry is removed from the list. 358 (3) Every time the congestion controller of the flow f determines a 359 new sending rate CC_R, the flow calls UPDATE, which carries out 360 the tasks listed below to derive the new sending rates for all 361 the flows in the FG. A flow's UPDATE function uses a local 362 (i.e. per-flow) temporary variable S_P, which is the sum of all 363 the priorities, and a local variable DELTA, which is used to 364 calculate the difference between CC_R and the previously stored 365 FSE_R. To prevent flows from either ignoring congestion or 366 overreacting, a timer keeps them from changing their rates 367 immediately after the common rate reduction that follows a 368 congestion event. This timer is set to 2 RTTs of the flow that 369 experienced congestion because it is assumed that a congestion 370 event can persist for up to one RTT of that flow, with another 371 RTT added to compensate for fluctuations in the measured RTT 372 value. 374 (a) It updates S_CR based on DELTA. 376 if Timer has expired or not set then 377 DELTA = CC_R - FSE_R(f) 378 if DELTA < 0 then // Reduce S_CR proportionally 379 S_CR = S_CR * CC_R / FSE_R(f) 380 Set Timer for 2 RTTs 381 else 382 S_CR = S_CR + DELTA 383 end if 384 end if 386 (b) It calculates the sum of all the priorities, S_P. 388 S_P = 0 389 for all flows i in FG do 390 S_P = S_P + P(i) 391 end for 393 (c) It calculates the sending rates for all the flows in an FG 394 and distributes them. 396 for all flows i in FG do 397 FSE_R(i) = (P(i)*S_CR)/S_P 398 send FSE_R(i) to the flow i 399 end for 401 6. Application 403 This section specifies how the FSE can be applied to specific 404 congestion control mechanisms and makes general recommendations that 405 facilitate applying the FSE to future congestion controls. 407 6.1. NADA 409 Network-Assisted Dynamic Adapation (NADA) [nada] is a congestion 410 control scheme for rtcweb. It calculates a reference rate R_n upon 411 receiving an acknowledgment, and then, based on the reference rate, 412 it calculates a video target rate R_v and a sending rate for the 413 flows, R_s. 415 When applying the FSE to NADA, the UPDATE function call described in 416 Section 5.3 gives the FSE NADA's reference rate R_n. The recommended 417 algorithm for NADA is the Active FSE in Section 5.3.1. In step 3 418 (c), when the FSE_R(i) is "sent" to the flow i, this means updating 419 R_v and R_s of flow i with the value of FSE_R(i). 421 NADA simulation results are available from 422 http://heim.ifi.uio.no/safiquli/coupled-cc/. The next version of 423 this document will refer to a technical report that will be made 424 available at the same URL. 426 6.2. General recommendations 428 This section will provides general advice for applying the FSE to 429 congestion control mechanisms. TEMPORARY NOTE: Future versions of 430 this document will contain a longer list. 432 Receiver-side calculations: 433 When receiver-side calculations make assumptions about the rate 434 of the sender, the calculations need to be synchronized or the 435 receiver needs to be updated accordingly. This applies to TFRC 436 [RFC5348], for example, where simulations showed somewhat less 437 favorable results when using the FSE without a receiver-side 438 change [fse]. 440 7. Acknowledgements 442 This document has benefitted from discussions with and feedback from 443 David Hayes, Mirja Kuehlewind, Andreas Petlund, David Ros (who also 444 gave the FSE its name), Zaheduzzaman Sarker and Varun Singh. The 445 authors would like to thank Xiaoqing Zhu for helping with NADA. 447 This work was partially funded by the European Community under its 448 Seventh Framework Programme through the Reducing Internet Transport 449 Latency (RITE) project (ICT-317700). 451 8. IANA Considerations 453 This memo includes no request to IANA. 455 9. Security Considerations 457 In scenarios where the architecture described in this document is 458 applied across applications, various cheating possibilities arise: 459 e.g., supporting wrong values for the calculated rate, the desired 460 rate, or the priority of a flow. In the worst case, such cheating 461 could either prevent other flows from sending or make them send at a 462 rate that is unreasonably large. The end result would be unfair 463 behavior at the network bottleneck, akin to what could be achieved 464 with any UDP based application. Hence, since this is no worse than 465 UDP in general, there seems to be no significant harm in using this 466 in the absence of UDP rate limiters. 468 In the case of a single-user system, it should also be in the 469 interest of any application programmer to give the user the best 470 possible experience by using reasonable flow priorities or even 471 letting the user choose them. In a multi-user system, this interest 472 may not be given, and one could imagine the worst case of an "arms 473 race" situation, where applications end up setting their priorities 474 to the maximum value. If all applications do this, the end result is 475 a fair allocation in which the priority mechanism is implicitly 476 eliminated, and no major harm is done. 478 10. References 480 10.1. Normative References 482 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 483 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 484 RFC2119, March 1997, 485 . 487 [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", 488 RFC 3124, DOI 10.17487/RFC3124, June 2001, 489 . 491 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 492 Friendly Rate Control (TFRC): Protocol Specification", 493 RFC 5348, DOI 10.17487/RFC5348, September 2008, 494 . 496 10.2. Informative References 498 [fse] Islam, S., Welzl, M., Gjessing, S., and N. Khademi, 499 "Coupled Congestion Control for RTP Media", ACM SIGCOMM 500 Capacity Sharing Workshop (CSWS 2014); extended version 501 available as a technical report from 502 http://safiquli.at.ifi.uio.no/paper/fse-tech-report.pdf , 503 2014. 505 [nada] Zhu, X., Pan, R., Ramalho, M., Mena, S., Ganzhorn, C., 506 Jones, P., and S. De Aronco, "NADA: A Unified Congestion 507 Control Scheme for Real-Time Media", 508 draft-ietf-rmcat-nada-00 (work in progress), April 2015. 510 [rtcweb-rtp-usage] 511 Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time 512 Communication (WebRTC): Media Transport and Use of RTP", 513 draft-ietf-rtcweb-rtp-usage-18.txt (work in progress), 514 October 2014. 516 [rtcweb-usecases] 517 Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- 518 Time Communication Use-cases and Requirements", 519 draft-ietf-rtcweb-use-cases-and-requirements-14.txt (work 520 in progress), February 2014. 522 [sbd] Hayes, D., Ferlin, S., and M. Welzl, "Shared Bottleneck 523 Detection for Coupled Congestion Control for RTP Media", 524 draft-ietf-rmcat-sbd-00.txt (work in progress), May 2015. 526 [transport-multiplex] 527 Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a 528 Single Lower-Layer Transport", 529 draft-westerlund-avtcore-transport-multiplexing-07.txt 530 (work in progress), October 2013. 532 Appendix A. Scheduling 534 When connections originate from the same host, it would be possible 535 to use only one single sender-side congestion controller which 536 determines the overall allowed sending rate, and then use a local 537 scheduler to assign a proportion of this rate to each RTP session. 538 This way, priorities could also be implemented as a function of the 539 scheduler. The Congestion Manager (CM) [RFC3124] also uses such a 540 scheduling function. 542 Appendix B. Example algorithm - Passive FSE 544 Active algorithms calculate the rates for all the flows in the FG and 545 actively distribute them. In a passive algorithm, UPDATE returns a 546 rate that should be used instead of the rate that the congestion 547 controller has determined. This can make a passive algorithm easier 548 to implement; however, when round-trip times of flows are unequal, 549 shorter-RTT flows will update and react to the overall FSE state more 550 often than longer-RTT flows, which can produce unwanted side effects. 551 This problem is more significant when the congestion control 552 convergence depends on the RTT. While the passive algorithm works 553 better for congestion controls with RTT-independent convergence, it 554 can still produce oscillations on short time scales. The algorithm 555 described below is therefore considered as highly experimental. 557 This passive version of the FSE stores the following information in 558 addition to the variables described in Section 5.2: 560 o The desired rate DR. This can be smaller than the calculated rate 561 if the application feeding into the flow has less data to send 562 than the congestion controller would allow. In case of a bulk 563 transfer, DR must be set to CC_R received from the flow's 564 congestion module. 566 The passive version of the FSE contains one static variable per FG 567 called TLO (Total Leftover Rate -- used to let a flow 'take' 568 bandwidth from application-limited or terminated flows) which is 569 initialized to 0. For the passive version, S_CR is limited to 570 increase or decrease as conservatively as a flow's congestion 571 controller decides in order to prohibit sudden rate jumps. 573 (1) When a flow f starts, it registers itself with SBD and the FSE. 574 FSE_R and DR are initialized with the congestion controller's 575 initial rate. SBD will assign the correct FGI. When a flow is 576 assigned an FGI, it adds its FSE_R to S_CR. 578 (2) When a flow f stops, it sets its DR to 0 and sets P to -1. 580 (3) Every time the congestion controller of the flow f determines a 581 new sending rate CC_R, assuming the flow's new desired rate 582 new_DR to be "infinity" in case of a bulk data transfer with an 583 unknown maximum rate, the flow calls UPDATE, which carries out 584 the tasks listed below to derive the flow's new sending rate, 585 Rate. A flow's UPDATE function uses a few local (i.e. per-flow) 586 temporary variables, which are all initialized to 0: DELTA, 587 new_S_CR and S_P. 589 (a) For all the flows in its FG (including itself), it 590 calculates the sum of all the calculated rates, new_S_CR. 591 Then it calculates the difference between FSE_R(f) and 592 CC_R, DELTA. 594 for all flows i in FG do 595 new_S_CR = new_S_CR + FSE_R(i) 596 end for 597 DELTA = CC_R - FSE_R(f) 599 (b) It updates S_CR, FSE_R(f) and DR(f). 601 FSE_R(f) = CC_R 602 if DELTA > 0 then // the flow's rate has increased 603 S_CR = S_CR + DELTA 604 else if DELTA < 0 then 605 S_CR = new_S_CR + DELTA 606 end if 607 DR(f) = min(new_DR,FSE_R(f)) 609 (c) It calculates the leftover rate TLO, removes the terminated 610 flows from the FSE and calculates the sum of all the 611 priorities, S_P. 613 for all flows i in FG do 614 if P(i)<0 then 615 delete flow 616 else 617 S_P = S_P + P(i) 618 end if 619 end for 620 if DR(f) < FSE_R(f) then 621 TLO = TLO + (P(f)/S_P) * S_CR - DR(f)) 622 end if 624 (d) It calculates the sending rate, Rate. 626 Rate = min(new_DR, (P(f)*S_CR)/S_P + TLO) 628 if Rate != new_DR and TLO > 0 then 629 TLO = 0 // f has 'taken' TLO 630 end if 632 (e) It updates DR(f) and FSE_R(f) with Rate. 634 if Rate > DR(f) then 635 DR(f) = Rate 636 end if 637 FSE_R(f) = Rate 639 The goals of the flow algorithm are to achieve prioritization, 640 improve network utilization in the face of application-limited flows, 641 and impose limits on the increase behavior such that the negative 642 impact of multiple flows trying to increase their rate together is 643 minimized. It does that by assigning a flow a sending rate that may 644 not be what the flow's congestion controller expected. It therefore 645 builds on the assumption that no significant inefficiencies arise 646 from temporary application-limited behavior or from quickly jumping 647 to a rate that is higher than the congestion controller intended. 648 How problematic these issues really are depends on the controllers in 649 use and requires careful per-controller experimentation. The coupled 650 congestion control mechanism described here also does not require all 651 controllers to be equal; effects of heterogeneous controllers, or 652 homogeneous controllers being in different states, are also subject 653 to experimentation. 655 This algorithm gives all the leftover rate of application-limited 656 flows to the first flow that updates its sending rate, provided that 657 this flow needs it all (otherwise, its own leftover rate can be taken 658 by the next flow that updates its rate). Other policies could be 659 applied, e.g. to divide the leftover rate of a flow equally among all 660 other flows in the FGI. 662 B.1. Example operation (passive) 664 In order to illustrate the operation of the passive coupled 665 congestion control algorithm, this section presents a toy example of 666 two flows that use it. Let us assume that both flows traverse a 667 common 10 Mbit/s bottleneck and use a simplistic congestion 668 controller that starts out with 1 Mbit/s, increases its rate by 1 669 Mbit/s in the absence of congestion and decreases it by 2 Mbit/s in 670 the presence of congestion. For simplicity, flows are assumed to 671 always operate in a round-robin fashion. Rate numbers below without 672 units are assumed to be in Mbit/s. For illustration purposes, the 673 actual sending rate is also shown for every flow in FSE diagrams even 674 though it is not really stored in the FSE. 676 Flow #1 begins. It is a bulk data transfer and considers itself to 677 have top priority. This is the FSE after the flow algorithm's step 678 1: 680 ---------------------------------------- 681 | # | FGI | P | FSE_R | DR | Rate | 682 | | | | | | | 683 | 1 | 1 | 1 | 1 | 1 | 1 | 684 ---------------------------------------- 685 S_CR = 1, TLO = 0 687 Its congestion controller gradually increases its rate. Eventually, 688 at some point, the FSE should look like this: 690 -------------------------------------- 691 | # | FGI | P | FSE_R | DR | Rate | 692 | | | | | | | 693 | 1 | 1 | 1 | 10 | 10 | 10 | 694 ----------------------------------------- 695 S_CR = 10, TLO = 0 697 Now another flow joins. It is also a bulk data transfer, and has a 698 lower priority (0.5): 700 ---------------------------------------- 701 | # | FGI | P | FSE_R | DR | Rate | 702 | | | | | | | 703 | 1 | 1 | 1 | 10 | 10 | 10 | 704 | 2 | 1 | 0.5 | 1 | 1 | 1 | 705 ------------------------------------------ 706 S_CR = 11, TLO = 0 708 Now assume that the first flow updates its rate to 8, because the 709 total sending rate of 11 exceeds the total capacity. Let us take a 710 closer look at what happens in step 3 of the flow algorithm. 712 CC_R = 8. new_DR = infinity. 713 3 a) new_S_CR = 11; DELTA = 8 - 10 = -2. 714 3 b) FSE_Rf) = 8. DELTA is negative, hence S_CR = 9; 715 DR(f) = 8. 716 3 c) S_P = 1.5. 717 3 d) new sending rate = min(infinity, 1/1.5 * 9 + 0) = 6. 718 3 e) FSE_R(f) = 6. 720 The resulting FSE looks as follows: 721 ---------------------------------------- 722 | # | FGI | P | FSE_R | DR | Rate | 723 | | | | | | | 724 | 1 | 1 | 1 | 6 | 8 | 6 | 725 | 2 | 1 | 0.5 | 1 | 1 | 1 | 726 ------------------------------------------- 727 S_CR = 9, TLO = 0 729 The effect is that flow #1 is sending with 6 Mbit/s instead of the 8 730 Mbit/s that the congestion controller derived. Let us now assume 731 that flow #2 updates its rate. Its congestion controller detects 732 that the network is not fully saturated (the actual total sending 733 rate is 6+1=7) and increases its rate. 735 CC_R=2. new_DR = infinity. 736 3 a) new_S_CR = 7; DELTA = 2 - 1 = 1. 737 3 b) FSE_R(f) = 2. DELTA is positive, hence S_CR = 9 + 1 = 10; 738 DR(f) = 2. 739 3 c) S_P = 1.5. 740 3 d) new sending rate = min(infinity, 0.5/1.5 * 10 + 0) = 3.33. 741 3 e) DR(f) = FSE_R(f) = 3.33. 743 The resulting FSE looks as follows: 744 ------------------------------------------- 745 | # | FGI | P | FSE_R | DR | Rate | 746 | | | | | | | 747 | 1 | 1 | 1 | 6 | 8 | 6 | 748 | 2 | 1 | 0.5 | 3.33 | 3.33 | 3.33 | 749 ------------------------------------------- 750 S_CR = 10, TLO = 0 751 The effect is that flow #2 is now sending with 3.33 Mbit/s, which is 752 close to half of the rate of flow #1 and leads to a total utilization 753 of 6(#1) + 3.33(#2) = 9.33 Mbit/s. Flow #2's congestion controller 754 has increased its rate faster than the controller actually expected. 755 Now, flow #1 updates its rate. Its congestion controller detects 756 that the network is not fully saturated and increases its rate. 757 Additionally, the application feeding into flow #1 limits the flow's 758 sending rate to at most 2 Mbit/s. 760 CC_R=7. new_DR=2. 761 3 a) new_S_CR = 9.33; DELTA = 1. 762 3 b) FSE_R(f) = 7, DELTA is positive, hence S_CR = 10 + 1 = 11; 763 DR = min(2, 7) = 2. 764 3 c) S_P = 1.5; DR(f) < FSE_R(f), hence TLO = 1/1.5 * 11 - 2 = 5.33. 765 3 d) new sending rate = min(2, 1/1.5 * 11 + 5.33) = 2. 766 3 e) FSE_R(f) = 2. 768 The resulting FSE looks as follows: 769 ------------------------------------------- 770 | # | FGI | P | FSE_R | DR | Rate | 771 | | | | | | | 772 | 1 | 1 | 1 | 2 | 2 | 2 | 773 | 2 | 1 | 0.5 | 3.33 | 3.33 | 3.33 | 774 ------------------------------------------- 775 S_CR = 11, TLO = 5.33 777 Now, the total rate of the two flows is 2 + 3.33 = 5.33 Mbit/s, i.e. 778 the network is significantly underutilized due to the limitation of 779 flow #1. Flow #2 updates its rate. Its congestion controller 780 detects that the network is not fully saturated and increases its 781 rate. 783 CC_R=4.33. new_DR = infinity. 784 3 a) new_S_CR = 5.33; DELTA = 1. 785 3 b) FSE_R(f) = 4.33. DELTA is positive, hence S_CR = 12; 786 DR(f) = 4.33. 787 3 c) S_P = 1.5. 788 3 d) new sending rate: min(infinity, 0.5/1.5 * 12 + 5.33 ) = 9.33. 789 3 e) FSE_R(f) = 9.33, DR(f) = 9.33. 791 The resulting FSE looks as follows: 792 ------------------------------------------- 793 | # | FGI | P | FSE_R | DR | Rate | 794 | | | | | | | 795 | 1 | 1 | 1 | 2 | 2 | 2 | 796 | 2 | 1 | 0.5 | 9.33 | 9.33 | 9.33 | 797 ------------------------------------------- 798 S_CR = 12, TLO = 0 800 Now, the total rate of the two flows is 2 + 9.33 = 11.33 Mbit/s. 801 Finally, flow #1 terminates. It sets P to -1 and DR to 0. Let us 802 assume that it terminated late enough for flow #2 to still experience 803 the network in a congested state, i.e. flow #2 decreases its rate in 804 the next iteration. 806 CC_R = 7.33. new_DR = infinity. 807 3 a) new_S_CR = 11.33; DELTA = -2. 808 3 b) FSE_R(f) = 7.33. DELTA is negative, hence S_CR = 9.33; 809 DR(f) = 7.33. 810 3 c) Flow 1 has P = -1, hence it is deleted from the FSE. 811 S_P = 0.5. 812 3 d) new sending rate: min(infinity, 0.5/0.5*9.33 + 0) = 9.33. 813 3 e) FSE_R(f) = DR(f) = 9.33. 815 The resulting FSE looks as follows: 816 ------------------------------------------- 817 | # | FGI | P | FSE_R | DR | Rate | 818 | | | | | | | 819 | 2 | 1 | 0.5 | 9.33 | 9.33 | 9.33 | 820 ------------------------------------------- 821 S_CR = 9.33, TLO = 0 823 Authors' Addresses 825 Safiqul Islam 826 University of Oslo 827 PO Box 1080 Blindern 828 Oslo, N-0316 829 Norway 831 Phone: +47 22 84 08 37 832 Email: safiquli@ifi.uio.no 834 Michael Welzl 835 University of Oslo 836 PO Box 1080 Blindern 837 Oslo, N-0316 838 Norway 840 Phone: +47 22 85 24 20 841 Email: michawe@ifi.uio.no 843 Stein Gjessing 844 University of Oslo 845 PO Box 1080 Blindern 846 Oslo, N-0316 847 Norway 849 Phone: +47 22 85 24 44 850 Email: steing@ifi.uio.no