idnits 2.17.1 draft-welzl-rmcat-coupled-cc-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 24, 2014) is 3471 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 2140 (Obsoleted by RFC 9040) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTP Media Congestion Avoidance M. Welzl 3 Techniques (rmcat) S. Islam 4 Internet-Draft S. Gjessing 5 Intended status: Experimental University of Oslo 6 Expires: April 27, 2015 October 24, 2014 8 Coupled congestion control for RTP media 9 draft-welzl-rmcat-coupled-cc-04 11 Abstract 13 When multiple congestion controlled RTP sessions traverse the same 14 network bottleneck, it can be beneficial to combine their controls 15 such that the total on-the-wire behavior is improved. This document 16 describes such a method for flows that have the same sender, in a way 17 that is as flexible and simple as possible while minimizing the 18 amount of changes needed to existing RTP applications. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on April 27, 2015. 37 Copyright Notice 39 Copyright (c) 2014 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 3. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 4. Architectural overview . . . . . . . . . . . . . . . . . . . . 5 58 5. Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 59 5.1. SBD . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 60 5.2. FSE . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 61 5.3. Flows . . . . . . . . . . . . . . . . . . . . . . . . . . 7 62 5.3.1. Example algorithm 1 - Active FSE . . . . . . . . . . . 8 63 5.3.2. Example algorithm 2 - Conservative Active FSE . . . . 9 64 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 65 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 66 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 67 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 68 9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 69 9.2. Informative References . . . . . . . . . . . . . . . . . . 11 70 Appendix A. Example algorithm - Passive FSE . . . . . . . . . . . 12 71 A.1. Example operation (passive) . . . . . . . . . . . . . . . 14 72 Appendix B. Change log . . . . . . . . . . . . . . . . . . . . . 19 73 B.1. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 19 74 B.2. Changes from -01 to -02 . . . . . . . . . . . . . . . . . 19 75 B.3. Changes from -02 to -03 . . . . . . . . . . . . . . . . . 19 76 B.4. Changes from -03 to -04 . . . . . . . . . . . . . . . . . 19 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 79 1. Introduction 81 When there is enough data to send, a congestion controller must 82 increase its sending rate until the path's capacity has been reached; 83 depending on the controller, sometimes the rate is increased further, 84 until packets are ECN-marked or dropped. This process inevitably 85 creates undesirable queuing delay -- an effect that is amplified when 86 multiple congestion controlled connections traverse the same network 87 bottleneck. When such connections originate from the same host, it 88 would therefore be ideal to use only one single sender-side 89 congestion controller which determines the overall allowed sending 90 rate, and then use a local scheduler to assign a proportion of this 91 rate to each RTP session. This way, priorities could also be 92 implemented quite easily, as a function of the scheduler; honoring 93 user-specified priorities is, for example, required by rtcweb 94 [rtcweb-usecases]. 96 The Congestion Manager (CM) [RFC3124] provides a single congestion 97 controller with a scheduling function just as described above. It is 98 hard to implement because it requires an additional congestion 99 controller and removes all per-connection congestion control 100 functionality, which is quite a significant change to existing RTP 101 based applications. This document presents a method that is easier 102 to implement than the CM and also requires less significant changes 103 to existing RTP based applications. It attempts to roughly 104 approximate the CM behavior by sharing information between existing 105 congestion controllers, akin to "Ensemble Sharing" in [RFC2140]. 107 2. Definitions 109 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 110 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 111 document are to be interpreted as described in RFC 2119 [RFC2119]. 113 Available Bandwidth: 114 The available bandwidth is the nominal link capacity minus the 115 amount of traffic that traversed the link during a certain time 116 interval, divided by that time interval. 118 Bottleneck: 119 The first link with the smallest available bandwidth along the 120 path between a sender and receiver. 122 Flow: 123 A flow is the entity that congestion control is operating on. 124 It could, for example, be a transport layer connection, an RTP 125 session, or a subsession that is multiplexed onto a single RTP 126 session together with other subsessions. 128 Flow Group Identifier (FGI): 129 A unique identifier for each subset of flows that is limited by 130 a common bottleneck. 132 Flow State Exchange (FSE): 133 The entity that maintains information that is exchanged between 134 flows. 136 Flow Group (FG): 137 A group of flows having the same FGI. 139 Shared Bottleneck Detection (SBD): 140 The entity that determines which flows traverse the same 141 bottleneck in the network, or the process of doing so. 143 3. Limitations 145 Sender-side only: 146 Coupled congestion control as described here only operates 147 inside a single host on the sender side. This is because, 148 irrespective of where the major decisions for congestion 149 control are taken, the sender of a flow needs to eventually 150 decide the transmission rate. Additionally, the necessary 151 information about how much data an application can currently 152 send on a flow is typically only available at the sender side, 153 making the sender an obvious choice for placement of the 154 elements and mechanisms described here. 156 When implementing a sender-side change to a congestion control 157 mechanism such as TFRC [RFC5348], where receiver-side 158 calculations make assumptions about the rate of the sender, the 159 receiver also needs to be updated accordingly. Flows that have 160 different senders but the same receiver, or different senders 161 and different receivers can also share a bottleneck; such 162 scenarios have been omitted for simplicity, and could be 163 incorporated in future versions of this document. Note that 164 limiting the number of flows on which coupled congestion 165 control operates merely limits the benefits derived from the 166 mechanism. 168 Shared bottlenecks do not change quickly: 169 As per the definition above, a bottleneck depends on cross 170 traffic, and since such traffic can heavily fluctuate, 171 bottlenecks can change at a high frequency (e.g., there can be 172 oscillation between two or more links). This means that, when 173 flows are partially routed along different paths, they may 174 quickly change between sharing and not sharing a bottleneck. 175 For simplicity, here it is assumed that a shared bottleneck is 176 valid for a time interval that is significantly longer than the 177 interval at which congestion controllers operate. Note that, 178 for the only SBD mechanism defined in this document 179 (multiplexing on the same five-tuple), the notion of a shared 180 bottleneck stays correct even in the presence of fast traffic 181 fluctuations: since all flows that are assumed to share a 182 bottleneck are routed in the same way, if the bottleneck 183 changes, it will still be shared. 185 4. Architectural overview 187 Figure 1 shows the elements of the architecture for coupled 188 congestion control: the Flow State Exchange (FSE), Shared Bottleneck 189 Detection (SBD) and Flows. The FSE is a storage element that can be 190 implemented in two ways: active and passive. In the active version, 191 it initiates communication with flows and SBD. However, in the 192 passive version, it does not actively initiate communication with 193 flows and SBD; its only active role is internal state maintenance 194 (e.g., an implementation could use soft state to remove a flow's data 195 after long periods of inactivity). Every time a flow's congestion 196 control mechanism would normally update its sending rate, the flow 197 instead updates information in the FSE and performs a query on the 198 FSE, leading to a sending rate that can be different from what the 199 congestion controller originally determined. Using information 200 about/from the currently active flows, SBD updates the FSE with the 201 correct Flow State Identifiers (FSIs). 203 ------- <--- Flow 1 204 | FSE | <--- Flow 2 .. 205 ------- <--- .. Flow N 206 ^ 207 | | 208 ------- | 209 | SBD | <-------| 210 ------- 212 Figure 1: Coupled congestion control architecture 214 Since everything shown in Figure 1 is assumed to operate on a single 215 host (the sender) only, this document only describes aspects that 216 have an influence on the resulting on-the-wire behavior. It does, 217 for instance, not define how many bits must be used to represent 218 FSIs, or in which way the entities communicate. Implementations can 219 take various forms: for instance, all the elements in the figure 220 could be implemented within a single application, thereby operating 221 on flows generated by that application only. Another alternative 222 could be to implement both the FSE and SBD together in a separate 223 process which different applications communicate with via some form 224 of Inter-Process Communication (IPC). Such an implementation would 225 extend the scope to flows generated by multiple applications. The 226 FSE and SBD could also be included in the Operating System kernel. 228 5. Roles 230 This section gives an overview of the roles of the elements of 231 coupled congestion control, and provides an example of how coupled 232 congestion control can operate. 234 5.1. SBD 236 SBD uses knowledge about the flows to determine which flows belong in 237 the same Flow Group (FG), and assigns FGIs accordingly. This 238 knowledge can be derived in three basic ways: 240 1. From multiplexing: it can be based on the simple assumption that 241 packets sharing the same five-tuple (IP source and destination 242 address, protocol, and transport layer port number pair) and 243 having the same Differentiated Services Code Point (DSCP) in the 244 IP header are typically treated in the same way along the path. 245 The latter method is the only one specified in this document: SBD 246 MAY consider all flows that use the same five-tuple and DSCP to 247 belong to the same FG. This classification applies to certain 248 tunnels, or RTP flows that are multiplexed over one transport 249 (cf. [transport-multiplex]). In one way or another, such 250 multiplexing will probably be recommended for use with rtcweb 251 [rtcweb-rtp-usage]. 253 2. Via configuration: e.g. by assuming that a common wireless uplink 254 is also a shared bottleneck. 256 3. From measurements: e.g. by considering correlations among 257 measured delay and loss as an indication of a shared bottleneck. 259 The methods above have some essential trade-offs: e.g., multiplexing 260 is a completely reliable measure, however it is limited in scope to 261 two end points (i.e., it cannot be applied to couple congestion 262 controllers of one sender talking to multiple receivers). A 263 measurement-based SBD mechanism is described in [sbd]. Measurements 264 can never be 100% reliable, in particular because they are based on 265 the past but applying coupled congestion control means to make an 266 assumption about the future; it is therefore recommended to implement 267 cautionary measures, e.g. by disabling coupled congestion control if 268 enabling it causes a significant increase in delay and/or packet 269 loss. Measurements also take time, which entails a certain delay for 270 turning on coupling (refer to [sbd] for details). 272 5.2. FSE 274 The FSE contains a list of all flows that have registered with it. 275 For each flow, it stores the following: 277 o a unique flow number to identify the flow 279 o the FGI of the FG that it belongs to (based on the definitions in 280 this document, a flow has only one bottleneck, and can therefore 281 be in only one FG) 283 o a priority P, which here is assumed to be represented as a 284 floating point number in the range from 0.1 (unimportant) to 1 285 (very important). A negative value is used to indicate that a 286 flow has terminated 288 o The rate used by the flow in bits per second, FSE_R. 290 The FSE can operate on window-based as well as rate-based congestion 291 controllers (TEMPORARY NOTE: and probably -- not yet tested -- 292 combinations thereof, with calculations to convert from one to the 293 other). In case of a window-based controller, FSE_R is a window, and 294 all the text below should be considered to refer to window, not 295 rates. 297 In the FSE, each FG contains one static variable S_CR which is meant 298 to be the sum of the calculated rates of all flows in the same FG 299 (including the flow itself). This value is used to calculate the 300 sending rate. 302 The information listed here is enough to implement the sample flow 303 algorithm given below. FSE implementations could easily be extended 304 to store, e.g., a flow's current sending rate for statistics 305 gathering or future potential optimizations. 307 5.3. Flows 309 Flows register themselves with SBD and FSE when they start, 310 deregister from the FSE when they stop, and carry out an UPDATE 311 function call every time their congestion controller calculates a new 312 sending rate. Via UPDATE, they provide the newly calculated rate and 313 the desired rate (less than the calculated rate in case of 314 application-limited flows, the same otherwise). 316 Below, two example algorithms are described. While other algorithms 317 could be used instead, the same algorithm must be applied to all 318 flows. 320 5.3.1. Example algorithm 1 - Active FSE 322 This algorithm was designed to be the simplest possible method to 323 assign rates according to the priorities of flows. Simulations 324 results in [fse] indicate that it does however not significantly 325 reduce queuing delay and packet loss. 327 (1) When a flow f starts, it registers itself with SBD and the FSE. 328 FSE_R is initialized with the congestion controller's initial 329 rate. SBD will assign the correct FGI. When a flow is assigned 330 an FGI, it adds its FSE_R to S_CR. 332 (2) When a flow f stops, its entry is removed from the list. 334 (3) Every time the congestion controller of the flow f determines a 335 new sending rate CC_R, the flow calls UPDATE, which carries out 336 the tasks listed below to derive the new sending rates for all 337 the flows in the FG. A flow's UPDATE function uses a local 338 (i.e. per-flow) temporary variable S_P, which is the sum of all 339 the priorities. 341 (a) It updates S_CR. 343 S_CR = S_CR + CC_R - FSE_R(f) 345 (b) It calculates the sum of all the priorities, S_P. 347 S_P = 0 348 for all flows i in FG do 349 S_P = S_P + P(i) 350 end for 352 (c) It calculates the sending rates for all the flows in an FG 353 and distributes them. 355 for all flows i in FG do 356 FSE_R(i) = (P(i)*S_CR)/S_P 357 send FSE_R(i) to the flow i 358 end for 360 5.3.2. Example algorithm 2 - Conservative Active FSE 362 This algorithm extends algorithm 1 to conservatively emulate the 363 behavior of a single flow by proportionally reducing the aggregate 364 rate on congestion. Simulations results in [fse] indicate that it 365 can significantly reduce queuing delay and packet loss. 367 (1) When a flow f starts, it registers itself with SBD and the FSE. 368 FSE_R is initialized with the congestion controller's initial 369 rate. SBD will assign the correct FGI. When a flow is assigned 370 an FGI, it adds its FSE_R to S_CR. 372 (2) When a flow f stops, its entry is removed from the list. 374 (3) Every time the congestion controller of the flow f determines a 375 new sending rate CC_R, the flow calls UPDATE, which carries out 376 the tasks listed below to derive the new sending rates for all 377 the flows in the FG. A flow's UPDATE function uses a local 378 (i.e. per-flow) temporary variable S_P, which is the sum of all 379 the priorities, and a local variable DELTA, which is used to 380 calculate the difference between CC_R and the previously stored 381 FSE_R. To prevent flows from either ignoring congestion or 382 overreacting, a timer keeps them from changing their rates 383 immediately after the common rate reduction that follows a 384 congestion event. This timer is set to 2 RTTs of the flow that 385 experienced congestion because it is assumed that a congestion 386 event can persist for up to one RTT of that flow, with another 387 RTT added to compensate for fluctuations in the measured RTT 388 value. 390 (a) It updates S_CR based on DELTA. 392 if Timer has expired or not set then 393 DELTA = CC_R - FSE_R(f) 394 if DELTA < 0 then // Reduce S_CR proportionally 395 S_CR = S_CR * CC_R / FSE_R(f) 396 Set Timer for 2 RTTs 397 else 398 S_CR = S_CR + DELTA 399 end if 400 end if 402 (b) It calculates the sum of all the priorities, S_P. 404 S_P = 0 405 for all flows i in FG do 406 S_P = S_P + P(i) 407 end for 409 (c) It calculates the sending rates for all the flows in an FG 410 and distributes them. 412 for all flows i in FG do 413 FSE_R(i) = (P(i)*S_CR)/S_P 414 send FSE_R(i) to the flow i 415 end for 417 6. Acknowledgements 419 This document has benefitted from discussions with and feedback from 420 David Hayes, Andreas Petlund, and David Ros (who also gave the FSE 421 its name). 423 This work was partially funded by the European Community under its 424 Seventh Framework Programme through the Reducing Internet Transport 425 Latency (RITE) project (ICT-317700). 427 7. IANA Considerations 429 This memo includes no request to IANA. 431 8. Security Considerations 433 In scenarios where the architecture described in this document is 434 applied across applications, various cheating possibilities arise: 435 e.g., supporting wrong values for the calculated rate, the desired 436 rate, or the priority of a flow. In the worst case, such cheating 437 could either prevent other flows from sending or make them send at a 438 rate that is unreasonably large. The end result would be unfair 439 behavior at the network bottleneck, akin to what could be achieved 440 with any UDP based application. Hence, since this is no worse than 441 UDP in general, there seems to be no significant harm in using this 442 in the absence of UDP rate limiters. 444 In the case of a single-user system, it should also be in the 445 interest of any application programmer to give the user the best 446 possible experience by using reasonable flow priorities or even 447 letting the user choose them. In a multi-user system, this interest 448 may not be given, and one could imagine the worst case of an "arms 449 race" situation, where applications end up setting their priorities 450 to the maximum value. If all applications do this, the end result is 451 a fair allocation in which the priority mechanism is implicitly 452 eliminated, and no major harm is done. 454 9. References 456 9.1. Normative References 458 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 459 Requirement Levels", BCP 14, RFC 2119, March 1997. 461 [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, 462 April 1997. 464 [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", 465 RFC 3124, June 2001. 467 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 468 Friendly Rate Control (TFRC): Protocol Specification", 469 RFC 5348, September 2008. 471 9.2. Informative References 473 [fse] Islam, S., Welzl, M., Gjessing, S., and N. Khademi, 474 "Coupled Congestion Control for RTP Media", ACM SIGCOMM 475 Capacity Sharing Workshop (CSWS 2014); extended version 476 available as a technical report from 477 http://safiquli.at.ifi.uio.no/paper/fse-tech-report.pdf , 478 2014. 480 [rtcweb-rtp-usage] 481 Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time 482 Communication (WebRTC): Media Transport and Use of RTP", 483 draft-ietf-rtcweb-rtp-usage-18.txt (work in progress), 484 October 2014. 486 [rtcweb-usecases] 487 Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- 488 Time Communication Use-cases and Requirements", 489 draft-ietf-rtcweb-use-cases-and-requirements-14.txt (work 490 in progress), February 2014. 492 [sbd] Hayes, D., Ferlin, S., and M. Welzl, "Shared Bottleneck 493 Detection for Coupled Congestion Control for RTP Media", 494 draft-hayes-rmcat-sbd-00.txt (work in progress), 495 October 2014. 497 [transport-multiplex] 498 Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a 499 Single Lower-Layer Transport", 500 draft-westerlund-avtcore-transport-multiplexing-07.txt 501 (work in progress), October 2013. 503 Appendix A. Example algorithm - Passive FSE 505 Active algorithms calculate the rates for all the flows in the FG and 506 actively distribute them. In a passive algorithm, UPDATE returns a 507 rate that should be used instead of the rate that the congestion 508 controller has determined. This can make a passive algorithm easier 509 to implement; however, the resulting dynamics are not fully 510 understood. The algorithm described below is to be considered as 511 highly experimental and did not perform as well as the active 512 variants in simulations. 514 This passive version of the FSE stores the following information in 515 addition to the variables described in Section 5.2: 517 o The desired rate DR. This can be smaller than the calculated rate 518 if the application feeding into the flow has less data to send 519 than the congestion controller would allow. In case of a bulk 520 transfer, DR must be set to CC_R received from the flow's 521 congestion module. 523 The passive version of the FSE contains one static variable per FG 524 called TLO (Total Leftover Rate -- used to let a flow 'take' 525 bandwidth from application-limited or terminated flows) which is 526 initialized to 0. For the passive version, S_CR is limited to 527 increase or decrease as conservatively as a flow's congestion 528 controller decides in order to prohibit sudden rate jumps. 530 (1) When a flow f starts, it registers itself with SBD and the FSE. 531 FSE_R and DR are initialized with the congestion controller's 532 initial rate. SBD will assign the correct FGI. When a flow is 533 assigned an FGI, it adds its FSE_R to S_CR. 535 (2) When a flow f stops, it sets its DR to 0 and sets P to -1. 537 (3) Every time the congestion controller of the flow f determines a 538 new sending rate CC_R, assuming the flow's new desired rate 539 new_DR to be "infinity" in case of a bulk data transfer with an 540 unknown maximum rate, the flow calls UPDATE, which carries out 541 the tasks listed below to derive the flow's new sending rate, 542 Rate. A flow's UPDATE function uses a few local (i.e. per-flow) 543 temporary variables, which are all initialized to 0: DELTA, 544 new_S_CR and S_P. 546 (a) For all the flows in its FG (including itself), it 547 calculates the sum of all the calculated rates, new_S_CR. 548 Then it calculates the difference between FSE_R(f) and 549 CC_R, DELTA. 551 for all flows i in FG do 552 new_S_CR = new_S_CR + FSE_R(i) 553 end for 554 DELTA = CC_R - FSE_R(f) 556 (b) It updates S_CR, FSE_R(f) and DR(f). 558 FSE_R(f) = CC_R 559 if DELTA > 0 then // the flow's rate has increased 560 S_CR = S_CR + DELTA 561 else if DELTA < 0 then 562 S_CR = new_S_CR + DELTA 563 end if 564 DR(f) = min(new_DR,FSE_R(f)) 566 (c) It calculates the leftover rate TLO, removes the terminated 567 flows from the FSE and calculates the sum of all the 568 priorities, S_P. 570 for all flows i in FG do 571 if P(i)<0 then 572 delete flow 573 else 574 S_P = S_P + P(i) 575 end if 576 end for 577 if DR(f) < FSE_R(f) then 578 TLO = TLO + (P(f)/S_P) * S_CR - DR(f)) 579 end if 581 (d) It calculates the sending rate, Rate. 583 Rate = min(new_DR, (P(f)*S_CR)/S_P + TLO) 585 if Rate != new_DR and TLO > 0 then 586 TLO = 0 // f has 'taken' TLO 587 end if 589 (e) It updates DR(f) and FSE_R(f) with Rate. 591 if Rate > DR(f) then 592 DR(f) = Rate 593 end if 594 FSE_R(f) = Rate 596 The goals of the flow algorithm are to achieve prioritization, 597 improve network utilization in the face of application-limited flows, 598 and impose limits on the increase behavior such that the negative 599 impact of multiple flows trying to increase their rate together is 600 minimized. It does that by assigning a flow a sending rate that may 601 not be what the flow's congestion controller expected. It therefore 602 builds on the assumption that no significant inefficiencies arise 603 from temporary application-limited behavior or from quickly jumping 604 to a rate that is higher than the congestion controller intended. 605 How problematic these issues really are depends on the controllers in 606 use and requires careful per-controller experimentation. The coupled 607 congestion control mechanism described here also does not require all 608 controllers to be equal; effects of heterogeneous controllers, or 609 homogeneous controllers being in different states, are also subject 610 to experimentation. 612 This algorithm gives all the leftover rate of application-limited 613 flows to the first flow that updates its sending rate, provided that 614 this flow needs it all (otherwise, its own leftover rate can be taken 615 by the next flow that updates its rate). Other policies could be 616 applied, e.g. to divide the leftover rate of a flow equally among all 617 other flows in the FGI. 619 A.1. Example operation (passive) 621 In order to illustrate the operation of the passive coupled 622 congestion control algorithm, this section presents a toy example of 623 two flows that use it. Let us assume that both flows traverse a 624 common 10 Mbit/s bottleneck and use a simplistic congestion 625 controller that starts out with 1 Mbit/s, increases its rate by 1 626 Mbit/s in the absence of congestion and decreases it by 2 Mbit/s in 627 the presence of congestion. For simplicity, flows are assumed to 628 always operate in a round-robin fashion. Rate numbers below without 629 units are assumed to be in Mbit/s. For illustration purposes, the 630 actual sending rate is also shown for every flow in FSE diagrams even 631 though it is not really stored in the FSE. 633 Flow #1 begins. It is a bulk data transfer and considers itself to 634 have top priority. This is the FSE after the flow algorithm's step 635 1: 637 ---------------------------------------- 638 | # | FGI | P | FSE_R | DR | Rate | 639 | | | | | | | 640 | 1 | 1 | 1 | 1 | 1 | 1 | 641 ---------------------------------------- 642 S_CR = 1, TLO = 0 644 Its congestion controller gradually increases its rate. Eventually, 645 at some point, the FSE should look like this: 647 -------------------------------------- 648 | # | FGI | P | FSE_R | DR | Rate | 649 | | | | | | | 650 | 1 | 1 | 1 | 10 | 10 | 10 | 651 ----------------------------------------- 652 S_CR = 10, TLO = 0 654 Now another flow joins. It is also a bulk data transfer, and has a 655 lower priority (0.5): 657 ---------------------------------------- 658 | # | FGI | P | FSE_R | DR | Rate | 659 | | | | | | | 660 | 1 | 1 | 1 | 10 | 10 | 10 | 661 | 2 | 1 | 0.5 | 1 | 1 | 1 | 662 ------------------------------------------ 663 S_CR = 11, TLO = 0 665 Now assume that the first flow updates its rate to 8, because the 666 total sending rate of 11 exceeds the total capacity. Let us take a 667 closer look at what happens in step 3 of the flow algorithm. 669 CC_R = 8. new_DR = infinity. 670 3 a) new_S_CR = 11; DELTA = 8 - 10 = -2. 671 3 b) FSE_Rf) = 8. DELTA is negative, hence S_CR = 9; 672 DR(f) = 8. 673 3 c) S_P = 1.5. 674 3 d) new sending rate = min(infinity, 1/1.5 * 9 + 0) = 6. 675 3 e) FSE_R(f) = 6. 677 The resulting FSE looks as follows: 678 ---------------------------------------- 679 | # | FGI | P | FSE_R | DR | Rate | 680 | | | | | | | 681 | 1 | 1 | 1 | 6 | 8 | 6 | 682 | 2 | 1 | 0.5 | 1 | 1 | 1 | 683 ------------------------------------------- 684 S_CR = 9, TLO = 0 686 The effect is that flow #1 is sending with 6 Mbit/s instead of the 8 687 Mbit/s that the congestion controller derived. Let us now assume 688 that flow #2 updates its rate. Its congestion controller detects 689 that the network is not fully saturated (the actual total sending 690 rate is 6+1=7) and increases its rate. 692 CC_R=2. new_DR = infinity. 693 3 a) new_S_CR = 7; DELTA = 2 - 1 = 1. 694 3 b) FSE_R(f) = 2. DELTA is positive, hence S_CR = 9 + 1 = 10; 695 DR(f) = 2. 696 3 c) S_P = 1.5. 697 3 d) new sending rate = min(infinity, 0.5/1.5 * 10 + 0) = 3.33. 698 3 e) DR(f) = FSE_R(f) = 3.33. 700 The resulting FSE looks as follows: 701 ------------------------------------------- 702 | # | FGI | P | FSE_R | DR | Rate | 703 | | | | | | | 704 | 1 | 1 | 1 | 6 | 8 | 6 | 705 | 2 | 1 | 0.5 | 3.33 | 3.33 | 3.33 | 706 ------------------------------------------- 707 S_CR = 10, TLO = 0 708 The effect is that flow #2 is now sending with 3.33 Mbit/s, which is 709 close to half of the rate of flow #1 and leads to a total utilization 710 of 6(#1) + 3.33(#2) = 9.33 Mbit/s. Flow #2's congestion controller 711 has increased its rate faster than the controller actually expected. 712 Now, flow #1 updates its rate. Its congestion controller detects 713 that the network is not fully saturated and increases its rate. 714 Additionally, the application feeding into flow #1 limits the flow's 715 sending rate to at most 2 Mbit/s. 717 CC_R=7. new_DR=2. 718 3 a) new_S_CR = 9.33; DELTA = 1. 719 3 b) FSE_R(f) = 7, DELTA is positive, hence S_CR = 10 + 1 = 11; 720 DR = min(2, 7) = 2. 721 3 c) S_P = 1.5; DR(f) < FSE_R(f), hence TLO = 1/1.5 * 11 - 2 = 5.33. 722 3 d) new sending rate = min(2, 1/1.5 * 11 + 5.33) = 2. 723 3 e) FSE_R(f) = 2. 725 The resulting FSE looks as follows: 726 ------------------------------------------- 727 | # | FGI | P | FSE_R | DR | Rate | 728 | | | | | | | 729 | 1 | 1 | 1 | 2 | 2 | 2 | 730 | 2 | 1 | 0.5 | 3.33 | 3.33 | 3.33 | 731 ------------------------------------------- 732 S_CR = 11, TLO = 5.33 734 Now, the total rate of the two flows is 2 + 3.33 = 5.33 Mbit/s, i.e. 735 the network is significantly underutilized due to the limitation of 736 flow #1. Flow #2 updates its rate. Its congestion controller 737 detects that the network is not fully saturated and increases its 738 rate. 740 CC_R=4.33. new_DR = infinity. 741 3 a) new_S_CR = 5.33; DELTA = 1. 742 3 b) FSE_R(f) = 4.33. DELTA is positive, hence S_CR = 12; 743 DR(f) = 4.33. 744 3 c) S_P = 1.5. 745 3 d) new sending rate: min(infinity, 0.5/1.5 * 12 + 5.33 ) = 9.33. 746 3 e) FSE_R(f) = 9.33, DR(f) = 9.33. 748 The resulting FSE looks as follows: 749 ------------------------------------------- 750 | # | FGI | P | FSE_R | DR | Rate | 751 | | | | | | | 752 | 1 | 1 | 1 | 2 | 2 | 2 | 753 | 2 | 1 | 0.5 | 9.33 | 9.33 | 9.33 | 754 ------------------------------------------- 755 S_CR = 12, TLO = 0 757 Now, the total rate of the two flows is 2 + 9.33 = 11.33 Mbit/s. 758 Finally, flow #1 terminates. It sets P to -1 and DR to 0. Let us 759 assume that it terminated late enough for flow #2 to still experience 760 the network in a congested state, i.e. flow #2 decreases its rate in 761 the next iteration. 763 CC_R = 7.33. new_DR = infinity. 764 3 a) new_S_CR = 11.33; DELTA = -2. 765 3 b) FSE_R(f) = 7.33. DELTA is negative, hence S_CR = 9.33; 766 DR(f) = 7.33. 767 3 c) Flow 1 has P = -1, hence it is deleted from the FSE. 768 S_P = 0.5. 769 3 d) new sending rate: min(infinity, 0.5/0.5*9.33 + 0) = 9.33. 770 3 e) FSE_R(f) = DR(f) = 9.33. 772 The resulting FSE looks as follows: 773 ------------------------------------------- 774 | # | FGI | P | FSE_R | DR | Rate | 775 | | | | | | | 776 | 2 | 1 | 0.5 | 9.33 | 9.33 | 9.33 | 777 ------------------------------------------- 778 S_CR = 9.33, TLO = 0 780 Appendix B. Change log 782 B.1. Changes from -00 to -01 784 o Added change log. 786 o Updated the example algorithm and its operation. 788 B.2. Changes from -01 to -02 790 o Included an active version of the algorithm which is simpler. 792 o Replaced "greedy flow" with "bulk data transfer" and "non-greedy" 793 with "application-limited". 795 o Updated new_CR to CC_R, and CR to FSE_R for better understanding. 797 B.3. Changes from -02 to -03 799 o Included an active conservative version of the algorithm which 800 reduces queue growth and packet loss; added a reference to a 801 technical report that shows these benefits with simulations. 803 o Moved the passive variant of the algorithm to appendix. 805 B.4. Changes from -03 to -04 807 o Extended SBD section. 809 o Added a note about window-based controllers. 811 Authors' Addresses 813 Michael Welzl 814 University of Oslo 815 PO Box 1080 Blindern 816 Oslo, N-0316 817 Norway 819 Phone: +47 22 85 24 20 820 Email: michawe@ifi.uio.no 821 Safiqul Islam 822 University of Oslo 823 PO Box 1080 Blindern 824 Oslo, N-0316 825 Norway 827 Phone: +47 22 84 08 37 828 Email: safiquli@ifi.uio.no 830 Stein Gjessing 831 University of Oslo 832 PO Box 1080 Blindern 833 Oslo, N-0316 834 Norway 836 Phone: +47 22 85 24 44 837 Email: steing@ifi.uio.no