idnits 2.17.1 draft-ietf-rmcat-coupled-cc-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 15, 2017) is 2415 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-13) exists of draft-ietf-rmcat-nada-04 == Outdated reference: A later version (-10) exists of draft-ietf-rmcat-eval-test-05 == Outdated reference: A later version (-11) exists of draft-ietf-rmcat-sbd-08 == Outdated reference: A later version (-19) exists of draft-ietf-rtcweb-overview-18 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTP Media Congestion Avoidance Techniques (rmcat) S. Islam 3 Internet-Draft M. Welzl 4 Intended status: Experimental S. Gjessing 5 Expires: March 19, 2018 University of Oslo 6 September 15, 2017 8 Coupled congestion control for RTP media 9 draft-ietf-rmcat-coupled-cc-07 11 Abstract 13 When multiple congestion controlled Real-time Transport Protocol 14 (RTP) sessions traverse the same network bottleneck, combining their 15 controls can improve the total on-the-wire behavior in terms of 16 delay, loss and fairness. This document describes such a method for 17 flows that have the same sender, in a way that is as flexible and 18 simple as possible while minimizing the amount of changes needed to 19 existing RTP applications. It specifies how to apply the method for 20 the Network-Assisted Dynamic Adaptation (NADA) congestion control 21 algorithm, and provides suggestions on how to apply it to other 22 congestion control algorithms. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on March 19, 2018. 41 Copyright Notice 43 Copyright (c) 2017 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 3. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 4. Architectural overview . . . . . . . . . . . . . . . . . . . 5 62 5. Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 63 5.1. SBD . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 64 5.2. FSE . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 65 5.3. Flows . . . . . . . . . . . . . . . . . . . . . . . . . . 8 66 5.3.1. Example algorithm 1 - Active FSE . . . . . . . . . . 9 67 5.3.2. Example algorithm 2 - Conservative Active FSE . . . . 10 68 6. Application . . . . . . . . . . . . . . . . . . . . . . . . . 11 69 6.1. NADA . . . . . . . . . . . . . . . . . . . . . . . . . . 11 70 6.2. General recommendations . . . . . . . . . . . . . . . . . 11 71 7. Expected feedback from experiments . . . . . . . . . . . . . 12 72 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 73 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 74 10. Security Considerations . . . . . . . . . . . . . . . . . . . 13 75 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 76 11.1. Normative References . . . . . . . . . . . . . . . . . . 13 77 11.2. Informative References . . . . . . . . . . . . . . . . . 14 78 Appendix A. Application to GCC . . . . . . . . . . . . . . . . . 15 79 Appendix B. Scheduling . . . . . . . . . . . . . . . . . . . . . 16 80 Appendix C. Example algorithm - Passive FSE . . . . . . . . . . 16 81 C.1. Example operation (passive) . . . . . . . . . . . . . . . 19 82 Appendix D. Change log . . . . . . . . . . . . . . . . . . . . . 23 83 D.1. draft-welzl-rmcat-coupled-cc . . . . . . . . . . . . . . 23 84 D.1.1. Changes from -00 to -01 . . . . . . . . . . . . . . . 23 85 D.1.2. Changes from -01 to -02 . . . . . . . . . . . . . . . 23 86 D.1.3. Changes from -02 to -03 . . . . . . . . . . . . . . . 23 87 D.1.4. Changes from -03 to -04 . . . . . . . . . . . . . . . 24 88 D.1.5. Changes from -04 to -05 . . . . . . . . . . . . . . . 24 89 D.2. draft-ietf-rmcat-coupled-cc . . . . . . . . . . . . . . . 24 90 D.2.1. Changes from draft-welzl-rmcat-coupled-cc-05 . . . . 24 91 D.2.2. Changes from -00 to -01 . . . . . . . . . . . . . . . 24 92 D.2.3. Changes from -01 to -02 . . . . . . . . . . . . . . . 24 93 D.2.4. Changes from -02 to -03 . . . . . . . . . . . . . . . 24 94 D.2.5. Changes from -03 to -04 . . . . . . . . . . . . . . . 24 95 D.2.6. Changes from -04 to -05 . . . . . . . . . . . . . . . 25 96 D.2.7. Changes from -05 to -06 . . . . . . . . . . . . . . . 25 97 D.2.8. Changes from -06 to -07 . . . . . . . . . . . . . . . 25 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 100 1. Introduction 102 When there is enough data to send, a congestion controller attempts 103 to increase its sending rate until the path's capacity has been 104 reached. Some controllers detect path capacity by increasing the 105 sending rate further, until packets are ECN-marked [RFC8087] or 106 dropped, and then decreasing the sending rate until that stops 107 happening. This process inevitably creates undesirable queuing delay 108 when multiple congestion-controlled connections traverse the same 109 network bottleneck, and each connection overshoots the path capacity 110 as it determines its sending rate. 112 The Congestion Manager (CM) [RFC3124] couples flows by providing a 113 single congestion controller. It is hard to implement because it 114 requires an additional congestion controller and removes all per- 115 connection congestion control functionality, which is quite a 116 significant change to existing RTP based applications. This document 117 presents a method to combine the behavior of congestion control 118 mechanisms that is easier to implement than the Congestion Manager 119 [RFC3124] and also requires less significant changes to existing RTP 120 based applications. It attempts to roughly approximate the CM 121 behavior by sharing information between existing congestion 122 controllers. It is able to honor user-specified priorities, which is 123 required by rtcweb [I-D.ietf-rtcweb-overview] [RFC7478]. 125 The described mechanisms are believed safe to use, but are 126 experimental and are presented for wider review and operational 127 evaluation. 129 2. Definitions 131 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 132 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 133 document are to be interpreted as described in RFC 2119 [RFC2119]. 135 Available Bandwidth: 136 The available bandwidth is the nominal link capacity minus the 137 amount of traffic that traversed the link during a certain time 138 interval, divided by that time interval. 140 Bottleneck: 141 The first link with the smallest available bandwidth along the 142 path between a sender and receiver. 144 Flow: 146 A flow is the entity that congestion control is operating on. 147 It could, for example, be a transport layer connection, or an 148 RTP stream [RFC7656], whether or not this RTP stream is 149 multiplexed onto an RTP session with other RTP streams. 151 Flow Group Identifier (FGI): 152 A unique identifier for each subset of flows that is limited by 153 a common bottleneck. 155 Flow State Exchange (FSE): 156 The entity that maintains information that is exchanged between 157 flows. 159 Flow Group (FG): 160 A group of flows having the same FGI. 162 Shared Bottleneck Detection (SBD): 163 The entity that determines which flows traverse the same 164 bottleneck in the network, or the process of doing so. 166 3. Limitations 168 Sender-side only: 169 Shared bottlenecks can exist when multiple flows originate from 170 the same sender, or when flows from different senders reach the 171 same receiver (see [I-D.ietf-rmcat-sbd], section 3). Coupled 172 congestion control as described here only supports the former 173 case, not the latter, as it operates inside a single host on 174 the sender side. 176 Shared bottlenecks do not change quickly: 177 As per the definition above, a bottleneck depends on cross 178 traffic, and since such traffic can heavily fluctuate, 179 bottlenecks can change at a high frequency (e.g., there can be 180 oscillation between two or more links). This means that, when 181 flows are partially routed along different paths, they may 182 quickly change between sharing and not sharing a bottleneck. 183 For simplicity, here it is assumed that a shared bottleneck is 184 valid for a time interval that is significantly longer than the 185 interval at which congestion controllers operate. Note that, 186 for the only SBD mechanism defined in this document 187 (multiplexing on the same five-tuple), the notion of a shared 188 bottleneck stays correct even in the presence of fast traffic 189 fluctuations: since all flows that are assumed to share a 190 bottleneck are routed in the same way, if the bottleneck 191 changes, it will still be shared. 193 4. Architectural overview 195 Figure 1 shows the elements of the architecture for coupled 196 congestion control: the Flow State Exchange (FSE), Shared Bottleneck 197 Detection (SBD) and Flows. The FSE is a storage element that can be 198 implemented in two ways: active and passive. In the active version, 199 it initiates communication with flows and SBD. However, in the 200 passive version, it does not actively initiate communication with 201 flows and SBD; its only active role is internal state maintenance 202 (e.g., an implementation could use soft state to remove a flow's data 203 after long periods of inactivity). Every time a flow's congestion 204 control mechanism would normally update its sending rate, the flow 205 instead updates information in the FSE and performs a query on the 206 FSE, leading to a sending rate that can be different from what the 207 congestion controller originally determined. Using information 208 about/from the currently active flows, SBD updates the FSE with the 209 correct Flow Group Identifiers (FGIs). 211 This document describes both active and passive versions. While the 212 passive algorithm works better for congestion controls with RTT- 213 independent convergence, it can still produce oscillations on short 214 time scales. The passive algorithm, described in Appendix C, is 215 therefore considered as highly experimental and not safe to deploy 216 outside of testbed environments. Figure 2 shows the interaction 217 between flows and the FSE, using the variable names defined in 218 Section 5.2. 220 ------- <--- Flow 1 221 | FSE | <--- Flow 2 .. 222 ------- <--- .. Flow N 223 ^ 224 | | 225 ------- | 226 | SBD | <-------| 227 ------- 229 Figure 1: Coupled congestion control architecture 231 Flow#1(cc) FSE Flow#2(cc) 232 ---------- --- ---------- 233 #1 JOIN ----register--> REGISTER 235 REGISTER <--register-- JOIN #1 237 #2 CC_R(1) ----UPDATE----> UPDATE (in) 239 #3 NEW RATE <---FSE_R(1)-- UPDATE (out) --FSE_R(2)-> #3 NEW RATE 241 Figure 2: Flow-FSE interaction 243 Since everything shown in Figure 1 is assumed to operate on a single 244 host (the sender) only, this document only describes aspects that 245 have an influence on the resulting on-the-wire behavior. It does 246 not, for instance, define how many bits must be used to represent 247 FGIs, or in which way the entities communicate. 249 Implementations can take various forms: for instance, all the 250 elements in the figure could be implemented within a single 251 application, thereby operating on flows generated by that application 252 only. Another alternative could be to implement both the FSE and SBD 253 together in a separate process which different applications 254 communicate with via some form of Inter-Process Communication (IPC). 255 Such an implementation would extend the scope to flows generated by 256 multiple applications. The FSE and SBD could also be included in the 257 Operating System kernel. However, only one type of coupling 258 algorithm should be used for all flows. Combinations of multiple 259 algorithms at different aggregation levels (e.g., the Operating 260 System coupling application aggregates with one algorithm, and 261 applications coupling their flows with another) have not been tested 262 and are therefore not recommended. 264 5. Roles 266 This section gives an overview of the roles of the elements of 267 coupled congestion control, and provides an example of how coupled 268 congestion control can operate. 270 5.1. SBD 272 SBD uses knowledge about the flows to determine which flows belong in 273 the same Flow Group (FG), and assigns FGIs accordingly. This 274 knowledge can be derived in three basic ways: 276 1. From multiplexing: it can be based on the simple assumption that 277 packets sharing the same five-tuple (IP source and destination 278 address, protocol, and transport layer port number pair) and 279 having the same values for the Differentiated Services Code Point 280 (DSCP) and the ECN field in the IP header are typically treated 281 in the same way along the path. This method is the only one 282 specified in this document: SBD MAY consider all flows that use 283 the same five-tuple, DSCP and ECN field value to belong to the 284 same FG. This classification applies to certain tunnels, or RTP 285 flows that are multiplexed over one transport (cf. 286 [transport-multiplex]). Such multiplexing is also a recommended 287 usage of RTP in rtcweb [rtcweb-rtp-usage]. 289 2. Via configuration: e.g. by assuming that a common wireless uplink 290 is also a shared bottleneck. 292 3. From measurements: e.g. by considering correlations among 293 measured delay and loss as an indication of a shared bottleneck. 295 The methods above have some essential trade-offs: e.g., multiplexing 296 is a completely reliable measure, however it is limited in scope to 297 two end points (i.e., it cannot be applied to couple congestion 298 controllers of one sender talking to multiple receivers). A 299 measurement-based SBD mechanism is described in [I-D.ietf-rmcat-sbd]. 300 Measurements can never be 100% reliable, in particular because they 301 are based on the past but applying coupled congestion control means 302 to make an assumption about the future; it is therefore recommended 303 to implement cautionary measures, e.g. by disabling coupled 304 congestion control if enabling it causes a significant increase in 305 delay and/or packet loss. Measurements also take time, which entails 306 a certain delay for turning on coupling (refer to 307 [I-D.ietf-rmcat-sbd] for details). Using system configuration to 308 decide about shared bottlenecks can be more efficient (faster to 309 obtain) than using measurements, but it relies on assumptions about 310 the network environment. 312 5.2. FSE 314 The FSE contains a list of all flows that have registered with it. 315 For each flow, it stores the following: 317 o a unique flow number f to identify the flow. 319 o the FGI of the FG that it belongs to (based on the definitions in 320 this document, a flow has only one bottleneck, and can therefore 321 be in only one FG). 323 o a priority P(f), which is a positive number, greater than zero. 325 o The rate used by the flow in bits per second, FSE_R(f). 327 Note that the absolute range of priorities does not matter: the 328 algorithm works with a flow's priority portion of the sum of all 329 priority values. For example, if there are two flows, flow 1 with 330 priority 1 and flow 2 with priority 2, the sum of the priorities is 331 3. Then, flow 1 will be assigned 1/3 of the aggregate sending rate 332 and flow 2 will be assigned 2/3 of the aggregate sending rate. 333 Priorities can be mapped to the "very-low", "low", "medium" or "high" 334 priority levels described in [I-D.ietf-rtcweb-transports] by simply 335 using the values 1, 2, 4 and 8, respectively. 337 In the FSE, each FG contains one static variable S_CR which is the 338 sum of the calculated rates of all flows in the same FG. This value 339 is used to calculate the sending rate. 341 The information listed here is enough to implement the sample flow 342 algorithm given below. FSE implementations could easily be extended 343 to store, e.g., a flow's current sending rate for statistics 344 gathering or future potential optimizations. 346 5.3. Flows 348 Flows register themselves with SBD and FSE when they start, 349 deregister from the FSE when they stop, and carry out an UPDATE 350 function call every time their congestion controller calculates a new 351 sending rate. Via UPDATE, they provide the newly calculated rate and 352 optionally (if the algorithm supports it) the desired rate. The 353 desired rate is less than the calculated rate in case of application- 354 limited flows; otherwise, it is the same as the calculated rate. 356 Below, two example algorithms are described. While other algorithms 357 could be used instead, the same algorithm must be applied to all 358 flows. Names of variables used in the algorithms are explained 359 below. 361 o CC_R(f) - The rate received from the congestion controller of flow 362 f when it calls UPDATE. 364 o FSE_R(f) - The rate calculated by the FSE for flow f. 366 o S_CR - The sum of the calculated rates of all flows in the same 367 FG; this value is used to calculate the sending rate. 369 o FG - A group of flows having the same FGI, and hence sharing the 370 same bottleneck. 372 o P(f) - The priority of flow f which is received from the flow's 373 congestion controller; the FSE uses this variable for calculating 374 FSE_R(f). 376 o S_P - The sum of all the priorities. 378 5.3.1. Example algorithm 1 - Active FSE 380 This algorithm was designed to be the simplest possible method to 381 assign rates according to the priorities of flows. Simulations 382 results in [fse] indicate that it does however not significantly 383 reduce queuing delay and packet loss. 385 (1) When a flow f starts, it registers itself with SBD and the FSE. 386 FSE_R(f) is initialized with the congestion controller's initial 387 rate. SBD will assign the correct FGI. When a flow is assigned 388 an FGI, it adds its FSE_R(f) to S_CR. 390 (2) When a flow f stops or pauses, its entry is removed from the 391 list. 393 (3) Every time the congestion controller of the flow f determines a 394 new sending rate CC_R(f), the flow calls UPDATE, which carries 395 out the tasks listed below to derive the new sending rates for 396 all the flows in the FG. A flow's UPDATE function uses a local 397 (i.e. per-flow) temporary variable S_P, which is the sum of all 398 the priorities. 400 (a) It updates S_CR. 402 S_CR = S_CR + CC_R(f) - FSE_R(f) 404 (b) It calculates the sum of all the priorities, S_P. 406 S_P = 0 407 for all flows i in FG do 408 S_P = S_P + P(i) 409 end for 411 (c) It calculates the sending rates for all the flows in an FG 412 and distributes them. 414 for all flows i in FG do 415 FSE_R(i) = (P(i)*S_CR)/S_P 416 send FSE_R(i) to the flow i 417 end for 419 5.3.2. Example algorithm 2 - Conservative Active FSE 421 This algorithm extends algorithm 1 to conservatively emulate the 422 behavior of a single flow by proportionally reducing the aggregate 423 rate on congestion. Simulations results in [fse] indicate that it 424 can significantly reduce queuing delay and packet loss. 426 (1) When a flow f starts, it registers itself with SBD and the FSE. 427 FSE_R(f) is initialized with the congestion controller's initial 428 rate. SBD will assign the correct FGI. When a flow is assigned 429 an FGI, it adds its FSE_R(f) to S_CR. 431 (2) When a flow f stops or pauses, its entry is removed from the 432 list. 434 (3) Every time the congestion controller of the flow f determines a 435 new sending rate CC_R(f), the flow calls UPDATE, which carries 436 out the tasks listed below to derive the new sending rates for 437 all the flows in the FG. A flow's UPDATE function uses a local 438 (i.e. per-flow) temporary variable S_P, which is the sum of all 439 the priorities, and a local variable DELTA, which is used to 440 calculate the difference between CC_R(f) and the previously 441 stored FSE_R(f). To prevent flows from either ignoring 442 congestion or overreacting, a timer keeps them from changing 443 their rates immediately after the common rate reduction that 444 follows a congestion event. This timer is set to 2 RTTs of the 445 flow that experienced congestion because it is assumed that a 446 congestion event can persist for up to one RTT of that flow, 447 with another RTT added to compensate for fluctuations in the 448 measured RTT value. 450 (a) It updates S_CR based on DELTA. 452 if Timer has expired or not set then 453 DELTA = CC_R(f) - FSE_R(f) 454 if DELTA < 0 then // Reduce S_CR proportionally 455 S_CR = S_CR * CC_R(f) / FSE_R(f) 456 Set Timer for 2 RTTs 457 else 458 S_CR = S_CR + DELTA 459 end if 460 end if 462 (b) It calculates the sum of all the priorities, S_P. 464 S_P = 0 465 for all flows i in FG do 466 S_P = S_P + P(i) 467 end for 469 (c) It calculates the sending rates for all the flows in an FG 470 and distributes them. 472 for all flows i in FG do 473 FSE_R(i) = (P(i)*S_CR)/S_P 474 send FSE_R(i) to the flow i 475 end for 477 6. Application 479 This section specifies how the FSE can be applied to specific 480 congestion control mechanisms and makes general recommendations that 481 facilitate applying the FSE to future congestion controls. 483 6.1. NADA 485 Network-Assisted Dynamic Adapation (NADA) [I-D.ietf-rmcat-nada] is a 486 congestion control scheme for rtcweb. It calculates a reference rate 487 r_ref upon receiving an acknowledgment, and then, based on the 488 reference rate, it calculates a video target rate r_vin and a sending 489 rate for the flows, r_send. 491 When applying the FSE to NADA, the UPDATE function call described in 492 Section 5.3 gives the FSE NADA's reference rate r_ref. The 493 recommended algorithm for NADA is the Active FSE in Section 5.3.1. 494 In step 3 (c), when the FSE_R(i) is "sent" to the flow i, this means 495 updating r_ref(r_vin and r_send) of flow i with the value of 496 FSE_R(i). 498 6.2. General recommendations 500 This section provides general advice for applying the FSE to 501 congestion control mechanisms. 503 Receiver-side calculations: 504 When receiver-side calculations make assumptions about the rate 505 of the sender, the calculations need to be synchronized or the 506 receiver needs to be updated accordingly. This applies to TFRC 507 [RFC5348], for example, where simulations showed somewhat less 508 favorable results when using the FSE without a receiver-side 509 change [fse]. 511 Stateful algorithms: 512 When a congestion control algorithm is stateful (e.g., TCP, 513 with Slow Start, Congestion Avoidance and Fast Recovery), these 514 states should be carefully considered such that the overall 515 state of the aggregate flow is correct. This may require 516 sharing more information in the UPDATE call. 518 Rate jumps: 519 The FSE-based coupling algorithms can let a flow quickly 520 increase its rate to its fair share, e.g. when a new flow joins 521 or after a quiescent period. In case of window-based 522 congestion controls, this may produce a burst which should be 523 mitigated in some way. An example of how this could be done 524 without using a timer is presented in [anrw2016], using TCP as 525 an example. 527 7. Expected feedback from experiments 529 The algorithm described in this memo has so far been evaluated using 530 simulations covering all the tests for more than one flow from 531 [I-D.ietf-rmcat-eval-test] (see [IETF-93], [IETF-94]). Experiments 532 should confirm these results using at least the NADA congestion 533 control algorithm with real-life code (e.g., browsers communicating 534 over an emulated network covering the conditions in 535 [I-D.ietf-rmcat-eval-test]. The tests with real-life code should be 536 repeated afterwards in real network environments and monitored. 537 Experiments should investigate cases where the media coder's output 538 rate is below the rate that is calculated by the coupling algorithm 539 (FSE_R(i) in algorithms 1 and 2, section 5.3). Implementers and 540 testers are invited to document their findings in an Internet draft. 542 8. Acknowledgements 544 This document has benefitted from discussions with and feedback from 545 Andreas Petlund, Anna Brunstrom, Colin Perkins, David Hayes, David 546 Ros (who also gave the FSE its name), Ingemar Johansson, Karen 547 Nielsen, Kristian Hiorth, Mirja Kuehlewind, Martin Stiemerling, 548 Spencer Dawkins, Varun Singh, Xiaoqing Zhu, and Zaheduzzaman Sarker. 549 The authors would like to especially thank Xiaoqing Zhu and Stefan 550 Holmer for helping with NADA and GCC. 552 This work was partially funded by the European Community under its 553 Seventh Framework Programme through the Reducing Internet Transport 554 Latency (RITE) project (ICT-317700). 556 9. IANA Considerations 558 This memo includes no request to IANA. 560 10. Security Considerations 562 In scenarios where the architecture described in this document is 563 applied across applications, various cheating possibilities arise: 564 e.g., supporting wrong values for the calculated rate, the desired 565 rate, or the priority of a flow. In the worst case, such cheating 566 could either prevent other flows from sending or make them send at a 567 rate that is unreasonably large. The end result would be unfair 568 behavior at the network bottleneck, akin to what could be achieved 569 with any UDP based application. Hence, since this is no worse than 570 UDP in general, there seems to be no significant harm in using this 571 in the absence of UDP rate limiters. 573 In the case of a single-user system, it should also be in the 574 interest of any application programmer to give the user the best 575 possible experience by using reasonable flow priorities or even 576 letting the user choose them. In a multi-user system, this interest 577 may not be given, and one could imagine the worst case of an "arms 578 race" situation, where applications end up setting their priorities 579 to the maximum value. If all applications do this, the end result is 580 a fair allocation in which the priority mechanism is implicitly 581 eliminated, and no major harm is done. 583 Implementers should also be aware of the Security Considerations 584 sections of [RFC3124], [RFC5348], and [RFC7478]. 586 11. References 588 11.1. Normative References 590 [I-D.ietf-rmcat-nada] 591 Zhu, X., Pan, R., Ramalho, M., Cruz, S., Jones, P., Fu, 592 J., and S. D'Aronco, "NADA: A Unified Congestion Control 593 Scheme for Real-Time Media", draft-ietf-rmcat-nada-04 594 (work in progress), March 2017. 596 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 597 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 598 RFC2119, March 1997, . 601 [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", 602 RFC 3124, DOI 10.17487/RFC3124, June 2001, 603 . 605 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 606 Friendly Rate Control (TFRC): Protocol Specification", RFC 607 5348, DOI 10.17487/RFC5348, September 2008, 608 . 610 11.2. Informative References 612 [anrw2016] 613 Islam, S. and M. Welzl, "Start Me Up:Determining and 614 Sharing TCP's Initial Congestion Window", ACM, IRTF, ISOC 615 Applied Networking Research Workshop 2016 (ANRW 2016) , 616 2016. 618 [fse] Islam, S., Welzl, M., Gjessing, S., and N. Khademi, 619 "Coupled Congestion Control for RTP Media", ACM SIGCOMM 620 Capacity Sharing Workshop (CSWS 2014) and ACM SIGCOMM CCR 621 44(4) 2014; extended version available as a technical 622 report from 623 http://safiquli.at.ifi.uio.no/paper/fse-tech-report.pdf , 624 2014. 626 [fse-noms] 627 Islam, S., Welzl, M., Hayes, D., and S. Gjessing, 628 "Managing Real-Time Media Flows through a Flow State 629 Exchange", IEEE NOMS 2016, Istanbul, Turkey , 2016. 631 [I-D.ietf-rmcat-eval-test] 632 Sarker, Z., Singh, V., Zhu, X., and M. Ramalho, "Test 633 Cases for Evaluating RMCAT Proposals", draft-ietf-rmcat- 634 eval-test-05 (work in progress), April 2017. 636 [I-D.ietf-rmcat-gcc] 637 Holmer, S., Lundin, H., Carlucci, G., Cicco, L., and S. 638 Mascolo, "A Google Congestion Control Algorithm for Real- 639 Time Communication", draft-ietf-rmcat-gcc-02 (work in 640 progress), July 2016. 642 [I-D.ietf-rmcat-sbd] 643 Hayes, D., Ferlin, S., Welzl, M., and K. Hiorth, "Shared 644 Bottleneck Detection for Coupled Congestion Control for 645 RTP Media.", draft-ietf-rmcat-sbd-08 (work in progress), 646 July 2017. 648 [I-D.ietf-rtcweb-overview] 649 Alvestrand, H., "Overview: Real Time Protocols for 650 Browser-based Applications", draft-ietf-rtcweb-overview-18 651 (work in progress), March 2017. 653 [I-D.ietf-rtcweb-transports] 654 Alvestrand, H., "Transports for WebRTC", Internet-draft 655 draft-ietf-rtcweb-transports-17.txt, October 2016. 657 [IETF-93] Islam, S., Welzl, M., and S. Gjessing, "Updates on Coupled 658 Congestion Control for RTP Media", July 2015, 659 . 661 [IETF-94] Islam, S., Welzl, M., and S. Gjessing, "Updates on Coupled 662 Congestion Control for RTP Media", November 2015, 663 . 665 [RFC7478] Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- 666 Time Communication Use Cases and Requirements", RFC 7478, 667 DOI 10.17487/RFC7478, March 2015, . 670 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 671 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 672 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 673 DOI 10.17487/RFC7656, November 2015, . 676 [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using 677 Explicit Congestion Notification (ECN)", RFC 8087, DOI 678 10.17487/RFC8087, March 2017, . 681 [rtcweb-rtp-usage] 682 Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time 683 Communication (WebRTC): Media Transport and Use of RTP", 684 Internet-draft draft-ietf-rtcweb-rtp-usage-26.txt, March 685 2016. 687 [transport-multiplex] 688 Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a 689 Single Lower-Layer Transport", Internet-draft draft- 690 westerlund-avtcore-transport-multiplexing-07.txt, October 691 2013. 693 Appendix A. Application to GCC 695 Google Congestion Control (GCC) [I-D.ietf-rmcat-gcc] is another 696 congestion control scheme for RTP flows that is under development. 697 GCC is not yet finalised, but at the time of this writing, the rate 698 control of GCC employs two parts: controlling the bandwidth estimate 699 based on delay, and controlling the bandwidth estimate based on loss. 700 Both are designed to estimate the available bandwidth, A_hat. 702 When applying the FSE to GCC, the UPDATE function call described in 703 Section 5.3 gives the FSE GCC's estimate of available bandwidth 704 A_hat. The recommended algorithm for GCC is the Active FSE in 705 Section 5.3.1. In step 3 (c), when the FSE_R(i) is "sent" to the 706 flow i, this means updating A_hat of flow i with the value of 707 FSE_R(i). 709 Appendix B. Scheduling 711 When flows originate from the same host, it would be possible to use 712 only one single sender-side congestion controller which determines 713 the overall allowed sending rate, and then use a local scheduler to 714 assign a proportion of this rate to each RTP session. This way, 715 priorities could also be implemented as a function of the scheduler. 716 The Congestion Manager (CM) [RFC3124] also uses such a scheduling 717 function. 719 Appendix C. Example algorithm - Passive FSE 721 Active algorithms calculate the rates for all the flows in the FG and 722 actively distribute them. In a passive algorithm, UPDATE returns a 723 rate that should be used instead of the rate that the congestion 724 controller has determined. This can make a passive algorithm easier 725 to implement; however, when round-trip times of flows are unequal, 726 shorter-RTT flows may (depending on the congestion control algorithm) 727 update and react to the overall FSE state more often than longer-RTT 728 flows, which can produce unwanted side effects. This problem is more 729 significant when the congestion control convergence depends on the 730 RTT. While the passive algorithm works better for congestion 731 controls with RTT-independent convergence, it can still produce 732 oscillations on short time scales. The algorithm described below is 733 therefore considered as highly experimental and not safe to deploy 734 outside of testbed environments. Results of a simplified passive FSE 735 algorithm with both NADA and GCC can be found in [fse-noms]. 737 This passive version of the FSE stores the following information in 738 addition to the variables described in Section 5.2: 740 o The desired rate DR(f) of flow f. This can be smaller than the 741 calculated rate if the application feeding into the flow has less 742 data to send than the congestion controller would allow. In case 743 of a bulk transfer, DR(f) must be set to CC_R(f) received from the 744 congestion module of flow f. 746 The passive version of the FSE contains one static variable per FG 747 called TLO (Total Leftover Rate -- used to let a flow 'take' 748 bandwidth from application-limited or terminated flows) which is 749 initialized to 0. For the passive version, S_CR is limited to 750 increase or decrease as conservatively as a flow's congestion 751 controller decides in order to prohibit sudden rate jumps. 753 (1) When a flow f starts, it registers itself with SBD and the FSE. 754 FSE_R(f) and DR(f) are initialized with the congestion 755 controller's initial rate. SBD will assign the correct FGI. 756 When a flow is assigned an FGI, it adds its FSE_R(f) to S_CR. 758 (2) When a flow f stops or pauses, it sets its DR(f) to 0 and sets 759 P(f) to -1. 761 (3) Every time the congestion controller of the flow f determines a 762 new sending rate CC_R(f), assuming the flow's new desired rate 763 new_DR(f) to be "infinity" in case of a bulk data transfer with 764 an unknown maximum rate, the flow calls UPDATE, which carries 765 out the tasks listed below to derive the flow's new sending 766 rate, Rate(f). A flow's UPDATE function uses a few local (i.e. 767 per-flow) temporary variables, which are all initialized to 0: 768 DELTA, new_S_CR and S_P. 770 (a) For all the flows in its FG (including itself), it 771 calculates the sum of all the calculated rates, new_S_CR. 772 Then it calculates DELTA: the difference between FSE_R(f) 773 and CC_R(f). 775 for all flows i in FG do 776 new_S_CR = new_S_CR + FSE_R(i) 777 end for 778 DELTA = CC_R(f) - FSE_R(f) 780 (b) It updates S_CR, FSE_R(f) and DR(f). 782 FSE_R(f) = CC_R(f) 783 if DELTA > 0 then // the flow's rate has increased 784 S_CR = S_CR + DELTA 785 else if DELTA < 0 then 786 S_CR = new_S_CR + DELTA 787 end if 788 DR(f) = min(new_DR(f),FSE_R(f)) 790 (c) It calculates the leftover rate TLO, removes the terminated 791 flows from the FSE and calculates the sum of all the 792 priorities, S_P. 794 for all flows i in FG do 795 if P(i)<0 then 796 delete flow 797 else 798 S_P = S_P + P(i) 799 end if 800 end for 801 if DR(f) < FSE_R(f) then 802 TLO = TLO + (P(f)/S_P) * S_CR - DR(f)) 803 end if 805 (d) It calculates the sending rate, Rate(f). 807 Rate(f) = min(new_DR(f), (P(f)*S_CR)/S_P + TLO) 809 if Rate(f) != new_DR(f) and TLO > 0 then 810 TLO = 0 // f has 'taken' TLO 811 end if 813 (e) It updates DR(f) and FSE_R(f) with Rate(f). 815 if Rate(f) > DR(f) then 816 DR(f) = Rate(f) 817 end if 818 FSE_R(f) = Rate(f) 820 The goals of the flow algorithm are to achieve prioritization, 821 improve network utilization in the face of application-limited flows, 822 and impose limits on the increase behavior such that the negative 823 impact of multiple flows trying to increase their rate together is 824 minimized. It does that by assigning a flow a sending rate that may 825 not be what the flow's congestion controller expected. It therefore 826 builds on the assumption that no significant inefficiencies arise 827 from temporary application-limited behavior or from quickly jumping 828 to a rate that is higher than the congestion controller intended. 829 How problematic these issues really are depends on the controllers in 830 use and requires careful per-controller experimentation. The coupled 831 congestion control mechanism described here also does not require all 832 controllers to be equal; effects of heterogeneous controllers, or 833 homogeneous controllers being in different states, are also subject 834 to experimentation. 836 This algorithm gives all the leftover rate of application-limited 837 flows to the first flow that updates its sending rate, provided that 838 this flow needs it all (otherwise, its own leftover rate can be taken 839 by the next flow that updates its rate). Other policies could be 840 applied, e.g. to divide the leftover rate of a flow equally among all 841 other flows in the FGI. 843 C.1. Example operation (passive) 845 In order to illustrate the operation of the passive coupled 846 congestion control algorithm, this section presents a toy example of 847 two flows that use it. Let us assume that both flows traverse a 848 common 10 Mbit/s bottleneck and use a simplistic congestion 849 controller that starts out with 1 Mbit/s, increases its rate by 1 850 Mbit/s in the absence of congestion and decreases it by 2 Mbit/s in 851 the presence of congestion. For simplicity, flows are assumed to 852 always operate in a round-robin fashion. Rate numbers below without 853 units are assumed to be in Mbit/s. For illustration purposes, the 854 actual sending rate is also shown for every flow in FSE diagrams even 855 though it is not really stored in the FSE. 857 Flow #1 begins. It is a bulk data transfer and considers itself to 858 have top priority. This is the FSE after the flow algorithm's step 859 1: 861 ---------------------------------------- 862 | # | FGI | P | FSE_R | DR | Rate | 863 | | | | | | | 864 | 1 | 1 | 1 | 1 | 1 | 1 | 865 ---------------------------------------- 866 S_CR = 1, TLO = 0 868 Its congestion controller gradually increases its rate. Eventually, 869 at some point, the FSE should look like this: 871 ----------------------------------------- 872 | # | FGI | P | FSE_R | DR | Rate | 873 | | | | | | | 874 | 1 | 1 | 1 | 10 | 10 | 10 | 875 ----------------------------------------- 876 S_CR = 10, TLO = 0 878 Now another flow joins. It is also a bulk data transfer, and has a 879 lower priority (0.5): 881 ------------------------------------------ 882 | # | FGI | P | FSE_R | DR | Rate | 883 | | | | | | | 884 | 1 | 1 | 1 | 10 | 10 | 10 | 885 | 2 | 1 | 0.5 | 1 | 1 | 1 | 886 ------------------------------------------ 887 S_CR = 11, TLO = 0 889 Now assume that the first flow updates its rate to 8, because the 890 total sending rate of 11 exceeds the total capacity. Let us take a 891 closer look at what happens in step 3 of the flow algorithm. 893 CC_R(1) = 8. new_DR(1) = infinity. 894 3 a) new_S_CR = 11; DELTA = 8 - 10 = -2. 895 3 b) FSE_R(1) = 8. DELTA is negative, hence S_CR = 9; 896 DR(1) = 8. 897 3 c) S_P = 1.5. 898 3 d) new sending rate Rate(1) = min(infinity, 1/1.5 * 9 + 0) = 6. 899 3 e) FSE_R(1) = 6. 901 The resulting FSE looks as follows: 902 ------------------------------------------- 903 | # | FGI | P | FSE_R | DR | Rate | 904 | | | | | | | 905 | 1 | 1 | 1 | 6 | 8 | 6 | 906 | 2 | 1 | 0.5 | 1 | 1 | 1 | 907 ------------------------------------------- 908 S_CR = 9, TLO = 0 910 The effect is that flow #1 is sending with 6 Mbit/s instead of the 8 911 Mbit/s that the congestion controller derived. Let us now assume 912 that flow #2 updates its rate. Its congestion controller detects 913 that the network is not fully saturated (the actual total sending 914 rate is 6+1=7) and increases its rate. 916 CC_R(2) = 2. new_DR(2) = infinity. 917 3 a) new_S_CR = 7; DELTA = 2 - 1 = 1. 918 3 b) FSE_R(2) = 2. DELTA is positive, hence S_CR = 9 + 1 = 10; 919 DR(2) = 2. 920 3 c) S_P = 1.5. 921 3 d) Rate(2) = min(infinity, 0.5/1.5 * 10 + 0) = 3.33. 922 3 e) DR(2) = FSE_R(2) = 3.33. 924 The resulting FSE looks as follows: 925 ------------------------------------------- 926 | # | FGI | P | FSE_R | DR | Rate | 927 | | | | | | | 928 | 1 | 1 | 1 | 6 | 8 | 6 | 929 | 2 | 1 | 0.5 | 3.33 | 3.33 | 3.33 | 930 ------------------------------------------- 931 S_CR = 10, TLO = 0 933 The effect is that flow #2 is now sending with 3.33 Mbit/s, which is 934 close to half of the rate of flow #1 and leads to a total utilization 935 of 6(#1) + 3.33(#2) = 9.33 Mbit/s. Flow #2's congestion controller 936 has increased its rate faster than the controller actually expected. 937 Now, flow #1 updates its rate. Its congestion controller detects 938 that the network is not fully saturated and increases its rate. 939 Additionally, the application feeding into flow #1 limits the flow's 940 sending rate to at most 2 Mbit/s. 942 CC_R(1) = 7. new_DR(1) = 2. 943 3 a) new_S_CR = 9.33; DELTA = 1. 944 3 b) FSE_R(1) = 7, DELTA is positive, hence S_CR = 10 + 1 = 11; 945 DR(1) = min(2, 7) = 2. 946 3 c) S_P = 1.5; DR(1) < FSE_R(1), hence TLO = 1/1.5 * 11 - 2 = 5.33. 947 3 d) Rate(1) = min(2, 1/1.5 * 11 + 5.33) = 2. 948 3 e) FSE_R(1) = 2. 950 The resulting FSE looks as follows: 951 ------------------------------------------- 952 | # | FGI | P | FSE_R | DR | Rate | 953 | | | | | | | 954 | 1 | 1 | 1 | 2 | 2 | 2 | 955 | 2 | 1 | 0.5 | 3.33 | 3.33 | 3.33 | 956 ------------------------------------------- 957 S_CR = 11, TLO = 5.33 958 Now, the total rate of the two flows is 2 + 3.33 = 5.33 Mbit/s, i.e. 959 the network is significantly underutilized due to the limitation of 960 flow #1. Flow #2 updates its rate. Its congestion controller 961 detects that the network is not fully saturated and increases its 962 rate. 964 CC_R(2) = 4.33. new_DR(2) = infinity. 965 3 a) new_S_CR = 5.33; DELTA = 1. 966 3 b) FSE_R(2) = 4.33. DELTA is positive, hence S_CR = 12; 967 DR(2) = 4.33. 968 3 c) S_P = 1.5. 969 3 d) Rate(2) = min(infinity, 0.5/1.5 * 12 + 5.33 ) = 9.33. 970 3 e) FSE_R(2) = 9.33, DR(2) = 9.33. 972 The resulting FSE looks as follows: 973 ------------------------------------------- 974 | # | FGI | P | FSE_R | DR | Rate | 975 | | | | | | | 976 | 1 | 1 | 1 | 2 | 2 | 2 | 977 | 2 | 1 | 0.5 | 9.33 | 9.33 | 9.33 | 978 ------------------------------------------- 979 S_CR = 12, TLO = 0 981 Now, the total rate of the two flows is 2 + 9.33 = 11.33 Mbit/s. 982 Finally, flow #1 terminates. It sets P(1) to -1 and DR(1) to 0. Let 983 us assume that it terminated late enough for flow #2 to still 984 experience the network in a congested state, i.e. flow #2 decreases 985 its rate in the next iteration. 987 CC_R(2) = 7.33. new_DR(2) = infinity. 988 3 a) new_S_CR = 11.33; DELTA = -2. 989 3 b) FSE_R(2) = 7.33. DELTA is negative, hence S_CR = 9.33; 990 DR(2) = 7.33. 991 3 c) Flow 1 has P(1) = -1, hence it is deleted from the FSE. 992 S_P = 0.5. 993 3 d) Rate(2) = min(infinity, 0.5/0.5*9.33 + 0) = 9.33. 994 3 e) FSE_R(2) = DR(2) = 9.33. 996 The resulting FSE looks as follows: 997 ------------------------------------------- 998 | # | FGI | P | FSE_R | DR | Rate | 999 | | | | | | | 1000 | 2 | 1 | 0.5 | 9.33 | 9.33 | 9.33 | 1001 ------------------------------------------- 1002 S_CR = 9.33, TLO = 0 1004 Appendix D. Change log 1006 D.1. draft-welzl-rmcat-coupled-cc 1008 D.1.1. Changes from -00 to -01 1010 o Added change log. 1012 o Updated the example algorithm and its operation. 1014 D.1.2. Changes from -01 to -02 1016 o Included an active version of the algorithm which is simpler. 1018 o Replaced "greedy flow" with "bulk data transfer" and "non-greedy" 1019 with "application-limited". 1021 o Updated new_CR to CC_R, and CR to FSE_R for better understanding. 1023 D.1.3. Changes from -02 to -03 1025 o Included an active conservative version of the algorithm which 1026 reduces queue growth and packet loss; added a reference to a 1027 technical report that shows these benefits with simulations. 1029 o Moved the passive variant of the algorithm to appendix. 1031 D.1.4. Changes from -03 to -04 1033 o Extended SBD section. 1035 o Added a note about window-based controllers. 1037 D.1.5. Changes from -04 to -05 1039 o Added a section about applying the FSE to specific congestion 1040 control algorithms, with a subsection specifying its use with 1041 NADA. 1043 D.2. draft-ietf-rmcat-coupled-cc 1045 D.2.1. Changes from draft-welzl-rmcat-coupled-cc-05 1047 o Moved scheduling section to the appendix. 1049 D.2.2. Changes from -00 to -01 1051 o Included how to apply the algorithm to GCC. 1053 o Updated variable names of NADA to be in line with the latest 1054 version. 1056 o Added a reference to [I-D.ietf-rtcweb-transports] to make a 1057 connection to the prioritization text there. 1059 D.2.3. Changes from -01 to -02 1061 o Minor changes. 1063 o Moved references of NADA and GCC from informative to normative. 1065 o Added a reference for the passive variant of the algorithm. 1067 D.2.4. Changes from -02 to -03 1069 o Minor changes. 1071 o Added a section about expected feedback from experiments. 1073 D.2.5. Changes from -03 to -04 1075 o Described the names of variables used in the algorithms. 1077 o Added a diagram to illustrate the interaction between flows and 1078 the FSE. 1080 o Added text on the trade-off of using the configuration based 1081 approach. 1083 o Minor changes to enhance the readability. 1085 D.2.6. Changes from -04 to -05 1087 o Changed several occurrences of "NADA and GCC" to "NADA", including 1088 the abstract. 1090 o Moved the application to GCC to an appendix, and made the GCC 1091 reference informative. 1093 o Provided a few more general recommendations on applying the 1094 coupling algorithm. 1096 D.2.7. Changes from -05 to -06 1098 o Incorporated comments by Colin Perkins. 1100 D.2.8. Changes from -06 to -07 1102 o Addressed OPSDIR, SECDIR, GENART, AD and IESG comments. 1104 Authors' Addresses 1106 Safiqul Islam 1107 University of Oslo 1108 PO Box 1080 Blindern 1109 Oslo N-0316 1110 Norway 1112 Phone: +47 22 84 08 37 1113 Email: safiquli@ifi.uio.no 1115 Michael Welzl 1116 University of Oslo 1117 PO Box 1080 Blindern 1118 Oslo N-0316 1119 Norway 1121 Phone: +47 22 85 24 20 1122 Email: michawe@ifi.uio.no 1123 Stein Gjessing 1124 University of Oslo 1125 PO Box 1080 Blindern 1126 Oslo N-0316 1127 Norway 1129 Phone: +47 22 85 24 44 1130 Email: steing@ifi.uio.no