idnits 2.17.1 draft-zhu-rmcat-nada-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** There is 1 instance of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 13, 2014) is 3669 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC6817' is mentioned on line 315, but not defined == Unused Reference: 'RFC3168' is defined on line 548, but no explicit reference was found in the text == Unused Reference: 'RFC6187' is defined on line 559, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group X. Zhu 3 Internet Draft R. Pan 4 Intended Status: Informational Cisco Systems 5 Expires: September 14, 2014 March 13, 2014 7 NADA: A Unified Congestion Control Scheme for Real-Time Media 8 draft-zhu-rmcat-nada-03 10 Abstract 12 This document describes a scheme named network-assisted dynamic 13 adaptation (NADA), a novel congestion control approach for 14 interactive real-time media applications, such as video conferencing. 15 In the proposed scheme, the sender regulates its sending rate based 16 on either implicit or explicit congestion signaling, in a unified 17 approach. The scheme can benefit from explicit congestion 18 notification (ECN) markings from network nodes. It also maintains 19 consistent sender behavior in the absence of such markings, by 20 reacting to queuing delays and packet losses instead. 22 We present here the overall system architecture, recommended 23 behaviors at the sender and the receiver, as well as expected network 24 node operations. Results from extensive simulation studies of the 25 proposed scheme are available upon request. 27 Status of this Memo 29 This Internet-Draft is submitted to IETF in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF), its areas, and its working groups. Note that 34 other groups may also distribute working documents as 35 Internet-Drafts. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 The list of current Internet-Drafts can be accessed at 43 http://www.ietf.org/1id-abstracts.html 45 The list of Internet-Draft Shadow Directories can be accessed at 46 http://www.ietf.org/shadow.html 48 Copyright and License Notice 50 Copyright (c) 2012 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 3. System Model . . . . . . . . . . . . . . . . . . . . . . . . . 3 68 4. Network Node Operations . . . . . . . . . . . . . . . . . . . . 4 69 4.1 Default behavior of drop tail . . . . . . . . . . . . . . . 4 70 4.2 ECN marking . . . . . . . . . . . . . . . . . . . . . . . . 4 71 4.3 PCN marking . . . . . . . . . . . . . . . . . . . . . . . . 5 72 4.4 Comments and Discussions . . . . . . . . . . . . . . . . . . 6 73 5. Receiver Behavior . . . . . . . . . . . . . . . . . . . . . . . 6 74 5.1 Monitoring per-packet statistics . . . . . . . . . . . . . . 6 75 5.2 Calculating time-smoothed values . . . . . . . . . . . . . . 7 76 5.3 Sending periodic feedback . . . . . . . . . . . . . . . . . 7 77 5.4 Discussions on one-way delay measurements . . . . . . . . . 7 78 6. Sender Behavior . . . . . . . . . . . . . . . . . . . . . . . . 8 79 6.1 Video encoder rate control . . . . . . . . . . . . . . . . . 9 80 6.2 Rate shaping buffer . . . . . . . . . . . . . . . . . . . . 9 81 6.3 Reference rate calculator . . . . . . . . . . . . . . . . . 9 82 6.4 Video target rate and sending rate calculator . . . . . . . 11 83 6.5 Slow-start behavior . . . . . . . . . . . . . . . . . . . . 11 84 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 12 85 8. Implementation Status . . . . . . . . . . . . . . . . . . . . . 12 86 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 12 87 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 88 10.1 Normative References . . . . . . . . . . . . . . . . . . . 12 89 10.2 Informative References . . . . . . . . . . . . . . . . . . 12 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 92 1. Introduction 94 Interactive real-time media applications introduce a unique set of 95 challenges for congestion control. Unlike TCP, the mechanism used for 96 real-time media needs to adapt fast to instantaneous bandwidth 97 changes, accommodate fluctuations in the output of video encoder rate 98 control, and cause low queuing delay over the network. An ideal 99 scheme should also make effective use of all types of congestion 100 signals, including packet losses, queuing delay, and explicit 101 congestion notification (ECN) markings. 103 Based on the above considerations, we present a scheme named network- 104 assisted dynamic adaptation (NADA). The proposed design benefits from 105 explicit congestion control signals (e.g., ECN markings) from the 106 network, and remains compatible in the presence of implicit signals 107 (delay or loss) only. In addition, it supports weighted bandwidth 108 sharing among competing video flows. 110 This documentation describes the overall system architecture, 111 recommended designs at the sender and receiver, as well as expected 112 network nodes operations. The signaling mechanism consists of 113 standard RTP timestamp [RFC3550] and standard RTCP feedback reports. 115 2. Terminology 117 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 118 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 119 document are to be interpreted as described in RFC 2119 [RFC2119]. 121 3. System Model 123 The system consists of the following elements: 125 * Incoming media stream, in the form of consecutive raw video 126 frames and audio samples; 128 * Media encoder with rate control capabilities. It takes the 129 incoming media stream and encodes it to an RTP stream at a 130 target bit rate R_v. Note that the actual output rate from the 131 encoder R_o may fluctuate randomly around the target R_v. Also, 132 the encoder can only change its rate at rather coarse time 133 intervals, e.g., once every 0.5 seconds. 135 * RTP sender, responsible for calculating the target bit rate 136 R_n based on network congestion signals (delay or ECN marking 137 reports from the receiver), and for regulating the actual 138 sending rate R_s accordingly. A rate shaping buffer is employed 139 to absorb the instantaneous difference between video encoder 140 output rate R_v and sending rate R_s. The buffer size L_s, 141 together with R_n, influences the calculation of actual sending 142 rate R_s and video encoder target rate R_v. The RTP sender also 143 generates RTP timestamp in outgoing packets. 145 * RTP receiver, responsible for measuring and estimating end-to- 146 end delay d based on sender RTP timestamp. In the presence of 147 packet losses and ECN markings, it also records the individual 148 loss and marking events, and calculates the equivalent delay 149 d_tilde that accounts for queuing delay, ECN marking, and packet 150 losses. The receiver feeds such statistics back to the sender 151 via periodic RTCP reports. 153 * Network node, with several modes of operation. The system can 154 work with the default behavior of a simple drop tail queue. It 155 can also benefit from advanced AQM features such as RED-based 156 ECN marking, and PCN marking using a token bucket algorithm. 158 In the following, we will elaborate on the respective operations at the 159 network node, the receiver, and the sender. 161 4. Network Node Operations 163 We consider three variations of queue management behavior at the network 164 node, leading to either implicit or explicit congestion signals. 166 4.1 Default behavior of drop tail 168 In conventional network with drop tail or RED queues, congestion is 169 inferred from the estimation of end-to-end delay. No special action is 170 required at network node. 172 Packet drops at the queue are detected at the receiver, and contributes 173 to the calculation of the equivalent delay d_tilde. 175 4.2 ECN marking 177 In this mode, the network node randomly marks the ECN field in the IP 178 packet header following the Random Early Detection (RED) algorithm 179 [RFC2309]. Calculation of the marking probability involves the following 180 steps: 182 * upon packet arrival, update smoothed queue size q_avg as: 184 q_avg = alpha*q + (1-alpha)*q_avg. 186 The smoothing parameter alpha is a value between 0 and 1. A value of 187 alpha=1 corresponds to performing no smoothing at all. 189 * calculate marking probability p as: 191 p = 0, if q < q_lo; 193 q_avg - q_lo 194 p = p_max*--------------, if q_lo <= q < q_hi; 195 q_hi - q_lo 197 p = 1, if q >= q_hi. 199 Here, q_lo and q_hi corresponds to the low and high thresholds of queue 200 occupancy. The maximum parking probability is p_max. 202 The ECN markings events will contribute to the calculation of an 203 equivalent delay d_tilde at the receiver. No changes are required at the 204 sender. 206 4.3 PCN marking 208 As a more advanced feature, we also envision network nodes which support 209 PCN marking based on virtual queues. In such a case, the marking 210 probability of the ECN bit in the IP packet header is calculated as 211 follows: 213 * upon packet arrival, meter packet against token bucket (r,b); 215 * update token level b_tk; 217 * calculate the marking probability as: 219 p = 0, if b-b_tk < b_lo; 221 b-b_tk-b_lo 222 p = p_max* --------------, if b_lo<= b-b_tk =b_hi. 227 Here, the token bucket lower and upper limits are denoted by b_lo and 228 b_hi, respectively. The parameter b indicates the size of the token 229 bucket. The parameter r is chosen as r=gamma*C, where gamma<1 is the 230 target utilization ratio and C designates link capacity. The maximum 231 marking probability is p_max. 233 The ECN markings events will contribute to the calculation of an 234 equivalent delay d_tilde at the receiver. No changes are required at the 235 sender. The virtual queuing mechanism from the PCN marking algorithm 236 will lead to additional benefits such as zero standing queues. 238 4.4 Comments and Discussions 240 In all three flavors described above, the network queue operates with 241 the simple first-in-first-out (FIFO) principle. There is no need to 242 maintain per-flow state. Such a simple design ensures that the system 243 can scale easily with large number of video flows and high link 244 capacity. 246 The sender behavior stays the same in the presence of all types of 247 congestion signals: delay, loss, ECN marking due to either RED/ECN or 248 PCN algorithms. This unified approach allows a graceful transition of 249 the scheme as the level of congestion in the network shifts dynamically 250 between different regimes. 252 5. Receiver Behavior 254 The role of the receiver is fairly straightforward. It is in charge of 255 four steps: a) monitoring end-to-end delay/loss/marking statistics on a 256 per-packet basis; b) aggregating all forms of congestion signals in 257 terms of the equivalent delay; c) calculating time-smoothed value of the 258 congestion signal; and d) sending periodic reports back to the sender. 260 5.1 Monitoring per-packet statistics 262 The receiver observes and estimates one-way delay d_n for the n-th 263 packet, ECN marking event 1_M, and packet loss event 1_L. Here, 1_M and 264 1_L are binary indicators: the value of 1 corresponding to a marked or 265 lost packet and value of 0 indicates no marking or loss. 267 The equivalent delay d_tilde is calculated as follows: 269 d_tilde = d_n + 1_M d_M + 1_M d_L, 271 where d_M is a prescribed fictitious delay value corresponding to the 272 ECN marking event (e.g., d_M = 200 ms), and d_L is a prescribed 273 fictitious delay value corresponding to the packet loss event (e.g., d_L 274 = 1 second). By introducing a large fictitious delay penalty for ECN 275 marking and packet losses, the proposed scheme leads to low end-to-end 276 actual delays in the presence of such events. 278 While the value of d_M and d_L are fixed and predetermined in our 279 current design, we also plan to investigate a scheme for automatically 280 tuning these values based on desired bandwidth sharing behavior in the 281 presence of other competing loss-based flows (e.g., loss-based TCP). 283 5.2 Calculating time-smoothed values 285 The receiver smoothes its observations via exponential averaging: 287 x_n = alpha*d_tilde + (1-alpha)*x_n. 289 The weighting parameter alpha adjusts the level of smoothing. 291 5.3 Sending periodic feedback 293 Periodically, the receiver sends back the updated value of x in RTCP 294 messages, to aid the sender in its calculation of target rate. The size 295 of acknowledgement packets are typically on the order of tens of bytes, 296 and are significantly smaller than average video packet sizes. 297 Therefore, the bandwidth overhead of the receiver acknowledgement stream 298 is sufficiently low. 300 5.4 Discussions on one-way delay measurements 302 At the current stage, our proposed scheme relies on one-way delay (OWD) 303 as the primary form of congestion indication. This implicitly relies on 304 well-synchronized sender and receiver clocks, e.g., due to the presence 305 of an auxiliary clock synchronization process. For deployment in the 306 open Internet, however, this assumption may not always hold. 308 There are several ways to get around the clock synchronization issue by 309 slightly tweaking the current design. One option is to work with 310 relative OWD instead, by maintaining the minimum value of observed OWD 311 over a longer time horizon and subtract that out from the observed 312 absolute OWD value. Such an approach cancels out the fixed clock 313 difference from the sender and receiver clocks, and has been widely 314 adopted by other delay-based congestion control approaches such as 315 LEDBAT [RFC6817]. As discussed in [RFC6817], the time horizon for 316 tracking the minimum OWD needs to be chosen with care: long enough for 317 an opportunity to observe the minimum OWD with zero queuing delay along 318 the path, and sufficiently short so as to timely reflect "true" changes 319 in minimum OWD introduced by route changes and other rare events. 321 Alternatively, one could move the per-packet statistical handling to the 322 sender instead, and use RTT in lieu of OWD, assuming the per-packet ACKs 323 are present. The main drawback of this latter approach, on the other 324 hand, is that the scheme will be confused by congestion in the reverse 325 direction. 327 Note that either approach involves no change in the proposed rate 328 adaptation algorithm at the sender. Therefore, comparing the pros and 329 cons regarding which delay metric to use can be kept as an orthogonal 330 direction of investigation. 332 6. Sender Behavior 334 -------------------- 335 | | 336 | Reference Rate | <--------- RTCP report 337 | Calculator | 338 | | 339 -------------------- 340 | 341 | R_n 342 | 343 -------------------------- 344 | | 345 | | 346 \ / \ / 347 -------------------- ----------------- 348 | | | | 349 | Video Target | | Sending Rate | 350 | Rate Calculator | | Calculator | 351 | | | | 352 -------------------- ----------------- 353 | /|\ /|\ | 354 R_v| | | | 355 | ----------------------- | 356 | | | R_s 357 ------------ |L_s | 358 | | | | 359 | | R_o -------------- \|/ 360 | Encoder |----------> | | | | | ---------------> 361 | | | | | | | video packets 362 ------------ -------------- 363 Rate Shaping Buffer 365 Figure 1 NADA Sender Structure 367 Figure 1 provides a more detailed view of the NADA sender. Upon 368 receipt of an RTCP report from the receiver, the NADA sender updates 369 its calculation of the reference rate R_n as a function of the 370 network congestion signal. It further adjusts both the target rate 371 for the live video encoder R_v and the sending rate R_s over the 372 network based on the updated value of R_n, as well as the size of 373 the rate shaping buffer. 375 The following sections describe these modules in further details, 376 and explain how they interact with each other. 378 6.1 Video encoder rate control 380 The video encoder rate control procedure has the following 381 characteristics: 383 * Rate changes can happen only at large intervals, on the order of 384 seconds. 386 * Given a target rate R_o, the encoder output rate may randomly 387 fluctuate around it. 389 * The encoder output rate is further constrained by video content 390 complexity. The range of the final rate output is [R_min, R_max]. 391 Note that it's content-dependent, and may change over time. 393 Note that operation of the live video encoder is out of the scope of our 394 design for a congestion control scheme in NADA. Instead, its behavior 395 treated as a black box. 397 6.2 Rate shaping buffer 399 A rate shaping buffer is employed to absorb any instantaneous mismatch 400 between encoder rate output R_o and regulated sending rate R_s. The size 401 of the buffer evolves from time t-tau to time t as: 403 L_s(t) = max [0, L_s(t-tau)+R_v*tau-R_s*tau]. 405 A large rate shaping buffer contributes to higher end-to-end delay, 406 which may harm the performance of real-time media communications. 407 Therefore, the sender has a strong incentive to constrain the size of 408 the shaping buffer. It can either deplete it faster by increasing the 409 sending rate R_s, or limit its growth by reducing the target rate for 410 the video encoder rate control R_v. 412 6.3 Reference rate calculator 414 The sender calculates the reference rate R_n based on network congestion 415 information from receiver RTCP reports. It first compensates the effect 416 of delayed observation by one round-trip time (RTT) via a linear 417 predictor: 419 x_n - x_n-1 420 x_hat = x_n + ---------------*tau_o (1) 421 delta 423 In (1), the arrival interval between the (n-1)-th the n-th packets is 424 designated by delta. The parameter tau_o is pre-configured to a fixed 425 value. Typically, its value is comparable to the RTT experienced by the 426 flow, but does not needs to be an exact match. Throughout all our 427 simulation evaluations (see [Zhu-PV13]), we have been using the same 428 fixed value of tau_o = 200ms. 430 The reference rate is then calculated as: 432 R_max-R_min 433 R_n = R_min + w*---------------*x_ref (2) 434 x_hat 436 Here, R_min and R_max denote the content-dependent rate range the 437 encoder can produce. The weight of priority level is w. The reference 438 congestion signal x_ref is chosen so that the maximum rate of R_max can 439 be achieved when x_hat = w*x_ref. The final target rate R_n is clipped 440 within the range of [R_min, R_max]. 442 The rationale of choose x_ref to be the value of absolute one-way delay 443 (i.e., only propagation delay along the path) is that ideally, we would 444 want the video stream to reach the highest possible rate when the queue 445 stays is empty, e.g., when bottleneck link rate exceeds R_max of video. 446 In practice, the stream can simply set x_ref to be the minimum value of 447 OWD observed over a long time horizon. Note also that the combination of 448 w and x_ref determines how sensitive the rate adaptation scheme is in 449 reaction to fluctuations in observed signal x. 451 The sender does not need any explicit knowledge of the management scheme 452 inside the network. Rather, it reacts to the aggregation of all forms of 453 congestion indications (delay, loss, and marking) via the composite 454 congestion signal x_n from the receiver in a coherent manner. 456 6.4 Video target rate and sending rate calculator 458 The target rate for the live video encoder is updated based on both the 459 reference rate R_n and the rate shaping buffer size L_s, as follows: 461 L_s 462 R_v = R_o - beta_v * -------. (3) 463 tau_v 465 Similarly, the outgoing rate is regulated based on both the reference 466 rate R_n and the rate shaping buffer size L_s, such that: 468 L_s 469 R_s = R_o + beta_s * -------. (4) 470 tau_v 472 In (3) and (4), the first term indicates the rate calculated from 473 network congestion feedback alone. The second term indicates the 474 influence of the rate shaping buffer. A large rate shaping buffer nudges 475 the encoder target rate slightly below -- and the sending rate slightly 476 above -- the reference rate R_n. Intuitively, the amount of extra rate 477 offset needed to completely drain the rate shaping buffer within the 478 same time frame of encoder rate adaptation tau_v is given by L_s/tau_v. 479 The scaling parameters beta_v and beta_s can be tuned to balance between 480 the competing goals of maintaining a small rate shaping buffer and 481 deviating the system from the reference rate point. 483 6.5 Slow-start behavior 485 Finally, special care needs to be taken during the startup phase of a 486 video stream, since it may take several roundtrip-times before the 487 sender can collect statistically robust information on network 488 congestion. We propose to regulate the reference rate R_n to grow 489 linearly in the beginning, no more than: R_ss at time t: 491 t-t_0 492 R_ss(t) = R_min + -------(R_max-R_min). 493 T 495 The start time of the stream is t_0, and T represents the time horizon 496 over which the slow-start mechanism is effective. The encoder target 497 rate is chosen to be the minimum of R_n and R_ss during the first T 498 seconds. 500 7. Incremental Deployment 502 One nice property of proposed design is the consistent video end point 503 behavior irrespective of network node variations. This facilitates 504 gradual, incremental adoption of the scheme. 506 To start off with, the proposed encoder congestion control mechanism can 507 be implemented without any explicit support from the network, and rely 508 solely on observed one-way delay measurements and packet loss ratios as 509 implicit congestion signals. 511 When ECN is enabled at the network nodes with RED-based marking, the 512 receiver can fold its observations of ECN markings into the calculation 513 of the equivalent delay. The sender can react to these explicit 514 congestion signals without any modification. 516 Ultimately, networks equipped with proactive marking based on token 517 bucket level metering can reap the additional benefits of zero standing 518 queues and lower end-to-end delay and work seamlessly with existing 519 senders and receivers. 521 8. Implementation Status 523 The proposed NADA scheme has been implemented in the ns-2 simulation 524 platform [ns2]. Extensive simulation evaluations of the scheme are 525 documented in [Zhu-PV13]. 527 The scheme has also been implemented in Linux. Initial set of testbed 528 evaluation results have focused on the case of a single NADA flow over a 529 single low-delay bottleneck link. More investigations are underway. 531 9. IANA Considerations 533 There are no actions for IANA. 535 10. References 537 10.1 Normative References 539 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 540 Requirement Levels", BCP 14, RFC 2119, March 1997. 542 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 543 Jacobson, "RTP: A Transport Protocol for Real-Time 544 Applications", STD 64, RFC 3550, July 2003. 546 10.2 Informative References 548 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 549 of Explicit Congestion Notification (ECN) to IP", 550 RFC 3168, September 2001. 552 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 553 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 554 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 555 S., Wroclawski, J., and L. Zhang, "Recommendations on 556 Queue Management and Congestion Avoidance in the 557 Internet", RFC 2309, April 1998. 559 [RFC6187] S. Shalunov, G. Hazel, J. Iyengar, and M. Kuehlewind, "Low 560 Extra Delay Background Transport (LEDBAT)", RFC 6817, 561 December 2012 563 [ns2] "The Network Simulator - ns-2", http://www.isi.edu/nsnam/ns/ 565 [Zhu-PV13] Zhu, X. and Pan, R., "NADA: A Unified Congestion Control 566 Scheme for Low-Latency Interactive Video", in Proc. IEEE 567 International Packet Video Workshop (PV'13). San Jose, CA, 568 USA. December 2013. 570 Authors' Addresses 572 Xiaoqing Zhu 573 Cisco Systems, 574 510 McCarthy Blvd, 575 Milpitas, CA 95134, USA 576 EMail: xiaoqzhu@cisco.com 578 Rong Pan 579 Cisco Systems 580 510 McCarthy Blvd, 581 Milpitas, CA 95134, USA 582 Email: ropan@cisco.com