idnits 2.17.1 draft-hayes-rmcat-sbd-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 27, 2014) is 3462 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-05) exists of draft-welzl-rmcat-coupled-cc-04 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTP Media Congestion Avoidance D. Hayes, Ed. 3 Techniques University of Oslo 4 Internet-Draft S. Ferlin 5 Intended status: Experimental Simula Research Laboratory 6 Expires: April 30, 2015 M. Welzl 7 University of Oslo 8 October 27, 2014 10 Shared Bottleneck Detection for Coupled Congestion Control for RTP 11 Media. 12 draft-hayes-rmcat-sbd-01 14 Abstract 16 This document describes a mechanism to detect whether end-to-end data 17 flows share a common bottleneck. It relies on summary statistics 18 that are calculated by a data receiver based on continuous 19 measurements and regularly fed to a grouping algorithm that runs 20 wherever the knowledge is needed. This mechanism complements the 21 coupled congestion control mechanism in draft-welzl-rmcat-coupled-cc. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on April 30, 2015. 40 Copyright Notice 42 Copyright (c) 2014 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 1.1. The signals . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1.1. Packet Loss . . . . . . . . . . . . . . . . . . . . . 3 60 1.1.2. Packet Delay . . . . . . . . . . . . . . . . . . . . . 3 61 1.1.3. Path Lag . . . . . . . . . . . . . . . . . . . . . . . 4 62 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 2.1. Parameter Values . . . . . . . . . . . . . . . . . . . . . 5 64 3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 3.1. Key metrics and their calculation . . . . . . . . . . . . 7 66 3.1.1. Mean delay . . . . . . . . . . . . . . . . . . . . . . 7 67 3.1.2. Skewness Estimate . . . . . . . . . . . . . . . . . . 7 68 3.1.3. Variance Estimate . . . . . . . . . . . . . . . . . . 8 69 3.1.4. Oscillation Estimate . . . . . . . . . . . . . . . . . 8 70 3.1.5. Packet loss . . . . . . . . . . . . . . . . . . . . . 9 71 3.2. Flow Grouping . . . . . . . . . . . . . . . . . . . . . . 9 72 3.2.1. Flow Grouping Algorithm . . . . . . . . . . . . . . . 9 73 3.2.2. Using the flow group signal . . . . . . . . . . . . . 10 74 4. Measuring OWD . . . . . . . . . . . . . . . . . . . . . . . . 11 75 4.1. Time stamp resolution . . . . . . . . . . . . . . . . . . 11 76 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 77 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 78 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 79 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 8.1. Normative References . . . . . . . . . . . . . . . . . . . 12 81 8.2. Informative References . . . . . . . . . . . . . . . . . . 12 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 84 1. Introduction 86 In the Internet, it is not normally known if flows (e.g., TCP 87 connections or UDP data streams) traverse the same bottlenecks. Even 88 flows that have the same sender and receiver may take different paths 89 and share a bottleneck or not. Flows that share a bottleneck link 90 usually compete with one another for their share of the capacity. 91 This competition has the potential to increase packet loss and 92 delays. This is especially relevant for interactive applications 93 that communicate simultaneously with multiple peers (such as multi- 94 party video). For RTP media applications such as RTCWEB, 95 [I-D.welzl-rmcat-coupled-cc] describes a scheme that combines the 96 congestion controllers of flows in order to honor their priorities 97 and avoid unnecessary packet loss as well as delay. This mechanism 98 relies on some form of Shared Bottleneck Detection (SBD); here, a 99 measurement-based SBD approach is described. 101 1.1. The signals 103 The current Internet is unable to explicitly inform endpoints as to 104 which flows share bottlenecks, so endpoints need to infer this from 105 packet loss and packet delay. 107 1.1.1. Packet Loss 109 Packet loss is often a relatively rare signal. Therefore, on its own 110 it is of limited use for SBD, however, it is a valuable supplementary 111 measure when it is more prevalent. 113 1.1.2. Packet Delay 115 End-to-end delay measurements include noise from every device along 116 the path in addition to the delay perturbation at the bottleneck 117 device. The noise is often significantly increased if the round-trip 118 time is used. The cleanest signal is obtained by using One-Way-Delay 119 (OWD). 121 Measuring absolute OWD is difficult since it requires both the sender 122 and receiver clocks to be synchronised. However, since the 123 statistics being collected are relative to the mean OWD, a relative 124 OWD measurement is sufficient. Clock drift is not usually 125 significant over the time intervals used by this SBD mechanism (see 126 [RFC6817] A.2 for a discussion on clock drift and OWD measurements). 128 Each packet arriving at the bottleneck buffer may experience very 129 different queue lengths, and therefore different waiting times. A 130 single OWD sample does therefore not characterize the actual OWD of a 131 path well. However, multiple OWD measurements do reflect the 132 distribution of delays experienced at the bottleneck. 134 1.1.3. Path Lag 136 Flows that share a common bottleneck may traverse different paths, 137 and these paths will often have different base delays. This makes it 138 difficult to correlate changes in delay or loss. This technique uses 139 the long term shape of the delay distribution as a base for 140 comparison to counter this. 142 2. Definitions 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in RFC 2119 [RFC2119]. 148 Acronyms used in this document: 150 OWD -- One Way Delay 152 RTT -- Round Trip Time 154 SBD -- Shared Bottleneck Detection 156 Conventions used in this document: 158 T -- the base time interval over which measurements are 159 made. 161 N -- the number of base time, T, intervals used in some 162 calculations. 164 sum_T(...) -- summation of all the measurements of the variable 165 in parentheses taken over the interval T 167 sum_N(...) -- summation of N terms of the variable in parentheses 169 sum_NT(...) -- summation of all measurements taken over the 170 interval N*T 172 E_T(...) -- the expectation or mean of the measurements of the 173 variable in parentheses over T 175 E_N(...) -- The expectation or mean of the last N values of the 176 variable in parentheses 178 max_T(...) -- the maximum recorded measurement of the variable in 179 parentheses taken over the interval T 181 min_T(...) -- the minimum recorded measurement of the variable in 182 parentheses taken over the interval T 184 num_T(...) -- the count of measurements of the variable in 185 parenthesis taken in the interval T 187 p_l, p_f, p_pdf, c_s, p_s, p_d, p_v -- various thresholds used in 188 the mechanism. 190 2.1. Parameter Values 192 Reference [Hayes-LCN14] uses T=350ms, N=50, p_l = 0.1, p_f = 0.2, 193 p_pdf = 0.3, c_s = 0.0, p_s = p_d = p_v = 0.2. These are values that 194 seem to work well over a wide range of practical Internet conditions. 196 3. Mechanism 198 The mechanism described in this document is based on the observation 199 that the distribution of delay measurements of packets from flows 200 that share a common bottleneck have similar shape characteristics. 201 These shape characteristics are described using 3 key summary 202 statistics: 204 variance (estimate PDV, see Section 3.1.3) 206 skewness (estimate skew_est, see Section 3.1.2) 208 oscillation (estimate freq_est, see Section 3.1.4) 210 Summary statistics help to address both the noise and the path lag 211 problems by describing the general shape over a relatively long 212 period of time. This is sufficient for their application in coupled 213 congestion control for RTP Media. They can be signalled from a 214 receiver, which measures the OWD and calculates the summary 215 statistics, to a sender, which is the entity that is transmitting the 216 media stream. An RTP Media device may be both a sender and a 217 receiver. SBD can be performed at either Sender or receiver or both. 219 +----+ 220 | H2 | 221 +----+ 222 | 223 | L2 224 | 225 +----+ L1 | L3 +----+ 226 | H1 |------|------| H3 | 227 +----+ +----+ 229 A network with 3 hosts (H1, H2, H3) and 3 links (L1, L2, L3). 231 Figure 1 233 In Figure 1, there are two possible cases for shared bottleneck 234 detection: a sender-based and a receiver-based case. 236 1. Sender-based: consider a situation where host H1 sends media 237 streams to hosts H2 and H3, and L1 is a shared bottleneck. H2 238 and H3 measure the OWD and calculate summary statistics, which 239 they send to H1 every T. H1, having this knowledge, can determine 240 the shared bottleneck and accordingly control the send rates. 242 2. Receiver-based: consider that H2 is also sending media to H3, and 243 L3 is a shared bottleneck. If H3 sends summary statistics to H1 244 and H2, neither H1 nor H2 alone obtain enough knowledge to detect 245 this shared bottleneck; H3 can however determine it by combining 246 the summary statistics related to H1 and H2, respectively. This 247 case is applicable when send rates are controlled by the 248 receiver; then, the signal from H3 to the senders contains the 249 sending rate. 251 A discussion of the required signaling for the receiver-based case is 252 beyond the scope of this document. For the sender-based case, the 253 messages and their data format will be defined here in future 254 versions of this document. We envision that an initialization 255 message from the sender to the receiver could specify which key 256 metrics are requested out of a possibly extensible set (PL_NT, PDV, 257 skew_est, freq_est). The grouping algorithm described in this 258 document requires all four of these metrics, and receivers MUST be 259 able to provide them, but future algorithms may be able to exploit 260 other metrics (e.g. metrics based on explicit network signals). 261 Moreover, the initialization message could specify T, N, and the 262 necessary resolution and precision (number of bits per field). 264 3.1. Key metrics and their calculation 266 Measurements are calculated over a base interval, T. T should be long 267 enough to provide enough samples for a good estimate of skewness, but 268 short enough so that a measure of the oscillation can be made from N 269 of these estimates. Reference [Hayes-LCN14] uses T = 350ms and N = 270 50, which are values that seem to work well over a wide range of 271 practical Internet conditions. 273 3.1.1. Mean delay 275 The mean delay is not a useful signal for comparisons between flows 276 since flows may traverse quite different paths and clocks will not 277 necessarily be synchronized. However, it is a base measure for the 3 278 summary statistics. The mean delay, E_T(OWD), is the average one way 279 delay measured over T. 281 To facilitate the other calculations, the last N E_T(OWD) values will 282 need to be stored in a cyclic buffer along with the moving average of 283 E_T(OWD): 285 E_N(E_T(OWD)) = sum_N(E_T(OWD)) / N 287 3.1.2. Skewness Estimate 289 Skewness is difficult to calculate efficiently and accurately. 290 Ideally it should be calculated over the entire period (N * T) from 291 the mean OWD over that period. However this would require storing 292 every delay measurement over the period. Instead, an estimate is 293 made over T using the previous calculation of E_T(OWD). Comparisons 294 are made using the mean of N skew estimates. 296 The skewness is estimated using two counters, counting the number of 297 one way delay samples (OWD) above and below the mean: 299 skew_est = (sum_T(OWD < E_NT(OWD)) - sum_T(OWD > E_NT(OWD))) / 300 num_T(OWD) 302 where 304 if (OWD < E_NT(OWD)) 1 else 0 306 if (OWD > E_NT(OWD)) 1 else 0 308 skew_est is a number between -1 and 1 310 E_N(skew_est) = sum_N(skew_est) / N 312 For implementation ease, E_NT(OWD) does not include the mean of the 313 current T interval. Care must be taken when implementing the 314 comparisons to ensure that rounding does not bias skew_est. 316 3.1.3. Variance Estimate 318 Packet Delay Variation (PDV) ([RFC5481] and [ITU-Y1540]) is used as 319 an estimator of the variance of the delay signal. We define PDV as 320 follows: 322 PDV = (max_T(OWD) - E_T(OWD)) 324 E_N(PDV) = sum_N(PDV) / N 326 This modifies PDV as outlined in [RFC5481] to provide a summary 327 statistic version that best aids the grouping decisions of the 328 algorithm (see [Hayes-LCN14] section IVB). 330 The use of PDV = (min_T(OWD) - E_T(OWD)) is currently being 331 investigated as an alternative that is less sensitive to noise. 333 3.1.4. Oscillation Estimate 335 An estimate of the low frequency oscillation of the delay signal is 336 calculated by counting and normalising the significant mean, 337 E_T(OWD), crossings of E_N(E_T(OWD)): 339 freq_est = number_of_crossings / N 341 Where 343 we define a significant mean crossing as a crossing that 344 extends p_v * E_N(PDV) from E_N(E_T(OWD)). In our experiments 345 we have found that p_v = 0.2 is a good value. 347 Freq_est is a number between 0 and 1. Freq_est can be approximated 348 incrementally as follows: 350 With each new calculation of E_T(OWD) a decision is made as to 351 whether this value of E_T(OWD) significantly crosses the current 352 long term mean, E_N(E_T(OWD), with respect to the previous 353 significant mean crossing. 355 A cyclic buffer, last_N_crossings, records a 1 if there is a 356 significant mean crossing, otherwise a 0. 358 The counter, number_of_crossings, is incremented when there is a 359 significant mean crossing and subtracted from when a non zero 360 value is removed from the last_N_crossings. 362 This approximation of freq_est was not used in [Hayes-LCN14], which 363 calculated freq_est every T using the current E_N(E_T(OWD)). Our 364 tests show that this approximation of freq_est yields results that 365 are almost identical to when the full calculation is performed every 366 T. 368 3.1.5. Packet loss 370 The proportion of packets lost is used as a supplementary measure: 372 PL_NT = sum_NT(lost packets) / sum_NT(total packets) 374 3.2. Flow Grouping 376 3.2.1. Flow Grouping Algorithm 378 The following grouping algorithm is RECOMMENDED for SBD in this 379 context and is sufficient and efficient for small to moderate numbers 380 of flows. For very large numbers of flows (e.g. hundreds), a more 381 complex clustering algorithm may be substituted. 383 Since no single metric is precise enough to group flows (due to 384 noise), the algorithm uses multiple metrics. Each metric offers a 385 different "view" of the bottleneck link characteristics, and used 386 together enable a more precise grouping of flows than would otherwise 387 be possible. 389 Flows determined to be experiencing congestion are successively 390 divided into groups based on freq_est, PDV, and skew_est. 392 The first step is to determine which flows are experiencing 393 congestion. This is important, since if a flow is not experiencing 394 congestion its delay based metrics will not describe the bottleneck, 395 but the "noise" from the rest of the path. Skewness, with proportion 396 of packets loss as a supplementary measure, is used to do this: 398 1. Grouping will be performed on flows where: 400 E_N(skew_est) < c_s || PL_NT > p_l. 402 The parameter c_s controls how sensitive the mechanism is in 403 detecting congestion. C_s = 0.0 was used in [Hayes-LCN14]. A value 404 of c_s = 0.05 is a little more sensitive, and c_s = -0.05 is a little 405 less sensitive. 407 These flows, flows experiencing congestion, are then progressively 408 divided into groups based on the freq_est, PDV, and skew_est summary 409 statistics. The process proceeds according to the following steps: 411 2. Group flows whose difference in sorted freq_est is less than a 412 threshold: 414 diff(freq_est) < p_f 416 3. Group flows whose difference in sorted E_N(PDV) is less than a 417 threshold: 419 diff(E_N(PDV)) < (p_pdv * E_N(PDV)) 421 4. Group flows whose difference in sorted E_N(skew_est) or PL_NT is 422 less than a threshold: 424 if PL_NT < p_l 426 diff(E_N(skewness)) < p_s 428 otherwise 430 diff(PL_NT) < p_d 432 This procedure involves sorting the groups, according to the measure 433 being used to divide them. It is simple to implement, and efficient 434 for small numbers of flows, such as are expected in RTCWEB. 436 3.2.2. Using the flow group signal 438 A grouping decisions is made every T from the second T, though they 439 will not attain their full design accuracy until after the N'th T 440 interval. 442 Network conditions can cause bottlenecks to fluctuate. A coupled 443 congestion controller MAY decide only to couple groups that remain 444 stable, say grouped together 90% of the time, depending on its 445 objectives. Recommendations concerning this are beyond the scope of 446 this draft and will be specific to the coupled congestion controllers 447 objectives. 449 4. Measuring OWD 451 This section discusses the OWD measurements required for this 452 algorithm to detect shared bottlenecks. 454 The SBD mechanism described in this draft relies on differences 455 between OWD measurements to avoid the practical problems with 456 measuring absolute OWD (see [Hayes-LCN14] section IIIC). Since all 457 summary statistics are relative to the mean OWD and sender/receiver 458 clock offsets are approximately constant over the measurement 459 periods, the offset is subtracted out in the calculation. 461 4.1. Time stamp resolution 463 The SBD mechanism requires timing information precise enough to be 464 able to make comparisons. As a rule of thumb, the time resolution 465 should be less than one hundredth of a typical path's range of 466 delays. In general, the lower the time resolution, the more care 467 that needs to be taken to ensure rounding errors don't bias the 468 skewness calculation. 470 Typical RTP media flows use sub-millisecond timers, which should be 471 adequate in most situations. 473 5. Acknowledgements 475 This work was part-funded by the European Community under its Seventh 476 Framework Programme through the Reducing Internet Transport Latency 477 (RITE) project (ICT-317700). The views expressed are solely those of 478 the authors. 480 6. IANA Considerations 482 This memo includes no request to IANA. 484 7. Security Considerations 486 The security considerations of RFC 3550 [RFC3550], RFC 4585 487 [RFC4585], and RFC 5124 [RFC5124] are expected to apply. 489 Non-authenticated RTCP packets carrying shared bottleneck indications 490 and summary statistics could allow attackers to alter the bottleneck 491 sharing characteristics for private gain or disruption of other 492 parties communication. 494 8. References 496 8.1. Normative References 498 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 499 Requirement Levels", BCP 14, RFC 2119, March 1997. 501 8.2. Informative References 503 [Hayes-LCN14] 504 Hayes, D., Ferlin, S., and M. Welzl, "Practical Passive 505 Shared Bottleneck Detection using Shape Summary 506 Statistics", Proc. the IEEE Local Computer Networks 507 (LCN) p150-158, September 2014, . 511 [I-D.welzl-rmcat-coupled-cc] 512 Welzl, M., Islam, S., and S. Gjessing, "Coupled congestion 513 control for RTP media", draft-welzl-rmcat-coupled-cc-04 514 (work in progress), October 2014. 516 [ITU-Y1540] 517 ITU-T, "Internet protocol data communication service - IP 518 packet transfer and availability performance parameters", 519 Series Y: Global Information Infrastructure, Internet 520 Protocol Aspects and Next-Generation Networks , 521 March 2011, 522 . 524 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 525 Jacobson, "RTP: A Transport Protocol for Real-Time 526 Applications", STD 64, RFC 3550, July 2003. 528 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 529 "Extended RTP Profile for Real-time Transport Control 530 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 531 July 2006. 533 [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for 534 Real-time Transport Control Protocol (RTCP)-Based Feedback 535 (RTP/SAVPF)", RFC 5124, February 2008. 537 [RFC5481] Morton, A. and B. Claise, "Packet Delay Variation 538 Applicability Statement", RFC 5481, March 2009. 540 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 541 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 542 December 2012. 544 Authors' Addresses 546 David Hayes (editor) 547 University of Oslo 548 PO Box 1080 Blindern 549 Oslo, N-0316 550 Norway 552 Phone: +47 2284 5566 553 Email: davihay@ifi.uio.no 555 Simone Ferlin 556 Simula Research Laboratory 557 P.O.Box 134 558 Lysaker, 1325 559 Norway 561 Phone: +47 4072 0702 562 Email: ferlin@simula.no 564 Michael Welzl 565 University of Oslo 566 PO Box 1080 Blindern 567 Oslo, N-0316 568 Norway 570 Phone: +47 2285 2420 571 Email: michawe@ifi.uio.no