idnits 2.17.1 draft-hayes-rmcat-sbd-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 10, 2014) is 3485 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-05) exists of draft-welzl-rmcat-coupled-cc-03 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTP Media Congestion Avoidance D. Hayes, Ed. 3 Techniques University of Oslo 4 Internet-Draft S. Ferlin 5 Intended status: Experimental Simula Research Laboratory 6 Expires: April 13, 2015 M. Welzl 7 University of Oslo 8 October 10, 2014 10 Shared Bottleneck Detection for Coupled Congestion Control for RTP 11 Media. 12 draft-hayes-rmcat-sbd-00 14 Abstract 16 This document describes a mechanism to detect whether end-to-end data 17 flows share a common bottleneck. It relies on summary statistics 18 that are calculated by a data receiver based on continuous 19 measurements and regularly fed to a grouping algorithm that runs 20 wherever the knowledge is needed. This mechanism complements the 21 coupled congestion control mechanism in draft-welzl-rmcat-coupled-cc. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on April 13, 2015. 40 Copyright Notice 42 Copyright (c) 2014 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 1.1. The signals . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1.1. Packet Loss . . . . . . . . . . . . . . . . . . . . . 3 60 1.1.2. Packet Delay . . . . . . . . . . . . . . . . . . . . . 3 61 1.1.3. Path Lag . . . . . . . . . . . . . . . . . . . . . . . 4 62 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 2.1. Parameter Values . . . . . . . . . . . . . . . . . . . . . 5 64 3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 3.1. Key metrics and their calculation . . . . . . . . . . . . 6 66 3.1.1. Mean delay . . . . . . . . . . . . . . . . . . . . . . 6 67 3.1.2. Skewness Estimate . . . . . . . . . . . . . . . . . . 7 68 3.1.3. Variance Estimate . . . . . . . . . . . . . . . . . . 7 69 3.1.4. Oscilation Estimate . . . . . . . . . . . . . . . . . 8 70 3.1.5. Packet loss . . . . . . . . . . . . . . . . . . . . . 8 71 3.2. Flow Grouping . . . . . . . . . . . . . . . . . . . . . . 8 72 3.2.1. Flow Grouping Algorithm . . . . . . . . . . . . . . . 8 73 3.2.2. Using the flow group signal . . . . . . . . . . . . . 9 74 4. Measuring OWD . . . . . . . . . . . . . . . . . . . . . . . . 10 75 4.1. Time stamp resolution . . . . . . . . . . . . . . . . . . 10 76 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 77 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 78 7. Security Considerations . . . . . . . . . . . . . . . . . . . 10 79 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 80 8.1. Normative References . . . . . . . . . . . . . . . . . . . 11 81 8.2. Informative References . . . . . . . . . . . . . . . . . . 11 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 84 1. Introduction 86 In the Internet, it is not normally known if flows (e.g., TCP 87 connections or UDP data streams) traverse the same bottlenecks. Even 88 flows that have the same sender and receiver may take different paths 89 and share a bottleneck or not. Flows that share a bottleneck link 90 usually compete with one another for their share of the capacity. 91 This competition has the potential to increase packet loss and 92 delays. This is especially relevant for interactive applications 93 that communicate simultaneously with multiple peers (such as multi- 94 party video). For RTP media applications such as RTCWEB, 95 [I-D.welzl-rmcat-coupled-cc] describes a scheme that combines the 96 congestion controllers of flows in order to honor their priorities 97 and avoid unnecessary packet loss as well as delay. This mechanism 98 relies on some form of Shared Bottleneck Detection (SBD); here, a 99 measurement-based SBD approach is described. 101 1.1. The signals 103 The current Internet is unable to explicitly inform endpoints as to 104 which flows share bottlenecks, so endpoints need to infer this from 105 packet loss and packet delay. 107 1.1.1. Packet Loss 109 Packet loss is often a relatively rare signal. Therefore, on its own 110 it is of limited use for SBD, however, it is a valuable supplementary 111 measure when it is more prevalent. 113 1.1.2. Packet Delay 115 End-to-end delay measurements include noise from every device along 116 the path in addition to the delay perturbation at the bottleneck 117 device. The noise is often significantly increased if the round-trip 118 time is used. The cleanest signal is obtained by using One-Way-Delay 119 (OWD). 121 Measuring absolute OWD is difficult since it requires both the sender 122 and receiver clocks to be synchronised. However, since the 123 statistics being collected are relative to the mean OWD, a relative 124 OWD measurement is sufficient. Clock drift is not usually 125 significant over the time intervals used by this SBD mechanism (see 126 [RFC6817] A.2 for a discussion on clock drift and OWD measurements). 128 Each packet arriving at the bottleneck buffer may experience very 129 different queue lengths, and therefore waiting times. A single OWD 130 sample does therefore not characterize the actual OWD of a path well. 131 However, multiple OWD measurements do reflect the distribution of 132 delays experienced at the bottleneck. 134 1.1.3. Path Lag 136 Flows that share a common bottleneck may traverse different paths, 137 and these paths will often have different base delays. This makes it 138 difficult to correlate changes in delay or loss. This technique uses 139 the long term shape of the delay distribution as a base for 140 comparison to counter this. 142 2. Definitions 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in RFC 2119 [RFC2119]. 148 Acronyms used in this document: 150 OWD -- One Way Delay 152 RTT -- Round Trip Time 154 SBD -- Shared Bottleneck Detection 156 Conventions used in this document: 158 T -- the base time interval over which measurements are 159 made. 161 N -- the number of base time, T, intervals used in some 162 calculations. 164 sum_T(...) -- summation of all the measurements of the variable 165 in parentheses taken over the interval T 167 sum_N(...) -- summation of N terms of the variable in parentheses 169 sum_NT(...) -- summation of all measurements taken over the 170 interval N*T 172 E_T(...) -- the expectation or mean of the measurements of the 173 variable in parentheses over T 175 E_N(...) -- The expectation or mean of the last N values of the 176 variable in parentheses 178 max_T(...) -- the maximum recorded measurement of the variable in 179 parentheses taken over the interval T 181 p_l, p_f, p_pdf, p_s, p_d, p_v -- various thresholds used in the 182 mechanism. 184 2.1. Parameter Values 186 Reference [Hayes-LCN14] uses T=350ms, N=50, p_l = 0.1, p_f = 0.2, 187 p_pdf = 0.3, p_s = p_d = p_v = 0.2. These are values that seem to 188 work well over a wide range of practical Internet conditions. 190 3. Mechanism 192 The mechanism described in this document is based on the observation 193 that the delay measurements of flows that share a common bottleneck 194 have similar shape characteristics. The shape of these 195 characteristics are described using 3 key summary statistics: 197 variance (estimate PDV, see Section 3.1.3) 199 skewness (estimate skewest, see Section 3.1.2) 201 oscillation (estimate freqest, see Section 3.1.4) 203 Summary statistics help to address both the noise and the path lag 204 problems by describing the general shape over a relatively long 205 period of time. This is sufficient for their application in coupled 206 congestion control for RTP Media. They can be signalled from a 207 receiver, which measures the OWD and calculates the summary 208 statistics, to a sender, which is the entity that is transmitting the 209 media stream. An RTP Media device may be both a sender and a 210 receiver. SBD can be performed at both the Sender and the Receiver. 212 +----+ 213 | H2 | 214 +----+ 215 | 216 | L2 217 | 218 +----+ L1 | L3 +----+ 219 | H1 |------|------| H3 | 220 +----+ +----+ 222 A network with 3 hosts (H1, H2, H3) and 3 links (L1, L2, L3). 224 Figure 1 226 In Figure 1, there are two possible cases for shared bottleneck 227 detection: a sender-based and a receiver-based case. 229 1. Sender-based: consider a situation where host H1 sends media 230 streams to hosts H2 and H3, and L1 is a shared bottleneck. H2 231 and H3 measure the OWD and calculate summary statistics, which 232 they send to H1 every T. H1, having this knowledge, can determine 233 the shared bottleneck and accordingly control the send rates. 235 2. Receiver-based: consider that H2 is also sending media to H3, and 236 L3 is a shared bottleneck. If H3 sends summary statistics to H1 237 and H2, neither H1 nor H2 alone obtain enough knowledge to detect 238 this shared bottleneck; H3 can however determine it by combining 239 the summary statistics related to H1 and H2, respectively. This 240 case is applicable when send rates are controlled by the 241 receiver; then, the signal from H3 to the senders contains the 242 sending rate. 244 A discussion of the required signaling for the receiver-based case is 245 beyond the scope of this document. For the sender-based case, the 246 messages and their data format will be defined here in future 247 versions of this document. We envision that an initialization 248 message from the sender to the receiver could specify which key 249 metrics are requested out of a possibly extensible set (losscnt, PDV, 250 skewest, freqest). The grouping algorithm described in this document 251 requires all four of these metrics, and receivers MUST be able to 252 provide them, but future algorithms may be able to exploit other 253 metrics (e.g. metrics based on explicit network signals). Moreover, 254 the initialization message could specify T, N, and the necessary 255 resolution and precision (number of bits per field). 257 3.1. Key metrics and their calculation 259 Measurements are calculated over a base interval, T. T should be long 260 enough to provide enough samples for a good estimate of skewness, but 261 short enough so that a measure of the oscillation can be made from N 262 of these estimates. Reference [Hayes-LCN14] uses T = 350ms and N = 263 50, which are values that seem to work well over a wide range of 264 practical Internet conditions. 266 3.1.1. Mean delay 268 The mean delay is not a useful signal for comparisons, however, it is 269 a base measure for the 3 summary statistics. The mean delay, 270 E_T(OWD), is the average one way delay measured over T. 272 To facilitate the other calculations, the last N E_T(OWD) values will 273 need to be stored in a cyclic buffer along with the moving average of 274 E_T(OWD): 276 E_N(E_T(OWD)) = sum_N(E_T(OWD)) / N 278 3.1.2. Skewness Estimate 280 Skewness is difficult to calculate efficiently and accurately. 281 Ideally it should be calculated over the entire measurement for the 282 entire period (N * T), however this would require storing every delay 283 measurement over the period. Instead, an estimate is made over T 284 using the previous calculation of E_T(OWD). Comparisons are made 285 using the mean of N skew estimates. 287 The skewness is estimated using two counters, counting the number of 288 one way delay samples above and below the mean: 290 skewest = (sum_T(OWD < E_T(OWD)) - sum_T(OWD > E_T(OWD)))/num(OWD) 292 where 294 if (OWD < E_T(OWD)) 1 else 0 296 if (OWD > E_T(OWD)) 1 else 0 298 skewest is a number between -1 and 1 300 E_N(skewest) = sum_N(skewest) /N 302 For implementation ease, E_T(OWD) is the mean delay of the previous T 303 interval. Care must be taken when implementing the comparisons to 304 ensure that rounding does not bias skewest. 306 3.1.3. Variance Estimate 308 Packet Delay Variation (PDV) ([RFC5481] and [ITU-Y1540] is used as an 309 estimator of the variance of the delay signal. We define PDV as 310 follows: 312 PDV = (max(OWD) - E_T(OWD)) 314 E_N(PDV) = sum_N(PDV) /N 316 This modifies PDV as outlined in [RFC5481] to provide a summary 317 statistic version that best aids the grouping decisions of the 318 algorithm (see [Hayes-LCN14] section IVB). 320 3.1.4. Oscilation Estimate 322 An estimate of the low frequency oscillation of the delay signal is 323 calculated by counting and normalising the significant mean, 324 E_T(OWD), crossings of E_N(E_T(OWD)): 326 freqest = number_of_crossings / N 328 Where 330 we define a significant mean crossing as a crossing that 331 extends p_v * E_N(PDV) from E_N(E_T(OWD)). In our experiments 332 we have found that p_v = 0.2 is a good value. 334 Freqest is a number between 0 and 1. Freqest and can be approximated 335 incrementally as follows: 337 With each new calculation of E_T(OWD) a decision is made as to 338 whether this value of E_T(OWD) significantly crosses the current 339 long term mean, E_N(E_T(OWD), with respect to the previous 340 significant mean crossing. 342 A cyclic buffer, last_N_crossings, records a 1 if there is a 343 significant mean crossing, otherwise a 0. 345 The counter, number_of_crossings, is incremented when there is a 346 significant mean crossing and subtracted from when a non zero 347 value is removed from the last_N_crossings. 349 This approximation of freqest was not used in [Hayes-LCN14], which 350 calculated freqest every T using the current E_N(E_T(OWD)). Our 351 tests show that this approximation of freqest yields results that are 352 almost identical to when the full calculation is performed every T. 354 3.1.5. Packet loss 356 The proportion of packets lost is used as a supplementary measure: 358 PL_NT = sum_NT(lost packets) / sum_NT(total packets) 360 3.2. Flow Grouping 362 3.2.1. Flow Grouping Algorithm 364 The following grouping algorithm is RECOMMENDED for SBD in this 365 context and is sufficient and efficient for small to moderate numbers 366 of flows. For very large numbers of flows, hundreds, a more complex 367 clustering algorithm may be substituted. 369 Flows determined to be experiencing congestion are successively 370 divided into groups based on freqest, PDV, and skewest. 372 The first step is to determine which flows are experiencing 373 congestion. This is important, since if a flow is not experiencing 374 congestion its delay based metrics will not describe the bottleneck, 375 but the "noise" from the rest of the path. Skewness, with proportion 376 of packets loss as a supplementary measure, is used to do this: 378 1. Grouping will be performed on flows where: 380 E_N(skewest) < 0 || PL_NT > p_l. 382 These flows, flows experiencing congestion, are then progressively 383 divided into groups based on the freqest, PDV, and skewest summary 384 statistics. The process proceeds according to the following steps: 386 2. Group flows whose difference in sorted freqest is less than a 387 threshold: 389 diff(freqest) < p_f 391 3. Group flows whose difference in sorted E_N(PDV) is less than a 392 threshold: 394 diff(E_N(PDV)) < (p_pdv * E_N(PDV)) 396 4. Group flows whose difference in sorted E_N(skewest) or PL_NT is 397 less than a threshold: 399 if PL_NT < p_l 401 diff(E_N(skewness)) < p_s 403 otherwise 405 diff(PL_NT) < p_d 407 This procedure involves sorting the groups, according to the measure 408 being used to divide them. It is simple to implement, and efficient 409 for small numbers of flows, such as are expected in RTCWEB. 411 3.2.2. Using the flow group signal 413 A grouping decisions is made every T. Network conditions can cause 414 bottlenecks to fluctuate. A coupled congestion controller MAY decide 415 only to couple groups that remain stable, say grouped together 90% of 416 the time, depending on its objectives. Recommendations concerning 417 this are beyond the scope of this draft and will be specific to the 418 coupled congestion controllers objectives. 420 4. Measuring OWD 422 This section discusses the OWD measurements required for this 423 algorithm to detect shared bottlenecks. 425 The SBD mechanism described in this draft relies on differences 426 between OWD measurements to avoid the practical problems with 427 measuring absolute OWD (see [Hayes-LCN14] section IIIC). Since all 428 summary statistics are relative to the mean OWD and sender/receiver 429 clock offsets are approximately constant over the measurement 430 periods, the offset is subtracted out in the calculation. 432 4.1. Time stamp resolution 434 The SBD mechanism requires timing information precise enough to be 435 able to make comparisons. As a rule of thumb, the time resolution 436 should be less than one hundredth of a typical paths range of delays. 437 In general, the lower the time resolution, the more care that needs 438 to be taken to ensure rounding errors don't bias the skewness 439 calculation. 441 Typical RTP media flows use sub-millisecond timers, which should be 442 adequate in most situations. 444 5. Acknowledgements 446 This work was part-funded by the European Community under its Seventh 447 Framework Programme through the Reducing Internet Transport Latency 448 (RITE) project (ICT-317700). The views expressed are solely those of 449 the authors. 451 6. IANA Considerations 453 This memo includes no request to IANA. 455 7. Security Considerations 457 The security considerations of RFC 3550 [RFC3550], RFC 4585 458 [RFC4585], and RFC 5124 [RFC5124] are expected to apply. 460 Non-authenticated RTCP packets carrying shared bottleneck indications 461 and summary statistics could attackers to alter the bottleneck 462 sharing characteristics for private gain or disruption of other 463 parties communication. 465 8. References 467 8.1. Normative References 469 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 470 Requirement Levels", BCP 14, RFC 2119, March 1997. 472 8.2. Informative References 474 [Hayes-LCN14] 475 Hayes, D., Ferlin, S., and M. Welzl, "Practical Passive 476 Shared Bottleneck Detection using Shape Summary 477 Statistics", Proc. the IEEE Local Computer Networks 478 (LCN) p150-158, September 2014, . 482 [I-D.welzl-rmcat-coupled-cc] 483 Welzl, M., Islam, S., and S. Gjessing, "Coupled congestion 484 control for RTP media", draft-welzl-rmcat-coupled-cc-03 485 (work in progress), May 2014. 487 [ITU-Y1540] 488 ITU-T, "Internet protocol data communication service - IP 489 packet transfer and availability performance parameters", 490 Series Y: Global Information Infrastructure, Internet 491 Protocol Aspects and Next-Generation Networks , 492 March 2011, 493 . 495 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 496 Jacobson, "RTP: A Transport Protocol for Real-Time 497 Applications", STD 64, RFC 3550, July 2003. 499 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 500 "Extended RTP Profile for Real-time Transport Control 501 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 502 July 2006. 504 [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for 505 Real-time Transport Control Protocol (RTCP)-Based Feedback 506 (RTP/SAVPF)", RFC 5124, February 2008. 508 [RFC5481] Morton, A. and B. Claise, "Packet Delay Variation 509 Applicability Statement", RFC 5481, March 2009. 511 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 512 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 513 December 2012. 515 Authors' Addresses 517 David Hayes (editor) 518 University of Oslo 519 PO Box 1080 Blindern 520 Oslo, N-0316 521 Norway 523 Phone: +47 2284 5566 524 Email: davihay@ifi.uio.no 526 Simone Ferlin 527 Simula Research Laboratory 528 P.O.Box 134 529 Lysaker, 1325 530 Norway 532 Phone: +47 4072 0702 533 Email: ferlin@simula.no 535 Michael Welzl 536 University of Oslo 537 PO Box 1080 Blindern 538 Oslo, N-0316 539 Norway 541 Phone: +47 2285 2420 542 Email: michawe@ifi.uio.no