idnits 2.17.1 

draft-hayes-rmcat-sbd-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 27, 2014) is 3462 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-05) exists of
     draft-welzl-rmcat-coupled-cc-04


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	RTP Media Congestion Avoidance                             D. Hayes, Ed.
3	Techniques                                            University of Oslo
4	Internet-Draft                                                 S. Ferlin
5	Intended status: Experimental                 Simula Research Laboratory
6	Expires: April 30, 2015                                         M. Welzl
7	                                                      University of Oslo
8	                                                        October 27, 2014

10	   Shared Bottleneck Detection for Coupled Congestion Control for RTP
11	                                 Media.
12	                        draft-hayes-rmcat-sbd-01

14	Abstract

16	   This document describes a mechanism to detect whether end-to-end data
17	   flows share a common bottleneck.  It relies on summary statistics
18	   that are calculated by a data receiver based on continuous
19	   measurements and regularly fed to a grouping algorithm that runs
20	   wherever the knowledge is needed.  This mechanism complements the
21	   coupled congestion control mechanism in draft-welzl-rmcat-coupled-cc.

23	Status of this Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on April 30, 2015.

40	Copyright Notice

42	   Copyright (c) 2014 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
58	     1.1.  The signals  . . . . . . . . . . . . . . . . . . . . . . .  3
59	       1.1.1.  Packet Loss  . . . . . . . . . . . . . . . . . . . . .  3
60	       1.1.2.  Packet Delay . . . . . . . . . . . . . . . . . . . . .  3
61	       1.1.3.  Path Lag . . . . . . . . . . . . . . . . . . . . . . .  4
62	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
63	     2.1.  Parameter Values . . . . . . . . . . . . . . . . . . . . .  5
64	   3.  Mechanism  . . . . . . . . . . . . . . . . . . . . . . . . . .  5
65	     3.1.  Key metrics and their calculation  . . . . . . . . . . . .  7
66	       3.1.1.  Mean delay . . . . . . . . . . . . . . . . . . . . . .  7
67	       3.1.2.  Skewness Estimate  . . . . . . . . . . . . . . . . . .  7
68	       3.1.3.  Variance Estimate  . . . . . . . . . . . . . . . . . .  8
69	       3.1.4.  Oscillation Estimate . . . . . . . . . . . . . . . . .  8
70	       3.1.5.  Packet loss  . . . . . . . . . . . . . . . . . . . . .  9
71	     3.2.  Flow Grouping  . . . . . . . . . . . . . . . . . . . . . .  9
72	       3.2.1.  Flow Grouping Algorithm  . . . . . . . . . . . . . . .  9
73	       3.2.2.  Using the flow group signal  . . . . . . . . . . . . . 10
74	   4.  Measuring OWD  . . . . . . . . . . . . . . . . . . . . . . . . 11
75	     4.1.  Time stamp resolution  . . . . . . . . . . . . . . . . . . 11
76	   5.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
77	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 11
78	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 11
79	   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
80	     8.1.  Normative References . . . . . . . . . . . . . . . . . . . 12
81	     8.2.  Informative References . . . . . . . . . . . . . . . . . . 12
82	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13

84	1.  Introduction

86	   In the Internet, it is not normally known if flows (e.g., TCP
87	   connections or UDP data streams) traverse the same bottlenecks.  Even
88	   flows that have the same sender and receiver may take different paths
89	   and share a bottleneck or not.  Flows that share a bottleneck link
90	   usually compete with one another for their share of the capacity.
91	   This competition has the potential to increase packet loss and
92	   delays.  This is especially relevant for interactive applications
93	   that communicate simultaneously with multiple peers (such as multi-
94	   party video).  For RTP media applications such as RTCWEB,
95	   [I-D.welzl-rmcat-coupled-cc] describes a scheme that combines the
96	   congestion controllers of flows in order to honor their priorities
97	   and avoid unnecessary packet loss as well as delay.  This mechanism
98	   relies on some form of Shared Bottleneck Detection (SBD); here, a
99	   measurement-based SBD approach is described.

101	1.1.  The signals

103	   The current Internet is unable to explicitly inform endpoints as to
104	   which flows share bottlenecks, so endpoints need to infer this from
105	   packet loss and packet delay.

107	1.1.1.  Packet Loss

109	   Packet loss is often a relatively rare signal.  Therefore, on its own
110	   it is of limited use for SBD, however, it is a valuable supplementary
111	   measure when it is more prevalent.

113	1.1.2.  Packet Delay

115	   End-to-end delay measurements include noise from every device along
116	   the path in addition to the delay perturbation at the bottleneck
117	   device.  The noise is often significantly increased if the round-trip
118	   time is used.  The cleanest signal is obtained by using One-Way-Delay
119	   (OWD).

121	   Measuring absolute OWD is difficult since it requires both the sender
122	   and receiver clocks to be synchronised.  However, since the
123	   statistics being collected are relative to the mean OWD, a relative
124	   OWD measurement is sufficient.  Clock drift is not usually
125	   significant over the time intervals used by this SBD mechanism (see
126	   [RFC6817] A.2 for a discussion on clock drift and OWD measurements).

128	   Each packet arriving at the bottleneck buffer may experience very
129	   different queue lengths, and therefore different waiting times.  A
130	   single OWD sample does therefore not characterize the actual OWD of a
131	   path well.  However, multiple OWD measurements do reflect the
132	   distribution of delays experienced at the bottleneck.

134	1.1.3.  Path Lag

136	   Flows that share a common bottleneck may traverse different paths,
137	   and these paths will often have different base delays.  This makes it
138	   difficult to correlate changes in delay or loss.  This technique uses
139	   the long term shape of the delay distribution as a base for
140	   comparison to counter this.

142	2.  Definitions

144	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
145	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
146	   document are to be interpreted as described in RFC 2119 [RFC2119].

148	   Acronyms used in this document:

150	      OWD -- One Way Delay

152	      RTT -- Round Trip Time

154	      SBD -- Shared Bottleneck Detection

156	   Conventions used in this document:

158	      T     --     the base time interval over which measurements are
159	                   made.

161	      N     --     the number of base time, T, intervals used in some
162	                   calculations.

164	      sum_T(...) --  summation of all the measurements of the variable
165	                   in parentheses taken over the interval T

167	      sum_N(...) --  summation of N terms of the variable in parentheses

169	      sum_NT(...) --  summation of all measurements taken over the
170	                   interval N*T

172	      E_T(...) --  the expectation or mean of the measurements of the
173	                   variable in parentheses over T

175	      E_N(...) --  The expectation or mean of the last N values of the
176	                   variable in parentheses

178	      max_T(...) --  the maximum recorded measurement of the variable in
179	                   parentheses taken over the interval T

181	      min_T(...) --  the minimum recorded measurement of the variable in
182	                   parentheses taken over the interval T

184	      num_T(...) --  the count of measurements of the variable in
185	                   parenthesis taken in the interval T

187	      p_l, p_f, p_pdf, c_s, p_s, p_d, p_v --  various thresholds used in
188	                   the mechanism.

190	2.1.  Parameter Values

192	   Reference [Hayes-LCN14] uses T=350ms, N=50, p_l = 0.1, p_f = 0.2,
193	   p_pdf = 0.3, c_s = 0.0, p_s = p_d = p_v = 0.2.  These are values that
194	   seem to work well over a wide range of practical Internet conditions.

196	3.  Mechanism

198	   The mechanism described in this document is based on the observation
199	   that the distribution of delay measurements of packets from flows
200	   that share a common bottleneck have similar shape characteristics.
201	   These shape characteristics are described using 3 key summary
202	   statistics:

204	      variance (estimate PDV, see Section 3.1.3)

206	      skewness (estimate skew_est, see Section 3.1.2)

208	      oscillation (estimate freq_est, see Section 3.1.4)

210	   Summary statistics help to address both the noise and the path lag
211	   problems by describing the general shape over a relatively long
212	   period of time.  This is sufficient for their application in coupled
213	   congestion control for RTP Media.  They can be signalled from a
214	   receiver, which measures the OWD and calculates the summary
215	   statistics, to a sender, which is the entity that is transmitting the
216	   media stream.  An RTP Media device may be both a sender and a
217	   receiver.  SBD can be performed at either Sender or receiver or both.

219	                                  +----+
220	                                  | H2 |
221	                                  +----+
222	                                     |
223	                                     | L2
224	                                     |
225	                         +----+  L1  |  L3  +----+
226	                         | H1 |------|------| H3 |
227	                         +----+             +----+

229	       A network with 3 hosts (H1, H2, H3) and 3 links (L1, L2, L3).

231	                                 Figure 1

233	   In Figure 1, there are two possible cases for shared bottleneck
234	   detection: a sender-based and a receiver-based case.

236	   1.  Sender-based: consider a situation where host H1 sends media
237	       streams to hosts H2 and H3, and L1 is a shared bottleneck.  H2
238	       and H3 measure the OWD and calculate summary statistics, which
239	       they send to H1 every T. H1, having this knowledge, can determine
240	       the shared bottleneck and accordingly control the send rates.

242	   2.  Receiver-based: consider that H2 is also sending media to H3, and
243	       L3 is a shared bottleneck.  If H3 sends summary statistics to H1
244	       and H2, neither H1 nor H2 alone obtain enough knowledge to detect
245	       this shared bottleneck; H3 can however determine it by combining
246	       the summary statistics related to H1 and H2, respectively.  This
247	       case is applicable when send rates are controlled by the
248	       receiver; then, the signal from H3 to the senders contains the
249	       sending rate.

251	   A discussion of the required signaling for the receiver-based case is
252	   beyond the scope of this document.  For the sender-based case, the
253	   messages and their data format will be defined here in future
254	   versions of this document.  We envision that an initialization
255	   message from the sender to the receiver could specify which key
256	   metrics are requested out of a possibly extensible set (PL_NT, PDV,
257	   skew_est, freq_est).  The grouping algorithm described in this
258	   document requires all four of these metrics, and receivers MUST be
259	   able to provide them, but future algorithms may be able to exploit
260	   other metrics (e.g. metrics based on explicit network signals).
261	   Moreover, the initialization message could specify T, N, and the
262	   necessary resolution and precision (number of bits per field).

264	3.1.  Key metrics and their calculation

266	   Measurements are calculated over a base interval, T. T should be long
267	   enough to provide enough samples for a good estimate of skewness, but
268	   short enough so that a measure of the oscillation can be made from N
269	   of these estimates.  Reference [Hayes-LCN14] uses T = 350ms and N =
270	   50, which are values that seem to work well over a wide range of
271	   practical Internet conditions.

273	3.1.1.  Mean delay

275	   The mean delay is not a useful signal for comparisons between flows
276	   since flows may traverse quite different paths and clocks will not
277	   necessarily be synchronized.  However, it is a base measure for the 3
278	   summary statistics.  The mean delay, E_T(OWD), is the average one way
279	   delay measured over T.

281	   To facilitate the other calculations, the last N E_T(OWD) values will
282	   need to be stored in a cyclic buffer along with the moving average of
283	   E_T(OWD):

285	      E_N(E_T(OWD)) = sum_N(E_T(OWD)) / N

287	3.1.2.  Skewness Estimate

289	   Skewness is difficult to calculate efficiently and accurately.
290	   Ideally it should be calculated over the entire period (N * T) from
291	   the mean OWD over that period.  However this would require storing
292	   every delay measurement over the period.  Instead, an estimate is
293	   made over T using the previous calculation of E_T(OWD).  Comparisons
294	   are made using the mean of N skew estimates.

296	   The skewness is estimated using two counters, counting the number of
297	   one way delay samples (OWD) above and below the mean:

299	      skew_est = (sum_T(OWD < E_NT(OWD)) - sum_T(OWD > E_NT(OWD))) /
300	      num_T(OWD)

302	         where

304	            if (OWD < E_NT(OWD)) 1 else 0

306	            if (OWD > E_NT(OWD)) 1 else 0

308	         skew_est is a number between -1 and 1

310	      E_N(skew_est) = sum_N(skew_est) / N

312	   For implementation ease, E_NT(OWD) does not include the mean of the
313	   current T interval.  Care must be taken when implementing the
314	   comparisons to ensure that rounding does not bias skew_est.

316	3.1.3.  Variance Estimate

318	   Packet Delay Variation (PDV) ([RFC5481] and [ITU-Y1540]) is used as
319	   an estimator of the variance of the delay signal.  We define PDV as
320	   follows:

322	      PDV = (max_T(OWD) - E_T(OWD))

324	      E_N(PDV) = sum_N(PDV) / N

326	   This modifies PDV as outlined in [RFC5481] to provide a summary
327	   statistic version that best aids the grouping decisions of the
328	   algorithm (see [Hayes-LCN14] section IVB).

330	   The use of PDV = (min_T(OWD) - E_T(OWD)) is currently being
331	   investigated as an alternative that is less sensitive to noise.

333	3.1.4.  Oscillation Estimate

335	   An estimate of the low frequency oscillation of the delay signal is
336	   calculated by counting and normalising the significant mean,
337	   E_T(OWD), crossings of E_N(E_T(OWD)):

339	      freq_est = number_of_crossings / N

341	      Where

343	         we define a significant mean crossing as a crossing that
344	         extends p_v * E_N(PDV) from E_N(E_T(OWD)).  In our experiments
345	         we have found that p_v = 0.2 is a good value.

347	   Freq_est is a number between 0 and 1.  Freq_est can be approximated
348	   incrementally as follows:

350	      With each new calculation of E_T(OWD) a decision is made as to
351	      whether this value of E_T(OWD) significantly crosses the current
352	      long term mean, E_N(E_T(OWD), with respect to the previous
353	      significant mean crossing.

355	      A cyclic buffer, last_N_crossings, records a 1 if there is a
356	      significant mean crossing, otherwise a 0.

358	      The counter, number_of_crossings, is incremented when there is a
359	      significant mean crossing and subtracted from when a non zero
360	      value is removed from the last_N_crossings.

362	   This approximation of freq_est was not used in [Hayes-LCN14], which
363	   calculated freq_est every T using the current E_N(E_T(OWD)).  Our
364	   tests show that this approximation of freq_est yields results that
365	   are almost identical to when the full calculation is performed every
366	   T.

368	3.1.5.  Packet loss

370	   The proportion of packets lost is used as a supplementary measure:

372	      PL_NT = sum_NT(lost packets) / sum_NT(total packets)

374	3.2.  Flow Grouping

376	3.2.1.  Flow Grouping Algorithm

378	   The following grouping algorithm is RECOMMENDED for SBD in this
379	   context and is sufficient and efficient for small to moderate numbers
380	   of flows.  For very large numbers of flows (e.g. hundreds), a more
381	   complex clustering algorithm may be substituted.

383	   Since no single metric is precise enough to group flows (due to
384	   noise), the algorithm uses multiple metrics.  Each metric offers a
385	   different "view" of the bottleneck link characteristics, and used
386	   together enable a more precise grouping of flows than would otherwise
387	   be possible.

389	   Flows determined to be experiencing congestion are successively
390	   divided into groups based on freq_est, PDV, and skew_est.

392	   The first step is to determine which flows are experiencing
393	   congestion.  This is important, since if a flow is not experiencing
394	   congestion its delay based metrics will not describe the bottleneck,
395	   but the "noise" from the rest of the path.  Skewness, with proportion
396	   of packets loss as a supplementary measure, is used to do this:

398	   1.  Grouping will be performed on flows where:

400	          E_N(skew_est) < c_s || PL_NT > p_l.

402	   The parameter c_s controls how sensitive the mechanism is in
403	   detecting congestion.  C_s = 0.0 was used in [Hayes-LCN14].  A value
404	   of c_s = 0.05 is a little more sensitive, and c_s = -0.05 is a little
405	   less sensitive.

407	   These flows, flows experiencing congestion, are then progressively
408	   divided into groups based on the freq_est, PDV, and skew_est summary
409	   statistics.  The process proceeds according to the following steps:

411	   2.  Group flows whose difference in sorted freq_est is less than a
412	       threshold:

414	          diff(freq_est) < p_f

416	   3.  Group flows whose difference in sorted E_N(PDV) is less than a
417	       threshold:

419	          diff(E_N(PDV)) < (p_pdv * E_N(PDV))

421	   4.  Group flows whose difference in sorted E_N(skew_est) or PL_NT is
422	       less than a threshold:

424	          if PL_NT < p_l

426	             diff(E_N(skewness)) < p_s

428	          otherwise

430	             diff(PL_NT) < p_d

432	   This procedure involves sorting the groups, according to the measure
433	   being used to divide them.  It is simple to implement, and efficient
434	   for small numbers of flows, such as are expected in RTCWEB.

436	3.2.2.  Using the flow group signal

438	   A grouping decisions is made every T from the second T, though they
439	   will not attain their full design accuracy until after the N'th T
440	   interval.

442	   Network conditions can cause bottlenecks to fluctuate.  A coupled
443	   congestion controller MAY decide only to couple groups that remain
444	   stable, say grouped together 90% of the time, depending on its
445	   objectives.  Recommendations concerning this are beyond the scope of
446	   this draft and will be specific to the coupled congestion controllers
447	   objectives.

449	4.  Measuring OWD

451	   This section discusses the OWD measurements required for this
452	   algorithm to detect shared bottlenecks.

454	   The SBD mechanism described in this draft relies on differences
455	   between OWD measurements to avoid the practical problems with
456	   measuring absolute OWD (see [Hayes-LCN14] section IIIC).  Since all
457	   summary statistics are relative to the mean OWD and sender/receiver
458	   clock offsets are approximately constant over the measurement
459	   periods, the offset is subtracted out in the calculation.

461	4.1.  Time stamp resolution

463	   The SBD mechanism requires timing information precise enough to be
464	   able to make comparisons.  As a rule of thumb, the time resolution
465	   should be less than one hundredth of a typical path's range of
466	   delays.  In general, the lower the time resolution, the more care
467	   that needs to be taken to ensure rounding errors don't bias the
468	   skewness calculation.

470	   Typical RTP media flows use sub-millisecond timers, which should be
471	   adequate in most situations.

473	5.  Acknowledgements

475	   This work was part-funded by the European Community under its Seventh
476	   Framework Programme through the Reducing Internet Transport Latency
477	   (RITE) project (ICT-317700).  The views expressed are solely those of
478	   the authors.

480	6.  IANA Considerations

482	   This memo includes no request to IANA.

484	7.  Security Considerations

486	   The security considerations of RFC 3550 [RFC3550], RFC 4585
487	   [RFC4585], and RFC 5124 [RFC5124] are expected to apply.

489	   Non-authenticated RTCP packets carrying shared bottleneck indications
490	   and summary statistics could allow attackers to alter the bottleneck
491	   sharing characteristics for private gain or disruption of other
492	   parties communication.

494	8.  References

496	8.1.  Normative References

498	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
499	              Requirement Levels", BCP 14, RFC 2119, March 1997.

501	8.2.  Informative References

503	   [Hayes-LCN14]
504	              Hayes, D., Ferlin, S., and M. Welzl, "Practical Passive
505	              Shared Bottleneck Detection using Shape Summary
506	              Statistics", Proc. the IEEE Local Computer Networks
507	              (LCN) p150-158, September 2014, <http://heim.ifi.uio.no/
508	              davihay/
509	              hayes14__pract_passiv_shared_bottl_detec-abstract.html>.

511	   [I-D.welzl-rmcat-coupled-cc]
512	              Welzl, M., Islam, S., and S. Gjessing, "Coupled congestion
513	              control for RTP media", draft-welzl-rmcat-coupled-cc-04
514	              (work in progress), October 2014.

516	   [ITU-Y1540]
517	              ITU-T, "Internet protocol data communication service - IP
518	              packet transfer and availability performance parameters",
519	              Series Y: Global Information Infrastructure, Internet
520	              Protocol Aspects and Next-Generation Networks ,
521	              March 2011,
522	              <http://www.itu.int/rec/T-REC-Y.1540-201103-I/en>.

524	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
525	              Jacobson, "RTP: A Transport Protocol for Real-Time
526	              Applications", STD 64, RFC 3550, July 2003.

528	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
529	              "Extended RTP Profile for Real-time Transport Control
530	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
531	              July 2006.

533	   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
534	              Real-time Transport Control Protocol (RTCP)-Based Feedback
535	              (RTP/SAVPF)", RFC 5124, February 2008.

537	   [RFC5481]  Morton, A. and B. Claise, "Packet Delay Variation
538	              Applicability Statement", RFC 5481, March 2009.

540	   [RFC6817]  Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
541	              "Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
542	              December 2012.

544	Authors' Addresses

546	   David Hayes (editor)
547	   University of Oslo
548	   PO Box 1080 Blindern
549	   Oslo,   N-0316
550	   Norway

552	   Phone: +47 2284 5566
553	   Email: davihay@ifi.uio.no

555	   Simone Ferlin
556	   Simula Research Laboratory
557	   P.O.Box 134
558	   Lysaker,   1325
559	   Norway

561	   Phone: +47 4072 0702
562	   Email: ferlin@simula.no

564	   Michael Welzl
565	   University of Oslo
566	   PO Box 1080 Blindern
567	   Oslo,   N-0316
568	   Norway

570	   Phone: +47 2285 2420
571	   Email: michawe@ifi.uio.no