idnits 2.17.1 

draft-ietf-rmcat-sbd-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 8, 2015) is 3274 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-05) exists of
     draft-welzl-rmcat-coupled-cc-04


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	RTP Media Congestion Avoidance                             D. Hayes, Ed.
3	Techniques                                            University of Oslo
4	Internet-Draft                                                 S. Ferlin
5	Intended status: Experimental                 Simula Research Laboratory
6	Expires: November 9, 2015                                       M. Welzl
7	                                                      University of Oslo
8	                                                             May 8, 2015

10	   Shared Bottleneck Detection for Coupled Congestion Control for RTP
11	                                 Media.
12	                        draft-ietf-rmcat-sbd-00

14	Abstract

16	   This document describes a mechanism to detect whether end-to-end data
17	   flows share a common bottleneck.  It relies on summary statistics
18	   that are calculated by a data receiver based on continuous
19	   measurements and regularly fed to a grouping algorithm that runs
20	   wherever the knowledge is needed.  This mechanism complements the
21	   coupled congestion control mechanism in draft-welzl-rmcat-coupled-cc.

23	Status of this Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on November 9, 2015.

40	Copyright Notice

42	   Copyright (c) 2015 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
58	     1.1.  The signals  . . . . . . . . . . . . . . . . . . . . . . .  3
59	       1.1.1.  Packet Loss  . . . . . . . . . . . . . . . . . . . . .  3
60	       1.1.2.  Packet Delay . . . . . . . . . . . . . . . . . . . . .  3
61	       1.1.3.  Path Lag . . . . . . . . . . . . . . . . . . . . . . .  4
62	   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  4
63	     2.1.  Parameter Values . . . . . . . . . . . . . . . . . . . . .  5
64	   3.  Mechanism  . . . . . . . . . . . . . . . . . . . . . . . . . .  6
65	     3.1.  Key metrics and their calculation  . . . . . . . . . . . .  7
66	       3.1.1.  Mean delay . . . . . . . . . . . . . . . . . . . . . .  7
67	       3.1.2.  Skewness Estimate  . . . . . . . . . . . . . . . . . .  8
68	       3.1.3.  Variance Estimate  . . . . . . . . . . . . . . . . . .  9
69	       3.1.4.  Oscillation Estimate . . . . . . . . . . . . . . . . .  9
70	       3.1.5.  Packet loss  . . . . . . . . . . . . . . . . . . . . . 10
71	     3.2.  Flow Grouping  . . . . . . . . . . . . . . . . . . . . . . 10
72	       3.2.1.  Flow Grouping Algorithm  . . . . . . . . . . . . . . . 10
73	       3.2.2.  Using the flow group signal  . . . . . . . . . . . . . 12
74	     3.3.  Removing Noise from the Estimates  . . . . . . . . . . . . 12
75	       3.3.1.  Oscillation noise  . . . . . . . . . . . . . . . . . . 12
76	       3.3.2.  Clock drift  . . . . . . . . . . . . . . . . . . . . . 13
77	       3.3.3.  Bias in the skewness measure . . . . . . . . . . . . . 14
78	     3.4.  Reducing lag and Improving Responsiveness  . . . . . . . . 14
79	       3.4.1.  Improving the response of the skewness estimate  . . . 15
80	       3.4.2.  Improving the response of the variance estimate  . . . 15
81	   4.  Measuring OWD  . . . . . . . . . . . . . . . . . . . . . . . . 16
82	     4.1.  Time stamp resolution  . . . . . . . . . . . . . . . . . . 16
83	   5.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16
84	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 16
85	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 16
86	   8.  Change history . . . . . . . . . . . . . . . . . . . . . . . . 17
87	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
88	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 17
89	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 17
90	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18

92	1.  Introduction

94	   In the Internet, it is not normally known if flows (e.g., TCP
95	   connections or UDP data streams) traverse the same bottlenecks.  Even
96	   flows that have the same sender and receiver may take different paths
97	   and share a bottleneck or not.  Flows that share a bottleneck link
98	   usually compete with one another for their share of the capacity.
99	   This competition has the potential to increase packet loss and
100	   delays.  This is especially relevant for interactive applications
101	   that communicate simultaneously with multiple peers (such as multi-
102	   party video).  For RTP media applications such as RTCWEB,
103	   [I-D.welzl-rmcat-coupled-cc] describes a scheme that combines the
104	   congestion controllers of flows in order to honor their priorities
105	   and avoid unnecessary packet loss as well as delay.  This mechanism
106	   relies on some form of Shared Bottleneck Detection (SBD); here, a
107	   measurement-based SBD approach is described.

109	1.1.  The signals

111	   The current Internet is unable to explicitly inform endpoints as to
112	   which flows share bottlenecks, so endpoints need to infer this from
113	   whatever information is available to them.  The mechanism described
114	   here currently utilises packet loss and packet delay, but is not
115	   restricted to these.

117	1.1.1.  Packet Loss

119	   Packet loss is often a relatively rare signal.  Therefore, on its own
120	   it is of limited use for SBD, however, it is a valuable supplementary
121	   measure when it is more prevalent.

123	1.1.2.  Packet Delay

125	   End-to-end delay measurements include noise from every device along
126	   the path in addition to the delay perturbation at the bottleneck
127	   device.  The noise is often significantly increased if the round-trip
128	   time is used.  The cleanest signal is obtained by using One-Way-Delay
129	   (OWD).

131	   Measuring absolute OWD is difficult since it requires both the sender
132	   and receiver clocks to be synchronised.  However, since the
133	   statistics being collected are relative to the mean OWD, a relative
134	   OWD measurement is sufficient.  Clock drift is not usually
135	   significant over the time intervals used by this SBD mechanism (see
136	   [RFC6817] A.2 for a discussion on clock drift and OWD measurements).
137	   However, in circumstances where it is significant, Section 3.3.2
138	   outlines a way of adjusting the calculations to cater for it.

140	   Each packet arriving at the bottleneck buffer may experience very
141	   different queue lengths, and therefore different waiting times.  A
142	   single OWD sample does not, therefore, characterize the path well.
143	   However, multiple OWD measurements do reflect the distribution of
144	   delays experienced at the bottleneck.

146	1.1.3.  Path Lag

148	   Flows that share a common bottleneck may traverse different paths,
149	   and these paths will often have different base delays.  This makes it
150	   difficult to correlate changes in delay or loss.  This technique uses
151	   the long term shape of the delay distribution as a base for
152	   comparison to counter this.

154	2.  Definitions

156	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
157	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
158	   document are to be interpreted as described in RFC 2119 [RFC2119].

160	   Acronyms used in this document:

162	      OWD -- One Way Delay

164	      PDV -- Packet Delay Variation

166	      RTT -- Round Trip Time

168	      SBD -- Shared Bottleneck Detection

170	   Conventions used in this document:

172	      T     --     the base time interval over which measurements are
173	                   made.

175	      N     --     the number of base time, T, intervals used in some
176	                   calculations.

178	      sum_T(...) --  summation of all the measurements of the variable
179	                   in parentheses taken over the interval T

181	      sum(...) --  summation of terms of the variable in parentheses

183	      sum_N(...) --  summation of N terms of the variable in parentheses
184	      sum_NT(...) --  summation of all measurements taken over the
185	                   interval N*T

187	      E_T(...) --  the expectation or mean of the measurements of the
188	                   variable in parentheses over T

190	      E_N(...) --  The expectation or mean of the last N values of the
191	                   variable in parentheses

193	      E_M(...) --  The expectation or mean of the last M values of the
194	                   variable in parentheses, where M <= N.

196	      max_T(...) --  the maximum recorded measurement of the variable in
197	                   parentheses taken over the interval T

199	      min_T(...) --  the minimum recorded measurement of the variable in
200	                   parentheses taken over the interval T

202	      num_T(...) --  the count of measurements of the variable in
203	                   parentheses taken in the interval T

205	      num_VM(...) --  the count of valid values of the variable in
206	                   parentheses given M records

208	      PC --        a boolean variable indicating the particular flow was
209	                   identified as experiencing congestion in the previous
210	                   interval T (i.e.  Previously Congested)

212	      CD_T --      an estimate of the effect of Clock Drift on the mean
213	                   OWD per T

215	      CD_Adj(...) --  Mean OWD adjusted for clock drift

217	      p_l, p_f, p_pdv, c_s, c_h, p_s, p_d, p_v --  various thresholds
218	                   used in the mechanism.

220	      N, M, and F --  number of values (calculated over T).

222	2.1.  Parameter Values

224	   Reference [Hayes-LCN14] uses T=350ms, N=50, p_l = 0.1.  The other
225	   parameters have been tightened to reflect minor enhancements to the
226	   algorithm outlined in Section 3.3: c_s = -0.01, p_f = p_s = p_d =
227	   0.1, p_pdv = 0.2, p_v = 0.2.  M=50, F=10, and c_h = 0.3 are
228	   additional parameters defined in the document.  These are values that
229	   seem to work well over a wide range of practical Internet conditions,
230	   but are the subject of ongoing tests.

232	3.  Mechanism

234	   The mechanism described in this document is based on the observation
235	   that the distribution of delay measurements of packets from flows
236	   that share a common bottleneck have similar shape characteristics.
237	   These shape characteristics are described using 3 key summary
238	   statistics:

240	      variance (estimate var_est, see Section 3.1.3)

242	      skewness (estimate skew_est, see Section 3.1.2)

244	      oscillation (estimate freq_est, see Section 3.1.4)

246	   with packet loss (estimate pkt_loss, see Section 3.1.5) used as a
247	   supplementary statistic.

249	   Summary statistics help to address both the noise and the path lag
250	   problems by describing the general shape over a relatively long
251	   period of time.  This is sufficient for their application in coupled
252	   congestion control for RTP Media.  They can be signalled from a
253	   receiver, which measures the OWD and calculates the summary
254	   statistics, to a sender, which is the entity that is transmitting the
255	   media stream.  An RTP Media device may be both a sender and a
256	   receiver.  SBD can be performed at either Sender or receiver or both.

258	                                  +----+
259	                                  | H2 |
260	                                  +----+
261	                                     |
262	                                     | L2
263	                                     |
264	                         +----+  L1  |  L3  +----+
265	                         | H1 |------|------| H3 |
266	                         +----+             +----+

268	       A network with 3 hosts (H1, H2, H3) and 3 links (L1, L2, L3).

270	                                 Figure 1

272	   In Figure 1, there are two possible cases for shared bottleneck
273	   detection: a sender-based and a receiver-based case.

275	   1.  Sender-based: consider a situation where host H1 sends media
276	       streams to hosts H2 and H3, and L1 is a shared bottleneck.  H2
277	       and H3 measure the OWD and calculate summary statistics, which
278	       they send to H1 every T. H1, having this knowledge, can determine
279	       the shared bottleneck and accordingly control the send rates.

281	   2.  Receiver-based: consider that H2 is also sending media to H3, and
282	       L3 is a shared bottleneck.  If H3 sends summary statistics to H1
283	       and H2, neither H1 nor H2 alone obtain enough knowledge to detect
284	       this shared bottleneck; H3 can however determine it by combining
285	       the summary statistics related to H1 and H2, respectively.  This
286	       case is applicable when send rates are controlled by the
287	       receiver; then, the signal from H3 to the senders contains the
288	       sending rate.

290	   A discussion of the required signalling for the receiver-based case
291	   is beyond the scope of this document.  For the sender-based case, the
292	   messages and their data format will be defined here in future
293	   versions of this document.  We envision that an initialization
294	   message from the sender to the receiver could specify which key
295	   metrics are requested out of a possibly extensible set (pkt_loss,
296	   var_est, skew_est, freq_est).  The grouping algorithm described in
297	   this document requires all four of these metrics, and receivers MUST
298	   be able to provide them, but future algorithms may be able to exploit
299	   other metrics (e.g. metrics based on explicit network signals).
300	   Moreover, the initialization message could specify T, N, and the
301	   necessary resolution and precision (number of bits per field).

303	3.1.  Key metrics and their calculation

305	   Measurements are calculated over a base interval, T. T should be long
306	   enough to provide enough samples for a good estimate of skewness, but
307	   short enough so that a measure of the oscillation can be made from N
308	   of these estimates.  Reference [Hayes-LCN14] uses T = 350ms and
309	   N=M=50, which are values that seem to work well over a wide range of
310	   practical Internet conditions.

312	3.1.1.  Mean delay

314	   The mean delay is not a useful signal for comparisons between flows
315	   since flows may traverse quite different paths and clocks will not
316	   necessarily be synchronized.  However, it is a base measure for the 3
317	   summary statistics.  The mean delay, E_T(OWD), is the average one way
318	   delay measured over T.

320	   To facilitate the other calculations, the last N E_T(OWD) values will
321	   need to be stored in a cyclic buffer along with the moving average of
322	   E_T(OWD):

324	      mean_delay = E_M(E_T(OWD)) = sum_M(E_T(OWD)) / M

326	   where M <= N. Generally M=N, setting M to be less than N allows the
327	   mechanism to be more responsive to changes, but potentially at the
328	   expense of a higher error rate (see Section 3.4 for a discussion on
329	   improving the responsiveness of the mechanism.)

331	3.1.2.  Skewness Estimate

333	   Skewness is difficult to calculate efficiently and accurately.
334	   Ideally it should be calculated over the entire period (M * T) from
335	   the mean OWD over that period.  However this would require storing
336	   every delay measurement over the period.  Instead, an estimate is
337	   made over T using the previous calculation of mean_delay.
338	   Comparisons are made using the mean of M skew estimates (an
339	   alternative that removes bias in the mean is given in Section 3.3.3).

341	   The skewness is estimated using two counters, counting the number of
342	   one way delay samples (OWD) above and below the mean:

344	      skew_est_T =  (sum_T(OWD < mean_delay)

346	                    - sum_T(OWD > mean_delay)) / num_T(OWD)

348	         where

350	            if (OWD < mean_delay) 1 else 0

352	            if (OWD > mean_delay) 1 else 0

354	         skew_est_T is a number between -1 and 1

356	      skew_est = E_M(skew_est_T) = sum_M(skew_est_T) / M

358	   For implementation ease, mean_delay does not include the mean of the
359	   current T interval.

361	   Note: Care must be taken when implementing the comparisons to ensure
362	   that rounding does not bias skew_est.  It is important that the mean
363	   is calculated with a higher precision than the samples.

365	3.1.3.  Variance Estimate

367	   Packet Delay Variation (PDV) ([RFC5481] and [ITU-Y1540]) is used as
368	   an estimator of the variance of the delay signal.  We define PDV as
369	   follows:

371	      PDV = PDV_max = max_T(OWD) - E_T(OWD)

373	      var_est = E_M(PDV) = sum_M(PDV) / M

375	   This modifies PDV as outlined in [RFC5481] to provide a summary
376	   statistic version that best aids the grouping decisions of the
377	   algorithm (see [Hayes-LCN14] section IVB).

379	   The use of PDV = PDV_min = E_T(OWD) - min_T(OWD) is currently being
380	   investigated as an alternative that is less sensitive to noise.  The
381	   drawback of using PDV_min is that it does not distinguish between
382	   groups of flows with similar values of skew_est as well as PDV_max
383	   (see [Hayes-LCN14] section IVB).

385	3.1.4.  Oscillation Estimate

387	   An estimate of the low frequency oscillation of the delay signal is
388	   calculated by counting and normalising the significant mean,
389	   E_T(OWD), crossings of mean_delay:

391	      freq_est = number_of_crossings / N

393	      Where

395	         we define a significant mean crossing as a crossing that
396	         extends p_v * var_est from mean_delay.  In our experiments we
397	         have found that p_v = 0.2 is a good value.

399	   Freq_est is a number between 0 and 1.  Freq_est can be approximated
400	   incrementally as follows:

402	      With each new calculation of E_T(OWD) a decision is made as to
403	      whether this value of E_T(OWD) significantly crosses the current
404	      long term mean, mean_delay, with respect to the previous
405	      significant mean crossing.

407	      A cyclic buffer, last_N_crossings, records a 1 if there is a
408	      significant mean crossing, otherwise a 0.

410	      The counter, number_of_crossings, is incremented when there is a
411	      significant mean crossing and subtracted from when a non-zero
412	      value is removed from the last_N_crossings.

414	   This approximation of freq_est was not used in [Hayes-LCN14], which
415	   calculated freq_est every T using the current E_N(E_T(OWD)).  Our
416	   tests show that this approximation of freq_est yields results that
417	   are almost identical to when the full calculation is performed every
418	   T.

420	3.1.5.  Packet loss

422	   The proportion of packets lost is used as a supplementary measure:

424	      pkt_loss = sum_NT(lost packets) / sum_NT(total packets)

426	   Note: When pkt_loss is small it is very variable, however, when
427	   pkt_loss is high it becomes a stable measure for making grouping
428	   decisions.

430	3.2.  Flow Grouping

432	3.2.1.  Flow Grouping Algorithm

434	   The following grouping algorithm is RECOMMENDED for SBD in the RMCAT
435	   context and is sufficient and efficient for small to moderate numbers
436	   of flows.  For very large numbers of flows (e.g. hundreds), a more
437	   complex clustering algorithm may be substituted.

439	   Since no single metric is precise enough to group flows (due to
440	   noise), the algorithm uses multiple metrics.  Each metric offers a
441	   different "view" of the bottleneck link characteristics, and used
442	   together they enable a more precise grouping of flows than would
443	   otherwise be possible.

445	   Flows determined to be experiencing congestion are successively
446	   divided into groups based on freq_est, var_est, and skew_est.

448	   The first step is to determine which flows are experiencing
449	   congestion.  This is important, since if a flow is not experiencing
450	   congestion its delay based metrics will not describe the bottleneck,
451	   but the "noise" from the rest of the path.  Skewness, with proportion
452	   of packets loss as a supplementary measure, is used to do this:

454	   1.  Grouping will be performed on flows where:

456	          skew_est < c_s

458	             || ( skew_est < c_h && PC )

460	             || pkt_loss > p_l

462	   The parameter c_s controls how sensitive the mechanism is in
463	   detecting congestion.  C_s = 0.0 was used in [Hayes-LCN14].  A value
464	   of c_s = 0.05 is a little more sensitive, and c_s = -0.05 is a little
465	   less sensitive.  C_h controls the hysteresis on flows that were
466	   grouped as experiencing congestion last time.

468	   These flows, flows experiencing congestion, are then progressively
469	   divided into groups based on the freq_est, PDV, and skew_est summary
470	   statistics.  The process proceeds according to the following steps:

472	   2.  Group flows whose difference in sorted freq_est is less than a
473	       threshold:

475	          diff(freq_est) < p_f

477	   3.  Group flows whose difference in sorted E_N(PDV) (highest to
478	       lowest) is less than a threshold:

480	          diff(var_est) < (p_pdv * var_est)

482	       The threshold, (p_pdv * var_est), is with respect to the highest
483	       value in the difference.

485	   4.  Group flows whose difference in sorted skew_est or pkt_loss is
486	       less than a threshold:

488	          if pkt_loss < p_l

490	             diff(skew_est) < p_s

492	          otherwise

494	             diff(pkt_loss) < (p_d * pkt_loss)

496	          The threshold, (p_d * pkt_loss), is with respect to the
497	          highest value in the difference.

499	   This procedure involves sorting estimates from highest to lowest.  It
500	   is simple to implement, and efficient for small numbers of flows,
501	   such as are expected in RTCWEB.

503	3.2.2.  Using the flow group signal

505	   A grouping decisions is made every T from the second T, though they
506	   will not attain their full design accuracy until after the N'th T
507	   interval.

509	   Network conditions, and even the congestion controllers, can cause
510	   bottlenecks to fluctuate.  A coupled congestion controller MAY decide
511	   only to couple groups that remain stable, say grouped together 90% of
512	   the time, depending on its objectives.  Recommendations concerning
513	   this are beyond the scope of this draft and will be specific to the
514	   coupled congestion controllers objectives.

516	3.3.  Removing Noise from the Estimates

518	   The following describe small changes to the calculation of the key
519	   metrics that help remove noise from them.  Currently these "tweaks"
520	   are described separately to keep the main description succinct.  In
521	   future revisions of the draft these enhancements may replace the
522	   original key metric calculations.

524	3.3.1.  Oscillation noise

526	   When a path has no congestion, the PDV will be very small and the
527	   recorded significant mean crossings will be the result of path noise.
528	   Thus up to N-1 meaningless mean crossings can be a source of error at
529	   the point a link becomes a bottleneck and flows traversing it begin
530	   to be grouped.

532	   To remove this source of noise from freq_est:

534	   1.  Set the current PDV to PDV = NaN (a value representing an invalid
535	       record, ie Not a Number) for flows that are deemed to not be
536	       experiencing congestion by the first skew_est based grouping test
537	       (see Section 3.2.1).

539	   2.  Then var_est = sum_M(PDV != NaN) / num_VM(PDV)

541	   3.  For freq_est, only record a significant mean crossing if flow is
542	       experiencing congestion.

544	   These three changes will remove the non-congestion noise from
545	   freq_est.

547	3.3.2.  Clock drift

549	   Generally sender and receiver clock drift will be too small to cause
550	   significant errors in the estimators.  Skew_est is most sensitive to
551	   this type of noise.  In circumstances where clock drift is high,
552	   making M < N can reduce this error.

554	   A better method is to estimate the effect the clock drift is having
555	   on the E_N(E_T(OWD)), and then adjust mean_delay accordingly.  A
556	   simple method of doing this follows:

558	      First divide the N E_T(OWD) values into two halves (N/2 in each)
559	      -- old and new.

561	      Calculate a mean of the old half:

563	         Older_mean = E_old(E_T(OWD)) / N/2

565	      Calculate a mean of the new (most recent) half:

567	         Newer_mean = E_new(E_T(OWD)) / N/2

569	      A linear estimate of the Clock Drift per T estimates is:

571	         CD_T = (Newer_mean - Older_mean)/N/2

573	      An adjusted mean estimate then is:

575	         mean_delay = CD_Adj(E_M(E_T(OWD))) = E_M(E_T(OWD)) + CD_T *
576	         (M/2 + 0.5)

578	   CD_Adj can be thought of as a prediction of what the long term mean
579	   will be in the current measurement period T. It is used as the basis
580	   for skew_est and freq_est.

582	3.3.3.  Bias in the skewness measure

584	   If successive calculations of skew_est are made with very different
585	   numbers of samples (num_T(OWD)), the simple calculation of
586	   E_M(skew_est) used for grouping decisions will be biased by the
587	   intervals that have few samples samples.  This bias can be corrected
588	   if necessary as follows.

590	      skew_base_T = sum_T(OWD < mean_delay) - sum_T(OWD > mean_delay)

592	         skew_est = sum_MT(skew_base_T)/num_MT(OWD)

594	   This calculation requires slightly more state, since an
595	   implementation will need to maintain two cyclic buffers storing
596	   skew_base_T and num_T(OWD) respectively to manage the rolling
597	   summations (note only one cyclic buffer is needed for the calculation
598	   of skew_est outlined previously).

600	3.4.  Reducing lag and Improving Responsiveness

602	   Measurement based shared bottleneck detection makes decisions in the
603	   present based on what has been measured in the past.  This means that
604	   there is always a lag in responding to changing conditions.  This
605	   mechanism is based on summary statistics taken over (N*T) seconds.
606	   This mechanism can be made more responsive to changing conditions by:

608	   1.  Reducing N and/or M -- but at the expense of less accurate
609	       metrics, and/or

611	   2.  Exploiting the fact that more recent measurements are more
612	       valuable than older measurements and weighting them accordingly.

614	   Although more recent measurements are more valuable, older
615	   measurements are still needed to gain an accurate estimate of the
616	   distribution descriptor we are measuring.  Unfortunately, the simple
617	   exponentially weighted moving average weights drop off too quickly
618	   for our requirements and have an infinite tail.  A simple linearly
619	   declining weighted moving average also does not provide enough weight
620	   to the most recent measurements.  We propose a piecewise linear
621	   distribution of weights, such that the first section (samples 1:F) is
622	   flat as in a simple moving average, and the second section (samples
623	   F+1:M) is linearly declining weights to the end of the averaging
624	   window.  We choose integer weights, which allows incremental
625	   calculation without introducing rounding errors.

627	3.4.1.  Improving the response of the skewness estimate

629	   The weighted moving average for skew_est, based on skew_est in
630	   Section 3.3.3, can be calculated as follows:

632	      skew_est = ((M-F+1)*sum(skew_base_T(1:F))

634	                      + sum([(M-F):1].*skew_base_T(F+1:M)))

636	                 / ((M-F+1)*sum(numsampT(1:F))

638	                      + sum([(M-F):1].*numsampT(F+1:M)))

640	   where numsampT is an array of the number of OWD samples in each T (ie
641	   num_T(OWD)), and numsampT(1) is the most recent; skew_base_T(1) is
642	   the most recent calculation of skew_base_T; 1:F refers to the integer
643	   values 1 through to F, and [(M-F):1] refers to an array of the
644	   integer values (M-F) declining through to 1; and ".*" is the array
645	   scalar dot product operator.

647	3.4.2.  Improving the response of the variance estimate

649	   The weighted moving average for var_est can be calculated as follows:

651	      var_est = ((M-F+1)*sum(PDV(1:F)) + sum([(M-F):1].*PDV(F+1:M)))

653	                / (F*(M-F+1) + sum([(M-F):1])

655	   where 1:F refers to the integer values 1 through to F, and [(M-F):1]
656	   refers to an array of the integer values (M-F) declining through to
657	   1; and ".*" is the array scalar dot product operator.  When removing
658	   oscillation noise (see Section 3.3.1) this calculation must be
659	   adjusted to allow for invalid PDV records.

661	4.  Measuring OWD

663	   This section discusses the OWD measurements required for this
664	   algorithm to detect shared bottlenecks.

666	   The SBD mechanism described in this draft relies on differences
667	   between OWD measurements to avoid the practical problems with
668	   measuring absolute OWD (see [Hayes-LCN14] section IIIC).  Since all
669	   summary statistics are relative to the mean OWD and sender/receiver
670	   clock offsets should be approximately constant over the measurement
671	   periods, the offset is subtracted out in the calculation.

673	4.1.  Time stamp resolution

675	   The SBD mechanism requires timing information precise enough to be
676	   able to make comparisons.  As a rule of thumb, the time resolution
677	   should be less than one hundredth of a typical path's range of
678	   delays.  In general, the lower the time resolution, the more care
679	   that needs to be taken to ensure rounding errors do not bias the
680	   skewness calculation.

682	   Typical RTP media flows use sub-millisecond timers, which should be
683	   adequate in most situations.

685	5.  Acknowledgements

687	   This work was part-funded by the European Community under its Seventh
688	   Framework Programme through the Reducing Internet Transport Latency
689	   (RITE) project (ICT-317700).  The views expressed are solely those of
690	   the authors.

692	6.  IANA Considerations

694	   This memo includes no request to IANA.

696	7.  Security Considerations

698	   The security considerations of RFC 3550 [RFC3550], RFC 4585
699	   [RFC4585], and RFC 5124 [RFC5124] are expected to apply.

701	   Non-authenticated RTCP packets carrying shared bottleneck indications
702	   and summary statistics could allow attackers to alter the bottleneck
703	   sharing characteristics for private gain or disruption of other
704	   parties communication.

706	8.  Change history

708	   Changes made to this document:

710	     02->WG-00 :   Fixed missing 0.5 in 3.3.2 and missing brace in 3.3.3

712	     01->02 :      New section describing improvements to the key metric
713	                   calculations that help to remove noise, bias, and
714	                   reduce lag.  Some revisions to the notation to make
715	                   it clearer.  Some tightening of the thresholds.

717	     00->01 :      Revisions to terminology for clarity

719	9.  References

721	9.1.  Normative References

723	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
724	              Requirement Levels", BCP 14, RFC 2119, March 1997.

726	9.2.  Informative References

728	   [Hayes-LCN14]
729	              Hayes, D., Ferlin, S., and M. Welzl, "Practical Passive
730	              Shared Bottleneck Detection using Shape Summary
731	              Statistics", Proc. the IEEE Local Computer Networks
732	              (LCN) p150-158, September 2014, <http://heim.ifi.uio.no/
733	              davihay/
734	              hayes14__pract_passiv_shared_bottl_detec-abstract.html>.

736	   [I-D.welzl-rmcat-coupled-cc]
737	              Welzl, M., Islam, S., and S. Gjessing, "Coupled congestion
738	              control for RTP media", draft-welzl-rmcat-coupled-cc-04
739	              (work in progress), October 2014.

741	   [ITU-Y1540]
742	              ITU-T, "Internet Protocol Data Communication Service - IP
743	              Packet Transfer and Availability Performance Parameters",
744	              Series Y: Global Information Infrastructure, Internet
745	              Protocol Aspects and Next-Generation Networks ,
746	              March 2011,
747	              <http://www.itu.int/rec/T-REC-Y.1540-201103-I/en>.

749	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
750	              Jacobson, "RTP: A Transport Protocol for Real-Time
751	              Applications", STD 64, RFC 3550, July 2003.

753	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
754	              "Extended RTP Profile for Real-time Transport Control
755	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
756	              July 2006.

758	   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
759	              Real-time Transport Control Protocol (RTCP)-Based Feedback
760	              (RTP/SAVPF)", RFC 5124, February 2008.

762	   [RFC5481]  Morton, A. and B. Claise, "Packet Delay Variation
763	              Applicability Statement", RFC 5481, March 2009.

765	   [RFC6817]  Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
766	              "Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
767	              December 2012.

769	Authors' Addresses

771	   David Hayes (editor)
772	   University of Oslo
773	   PO Box 1080 Blindern
774	   Oslo,   N-0316
775	   Norway

777	   Phone: +47 2284 5566
778	   Email: davihay@ifi.uio.no

780	   Simone Ferlin
781	   Simula Research Laboratory
782	   P.O.Box 134
783	   Lysaker,   1325
784	   Norway

786	   Phone: +47 4072 0702
787	   Email: ferlin@simula.no

789	   Michael Welzl
790	   University of Oslo
791	   PO Box 1080 Blindern
792	   Oslo,   N-0316
793	   Norway

795	   Phone: +47 2285 2420
796	   Email: michawe@ifi.uio.no