idnits 2.17.1 

draft-ietf-avt-rtpsample-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 26, 1999) is 9192 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? '1' on line 409 looks like a reference

  -- Missing reference section? '2' on line 413 looks like a reference

  -- Missing reference section? '3' on line 417 looks like a reference

  -- Missing reference section? '4' on line 422 looks like a reference


     Summary: 5 errors (**), 0 flaws (~~), 1 warning (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                 Audio/Video Transport wg
3	Internet Draft                              J. Rosenberg, H. Schulzrinne
4	draft-ietf-avt-rtpsample-02.txt            Bell Laboratories/Columbia U.
5	February 26, 1999
6	Expires: August 26, 1999

8	                Sampling of the Group Membership in RTP

10	STATUS OF THIS MEMO

12	   This document is an Internet-Draft and is in full conformance with
13	   all provisions of Section 10 of RFC2026.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six months
21	   and may be updated, replaced, or obsoleted by other documents at any
22	   time.  It is inappropriate to use Internet- Drafts as reference
23	   material or to cite them other than as work in progress.

25	   The list of current Internet-Drafts can be accessed at
26	   http://www.ietf.org/ietf/1id-abstracts.txt

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html.

31	Abstract

33	   In large multicast groups, the size of the group membership table
34	   maintained by RTP (Real Time Transport Protocol) participants may
35	   become unwieldy, particularly for embedded devices with limited
36	   memory and processing power. This document discusses mechanisms for
37	   sampling of this group membership table in order to reduce the memory
38	   requirements. Several mechanisms are proposed, and the performance of
39	   each is considered.

41	   NOTE:

43	   This is to inform you that Lucent Technologies has applied for and/or
44	   has patent(s) that relates to the binning algorithm which is a part
45	   of the specification.

47	   This declaration is being made pursuant to the provisions of IETF IPR
48	   Policy , Sections 10.3.1 and 10.3.2.

50	   Lucent Technologies Inc. will offer patent licenses for submissions
51	   made by it which are adopted as a standard by your organization as
52	   follows:

54	   If part(s) of a submission by Lucent is included in a standard and
55	   Lucent has patents and/or pending applications that are essential to
56	   implementation of the included part(s) in said standard, Lucent is
57	   prepared to grant - on the basis of reciprocity (grantback) - a
58	   license on such included part(s) on reasonable, non-discriminatory
59	   terms and conditions.

61	1 Introduction

63	   RTP, the Real Time Transport protocol [1], mandates that RTCP packets
64	   be transmitted from each participant with a period roughly
65	   proportional to the group size. The group size is obtained by storing
66	   a table, containing an entry for each unique SSRC seen in RTP and
67	   RTCP packets. As members leave or time out, entries are deleted, and
68	   as new members join, entries are added. The table is thus highly
69	   dynamic.

71	   For large multicast sessions, such as an mbone broadcast or IP based
72	   TV distribution, group sizes can be extremely large, on the order of
73	   hundreds of thousands to millions of participants. In these
74	   environments, RTCP may not always be used, and thus the group
75	   membership table isn't needed. However, it is highly desirable for
76	   RTP to scale well for groups with one member to groups with one
77	   million members, without human intervention to "turn off" RTCP when
78	   its no longer appropriate. This means that the same tools and systems
79	   can be used for both small conferences and TV broadcasts in a smooth,
80	   scalable fashion.

82	   Previous work [2] has identified three major scalability problems
83	   with RTP. These are:

85	        1.   Congestion due to floods of RTCP packets in highly dynamic
86	             groups

88	        2.   Large delays between receipt of RTCP packets from a single
89	             user

91	        3.   Large size of the group membership table

93	   The reconsideration algorithm helps to alleviate the first of these.
94	   This document addresses the third, that of large group size tables.

96	   Storage of an SSRC table with one million members, for example,
97	   requires at least four megabytes. As a result, embedded devices with
98	   small memory footprints may have difficulty under these conditions.
99	   To solve this problem, SSRC sampling has been proposed. SSRC sampling
100	   uses statistical sampling to obtain a stochastic estimate of the
101	   group membership. There are many issues that arise when this is done.
102	   This document reviews these issues and discusses the mechanisms which
103	   can be applied by implementors.

105	2 Basic Operation

107	   The basic idea behind SSRC sampling is simple. Each participant
108	   maintains a key K of 32 bits, and a mask M of 32 bits. Assume that m
109	   of the M bits in the mask are 1, and the remainder are zero. When an
110	   RTCP packet arrives with some SSRC S, rather than placing it in the
111	   table, it is first sampled. The sampling is performed by ANDing the
112	   key and the mask, and also ANDing the SSRC and the mask. The
113	   resulting values are compared. If equal, the SSRC is stored in the
114	   table. If not equal, the SSRC is rejected, and the packet is treated
115	   as if it were never received.

117	   The key can be anything, but is usually chosen to be the SSRC of the
118	   user who is performing the sampling.

120	   This sampling process can be described mathematically as:

122	   D = (K*M == S*M)

124	   Where the * operator denotes AND and the = operator denotes a test
125	   for equality. D represents the sampling decision.

127	   According to the RTP specification, the SSRC's used by session
128	   participants are chosen randomly. If the distribution is also
129	   uniform, it is easy to see that the above filtering will cause 1 out
130	   of 2**m SSRC's to be placed in the table, where m is the number of
131	   bits in the mask, M, which are one. Thus, the sampling probability p
132	   is 2**-m.

134	   Then, to obtain an actual group size estimate, L, the number of
135	   entries in the table N is multiplied by 2**m:

137	   L = N * 2**m

139	   Care must be taken when choosing which bits to set to 1 in the mask.

141	   Although the RTP specification mandates randomly chosen SSRC, there
142	   are many known implementations which do not conform to this. In
143	   particular, the ITU H.323 [3] series of recommendations allows the
144	   central control element, the gatekeeper, to assign the least
145	   significant 8 bits of the SSRC, while the most significant are
146	   randomly chosen by RTP participants.

148	   The safest way to handle this problem is to first hash the SSRC using
149	   a cryptographically secure hash, such as MD5 [4], and then choose 32
150	   of the bits in the result as the SSRC used in the above computation.
151	   This provides much better randomness, and doesn't require detailed
152	   knowledge about how various implementations actually set the SSRC.

154	2.1 Performance

156	   The estimate is more accurate as the value of m decreases, less
157	   accurate as it increases. This can be demonstrated analytically. If
158	   the actual group size is G, the standard deviation to mean of the
159	   estimate L (coefficient of variation) is:

161	   sqrt{(1 - 2**-m)/(2**-m * L}

163	   This equation can be used as a guide for selecting the thresholds for
164	   when to change the sampling factor, as discussed below. For example,
165	   if the target is a 1% standard deviation to mean, the sampling
166	   probability p==2**-m should be no smaller than .5 when there are ten
167	   thousand group members. More generally, to achieve a desired standard
168	   deviation to mean ration of T, the sampling probability should be no
169	   less than:

171	   p > 1 / (1 + G*(T**2))

173	3 Increasing the Sampling Probability

175	   The above simple sampling procedure would work fine if the group size
176	   was static. However, it is not. A participant joining an RTP session
177	   will initially see just one participant (themselves). As packets are
178	   received, the group size as seen by that participant will increase.
179	   To handle this, the sampling probability must be made dynamic, and
180	   will need to increase and decrease as group sizes vary.

182	   The procedure for increasing the sampling probability is easy. A
183	   participant starts with a mask with m=0. Under these conditions,
184	   every received SSRC will be stored in the table, so there is
185	   effectively no sampling. At some point, the value of m is increased.
186	   This implies that approximately half of the SSRC already in the table
187	   will no longer match the key under the masking operation. In order to
188	   maintain a correct estimate, these SSRC must be discarded from the
189	   table. New SSRC are only added if they match the key under the new
190	   mask.

192	   The decision about when to increase the number of bits in the mask is
193	   also simple. Lets say an RTP participant has a memory with enough
194	   capacity to store C entries in the table. The best estimate of the
195	   group is obtained by the largest sampling probability. This also
196	   means that the best estimate is obtained the fuller the table is. A
197	   reasonable approach is therefore to increase the number of bits in
198	   the mask just as the table fills to C. This will generally cause its
199	   contents to be reduced by half. Once the table fills again, the
200	   number of bits in the mask is further increased.

202	4 Reducing the Sampling Probability

204	   If the group size begins to decrease, it may be necessary to reduce
205	   the number of bits in the mask. Not doing so will result in extremely
206	   poor estimates of the group size. Unfortunately, reducing the number
207	   of bits in the mask is more difficult than increasing them.

209	   When the number of bits in the mask increases, the user compensates
210	   by removing those SSRC which no longer match. When the number of bits
211	   decreases, the user should theoretically add back those users whose
212	   SSRC now match. However, these SSRC are not known, since the whole
213	   point of sampling was to not have to remember them. Therefore, if the
214	   number of bits in the mask is just reduced without any changes in the
215	   membership table, the group estimate will instantly drop by exactly
216	   half.

218	   To compensate for this, some kind of algorithm is needed. Two
219	   approaches are presented here: a corrective-factor solution, and a
220	   binning solution.

222	4.1 Corrective Factors

224	   The idea with the corrective factors is to add in or multiply the
225	   group size estimate by some corrective factor to compensate for the
226	   change in sample mask. The corrective factors should decay as the
227	   "fudged" members are eventually learned about and actually placed in
228	   the membership list.

230	   The multiplicative factor starts at 2, and gradually decays to one.

232	   The additive factor starts at the difference between the group size
233	   estimate before and after the number of bits in the mask is reduced,
234	   and decays to 0 (this is not always half the group size estimate, as
235	   the corrective factors can be compounded, see below). Both factors
236	   decay over a time of c*L(ts), where c is the average RTCP packet size
237	   divided by the RTCP bandwidth for receivers, and L(ts) is the group
238	   size estimate just before the change in the number of bits in the
239	   mask at time ts. The reason for this constant is as follows. In the
240	   case where the actual group membership has not changed, those members
241	   which were forgotten will still be sending RTCP packets. The amount
242	   of time it will take to hear an RTCP packet from each of them is the
243	   average RTCP interval, which is cL(ts). Therefore, by cL(ts) seconds
244	   after the change in the mask, those users who were fudged by the
245	   corrective factor should have sent a packet and thus appear in the
246	   table. We chose to decay both functions linearly. This is because the
247	   rate of arrival of RTCP packets is linear.

249	   What happens if the number of bits in the mask is reduced once again
250	   before the previous corrective factor has expired? In that case, we
251	   compound the factors by using yet another one. Let fi() represent the
252	   ith correction function. If ts is the time when the number of bits in
253	   the mask is reduced, we can describe the additive correction factor
254	   as:

256	             / 0                                  ,   t < ts
257	             |                   ts + cL(ts-) - t
258	   fi(t)  =  |( L(ts-) - L(ts+)) ---------------- ,ts<t<ts+cL(ts-)
259	             |                        cL(ts-)
260	             | 0                                  ,  t > ts + cL(ts-)

262	   and the multiplicative factor as:

264	             /  1                      , t < ts
265	             |
266	             |  ts + 2cL(ts-) - t
267	             |  -----------------      , ts < t < ts + cL(ts-)
268	             |       cL(ts-)
269	             |
270	               1                      , t > ts + cL(ts-)

272	   Note that in these equations, L(t) denotes the group size estimate
273	   obtained including the corrective factors except for the new factor.
274	   ts- is the time right before the reduction in the number of bits, and
275	   ts+ the time after. As a result, L(ts-) represents the group size
276	   estimate before the reduction, and L(ts+) the estimate right after,
277	   but not including the new factor.

279	   Finally, the actual group size estimate L(t) is given by:

281	          -----

283	   L(t) = /      fi(t) + N*(2**m)
284	          -----
285	            i

287	   for the additive factor, and:

289	          ------
290	           |  |
291	           |  |
292	   L(t)=   |  |  N*(2**m)*fi(t)

294	   for the multiplicative factor.

296	   Simulations showed that both algorithms performed equally well, but
297	   both tended to seriously underestimate the group size when the group
298	   membership was rapidly declining. This is demonstrated in the
299	   performance data below.

301	4.2 Binning Algorithm

303	   In order to more correctly estimate the group size even when it was
304	   rapidly decreasing, a binning algorithm can be used. The algorithm
305	   works as follows. There are 32 bins, same as the number of bits in
306	   the sample mask. When an RTCP packet from a new user arrives whose
307	   SSRC matches the key under the masking operation, it is placed in the
308	   mth bin (where m is the number of ones in the mask) otherwise it is
309	   discarded.

311	   When the number of bits in the mask is to be increased, those members
312	   in the bin who still match after the new mask are moved into the next
313	   higher bin. Those who don't match are discarded. When the number of
314	   bits in the mask is to be decreased, nothing is done. Users in the
315	   various bins stay where they are. However, when an RTCP packet for a
316	   user shows up, and the user is in a bin with a higher value than the
317	   current number of bits in the mask, it is moved into the bin
318	   corresponding to the current number of bits in the mask. Finally, the
319	   group size estimate L(t) is obtained by:

321	           31
322	          ----

324	   L(t) = /    B(i) * 2**i
325	          ----
326	           i=0

328	   Where B(i) are the number of users in the ith bin.

330	   The algorithm works by basically keeping the old estimate when the
331	   number of bits in the mask drops. As users arrive, they are gradually
332	   moved into the lower bin, reducing the amount that the higher bin
333	   contributes to the total estimate. However, the old estimate is still
334	   updated in the sense that users which timeout are removed from the
335	   higher bin, and users who send BYE packets are also removed from the
336	   higher bin. This allows the older estimate to still adapt, while
337	   gradually phasing it out. It is this adaptation which makes it
338	   perform much better than the corrective algorithms. The algorithm is
339	   also extremely simple.

341	4.3 Comparison

343	   The algorithms are all compared via simulation in Table 1. In the
344	   simulation, 10,001 users join a group at t=0. At t=10,000, 5000 of
345	   them leave. At t=20,000, another 5000 leave. All implement an SSRC
346	   sampling algorithm, unconditional forward and BYE reconsideration.
347	   The table depicts the group size estimate from time 20,000 to time
348	   25,000 as seen by the single user present throughout the entire
349	   session. In the simulation, a memory size of 1000 SSRC was assumed.
350	   The performance without sampling, and with sampling with the
351	   additive, multiplicative, and bin-based correction are depicted.

353	   As the table shows, the bin based algorithm performs particularly
354	   well at capturing the group size estimate towards the tail end of the
355	   simulation.

357	   Time    No Sampling     Binned  Additive  Multiplicative
358	   ----    -----------     ------  --------  --------------
359	   20000   5001            5024    5024      5024
360	   20250   4379            4352    4352      4352
361	   20500   3881            3888    3900      3853
362	   20750   3420            3456    3508      3272
363	   21000   3018            2992    3100      2701
364	   21250   2677            2592    2724      2225
365	   21500   2322            2272    2389      1783
366	   21750   2034            2096    2125      1414
367	   22000   1756            1760    1795      1007
368	   22250   1476            1472    1459      582
369	   22500   1243            1232    1135      230
370	   22750   1047            1040    807       80
371	   23000   856             864     468       59
372	   23250   683             704     106       44
373	   23500   535             512     32        32
374	   23750   401             369     24        24
375	   24000   290             257     17        17
376	   24250   198             177     13        13
377	   24500   119             129     11        11
378	   24750   59              65      8         8
379	   25000   18              1       2         2

381	4.4 Sender Sampling

383	   Care must be taken in handling senders when using SSRC sampling.
384	   Since the number of senders is generally small, and they contribute
385	   significantly to the computation of the RTCP interval, sampling
386	   should not be applied to them. However, they must be kept in a
387	   separate table, and not be "counted" as part of the general group
388	   membership. If this is done, the group size estimate will be inflated
389	   to overemphasize the senders.

391	   This is easily demonstrated analytically. Let Ns be the number of
392	   senders, and Nr be the number of receivers. The membership table will
393	   contain all Ns senders and (1/2)**m of the receivers. The total group
394	   size estimate in the current draft is obtained by 2**m times the
395	   number of entries in the table. Therefore, the group size estimate
396	   becomes:

398	   L(t) = (2**m) Ns + Nr

400	   which exponentially weights the senders.

402	   This is easily compensated for in the binning algorithm. A sender is
403	   always placed in the 0th bin. When a sender becomes a receiver, it is
404	   moved into the bin corresponding to the current value of m, if its
405	   SSRC matches the key under the masked comparison operation.

407	5 Bibliography

409	   [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a
410	   transport protocol for real-time applications," Request for Comments
411	   (Proposed Standard) 1889, Internet Engineering Task Force, Jan. 1996.

413	   [2] J. Rosenberg and H. Schulzrinne, "Timer reconsideration for
414	   enhanced RTP scalability," (San Francisco, California), March/April
415	   1998.

417	   [3] International Telecommunication Union, "Visual telephone systems
418	   and equipment for local area networks which provide a non-guaranteed
419	   quality of service," Recommendation H.323, Telecommunication
420	   Standardization Sector of ITU, Geneva, Switzerland, May 1996.

422	   [4] R. Rivest, "The MD5 message-digest algorithm," Request for
423	   Comments (Informational) 1321, Internet Engineering Task Force, Apr.
424	   1992.

426	6 Authors' Addresses

428	   Jonathan Rosenberg
429	   Bell Laboratories, Lucent Technologies
430	   101 Crawfords Corner Rd.
431	   Holmdel, NJ 07733
432	   USA
433	   Rm. 4C-526
434	   email: jdrosen@bell-labs.com

436	   Henning Schulzrinne
437	   Columbia University
438	   M/S 0401
439	   1214 Amsterdam Ave.
440	   New York, NY 10027-7003
441	   USA
442	   email: schulzrinne@cs.columbia.edu