idnits 2.17.1 draft-ietf-avt-rtpsample-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 26, 1999) is 9192 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 409 looks like a reference -- Missing reference section? '2' on line 413 looks like a reference -- Missing reference section? '3' on line 417 looks like a reference -- Missing reference section? '4' on line 422 looks like a reference Summary: 5 errors (**), 0 flaws (~~), 1 warning (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Audio/Video Transport wg 3 Internet Draft J. Rosenberg, H. Schulzrinne 4 draft-ietf-avt-rtpsample-02.txt Bell Laboratories/Columbia U. 5 February 26, 1999 6 Expires: August 26, 1999 8 Sampling of the Group Membership in RTP 10 STATUS OF THIS MEMO 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet- Drafts as reference 23 material or to cite them other than as work in progress. 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 In large multicast groups, the size of the group membership table 34 maintained by RTP (Real Time Transport Protocol) participants may 35 become unwieldy, particularly for embedded devices with limited 36 memory and processing power. This document discusses mechanisms for 37 sampling of this group membership table in order to reduce the memory 38 requirements. Several mechanisms are proposed, and the performance of 39 each is considered. 41 NOTE: 43 This is to inform you that Lucent Technologies has applied for and/or 44 has patent(s) that relates to the binning algorithm which is a part 45 of the specification. 47 This declaration is being made pursuant to the provisions of IETF IPR 48 Policy , Sections 10.3.1 and 10.3.2. 50 Lucent Technologies Inc. will offer patent licenses for submissions 51 made by it which are adopted as a standard by your organization as 52 follows: 54 If part(s) of a submission by Lucent is included in a standard and 55 Lucent has patents and/or pending applications that are essential to 56 implementation of the included part(s) in said standard, Lucent is 57 prepared to grant - on the basis of reciprocity (grantback) - a 58 license on such included part(s) on reasonable, non-discriminatory 59 terms and conditions. 61 1 Introduction 63 RTP, the Real Time Transport protocol [1], mandates that RTCP packets 64 be transmitted from each participant with a period roughly 65 proportional to the group size. The group size is obtained by storing 66 a table, containing an entry for each unique SSRC seen in RTP and 67 RTCP packets. As members leave or time out, entries are deleted, and 68 as new members join, entries are added. The table is thus highly 69 dynamic. 71 For large multicast sessions, such as an mbone broadcast or IP based 72 TV distribution, group sizes can be extremely large, on the order of 73 hundreds of thousands to millions of participants. In these 74 environments, RTCP may not always be used, and thus the group 75 membership table isn't needed. However, it is highly desirable for 76 RTP to scale well for groups with one member to groups with one 77 million members, without human intervention to "turn off" RTCP when 78 its no longer appropriate. This means that the same tools and systems 79 can be used for both small conferences and TV broadcasts in a smooth, 80 scalable fashion. 82 Previous work [2] has identified three major scalability problems 83 with RTP. These are: 85 1. Congestion due to floods of RTCP packets in highly dynamic 86 groups 88 2. Large delays between receipt of RTCP packets from a single 89 user 91 3. Large size of the group membership table 93 The reconsideration algorithm helps to alleviate the first of these. 94 This document addresses the third, that of large group size tables. 96 Storage of an SSRC table with one million members, for example, 97 requires at least four megabytes. As a result, embedded devices with 98 small memory footprints may have difficulty under these conditions. 99 To solve this problem, SSRC sampling has been proposed. SSRC sampling 100 uses statistical sampling to obtain a stochastic estimate of the 101 group membership. There are many issues that arise when this is done. 102 This document reviews these issues and discusses the mechanisms which 103 can be applied by implementors. 105 2 Basic Operation 107 The basic idea behind SSRC sampling is simple. Each participant 108 maintains a key K of 32 bits, and a mask M of 32 bits. Assume that m 109 of the M bits in the mask are 1, and the remainder are zero. When an 110 RTCP packet arrives with some SSRC S, rather than placing it in the 111 table, it is first sampled. The sampling is performed by ANDing the 112 key and the mask, and also ANDing the SSRC and the mask. The 113 resulting values are compared. If equal, the SSRC is stored in the 114 table. If not equal, the SSRC is rejected, and the packet is treated 115 as if it were never received. 117 The key can be anything, but is usually chosen to be the SSRC of the 118 user who is performing the sampling. 120 This sampling process can be described mathematically as: 122 D = (K*M == S*M) 124 Where the * operator denotes AND and the = operator denotes a test 125 for equality. D represents the sampling decision. 127 According to the RTP specification, the SSRC's used by session 128 participants are chosen randomly. If the distribution is also 129 uniform, it is easy to see that the above filtering will cause 1 out 130 of 2**m SSRC's to be placed in the table, where m is the number of 131 bits in the mask, M, which are one. Thus, the sampling probability p 132 is 2**-m. 134 Then, to obtain an actual group size estimate, L, the number of 135 entries in the table N is multiplied by 2**m: 137 L = N * 2**m 139 Care must be taken when choosing which bits to set to 1 in the mask. 141 Although the RTP specification mandates randomly chosen SSRC, there 142 are many known implementations which do not conform to this. In 143 particular, the ITU H.323 [3] series of recommendations allows the 144 central control element, the gatekeeper, to assign the least 145 significant 8 bits of the SSRC, while the most significant are 146 randomly chosen by RTP participants. 148 The safest way to handle this problem is to first hash the SSRC using 149 a cryptographically secure hash, such as MD5 [4], and then choose 32 150 of the bits in the result as the SSRC used in the above computation. 151 This provides much better randomness, and doesn't require detailed 152 knowledge about how various implementations actually set the SSRC. 154 2.1 Performance 156 The estimate is more accurate as the value of m decreases, less 157 accurate as it increases. This can be demonstrated analytically. If 158 the actual group size is G, the standard deviation to mean of the 159 estimate L (coefficient of variation) is: 161 sqrt{(1 - 2**-m)/(2**-m * L} 163 This equation can be used as a guide for selecting the thresholds for 164 when to change the sampling factor, as discussed below. For example, 165 if the target is a 1% standard deviation to mean, the sampling 166 probability p==2**-m should be no smaller than .5 when there are ten 167 thousand group members. More generally, to achieve a desired standard 168 deviation to mean ration of T, the sampling probability should be no 169 less than: 171 p > 1 / (1 + G*(T**2)) 173 3 Increasing the Sampling Probability 175 The above simple sampling procedure would work fine if the group size 176 was static. However, it is not. A participant joining an RTP session 177 will initially see just one participant (themselves). As packets are 178 received, the group size as seen by that participant will increase. 179 To handle this, the sampling probability must be made dynamic, and 180 will need to increase and decrease as group sizes vary. 182 The procedure for increasing the sampling probability is easy. A 183 participant starts with a mask with m=0. Under these conditions, 184 every received SSRC will be stored in the table, so there is 185 effectively no sampling. At some point, the value of m is increased. 186 This implies that approximately half of the SSRC already in the table 187 will no longer match the key under the masking operation. In order to 188 maintain a correct estimate, these SSRC must be discarded from the 189 table. New SSRC are only added if they match the key under the new 190 mask. 192 The decision about when to increase the number of bits in the mask is 193 also simple. Lets say an RTP participant has a memory with enough 194 capacity to store C entries in the table. The best estimate of the 195 group is obtained by the largest sampling probability. This also 196 means that the best estimate is obtained the fuller the table is. A 197 reasonable approach is therefore to increase the number of bits in 198 the mask just as the table fills to C. This will generally cause its 199 contents to be reduced by half. Once the table fills again, the 200 number of bits in the mask is further increased. 202 4 Reducing the Sampling Probability 204 If the group size begins to decrease, it may be necessary to reduce 205 the number of bits in the mask. Not doing so will result in extremely 206 poor estimates of the group size. Unfortunately, reducing the number 207 of bits in the mask is more difficult than increasing them. 209 When the number of bits in the mask increases, the user compensates 210 by removing those SSRC which no longer match. When the number of bits 211 decreases, the user should theoretically add back those users whose 212 SSRC now match. However, these SSRC are not known, since the whole 213 point of sampling was to not have to remember them. Therefore, if the 214 number of bits in the mask is just reduced without any changes in the 215 membership table, the group estimate will instantly drop by exactly 216 half. 218 To compensate for this, some kind of algorithm is needed. Two 219 approaches are presented here: a corrective-factor solution, and a 220 binning solution. 222 4.1 Corrective Factors 224 The idea with the corrective factors is to add in or multiply the 225 group size estimate by some corrective factor to compensate for the 226 change in sample mask. The corrective factors should decay as the 227 "fudged" members are eventually learned about and actually placed in 228 the membership list. 230 The multiplicative factor starts at 2, and gradually decays to one. 232 The additive factor starts at the difference between the group size 233 estimate before and after the number of bits in the mask is reduced, 234 and decays to 0 (this is not always half the group size estimate, as 235 the corrective factors can be compounded, see below). Both factors 236 decay over a time of c*L(ts), where c is the average RTCP packet size 237 divided by the RTCP bandwidth for receivers, and L(ts) is the group 238 size estimate just before the change in the number of bits in the 239 mask at time ts. The reason for this constant is as follows. In the 240 case where the actual group membership has not changed, those members 241 which were forgotten will still be sending RTCP packets. The amount 242 of time it will take to hear an RTCP packet from each of them is the 243 average RTCP interval, which is cL(ts). Therefore, by cL(ts) seconds 244 after the change in the mask, those users who were fudged by the 245 corrective factor should have sent a packet and thus appear in the 246 table. We chose to decay both functions linearly. This is because the 247 rate of arrival of RTCP packets is linear. 249 What happens if the number of bits in the mask is reduced once again 250 before the previous corrective factor has expired? In that case, we 251 compound the factors by using yet another one. Let fi() represent the 252 ith correction function. If ts is the time when the number of bits in 253 the mask is reduced, we can describe the additive correction factor 254 as: 256 / 0 , t < ts 257 | ts + cL(ts-) - t 258 fi(t) = |( L(ts-) - L(ts+)) ---------------- ,ts ts + cL(ts-) 262 and the multiplicative factor as: 264 / 1 , t < ts 265 | 266 | ts + 2cL(ts-) - t 267 | ----------------- , ts < t < ts + cL(ts-) 268 | cL(ts-) 269 | 270 1 , t > ts + cL(ts-) 272 Note that in these equations, L(t) denotes the group size estimate 273 obtained including the corrective factors except for the new factor. 274 ts- is the time right before the reduction in the number of bits, and 275 ts+ the time after. As a result, L(ts-) represents the group size 276 estimate before the reduction, and L(ts+) the estimate right after, 277 but not including the new factor. 279 Finally, the actual group size estimate L(t) is given by: 281 ----- 283 L(t) = / fi(t) + N*(2**m) 284 ----- 285 i 287 for the additive factor, and: 289 ------ 290 | | 291 | | 292 L(t)= | | N*(2**m)*fi(t) 294 for the multiplicative factor. 296 Simulations showed that both algorithms performed equally well, but 297 both tended to seriously underestimate the group size when the group 298 membership was rapidly declining. This is demonstrated in the 299 performance data below. 301 4.2 Binning Algorithm 303 In order to more correctly estimate the group size even when it was 304 rapidly decreasing, a binning algorithm can be used. The algorithm 305 works as follows. There are 32 bins, same as the number of bits in 306 the sample mask. When an RTCP packet from a new user arrives whose 307 SSRC matches the key under the masking operation, it is placed in the 308 mth bin (where m is the number of ones in the mask) otherwise it is 309 discarded. 311 When the number of bits in the mask is to be increased, those members 312 in the bin who still match after the new mask are moved into the next 313 higher bin. Those who don't match are discarded. When the number of 314 bits in the mask is to be decreased, nothing is done. Users in the 315 various bins stay where they are. However, when an RTCP packet for a 316 user shows up, and the user is in a bin with a higher value than the 317 current number of bits in the mask, it is moved into the bin 318 corresponding to the current number of bits in the mask. Finally, the 319 group size estimate L(t) is obtained by: 321 31 322 ---- 324 L(t) = / B(i) * 2**i 325 ---- 326 i=0 328 Where B(i) are the number of users in the ith bin. 330 The algorithm works by basically keeping the old estimate when the 331 number of bits in the mask drops. As users arrive, they are gradually 332 moved into the lower bin, reducing the amount that the higher bin 333 contributes to the total estimate. However, the old estimate is still 334 updated in the sense that users which timeout are removed from the 335 higher bin, and users who send BYE packets are also removed from the 336 higher bin. This allows the older estimate to still adapt, while 337 gradually phasing it out. It is this adaptation which makes it 338 perform much better than the corrective algorithms. The algorithm is 339 also extremely simple. 341 4.3 Comparison 343 The algorithms are all compared via simulation in Table 1. In the 344 simulation, 10,001 users join a group at t=0. At t=10,000, 5000 of 345 them leave. At t=20,000, another 5000 leave. All implement an SSRC 346 sampling algorithm, unconditional forward and BYE reconsideration. 347 The table depicts the group size estimate from time 20,000 to time 348 25,000 as seen by the single user present throughout the entire 349 session. In the simulation, a memory size of 1000 SSRC was assumed. 350 The performance without sampling, and with sampling with the 351 additive, multiplicative, and bin-based correction are depicted. 353 As the table shows, the bin based algorithm performs particularly 354 well at capturing the group size estimate towards the tail end of the 355 simulation. 357 Time No Sampling Binned Additive Multiplicative 358 ---- ----------- ------ -------- -------------- 359 20000 5001 5024 5024 5024 360 20250 4379 4352 4352 4352 361 20500 3881 3888 3900 3853 362 20750 3420 3456 3508 3272 363 21000 3018 2992 3100 2701 364 21250 2677 2592 2724 2225 365 21500 2322 2272 2389 1783 366 21750 2034 2096 2125 1414 367 22000 1756 1760 1795 1007 368 22250 1476 1472 1459 582 369 22500 1243 1232 1135 230 370 22750 1047 1040 807 80 371 23000 856 864 468 59 372 23250 683 704 106 44 373 23500 535 512 32 32 374 23750 401 369 24 24 375 24000 290 257 17 17 376 24250 198 177 13 13 377 24500 119 129 11 11 378 24750 59 65 8 8 379 25000 18 1 2 2 381 4.4 Sender Sampling 383 Care must be taken in handling senders when using SSRC sampling. 384 Since the number of senders is generally small, and they contribute 385 significantly to the computation of the RTCP interval, sampling 386 should not be applied to them. However, they must be kept in a 387 separate table, and not be "counted" as part of the general group 388 membership. If this is done, the group size estimate will be inflated 389 to overemphasize the senders. 391 This is easily demonstrated analytically. Let Ns be the number of 392 senders, and Nr be the number of receivers. The membership table will 393 contain all Ns senders and (1/2)**m of the receivers. The total group 394 size estimate in the current draft is obtained by 2**m times the 395 number of entries in the table. Therefore, the group size estimate 396 becomes: 398 L(t) = (2**m) Ns + Nr 400 which exponentially weights the senders. 402 This is easily compensated for in the binning algorithm. A sender is 403 always placed in the 0th bin. When a sender becomes a receiver, it is 404 moved into the bin corresponding to the current value of m, if its 405 SSRC matches the key under the masked comparison operation. 407 5 Bibliography 409 [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a 410 transport protocol for real-time applications," Request for Comments 411 (Proposed Standard) 1889, Internet Engineering Task Force, Jan. 1996. 413 [2] J. Rosenberg and H. Schulzrinne, "Timer reconsideration for 414 enhanced RTP scalability," (San Francisco, California), March/April 415 1998. 417 [3] International Telecommunication Union, "Visual telephone systems 418 and equipment for local area networks which provide a non-guaranteed 419 quality of service," Recommendation H.323, Telecommunication 420 Standardization Sector of ITU, Geneva, Switzerland, May 1996. 422 [4] R. Rivest, "The MD5 message-digest algorithm," Request for 423 Comments (Informational) 1321, Internet Engineering Task Force, Apr. 424 1992. 426 6 Authors' Addresses 428 Jonathan Rosenberg 429 Bell Laboratories, Lucent Technologies 430 101 Crawfords Corner Rd. 431 Holmdel, NJ 07733 432 USA 433 Rm. 4C-526 434 email: jdrosen@bell-labs.com 436 Henning Schulzrinne 437 Columbia University 438 M/S 0401 439 1214 Amsterdam Ave. 440 New York, NY 10027-7003 441 USA 442 email: schulzrinne@cs.columbia.edu