idnits 2.17.1 draft-ietf-avt-rtpsample-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 10 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 20, 1999) is 9041 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 401 looks like a reference -- Missing reference section? '2' on line 405 looks like a reference -- Missing reference section? '3' on line 409 looks like a reference -- Missing reference section? '4' on line 414 looks like a reference Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Audio/Video Transport wg 2 Internet Draft J. Rosenberg, H. Schulzrinne 3 draft-ietf-avt-rtpsample-05.txt Bell Laboratories/Columbia U. 4 July 20, 1999 5 Expires: January 2000 7 Sampling of the Group Membership in RTP 9 STATUS OF THIS MEMO 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet- Drafts as reference 22 material or to cite them other than as work in progress. 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 The IETF has been notified of intellectual property rights claimed in 31 regard to some or all of the specification contained in this 32 document. For more information consult the online list of claimed 33 rights. 35 Abstract 37 In large multicast groups, the size of the group membership table 38 maintained by RTP (Real Time Transport Protocol) participants may 39 become unwieldy, particularly for embedded devices with limited 40 memory and processing power. This document discusses mechanisms for 41 sampling of this group membership table in order to reduce the memory 42 requirements. Several mechanisms are proposed, and the performance of 43 each is considered. 45 1 Introduction 46 RTP, the Real Time Transport protocol [1], mandates that RTCP packets 47 be transmitted from each participant with a period roughly 48 proportional to the group size. The group size is obtained by storing 49 a table, containing an entry for each unique SSRC seen in RTP and 50 RTCP packets. As members leave or time out, entries are deleted, and 51 as new members join, entries are added. The table is thus highly 52 dynamic. 54 For large multicast sessions, such as an mbone broadcast or IP based 55 TV distribution, group sizes can be extremely large, on the order of 56 hundreds of thousands to millions of participants. In these 57 environments, RTCP may not always be used, and thus the group 58 membership table isn't needed. However, it is highly desirable for 59 RTP to scale well for groups with one member to groups with one 60 million members, without human intervention to "turn off" RTCP when 61 its no longer appropriate. This means that the same tools and systems 62 can be used for both small conferences and TV broadcasts in a smooth, 63 scalable fashion. 65 Previous work [2] has identified three major scalability problems 66 with RTP. These are: 68 1. Congestion due to floods of RTCP packets in highly dynamic 69 groups 71 2. Large delays between receipt of RTCP packets from a single 72 user 74 3. Large size of the group membership table 76 The reconsideration algorithm [2] helps to alleviate the first of 77 these. This document addresses the third, that of large group size 78 tables. 80 Storage of an SSRC table with one million members, for example, 81 requires at least four megabytes. As a result, embedded devices with 82 small memory footprints may have difficulty under these conditions. 83 To solve this problem, SSRC sampling has been proposed. SSRC sampling 84 uses statistical sampling to obtain a stochastic estimate of the 85 group membership. There are many issues that arise when this is done. 86 This document reviews these issues and discusses the mechanisms which 87 can be applied by implementors. 89 2 Basic Operation 91 The basic idea behind SSRC sampling is simple. Each participant 92 maintains a key K of 32 bits, and a mask M of 32 bits. Assume that m 93 of the bits in the mask are 1, and the remainder are zero. When an 94 RTCP packet arrives with some SSRC S, rather than placing it in the 95 table, it is first sampled. The sampling is performed by ANDing the 96 key and the mask, and also ANDing the SSRC and the mask. The 97 resulting values are compared. If equal, the SSRC is stored in the 98 table. If not equal, the SSRC is rejected, and the packet is treated 99 as if it were never received. 101 The key can be anything, but is usually chosen to be the SSRC of the 102 user who is performing the sampling. 104 This sampling process can be described mathematically as: 106 D = (K*M == S*M) 108 Where the * operator denotes AND and the == operator denotes a test 109 for equality. D represents the sampling decision. 111 According to the RTP specification, the SSRC's used by session 112 participants are chosen randomly. If the distribution is also 113 uniform, it is easy to see that the above filtering will cause 1 out 114 of 2**m SSRC's to be placed in the table, where m is the number of 115 bits in the mask, M, which are one. Thus, the sampling probability p 116 is 2**-m. 118 Then, to obtain an actual group size estimate, L, the number of 119 entries in the table N is multiplied by 2**m: 121 L = N * 2**m 123 Care must be taken when choosing which bits to set to 1 in the mask. 124 Although the RTP specification mandates randomly chosen SSRC, there 125 are many known implementations which do not conform to this. In 126 particular, the ITU H.323 [3] series of recommendations allows the 127 central control element, the gatekeeper, to assign the least 128 significant 8 bits of the SSRC, while the most significant are 129 randomly chosen by RTP participants. 131 The safest way to handle this problem is to first hash the SSRC using 132 a cryptographically secure hash, such as MD5 [4], and then choose 32 133 of the bits in the result as the SSRC used in the above computation. 134 This provides much better randomness, and doesn't require detailed 135 knowledge about how various implementations actually set the SSRC. 137 2.1 Performance 138 The estimate is more accurate as the value of m decreases, less 139 accurate as it increases. This can be demonstrated analytically. If 140 the actual group size is G, the standard deviation to mean of the 141 estimate L (coefficient of variation) is: 143 sqrt((2**m - 1)/G) 145 This equation can be used as a guide for selecting the thresholds for 146 when to change the sampling factor, as discussed below. For example, 147 if the target is a 1% standard deviation to mean, the sampling 148 probability p=2**-m should be no smaller than .5 when there are ten 149 thousand group members. More generally, to achieve a desired standard 150 deviation to mean ration of T, the sampling probability should be no 151 less than: 153 p > 1 / (1 + G*(T**2)) 155 3 Increasing the Sampling Probability 157 The above simple sampling procedure would work fine if the group size 158 was static. However, it is not. A participant joining an RTP session 159 will initially see just one participant (themselves). As packets are 160 received, the group size as seen by that participant will increase. 161 To handle this, the sampling probability must be made dynamic, and 162 will need to increase and decrease as group sizes vary. 164 The procedure for increasing the sampling probability is easy. A 165 participant starts with a mask with m=0. Under these conditions, 166 every received SSRC will be stored in the table, so there is 167 effectively no sampling. At some point, the value of m is increased. 168 This implies that approximately half of the SSRC already in the table 169 will no longer match the key under the masking operation. In order to 170 maintain a correct estimate, these SSRC must be discarded from the 171 table. New SSRC are only added if they match the key under the new 172 mask. 174 The decision about when to increase the number of bits in the mask is 175 also simple. Lets say an RTP participant has a memory with enough 176 capacity to store C entries in the table. The best estimate of the 177 group is obtained by the largest sampling probability. This also 178 means that the best estimate is obtained the fuller the table is. A 179 reasonable approach is therefore to increase the number of bits in 180 the mask just as the table fills to C. This will generally cause its 181 contents to be reduced by half. Once the table fills again, the 182 number of bits in the mask is further increased. 184 4 Reducing the Sampling Probability 186 If the group size begins to decrease, it may be necessary to reduce 187 the number of bits in the mask. Not doing so will result in extremely 188 poor estimates of the group size. Unfortunately, reducing the number 189 of bits in the mask is more difficult than increasing them. 191 When the number of bits in the mask increases, the user compensates 192 by removing those SSRC which no longer match. When the number of bits 193 decreases, the user should theoretically add back those users whose 194 SSRC now match. However, these SSRC are not known, since the whole 195 point of sampling was to not have to remember them. Therefore, if the 196 number of bits in the mask is just reduced without any changes in the 197 membership table, the group estimate will instantly drop by exactly 198 half. 200 To compensate for this, some kind of algorithm is needed. Two 201 approaches are presented here: a corrective-factor solution, and a 202 binning solution. 204 4.1 Corrective Factors 206 The idea with the corrective factors is to add in or multiply the 207 group size estimate by some corrective factor to compensate for the 208 change in sample mask. The corrective factors should decay as the 209 "fudged" members are eventually learned about and actually placed in 210 the membership list. 212 The multiplicative corrective factor starts at 2, and gradually 213 decays to one. The additive factor starts at the difference between 214 the group size estimate before and after the number of bits in the 215 mask is reduced, and decays to 0 (this is not always half the group 216 size estimate, as the corrective factors can be compounded, see 217 below). Both factors decay over a time of cL(ts-), where c is the 218 average RTCP packet size divided by the RTCP bandwidth for receivers, 219 and L(ts-) is the group size estimate just before the change in the 220 number of bits in the mask at time ts. The reason for this constant 221 is as follows. In the case where the actual group membership has not 222 changed, those members which were forgotten will still be sending 223 RTCP packets. The amount of time it will take to hear an RTCP packet 224 from each of them is the average RTCP interval, which is cL(ts-). 225 Therefore, by cL(ts-) seconds after the change in the mask, those 226 users who were fudged by the corrective factor should have sent a 227 packet and thus appear in the table. We chose to decay both functions 228 linearly. This is because the rate of arrival of RTCP packets is 229 linear. 231 What happens if the number of bits in the mask is reduced once again 232 before the previous corrective factor has expired? In that case, we 233 compound the factors by using yet another one. Let fi() represent the 234 ith correction function. If ts is the time when the number of bits in 235 the mask is reduced, we can describe the additive correction factor 236 as: 238 / 0 , t < ts 239 | ts + cL(ts-) - t 240 fi(t) = |( L(ts-) - L(ts+)) ---------------- , ts < t < ts+cL(ts-) 241 | cL(ts-) 242 | 0 , t > ts + cL(ts-) 243 \ 245 and the multiplicative factor as: 247 / 1 , t < ts 248 | 249 | ts + 2cL(ts-) - t 250 fi(t) = | ----------------- , ts < t < ts + cL(ts-) 251 | cL(ts-) 252 | 253 \ 1 , t > ts + cL(ts-) 255 Note that in these equations, L(t) denotes the group size estimate 256 obtained including the corrective factors except for the new factor. 257 ts- is the time right before the reduction in the number of bits, and 258 ts+ the time after. As a result, L(ts-) represents the group size 259 estimate before the reduction, and L(ts+) the estimate right after, 260 but not including the new factor. 262 Finally, the actual group size estimate L(t) is given by: 264 ----- 265 \ 266 L(t) = / fi(t) + N*(2**m) 267 ----- 268 i 270 for the additive factor, and: 272 ------ 273 | | 274 | | 275 L(t)= | | N*(2**m)*fi(t) 277 for the multiplicative factor. 279 Simulations showed that both algorithms performed equally well, but 280 both tended to seriously underestimate the group size when the group 281 membership was rapidly declining. This is demonstrated in the 282 performance data below. 284 4.2 Binning Algorithm 286 In order to more correctly estimate the group size even when it was 287 rapidly decreasing, a binning algorithm can be used. The algorithm 288 works as follows. There are 32 bins, same as the number of bits in 289 the sample mask. When an RTCP packet from a new user arrives whose 290 SSRC matches the key under the masking operation, it is placed in the 291 mth bin (where m is the number of ones in the mask) otherwise it is 292 discarded. 294 When the number of bits in the mask is to be increased, those members 295 in the bin who still match after the new mask are moved into the next 296 higher bin. Those who don't match are discarded. When the number of 297 bits in the mask is to be decreased, nothing is done. Users in the 298 various bins stay where they are. However, when an RTCP packet for a 299 user shows up, and the user is in a bin with a higher value than the 300 current number of bits in the mask, it is moved into the bin 301 corresponding to the current number of bits in the mask. Finally, the 302 group size estimate L(t) is obtained by: 304 31 305 ---- 306 \ 307 L(t) = / B(i) * 2**i 308 ---- 309 i=0 311 Where B(i) are the number of users in the ith bin. 313 The algorithm works by basically keeping the old estimate when the 314 number of bits in the mask drops. As users arrive, they are gradually 315 moved into the lower bin, reducing the amount that the higher bin 316 contributes to the total estimate. However, the old estimate is still 317 updated in the sense that users which timeout are removed from the 318 higher bin, and users who send BYE packets are also removed from the 319 higher bin. This allows the older estimate to still adapt, while 320 gradually phasing it out. It is this adaptation which makes it 321 perform much better than the corrective algorithms. The algorithm is 322 also extremely simple. 324 4.3 Comparison 326 The algorithms are all compared via simulation in Table 1. In the 327 simulation, 10,001 users join a group at t=0. At t=10,000, 5000 of 328 them leave. At t=20,000, another 5000 leave. All implement an SSRC 329 sampling algorithm, unconditional forward reconsideration and BYE 330 reconsideration. The table depicts the group size estimate from time 331 20,000 to time 25,000 as seen by the single user present throughout 332 the entire session. In the simulation, a memory size of 1000 SSRC was 333 assumed. The performance without sampling, and with sampling with the 334 additive, multiplicative, and bin-based correction are depicted. 336 As the table shows, the bin based algorithm performs particularly 337 well at capturing the group size estimate towards the tail end of the 338 simulation. 340 Time No Sampling Binned Additive Multiplicative 341 ---- ----------- ------ -------- -------------- 342 20000 5001 5024 5024 5024 343 20250 4379 4352 4352 4352 344 20500 3881 3888 3900 3853 345 20750 3420 3456 3508 3272 346 21000 3018 2992 3100 2701 347 21250 2677 2592 2724 2225 348 21500 2322 2272 2389 1783 349 21750 2034 2096 2125 1414 350 22000 1756 1760 1795 1007 351 22250 1476 1472 1459 582 352 22500 1243 1232 1135 230 353 22750 1047 1040 807 80 354 23000 856 864 468 59 355 23250 683 704 106 44 356 23500 535 512 32 32 357 23750 401 369 24 24 358 24000 290 257 17 17 359 24250 198 177 13 13 360 24500 119 129 11 11 361 24750 59 65 8 8 362 25000 18 1 2 2 364 4.4 Sender Sampling 366 Care must be taken in handling senders when using SSRC sampling. 367 Since the number of senders is generally small, and they contribute 368 significantly to the computation of the RTCP interval, sampling 369 should not be applied to them. However, they must be kept in a 370 separate table, and not be "counted" as part of the general group 371 membership. If this is done, the group size estimate will be inflated 372 to overemphasize the senders. 374 This is easily demonstrated analytically. Let Ns be the number of 375 senders, and Nr be the number of receivers. The membership table will 376 contain all Ns senders and (1/2)**m of the receivers. The total group 377 size estimate in the current draft is obtained by 2**m times the 378 number of entries in the table. Therefore, the group size estimate 379 becomes: 381 L(t) = (2**m) Ns + Nr 383 which exponentially weights the senders. 385 This is easily compensated for in the binning algorithm. A sender is 386 always placed in the 0th bin. When a sender becomes a receiver, it is 387 moved into the bin corresponding to the current value of m, if its 388 SSRC matches the key under the masked comparison operation. 390 5 Security Considerations 392 The mechanisms described here introduce no additional security 393 considerations beyond those described in [1]. 395 6 Acknowledgements 397 The authors wish to thank Bill Fenner for his comments. 399 7 Bibliography 401 [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: a 402 transport protocol for real-time applications," Request for Comments 403 (Proposed Standard) 1889, Internet Engineering Task Force, Jan. 1996. 405 [2] J. Rosenberg and H. Schulzrinne, "Timer reconsideration for 406 enhanced RTP scalability," in IEEE Infocom , (San Francisco, 407 California), March/April 1998. 409 [3] International Telecommunication Union, "Visual telephone systems 410 and equipment for local area networks which provide a non-guaranteed 411 quality of service," Recommendation H.323, Telecommunication 412 Standardization Sector of ITU, Geneva, Switzerland, May 1996. 414 [4] R. Rivest, "The MD5 message-digest algorithm," Request for 415 Comments (Informational) 1321, Internet Engineering Task Force, Apr. 416 1992. 418 8 Authors' Addresses 420 Jonathan Rosenberg 421 Bell Laboratories, Lucent Technologies 422 101 Crawfords Corner Rd. 423 Holmdel, NJ 07733 424 USA 425 Rm. 4C-526 426 email: jdrosen@bell-labs.com 428 Henning Schulzrinne 429 Columbia University 430 M/S 0401 431 1214 Amsterdam Ave. 432 New York, NY 10027-7003 433 USA 434 email: schulzrinne@cs.columbia.edu