idnits 2.17.1 draft-rosenberg-itg-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1285 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 541 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** There is 1 instance of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 14 has weird spacing: '...This document...' == Line 15 has weird spacing: '...cuments of t...' == Line 16 has weird spacing: '... groups may ...' == Line 20 has weird spacing: '...months and m...' == Line 22 has weird spacing: '...as reference ...' == (536 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 26, 1996) is 10013 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550) -- Possible downref: Non-RFC (?) normative reference: ref. '3' ** Obsolete normative reference: RFC 1890 (ref. '4') (Obsoleted by RFC 3551) Summary: 14 errors (**), 0 flaws (~~), 8 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Audio-Visual Transport WG 2 INTERNET-DRAFT J. Rosenberg 3 Lucent, Bell Laboratories 4 H. Schulzrinne 5 Columbia University 6 November 26, 1996 7 Expires: May 26, 1997 9 Issues and Options for an Aggregation Service within RTP 10 draft-rosenberg-itg-00.txt 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its 16 areas, and its working groups. Note that other groups may also 17 distribute working documents as Internet Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months and may be updated, replaced, or obsoleted by other 21 documents at any time. It is inappropriate to use Internet-Drafts 22 as reference material or to cite them other than as "work in 23 progress." 25 To learn the current status of any Internet-Draft, please check 26 the "1id-abstracts.txt" listing contained in the Internet-Drafts 27 Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net 28 (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East 29 Coast), or ftp.isi.edu (US West Coast). 31 Distribution of this document is unlimited. 33 Abstract 35 This memorandum discusses the issues and options involved 36 in the design of a new transport protocol for multiplexed 37 voice within a single packet. The intended application is 38 the interconnection of devices which provide "trunking" or 39 long distance telephone service over the Internet. Such 40 devices have many voice connections simultaneously between 41 them. Multiplexing them into the same connection improves 42 on the efficiency, enables the use of low bitrate voice 43 codecs, and improves scalability. Options and issues 44 concerning timestamping, payload type identification, 45 length indication, and channel identification are 46 discussed. Several possible header formats are identified, 47 and their efficiencies are compared. 49 This document is a product of the Audio-Video Transport working 50 group within the Internet Engineering Task Force. Comments are 51 solicited and should be addressed to the working group's mailing 52 list at rem-conf@es.net and/or to the author(s). 54 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 1 55 1. Introduction 57 With the tremendous changes in the telecommunications industry, 58 and the recent growth of the Internet, there is a new opportunity 59 for offering long distance telephony over the Internet. Such a 60 service can be offered by allowing users to dial a local access 61 number, connecting them to a device called an Internet Telephony 62 Gateway (ITG). This device prompts the user for a destination 63 telephone number, and then routes the call over the Internet to a 64 similar device at the local exchange of the destination. There, 65 the call is completed when the destination ITG dials the end user. 66 The scenario is depicted in Figure 1. 68 ------- -------- ---------- 69 | Phone | --------| NY ITG |---------------| Internet | 70 ------- -------- | | 71 | | 72 | | 73 ------- -------- | | 74 | Phone | --------| LA ITG |---------------| | 75 ------- -------- ---------- 77 Figure 1: Internet Telephony Gateway 78 In this application, the Internet is used only for the long 79 distance portion of the telephone call. Access to the service is 80 still via the traditional POTS. Current implementations of this 81 service are using H.323 to set up and tear down a new connection 82 each time a user establishes or terminates a call. However, H.323 83 is the wrong protocol for many reasons. First, it is far too 84 complex, providing for capabilities and features which cannot be 85 used because both endpoints are analog telephones. Secondly, a 86 significant increase in efficiency (in excess of 30%), can be 87 readily achieved if all of the voice calls between two ITG are 88 multiplexed into a single packet, instead of using a separate 89 connection (and thus separate packets) for each. Such multiplexing 90 reduces overhead by increasing the effective payload without a 91 corresponding penalty in packetization delay. In fact, as more 92 users are multiplexed, the payload from a particular user can be 93 reduced in size, or the bitrate reduced, without an efficiency 94 penalty. Furthermore, multiplexing improves scalability. As the 95 number of users increases, the number of packets which arrive at 96 the destination does not increase. This means that computations 97 which are per-packet (such as RTCP statistics collecting, jitter 98 accumulation, header processing, etc.) do not increase. The end 99 result is that multiplexing can simultaneously improve efficiency, 100 reduce delay, and improve scalability. There are some minor side 102 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 2 103 benefits in addition to these major three. For example, in the 104 aggregated scenario, when a particular user enters a silence 105 period, and stops sending data, the flow of packets will not stop 106 unless all of the other users are already in silence (generally, 107 an unlikely event). This means that packets continually arrive, 108 and that delay estimates obtained from those packets can be 109 continuously generated. Algorithms for dynamically adapting the 110 playout buffer at the receiver are based on these delay estimates 111 [1], and can now be reworked to utilize the continuous stream of 112 delays, as opposed to relying on the delays received during 113 talkspurts only. The result is likely to be an improvement in both 114 end to end delay and loss performance. 116 In order to perform such multiplexing, a new Internet protocol is 117 required. This protocol must provide for the transport of multiple 118 real time streams within a single IP packet. Since the intended 119 application is real-time, the requirements for timing recovery, 120 sequencing, and payload identification are nearly identical to 121 normal single user voice. Since RTP was designed to meet these 122 requirements [2], it makes sense to build this new multiplexing 123 protocol on top of RTP. In fact, RTP allows for different profiles 124 to be defined for a particular application. The goal of this 125 document is to define a variety of options for that new profile, 126 and to compare them. 128 It is important to note that this application is similar in its 129 requirements to [3], which seeks to multiplex multiple encodings 130 for a particular user into the same IP packet. 132 2. Terminology 134 User: One of the individuals who has data within the IP packet. 135 Connection: The point to point RTP session between two ITG's. 136 Channel: A "virtual connection" which is established by allowing a 137 user to send data within a packet. There are many channels per 138 connection - this represents the multiplexing. 139 Channel Identifier: A number which identifies a channel. 140 Block: The section of the payload of a packet which contains data 141 for a particular user. 143 3. Requirements: 145 The transport protocol must provide, at a minimum, the following 146 functionality: 148 1. Delineation. Data from different users must be clearly 149 delineated. 150 2. Identification. The channel to which the data belongs must be 151 identified. 152 3. Variable lengths: The protocol should support variable length 153 blocks from a particular user. This allows for variable rate 154 codecs. 155 4. Low overhead: Since the protocol is designed for low rate 156 voice, it should have low overhead. This issue is extremely 157 important. New coders are emerging which can support near toll 158 quality at 8 kbps, and acceptable quality at rates even as low as 159 4 kbps. It is desirable to support such codecs, as they can reduce 161 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 3 162 the cost of providing an ITG service. Furthermore, advances in 163 coding technology indicate that it is desirable to send very low 164 bitrate information (1 kbps or less) during silence periods, so 165 that background noise can be reproduced well (as opposed to 166 sending nothing). Support of such rates requires a protocol with 167 low overhead. 168 5. Marker: A general purpose marker bit should be available for 169 all users within the connection. 170 6. Payload Identification. The codec in use for each user should 171 be indicated somehow. It is a requirement to allow for the coding 172 type to change during the lifetime of a channel. 174 4. Issues 175 The following section identifies a number of issues which have an 176 impact on the design of the protocol. It also identifies a variety 177 of options for providing the specific services of the protocol. 179 4.1 How to bind telephone numbers to channel identifiers 181 There are four options for this problem. First, the telephone 182 number can be included in the per-user header. Second, the 183 telephone number can be signaled reliably by a companion TCP 184 connection before data begins. Thirdly, the phone number can be 185 sent periodically in RTCP in a soft-state fashion. Fourthly, the 186 information can be sent periodically over a reliable TCP based 187 control channel. The first approach avoids any synchronization 188 problems, but has high overhead. The second approach is a more 189 traditional approach, but relies on hard state at the destination 190 ITG. The third approach allows for a refresh of state, but causes 191 longer setup delays in the face of packet loss. The fourth 192 approach guarantees reliable delivery of signaling information, 193 but also generates refreshes to allow for recovery from end-system 194 failures. 196 The most reasonable approach seems to be the second - the use of 197 TCP (or any other reliable protocol) for sending signaling 198 information. This approach guarantees that the critical 199 information is received correctly, and in a timely manner. It 200 avoids bandwidth inefficient refresh as well. 202 4.2 Payload type identification 204 There are a number of ways to identify the coding of the payload. 205 The first is through static types, identified by bits in the 206 header (like RTP is now). The second approach dynamically adjusts 207 the coding type based on external messages which bind a coding 208 type to a channel identifier. Such external messages can be either 209 UDP or TCP based. A related issue is synchronization of these 210 changes. Either the timestamps or sequence numbers can be used. 211 One approach to performing the synchronization is as follows: The 212 source sends a message reliably to the receiver, indicating that 213 it will change codings at timestamp N, where N is some future 214 timestamp (or SN). The N should be chosen far enough into the 215 future to guarantee that the receiver will get the TCP message 216 before time N. The farther away N is, the more robust the system 217 becomes, but the source also loses its ability to adapt quickly. 218 There are also several options for simple in-band signaling 219 methods which can assist in error recovery. This is based on the 220 assumption that it is better for the receiver to know that the 222 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 4 223 encoding has changed (even though it doesn't know to what), than 224 to know nothing. This avoids playing garbage out. A one or two bit 225 "coding sequence number" can be used in the header. Such a number 226 starts at zero. At the timestamp where the encoding changes, the 227 SN increments, and stays incremented until the next change. In 228 this fashion, we are guaranteed that the source will never play 229 out data using the wrong coding type. Probably just one or two 230 bits of this SN is necessary. 232 Yet another approach to changing payload types is via "pseudo- 233 dynamic" payloads. Before transmission of data commences, a 234 reliable exchange occurs which downloads a table of possible 235 encodings of the payload type, based on the capabilities of the 236 source. The table then remains active for the lifetime of the 237 connection. This technique can reduce the number of bits required 238 for the payload type, since a particular gateway is likely to 239 support just a few codecs. However, it is still a hard state 240 approach, but it would only fail in the face of end system 241 failure, not network failure. 243 Our conclusion is that it is desirable to have the PTI field in 244 the payload. This makes it possible to do more robust rate 245 control, which becomes a significant issue when multiple 246 connections are multiplexed together (and therefore the aggregate 247 bitrate increases). It also makes sense to signal a table of 248 encodings for the payload type at the beginning of the connection. 249 Any particular pair of ITG will generally only support a few 250 codecs. Therefore, dynamically setting the codings of the PTI bit 251 makes a more compact representation possible without restricting 252 the set of codecs which may be used. 254 4.3 Timestamps 255 Timing is a very complex issue for the multiplexing protocol. The 256 first question related to it is whether the protocol will support 257 mixing of media derived from separate clocks (i.e., voice and 258 video). Although doing this seems attractive, it is complex and in 259 opposition to the philosophy under which RTP was developed. RTP 260 explicitly states that separate media should be placed in separate 261 RTP streams. This allows for different QoS to be requested for 262 each media, and for clocks to be defined based on the media type. 263 Furthermore, this profile is geared towards the aggregation of 264 voice traffic generated from the POTS across the Internet. As a 265 result, the only source of data is from a single, 125us clock. 267 The next basic question is whether timestamps are needed 268 "globally", i.e., just one per packet independent of the number of 269 users, or "locally", whereby each user within a packet needs their 270 own timestamp. A separate question is the representation of these 271 timestamps in an efficient manner. When considering these 272 questions, the criteria to keep in mind are: 274 1. Can silence periods be recovered correctly 275 2. Can resynchronization occur in the face of packet loss 276 3. What is the impact on playout buffering and jitter 277 computation 279 The answer to this question depends on the desired capabilities of 281 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 5 282 the protocol. In the most general case, it is possible to have 283 different frame sizes for each user (for example, 20ms, 10ms, and 284 15ms) within the same packet. These frames can be arbitrarily 285 aligned in time with respect to each other (i.e., the 20ms frame 286 starts 5.3 ms after the beginning of another user's 10 ms frame). 287 The user can send packets off at any point, containing data from 288 those users whose frames have been generated before the packet 289 departure time. A somewhat more restrictive capability is to allow 290 for different frame sizes and time alignments, but to require that 291 any packet contains all the same frame sizes, all aligned in time. 292 The most restrictive case is to require separate RTP sessions for 293 users with different frame sizes. This requires a channel to be 294 torn down and re-setup when it changes codec. The desire to 295 perform flow control on a channel-by-channel basis makes this 296 approach unacceptable, and it is not considered further. 298 4.3.1 General Case 300 First consider the general case. Packets can contain frames from 301 some or all of the users, and those frames are not the same length 302 nor time aligned in any way. An example of such a scenario is 303 depicted in Figure 2. In the figure, there are three sources, and 304 the ti correspond to the times of packet emissions. When packets 305 are lost, the variability in the amount and time alignment of data 306 in each packet makes it impossible to reconstruct how much time 307 had elapsed based solely on sequence numbers (such reconstruction 308 IS possible in the single user case). Furthermore, the amount of 309 time elapsed can easily vary from user to user, and therefore 310 local timestamps are needed. 312 The general case introduces further complications which have to do 313 with jitter and delay computation. Such computations are needed 314 for RTCP reporting and possibly for the estimation of network 315 delays, used in dynamic playout buffers. In the single user case, 316 the jitter is computed between each packet as: 318 D(i,j) = (Rj - Ri) - (Sj - Si) 320 Where the Ri correspond to the reception times at the receiver 321 measured in RTP time, and the Si are the RTP timestamps in the 322 data packets. The delay is computed as the difference between the 323 arrival time at the receiver and generation time, as indicated by 324 the RTP timestamp. 326 In the multiple user case, these definitions no longer make sense, 327 as there is no single RTP timestamp any longer. Each arriving 328 packet will have a single arriving time (Ri), but multiple sending 329 times (Si,j) for each block j in the ith packet. There are a 330 number of alternatives for delay and jitter computation in this 331 case: compute such information for all users, compute such 332 information for a single user, or generate a single delay and 333 jitter estimate, but have it be based on information from all 334 users. There are pros and cons to each approach. 336 First of all, it is possible for different blocks to experience 337 different delays (and jitters) even though they are within the 339 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 6 340 same packet. This is because the general scenario allows for 341 significant variability, whereby blocks may either vary in size 342 from packet to packet and within a packet, or not be transmitted 343 immediately after their completion (the latter happens to source B 344 in Figure 2). Thus, it is arguable they it may be desirable to 345 perform adaptive playout buffering separately for each user, which 346 would require the storage and computation of delays for each user. 348 The second alternative is to compute the delays for a single user, 349 and use that information to size all of the other playout buffers. 350 This may be sub-optimal in terms of delay and loss, depending on 351 what fraction of the total delay and jitter are introduced by the 352 packetization itself. There is a second disadvantage to this 353 approach, however. When that particular user enters a silence 354 period, delay and jitter information is no longer being received, 355 and so estimates of network delay stop adapting. This implies that 356 delay estimates will be old for certain periods of time. An 357 alternative is to change the user from which delay and jitter 358 estimates are being collected. 360 The third alternative is to compute delay estimates based on some 361 measure derived from all of the users. There are several 362 reasonable approaches. For example, the delay estimate can be 363 computed as: 365 Delay = max{j, Ri - Si,j} 367 which would yield a conservative estimate of the delay for some 368 users. This approach requires storage of only a single set of 369 delay information, although computation still grows with the 370 number of users in a packet. 372 -------------------------------------------------- 373 || || || || 374 ----------------------------------------------------- 375 || || || || || || || 376 ----------------------------------------------------- 377 || || || || || || 378 ----------------------------------------------------- 380 t1 t2 t3 t4 t5 t6 t7 t8 382 Figure 2: Global Timestamp Problem 384 Sending local timestamps also requires extra bits in the block 385 headers. It is possible, however, to use offsets for the local 386 timestamps. A global timestamp can be used in the RTP header (the 387 field already exists), and each user has a modifier to indicate 388 position in time relative to that timestamp. 390 A related question is how big to make the offset field. This 391 offset is bounded by the difference in time between the earliest 392 and latest samples within a packet. Clearly, this itself is 393 bounded by the packetization delay at the source. For this 394 application, if we assume a 125us sample clock, and bound 395 packetization delays to 100ms, the offset field is bounded by 800 396 ticks, requiring 10 bits. 398 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 7 399 4.3.2 More Restrictive Case 401 As a more restrictive case, we allow blocks to be present in a 402 packet if their frame sizes are identical and aligned in time. 403 Note that this does not imply identical codecs or identical block 404 sizes in terms of bytes; many voice codecs operate with a 20ms or 405 50ms frame size. This case would allow all frame sizes of the same 406 size and time alignment, independent of the codec, into a packet. 408 This simplifies the timing issue tremendously. Now, the scenario 409 is much more like the single user application. The sequence 410 numbers and the frame size completely determine the timing when at 411 least one user is active. But, when all users enter silence, a 412 global timestamp is needed to indicate the duration of the silence 413 period. The global timestamp is sufficient to reconstruct the 414 timing in the face of losses. Therefore, in this case, only a 415 global timestamp is required. 417 It is desirable to support a variety of different frame sizes 418 within such an aggregated connection, however. The way to do this 419 in this case is to simply mandate that different packets can 420 contain different frame sizes; the only restriction is within a 421 packet. This is not as simple as it may seem at first. Once this 422 is done, the relationship between sequence numbers and timing is 423 lost. Consider an example. There are two frame sizes, 10ms and 424 30ms. Packet N contains 10ms frames, as does packet N+1 and N+2, 425 however, N+3 contains 30ms frames. Thus, although the difference 426 in sequence number between the first and fourth is three, the 427 relative timing is not 10ms*3 or 30ms*3. Due to this fact, the 428 measurement of jitter is complicated (for the same reasons 429 described in Section 4.3.1), as it should not be done between two 430 packets with different frame sizes. It also makes recovery 431 techniques based on sequence number more complex. To resolve this 432 problem, we use a natural concept in RTP, which is the 433 synchronization source (SSRC). The approach is to have a separate 434 SSRC for each frame size in use. Then, sequence numbers are 435 interpreted for each SSRC separately. This resolves the problem 436 with the relationship between timing and sequence numbering. It 437 also makes jitter and delay computations simpler - they are now 438 done for each SSRC separately. Furthermore, multiple jitter (and 439 delay, loss, etc.) values are reported to the source, one for each 440 frame size. This is also desirable, since the different frame 441 sizes will cause different packetization delays and packet sizes, 442 which may cause those packets to see different delays and losses 443 in the network than other packets. 445 This case has both advantages and drawbacks when compared to the 446 general case. As an advantage, timing is greatly simplified, and 447 the approach falls much in line with the original intentions of 448 RTP. However, it causes losses in efficiency for systems with a 449 variety of different frame sizes in operation simultaneously. Such 450 a situation arises naturally when flow control is applied to each 451 source individually, as opposed to altering the rate and codec 452 type for all of the active sources. 454 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 8 455 4.4 Channel ID 457 The question of channel identification may seem at first trivial - 458 simply use a 32 bit number, much like the SSRC, and be done with 459 it. However, 32 bits adds significant overhead. Reduction of the 460 number of bits for the channel ID becomes a complex issue. Unlike 461 the single user case, the connection may remain active for long 462 periods of time (days or months). The result is that channel ID's 463 will need to be reused during the lifetime of the connection. It 464 is critical to ensure that data from different channels is not 465 confused because of this. Large channel ID spacing helps to 466 resolve this issue (although it can not eliminate it), so an added 467 side effect of reducing the number of channel ID's possible is an 468 increase in the likelihood of such confusion. 470 The first question to be addressed is how many simultaneous users 471 can one expect to find in a single packet. 473 4.4.1 Number of Users 475 There are several ways to come up with some minimums and maximums. 477 Delay-bound 479 Clearly, as we add more users, the store and forward delays 480 increase since the packet size gets larger. Therefore, if we bound 481 the per-hop delay, and provide a lower bound on the codec bitrate 482 and packetization delay, an upper bound on the number of users can 483 be obtained. Consider a 2.4 kbps codec, with a 20ms frame size. 484 This is a reasonable minimum combination. Next, consider 50ms 485 store and forward delays. For a T1, this limits the number of 486 users within a packet to 965. For a T3, it is 30 times this, or 487 nearly 29,000. If silence suppression is used, the number of users 488 within a packet is roughly half the number of active users (on 489 average), thus requiring twice as many channel identifiers (1930 490 and 58,000). This bound doesn't seem to tight. Intuitively, even 491 965 seems too large. 493 Efficiency bound 495 The entire purpose of multiplexing is to improve upon efficiency. 496 Therefore, we should be able to support at least as many users as 497 is necessary to get good efficiency. Consider the typical case, a 498 16 kbps codec, with a 20ms packetization delay. This results in 499 320 bits of data per user. If we assume IP/UDP/RTP (20+8+12=40 500 bytes = 320 bits), plus an additional word (32 bits) of overhead 501 per user, the efficiency vs. N becomes: 503 E = (320N / ((320 + 32)N + 320)) 505 This reaches an asymptote of 90%. It is desirable to be within a 506 few percent of this, say 88%. Solving for N, this requires 7 users 507 in a packet, so that we must support at least 14 active channels 508 (again, due to stat mux). The lower bound, therefore, on the 509 number of users is around 14. 511 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 9 512 MTU Bound 514 In many cases, there is a maximum packet size. This is usually 515 around 1500 bytes. If we consider a very low bitrate codec, the 516 minimum block size from any particular user is 32 bits (otherwise, 517 overheads become very large, and we lose word alignment, so 32 518 bits is a good minimum). Dividing 1500 bytes by 4 bytes, we obtain 519 a maximum of 375 users. Multiplying by two, the number of active 520 channels needed is around 750. 522 Based on these bounds, we need to simultaneously support at least 523 10 users, and at most 750. This would imply that at least 8 to 10 524 bits of channel ID are required. 526 4.4.2 Channel ID Reuse Problem 528 It is important to guarantee that data from a particular channel 529 is never routed to a different channel; this would mean that a 530 user may hear pieces of conversations from different users, an 531 error we consider catastrophic. Such misrouting becomes possible 532 when a channel is torn down, and a new channel is set up soon 533 after using the same channel ID. Such a scenario is depicted in 534 Figure 3. Sometime after channel K is torn down, a new channel is 535 set up using the same channel ID, K. If the data packets (dotted 536 lines) are being delayed significantly, blocks from the old 537 channel K may still be present in the data stream after the new 538 channel K is established. These blocks will then be played out to 539 the new user of channel K. Protocol support is needed to 540 guarantee that this can never happen. 542 | Chnl K data here | 543 | .......> | 544 | | 545 | .......> | 546 | | 547 | | 548 | Teardown K | 549 | ---------------> | 550 | | 551 | Ack Teardown K | 552 | <--------------- | 553 | | 554 | Setup K | 555 | ---------------> | 556 | | 557 | Ack Setup K | 558 | <--------------- | 559 | Recv old Chnl K | 560 | .........> | 561 | .........> | 562 Source Destination 564 Figure 3: Channel ID Reuse Problem 566 The solution lies in an intelligent signaling protocol. The 567 protocol must support a two-way handshake for all control 568 messages. In addition, three simple rules must be obeyed at a 570 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 10 571 source when setting up or tearing down connections: 573 1. When a source sends a teardown message, it stops sending data 574 in the UDP stream for that channel. Furthermore, in the signaling 575 message, it indicates the sequence number of the packet which 576 contained the last block for that channel, call this sequence 577 number K. 578 2. A source cannot re-use a channel identifier until it has 579 received an acknowledge from the destination that that particular 580 channel was successfully torn down. 581 3. A source cannot send begin to send data from a particular 582 channel in the UDP stream until it has received an acknowledge 583 from the destination that the setup is complete. 585 A few simple rules must also be used at the receiver: 587 1. When a receiver gets a teardown message, it checks the 588 highest SN received so far (call this sequence number M). If M > 589 K, the channel is torn down, and any further blocks containing 590 that channel ID are discarded. If M < K, blocks from that channel 591 are accepted until the received SN exceeds K. Once this happens, 592 the channel is torn down and no further blocks with that channel 593 ID are accepted. 594 2. When a setup message is received, the destination will begin 595 to accept blocks with the given channel identifier, but only if 596 the sequence numbers of the packets in which they ride is greater 597 than K. 599 The use of the sequence numbers allows the receiver to separate 600 the old channel K blocks from the new ones. This guarantees that 601 the destination will not misroute packets. An additional benefit 602 is that the end of speech will not be clipped if the last data 603 packets arrive after the teardown is received. This protocol is 604 quite simple to implement, although it requires a table at the 605 receiver of the values of K for each channel ID. 607 Alternate solutions to this reuse problem exist which can operate 608 when the above restrictions are relaxed. The simplest approach is 609 to have the source keep a linked list of free channel ID's. The 610 list is initialized to contain all channel ID's, in order. When a 611 new channel is required to be established, the channel ID is taken 612 from the top of the list. When a channel is torn down, its ID is 613 placed at the bottom of the list. This makes the time between 614 channel ID reuse as long as possible, and reduces the probability 615 of confusion. With this method, it is no longer necessary to 616 include sequence numbers in the tear down messages. Also, the 617 receiver does not need to maintain a table. 619 4.4.3 Channel ID Coding 621 This section discusses some of the options for coding the channel 622 ID field. 624 4.4.3.1 Fixed Length 626 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 11 627 The fixed length approach is the most straightforward. A fixed 628 number of bits is assigned to the channel ID. Issues surrounding 629 the number of bits required have been discussed above. 631 4.4.3.2 Implicit + Present Mask 633 In reality, the channel ID's are very redundant. Both source and 634 destination know the set of active connections and their channel 635 identifiers from the signalling messages. Therefore, if the blocks 636 are placed in the packet in order of increasing channel ID, very 637 little information actually needs to be sent. In fact, without 638 silence suppression, channel activity and the presence of a block 639 in a packet are likely to be equivalent, in which case NO 640 information actually needs to be sent about channel ID's. 642 Unfortunately, there are some practical problems with this. First, 643 silence suppression is used. Secondly, even if it weren't, it is 644 possible for the voice codecs at the ITG not to have their framing 645 synchronized (as in the general case above), so that a packet may 646 not contain data from all users. Thirdly, the source and 647 destination do NOT have a consistent view of the state of the 648 system. There is a delay while signaling messages are in transit. 650 A few simple mechanisms can be used to overcome these 651 complexities. In the header of the packet, a mask is sent. Each 652 bit in the mask indicates whether data from a channel is present 653 in the packet or not. Mapping of channel ID's to bits is done by 654 sorting the channel ID's, and mapping the lowest number to the 655 first bit, next lowest to the second, etc. Therefore, if a channel 656 has no data for that packet, its bit is set to zero. Given that 657 the source and destination agree on how many connections are 658 active at all points in time, the number of bits required is known 659 to both sides. 661 The next step is to deal with the differences in state. An 662 additional field, called the "state-number", perhaps 5 bits, is 663 sent in the header of the packet. This field starts at zero. Lets 664 say at some point in time, its value is N. The source wishes to 665 tear down a channel. It sends the tear down message to the 666 destination, but continues to send data for that channel (or it 667 may choose to send nothing, but must set the appropriate bit in 668 the mask to zero). When the destination receives the message, it 669 replies with an acknowledge. When the acknowledge is received by 670 the source, the source considers the channel torn down, and no 671 longer sends data for it, nor considers it in computing the mask. 672 In the packet where this happens, the source also increments the 673 state-number field to N+1. The destination knows that the source 674 will do this, and will therefore consider the state changed for 675 all packets whose value of the field is N+1 or greater. When the 676 next signaling message takes effect, the field is further 677 increased. Even if packets are lost, the value of the state-number 678 field for any correctly received packet completely tells the 679 destination the state of the system as seen in that packet. 680 Furthermore, it is not necessary to wait for a particular setup or 681 teardown to be acknowledged before requesting another setup or 682 teardown. 684 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 12 685 The number of bits for the state-number field should be set large 686 enough to represent the maximum number of state changes which can 687 have taken effect during a round trip time. As an alternative, an 688 additional exchange can occur. After the destination receives a 689 packet with state number greater than N, it destroys the state 690 related to N, and sends back, reliably, a "free-state N" message, 691 indicating to the destination that state N is now de-allocated, 692 and can be used again. Until such a message is received, the 693 source cannot reuse state N. This is essentially a window based 694 flow control, where the flow is equal to changes in state. With 695 this addition, the number of bits for the state number can be 696 safely reduced, and it is guaranteed that the destination will 697 never confuse the state, independent of the number of state-number 698 bits used. However, the use of too few state bits can cause call 699 blocking or delay the teardown of inactive channels. 701 This problem in state difference appears to be similar to the 702 channel ID reuse problem described in Section 4.4.2. However, 703 there is an important difference. In the channel ID reuse problem, 704 if the packet containing the last block of a user arrives before 705 the signaling message tearing down that connection, there is no 706 problem. The destination will generally play out silence until the 707 signaling message is received. Here, however, the destination must 708 know that blocks are no longer present in the data stream 709 independent of when the signaling messages arrive. 711 There are some drawbacks to this approach. They require the source 712 and destination to maintain state. Any error in processing at 713 either end, or a hardware failure, causes a complete loss of 714 synchronization. This "hard-state" nature of the protocol can be 715 relaxed by having the source send the complete state of the system 716 with each signaling message, along with the "state-number" field 717 for which this state takes effect. This guarantees that even in 718 the event of end-system failure, the system state will be 719 refreshed whenever a new connection is set up or torn down. 720 Furthermore, the state can be sent periodically to improve 721 performance. 723 4.5 Length Indicators 725 There are many ways to actually code the length indicators. The 726 first question, however, is the range of lengths which must be 727 coded. 729 4.5.1 Range of Length Indicators 731 Here, there is a clear tradeoff between flexibility and 733 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 13 734 efficiency. A larger range can accommodate a variety of different 735 media (such as video) where lengths may be large. However, this 736 comes at the expense of a long length field, which may require 737 another word of header to hold. For voice, one would expect a 738 maximum bitrate to be 64 kbps, and around 50ms packetization 739 delay. This yields exactly 100 words of data. Therefore, an eight 740 bit field is probably sufficient for most voice applications. 742 4.5.2 PTI Based Lengths 744 In many applications, the amount of data present depends on the 745 voice codec in use. Frame based coders will generally send a frame 746 at a time. Since the codec type is indicated by the PTI field, it 747 may not always be necessary to send length information at all. 748 Even for non-frame based codecs, such as PCM, default data sizes 749 can be set in the standard (as in RFC 1890 [4]). An extension bit 750 can be used to indicate a non-standard length, so that when set, a 751 length field follows. This allows for efficient coding of the most 752 common cases, but allows for variable lengths with little 753 additional cost. 755 4.5.3 Variable Length w/ Indicator 757 In this approach, a variable length header is used. All of the 758 length indicators for all of the blocks are placed together in the 759 beginning of the packet. However, the first four bits of this 760 header field indicate the number of bits used for each length 761 field. What follows are the length fields themselves, each using 762 the number of bits indicated by the first four bits. This approach 763 scales well, using a small overhead when the block lengths are 764 small, and a larger overhead when they are larger. The drawback is 765 a variable length header field, plus additional complexity in the 766 parsing. An example of this technique is depicted in Figure 4. In 767 the first example, the four bit indicator field has a value of 768 three, so that the length fields are all three bits long. The four 769 lengths are then 2,6,3, and 8. In the second example, the 4 bit 770 indicator has a value of two, so that the length fields are all 771 two bits long. The four lengths are thus 3,2,1, and 3. 773 Example A: 0011 010 110 011 100 774 Example B: 0010 11 10 01 11 776 Figure 4: Variable Length w/ Indicator 778 4.5.4 Remaining Packet Length Based Lengths 780 UDP always informs RTP of how many bytes are in the payload. This 781 itself restricts the possible length of the first block, since its 782 length must be less than the total packet length minus the RTP 783 header. Furthermore, as each block is placed into the packet, the 784 possible set of lengths that it can have shrinks - it must always 785 be less than the remaining length in the packet. This approach, 786 therefore, codes each length field with log2 of the number of bits 788 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 14 789 remaining in the packet. This approach works extremely well when 790 there is a long packet followed by several shorter ones, whereas 791 the previous approach performs poorly in this case. Furthermore, 792 it eliminates the length indicator present in the previous 793 approach. However, it is even more complex than the previous 794 technique. It can result in no savings under some conditions, 795 especially since the header fields must be rounded to 32 bits. 797 Consider an example. The total size of the packet is 31 words. 798 Inside of it are three blocks, the first whose length is 17, the 799 second 8, and the third, 6. We would code the length field with 5 800 bits. After this block is read, the remaining amount of data in 801 the packet is 14 words. Therefore, the next length field is coded 802 with 4 bits. After this block, the remaining amount of data in the 803 packet is 6 words, so the final length field is coded with three 804 bits. The total is therefore 5+4+3 = 12 bits. In the previous 805 approach (Section 4.5.3), the entire length field would have 806 required 4 bits for the indicator (whose value would be 5), 807 followed by 3 five bit fields, for a total of 19 bits. 809 One may question this example since the overhead of the length 810 fields itself is not taken into account when computing the 811 remaining length of the packet. While this can be incorporated, it 812 makes things even more complex, and it is not actually necessary. 813 All that is required is that the length fields are coded with 814 log2(M), where M is any bound on the remaining amount of data 815 which can be deterministically computed from past information. A 816 simple bound is the packet length minus the data seen thus far 817 (one can also subtract away any fixed length fields), precisely 818 the metric used in the example above. 820 4.5.5 Table Based Approach 822 Realistically, most systems will operate with codecs that generate 823 data in a fixed set of lengths (a frame size, for example). In 824 that case, the set of lengths which can appear in the packet are 825 usually very restricted. To take advantage of this fact, a table 826 can be transmitted to the receiver reliably before transmission 827 commences. This table can indicate the actual length of a block, 828 and its coding. The symbols transmitted in the data packets are 829 then used in this table to look up the actual lengths. This can 830 reduce the length field to 2 or 3 bits. These lengths then all 831 occur next to each other in the header. The technique now relies 832 on state at the receiver, and the parsing process is further 833 complicated by table lookups. In addition, the approach only works 834 if you know the set of lengths before the system begins operation. 835 If you allow the table to be dynamically modified during a 836 session, synchronization problems occur, and the system becomes 837 quite complex. 839 Further gains can be achieved through the use of Huffman codes 840 instead of fixed length codes This only makes sense when different 842 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 15 843 codecs (and correspondingly different lengths) are used with 844 different frequencies. An example of such a situation is when the 845 codec changes to a higher rate because of music-on-hold; a rare 846 event in general. 848 4.6 Marker Bit 850 The marker bit has a general functionality, but is normally used 851 to indicate the beginning of a talkspurt. It seems like a good 852 idea to include this bit for each user. 854 4.7 Location of Per User Overhead 856 There will generally be overhead on a per-user basis (information 857 such as channel ID, length, etc.). This information can be located 858 in one of three places. First, it can all reside in front of the 859 block to which it is applicable. Second, it can all be pasted 860 together and reside up front in the header of the packet. The 861 third is a hybrid solution, where some of it resides up front 862 (such as channel ID), and some resides in front of the data. There 863 are various pros and cons to the different approaches. The hybrid 864 approach can be complex, since data is split into multiple places. 865 The case where all the header is up front has a few minor 866 advantages. First, it allows for a complete separation of the data 867 from the header. The implementation is likely to be a little less 868 complex, since extracting blocks does not require actually moving 869 through the payload. 871 5. Options 873 5.1 Option I: Mixer Based 875 This option is the most straightforward to implement, but has the 876 most overhead. The basic premise is to reuse the mixer concept 877 introduced in RTP. Each user is considered a contributing source, 878 and the gateway is considered a mixer. However, instead of mixing 879 the media, separate data from each user appear in the payload. The 880 32 bit CSRC identifies each user, acting as the channel ID. Data 881 from each user is organized into blocks. Each block has its own 32 882 bit header, which includes the length (12 bits) in units of 32 bit 883 words, Marker bit (1b), TimeStamp Offset (12b), and Payload Type 884 (7b). Furthermore, the payload type and marker bit are stricken 885 from the RTP header (since they only make sense for an individual 886 user), and the CC field expanded to fill the missing bytes. This 887 allows for a 12 bit CC field, or 4096 users in a packet. Thus, 888 the packet would look like: 890 Figure 5: Option I 892 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 16 893 This approach allows for the most amount of generality in terms of 894 variable length coders and coders with different frame sizes (see 895 Section 4.3.1). The channel ID is longer than necessary, but using 896 the concept of a contributing source for the channel ID 897 necessitates the use of the additional bits. There are several 898 variations on option I, many of which have been mentioned above: 900 I.A: Put the CSRC with each 32 bit length+M+PT field, instead of 901 all of them being at the beginning. This has some pros and cons. 902 As an interesting artifact of this change, it is no longer 903 necessary to have a CC field. The length passed up by UDP is 904 sufficient to recover the point at where you stop checking for 905 additional blocks from users in the payload. In fact, the length 906 field in the last block is not strictly necessary either. 908 I.B: Do the opposite of I.A. Put the length+M+PT field up front 909 along with the CSRC fields, with the pattern being CSRC 1, length 910 1, CSRC 2, length 2, etc. Here again, the CC field is not strictly 911 necessary. 913 I.C: The CSRC field can be shrunk to 8 bits. This allows for 914 either 4 or two channel ID's to be coded in the space of one word, 915 whereas only one could in the current size of the field. 917 I.D: The CSRC field can be shrunk to 16 bits. 919 5.2 Option II: One word header 921 This option eliminates the large channel ID field present in the 922 previous option. In the RTP header, the CC bit is set to zero, the 923 marker bit has no meaning, and the payload type is TBD (possible 924 uses include an indication of the number of blocks in the packet). 925 The RTP timestamp corresponds to the generation of the first 926 sample, among all blocks, enclosed in this packet. A one word 927 header precedes each block of data. The number of blocks is known 928 by parsing them until the end of the RTP packet. The one word 929 field has a channel ID (8 bits), length (8 bits), Marker (1 bit), 930 timestamp offset (11 bits), and payload type (4 bits). Channel ID 931 number 255 is reserved, and causes the header to be expanded to 932 allow for greater length, payload type, and possibly channel ID 933 encodings. The specific format for this expanded header is for 934 further study. Given the compacted payload type space, it may be a 935 good idea to allow negotiation of the meaning for the payload type 936 at the beginning of the connection. It may be worthwhile to expand 937 the length field at the expense of the channel ID - this issue is 938 for further study. 940 The format of the packet is thus: 942 Figure 6: Option II 944 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 17 945 5.3 Option III - Restricted Case 947 Option II has the advantage of being able to support multiple 948 frame sizes within a single packet. However, it comes at the 949 expense of a 32 bit header (which can be large for low bitrate 950 codecs), and at a reduced payload type field. This option has a 16 951 bit header, but does not support different frame sizes within a 952 packet. It therefore falls into the category described in Section 953 4.3.2. Of the 16 bit header, the first bit is an expand bit (to be 954 described shortly), and the second bit is the marker bit. The 955 following 6 bits indicate payload type, and the remaining 8 are 956 for channel ID. When the expand bit is set, an additional 16 bits 957 are present, which indicate the length of the block. When expand 958 is clear, the length is derived from the payload type. Since there 959 is no timestamp offset, all the blocks in the packet must be time 960 aligned and have the same frame lengths. Different sized frames 961 are supported by using a different SSRC for each frame length (see 962 Section 4.3.2). In the RTP header, the CC field is always zero. 963 The marker bits and payload type are undefined. The timestamp 964 indicates the time of generation of the first sample of each 965 block. SSRC is randomly chosen, but always different for each 966 frame size. 968 The block headers are all located at the beginning of the packet, 969 and follow each other. If the total length of the fields is not a 970 multiple of 32 bits, it is padded out to 32. The structure of the 971 header is such that fields never break across packet boundaries. 972 An example of such a packet is given in Figure 7. There are 7 973 blocks in this example. The first two have standard lengths based 974 on the PT field. The next one uses the expansion bit to indicate 975 the length. The fourth uses the PT field, the fifth the expansion 976 bit, and the last two use the PT field. The last 16 bits of the 977 header are padded out. 979 Figure 7: Option III 981 5.4 Option IV - Stacked RTP 983 This approach uses a duplicate of the RTP header as the per-block 984 header. It is therefore extremely inefficient (12 bytes per 985 block), but has several advantages: different media types can be 986 mixed, since the timestamps are no longer related, and little 987 processing is required if the sources being combined came from a 988 single user RTP source. It also works well when one of the users 989 is actually a mixer (for example, a conference bridge), since the 990 CSRC can be used. Its main advantage is the reduction in overhead 991 due to the IP and UDP headers. In addition to the standard RTP 992 header, an additional header is required for length indication. 993 This header has a number of 16 bit fields, each of which indicates 994 a length for its corresponding block (including the 12 byte RTP 995 header). The number of such 16 bit lengths fields is known by 996 continuing to look for additional length fields until the total 997 length of the packet passed up from UDP has been accounted for. If 998 an odd number of such length fields is required, then an 999 additional 16 bits of padding is inserted to make the length 1001 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 18 1002 header a multiple of 32 bits. 1004 The format of such a packet is given in Figure 8. 1006 Figure 8: Option IV 1008 5.5 Option V: Compacted 1010 This option uses the Implicit + Mask approach outlined in Section 1011 4.4.3.2 to code the channel ID. In all other respects it is 1012 similar to Option III. Now, however, the per-block header can be 1013 reduced to one byte: 1 bit of expansion, 1 bit of marker, and 6 1014 bits of payload type. Furthermore, the length field (present when 1015 the expansion bit is set) is reduced to 8 bits from 16 in Option 1016 III. This reduction saves on space, but it also guarantees that 1017 fields remain aligned on byte boundaries. The mask bits are 1018 present in the beginning of the packet, and they are preceded by a 1019 8 bit state-number. If the number of active channels is not a 1020 multiple of 32, the mask field is padded out to a full word. This 1021 approach is extremely efficient, but the channel identification 1022 procedure is more complex and requires additional signaling 1023 support. 1025 A diagram of a typical packet for this option is given in Figure 1026 9. The marker bits are indicated with lowercase m's. There are 1027 four active channels, each of which is present in this packet (all 1028 four mask bits would then be 1). The first block has a standard 1029 length, but the second has its expansion bit set, so that an 8 bit 1030 length field follows. The remaining two blocks have normal 8 bit 1031 headers. The last 24 bits of the header are padded to a word 1032 boundary. 1034 Figure 9: Option V 1036 6. Comparison of Options 1038 In this section, the options are compared in terms of efficiency. 1039 Issues relating to complexity, scalability, and generality have 1040 already been discussed in previous sections. The analysis here 1041 consists of two parts. The first is a table, indicating the 1042 efficiency of each option for a variety of speech codecs. Several 1043 tables are included for different numbers of users. The second 1044 analysis consists of a series of graphs which consider the 1045 efficiency vs. bitrate, assuming a fixed frame size and a certain 1046 number of users. This analysis helps to indicate the range of 1047 codecs which may be reasonably supported with each option. 1049 6.1 Specific Codecs 1051 In both Table 1 and Table 2, the efficiency vs. codec for all 1052 three options is tabulated. For G.711, G.726, G.728 and G.722, the 1053 frame size listed is a multiple of the actual frame size of the 1054 codec, which is too small to be sent one at a time. The efficiency 1055 is computed as the number of words of payload such a codec would 1057 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 19 1058 occupy, times the number of users, divided by the total packet 1059 size (i.e., it does not consider inefficiencies due to padding the 1060 payload portion). Note that Option V is always superior in 1061 efficiency. The efficiencies are generally 1 to 10 percent apart. 1062 Table 1 considers the case where there are 10 users, and Table 2 1063 considers the case where there are 24. 1065 Codec Bitrate FrameSize Opti Optio Optio Optio Optio Optio Optio 1066 (kbps) (ms) on n I.C n I.D n II n III n IV n V 1067 I 1068 G.711 64 20 93.0 94.56 94.12 95.24 96.39 90.50 96.84 1069 2% % % % % % % 1070 G.726, 32 20 86.9 89.69 88.89 90.91 93.02 82.64 93.88 1071 6% % % % % % % 1072 G.728, 16 18.75 76.9 81.30 80.00 83.33 86.96 70.42 88.47 1073 2% % % % % % % 1074 G.729 8 10 50.0 56.60 54.55 60.00 66.67 41.67 69.72 1075 0% % % % % % % 1076 G.723 5.3 30 62.5 68.49 66.67 71.43 76.92 54.35 79.33 1077 0% % % % % % % 1078 G.723 6.3 30 66.6 72.29 70.59 75.00 80.00 58.82 82.16 1079 7% % % % % % % 1080 ITU 4kbps 4 20 50.0 56.60 54.55 60.00 66.67 41.67 69.72 1081 0% % % % % % % 1082 G.722 64 15 90.9 92.88 92.31 93.75 95.24 87.72 95.84 1083 1% % % % % % % 1084 GSM Full 13 20 75.0 79.65 78.26 81.82 85.71 68.18 87.35 1085 Rate 0% % % % % % % 1086 TCH Half 5.6 20 57.1 63.49 61.54 66.67 72.73 48.78 75.43 1087 Rate 4% % % % % % % 1088 IS54 7.95 20 62.5 68.49 66.67 71.43 76.92 54.35 79.33 1089 0% % % % % % % 1090 IS96 8.5 20 66.6 72.29 70.59 75.00 80.00 58.82 82.16 1091 7% % % % % % % 1092 EVRC 8.5 20 66.6 72.29 70.59 75.00 80.00 58.82 82.16 1093 7% % % % % % % 1094 PDC Full 6.7 20 62.5 68.49 66.67 71.43 76.92 54.35 79.33 1095 Rate 0% % % % % % % 1096 PDC Half 3.45 40 62.5 68.49 66.67 71.43 76.92 54.35 79.33 1097 Rate 0% % % % % % % 1098 Table 1: 10 Users 1100 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 20 1101 Codec Bitrat FrameSize Optio Optio Optio Optio Optio Optio Optio 1102 e (ms) n I n I.C n I.D n II n III n IV n V 1103 (kbps) 1104 G.711 64 20 94.30 96.00 95.43 96.58 97.76 91.34 98.26 1105 % % % % % % % 1106 G.726 32 20 89.22 92.31 91.25 93.39 95.62 84.06 96.57 1107 % % % % % % % 1108 G.728 16 18.75 80.54 85.71 83.92 87.59 91.60 72.51 93.37 1109 % % % % % % % 1110 G.729 8 10 55.38 64.29 61.02 67.92 76.60 44.17 80.87 1111 % % % % % % % 1112 G.723 5.3 30 67.42 75.00 72.29 77.92 84.51 56.87 87.57 1113 % % % % % % % 1114 G.723 6.3 30 71.29 78.26 75.79 80.90 86.75 61.28 89.42 1115 % % % % % % % 1116 ITU 4kbps 4 20 55.38 64.29 61.02 67.92 76.60 44.17 80.87 1117 % % % % % % % 1118 G.722 64 15 92.54 94.74 93.99 95.49 97.04 88.78 97.69 1119 % % % % % % % 1120 GSM Full 13 20 78.83 84.38 82.44 86.40 90.76 70.36 92.69 1121 Rate % % % % % % % 1122 TCH Half 5.6 20 62.34 70.59 67.61 73.85 81.36 51.34 84.93 1123 Rate % % % % % % % 1124 IS54 7.95 20 67.42 75.00 72.29 77.92 84.51 56.87 87.57 1125 % % % % % % % 1126 IS96 8.5 20 71.29 78.26 75.79 80.90 86.75 61.28 89.42 1127 % % % % % % % 1128 EVRC 8.5 20 71.29 78.26 75.79 80.90 86.75 61.28 89.42 1129 % % % % % % % 1130 PDC Full 6.7 20 67.42 75.00 72.29 77.92 84.51 56.87 87.57 1131 Rate % % % % % % % 1132 PDC Half 3.45 40 67.42 75.00 72.29 77.92 84.51 56.87 87.57 1133 Rate % % % % % % % 1134 Table 2: 24 Users 1136 6.2 Efficiency vs. Bitrate 1137 The following figure considers the efficiency of the protocol vs. 1138 bitrate. For this case, the frame size is fixed at 20ms, and the 1139 number of users at 24. As the bitrate varies, the block size 1140 varies, and therefore the efficiency does as well. The efficiency 1141 here is computed in a slightly different manner than the graph 1142 above. Here, the efficiency is the bitrate times the frame size 1143 (without padding to 32 bits), divided by the same quantity plus 1144 the packet and block overhead. This avoids the otherwise sawtooth 1145 behavior of the graph, which makes it very difficult to read. 1147 The graph is very illustrative. The ordering of the efficiencies 1148 is no surprise; option V is always superior. However, the 1149 difference between the options is interesting. Despite the 1150 difference in overhead by a factor of two, Option V and Option III 1151 are very close in efficiencies over a wide range of bitrates. This 1152 is due to the fact that it requires a lot of users at low bitrates 1153 to overcome the IP/UDP/RTP header overhead, and at higher 1154 bitrates, the payload sizes are large enough to make the 1155 difference in block headers inconsequential. 1157 J. Rosenberg, H. Schulzrinne Expires 5/26/97 Pg. 21 1158 7. References 1159 _______________________________ 1160 [1] R. Ramjee, J. Kurose, D. Towsley, H. Schulzrinne, "Adaptive 1161 Playout Mechanisms for Packetized Audio Applications in Wide Area 1162 Networks", Proceedings of IEEE Infocom, 1994 1163 [2] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A 1164 Transport Protocol for Real-Time Applications", Audio Visual 1165 Working Group Request for Comments RFC 1889, IETF, January 1996 1166 [3] M. Handley, V. Hardman, I. Kouvelas, C. Perkins, J. Bolot, A. 1167 Vega-Garcia, S. Fosse-Parisis, "Payload Format Issues for 1168 Redundant Encodings in RTP", Work In Progress 1169 [4] H. Schulzrinne, "RTP Profile for Audio and Video Conferences 1170 with Minimal Control", Audio Visual Working Group Request for 1171 Comments RFC 1890, IETF, January 1996