Internet Engineering Task Force AVT Working Group Internet Draft J.Rosenberg, H.Schulzrinne draft-ietf-avt-muxissues-00.txt Bell Labs/Columbia U. October 1, 1998 Expires: March 1999 Issues and Options for RTP Multiplexing STATUS OF THIS MEMO This document is an Internet-Draft. Internet-Drafts are working docu- ments of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute work- ing documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference mate- rial or to cite them other than as ``work in progress''. To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. ABSTRACT This memorandum discusses the issues and options involved in the design of a new transport protocol for multiplexed voice within a single packet. The intended application is the interconnection of devices which provide trunking or long distance telephone service over the Internet. Such devices have many voice connections simultaneously between them. Multiplexing them into the same connection improves on the efficiency, enables the use of low bitrate voice codecs, and improves scalability. Options and issues con- cerning timestamping, payload type identification, length indication, and channel identification are discussed. Sev- eral possible header formats are identified, and their efficiencies are compared. J.Rosenberg, H.Schulzrinne [Page 1] Internet Draft RTP Mux Issues October 1, 1998 1 Introduction Internet telephone gateways (ITGs) allow a public switched telephony user (PSTN) user to contact another PSTN user, with the long distance portion of the call routed over the Internet. Such a scenario is depicted in Figure 1. ~~~~~~~~ ------- ~~~~~~~~~~ ------- ~~~~~~~~ A --| | | | | | | | | |-- C | PSTN |--| ITG |---| IP NET |---| ITG |--| PSTN | B --| X | | J | | | | K | | Y |-- D ~~~~~~~~ ------- ~~~~~~~~~~ ------- ~~~~~~~~ Figure 1: Internet telephony gateway architecture Subscribers A and B connect to ITG J via their local telephone net- work, X. A wishes to speak with user C, and B wishes to speak with user D, both of which are connected to local phone network Y. To complete the call, ITG J packetizes and transports the voice to and from A and B through the IP network, to remote gateway K. There, ITG K completes the calls to C and D through PSTN Y. This type of arrangement and common destination may be particularly common for connecting the PBXs of corporate branch offices across the Internet. In this scenario, ITGs J and K act as Internet hosts, which are effectively proxies for the telephone users connected to them. Unlike typical Internet telephony, however, their will often be multiple active calls between a pair of gateways, each representing a differ- ent pair of users. Gateways can signal calls using SIP [1], H.323 or proprietary signalling protocols. Media data is transported via a separate RTP [2] session for each user. We observe that using a separate RTP session for each user connected between a pair of gateways is wasteful. Rather, it would be more efficient to multiplex users between a pair of gateways into a single RTP session. A number of proposals have been made for RTP extensions to accomplish this multiplexing, [3][4] [5]. This memo discusses some of the issues and options for multiplexing users within RTP between a pair of gateways. There are other applica- tions for RTP multiplexing, such as transport of RTP in a switched RTP network, depicted in Figure 2. In this scenario, an entity which we call in RTP Switch, receives some number of RTP muxed connections. It extracts the multiplexed payloads from each of the received multi- plex streams, switches the payloads, and generates a new set of RTP Multiplexed streams. These streams may be destined for other RTP Switches, or for telephony gateways. J.Rosenberg, H.Schulzrinne [Page 2] Internet Draft RTP Mux Issues October 1, 1998 The switched scenario allows better network utilization. By allowing RTP multiplexing only between pairs of gateways, there is an effec- tive full mesh RTP network, with the number of multiplexed users between a pair of gateway potentially growing small with a large num- ber of gateways. An RTP Switched network would allow for greater mul- tiplexing. However, it comes at the significant cost of management, dynamic routing, and central point of failure requirements. These scenarios have differing requirements. In this document, we focus on the gateway to gateway case in Figure 1. ------- -------- | | RTPMux --------- RTPMux | | | ITG |--------| |--------| ITG 3 | | 1 | | | -------- ------- | | RTPMux -------- ------- | RTP |--------| | | | RTPMux | Switch | | ITG 4 | | ITG |--------| | -------- | 2 | | | RTPMux -------- ------- | |--------| | | | | ITG 5 | --------- -------- Figure 2: Internet telephony gateway architecture 2 Terminology oUser: One of the individuals who has data within the RTP packet. oConnection: The point to point RTP session between two ITGs. oChannel: A virtual connection which is established by allowing a user to send data within a packet. There are many channels per connection - this represents the multiplexing. oChannel Identifier: A number which identifies a channel. oBlock: The section of the payload of a packet which contains data for a particular user. 3 Requirements The transport protocol must provide, at a minimum, the following functionality: J.Rosenberg, H.Schulzrinne [Page 3] Internet Draft RTP Mux Issues October 1, 1998 1. Delineation: Data from different users must be clearly delin- eated. 2. Identification. The channel to which the data belongs must be identified. 3. Variable lengths: The protocol should support variable length blocks from a particular user. This allows for variable rate codecs and adjustment of packetization delays. 4. Low overhead: Since the protocol is designed for low rate voice, it should have low overhead. This issue is extremely important. New coders are emerging which can support near toll quality at 5.3 kbps, and acceptable quality at rates even as low as 4 kbps. It is desirable to support such codecs, as they can reduce the cost of providing an ITG service. Furthermore, advances in coding technology indicate that it is desirable to send very low bitrate information (1 kbps or less) during silence periods, so that background noise can be reproduced well (as opposed to sending nothing). Support of such rates requires a protocol with low overhead. 5. Marker: A general purpose marker bit should be available for all users within the connection. 6. Payload Identification. The codec in use for each user should be indicated somehow. It is a requirement to allow for the coding type to change during the lifetime of a channel. 4 Issues The following section identifies a number of issues which have an impact on the design of the protocol. It also identifies a variety of options for providing the specific services of the protocol. 4.1 Payload type identification There are a number of ways to identify the coding of the payload. They include in-band static types, in-band dynamic types, or out-of- band. The in-band approaches are based on some kind of payload type identifier, the semantics of which are either known apriori (static), or signaled ahead of time (dynamic). The out of band techniques sig- nal a binding between the channel identifier and a coder at the beginning (or even during) the lifetime of the connection. With out-of-band signaling, synchronizing the signaling with the media stream is a major issue. The synchronization can be accom- plished with either timestamps of sequence numbers. J.Rosenberg, H.Schulzrinne [Page 4] Internet Draft RTP Mux Issues October 1, 1998 One approach to performing the synchronization is as follows: The source sends a message reliably to the receiver, indicating that it will change codings at timestamp N, where N is some future timestamp (or SN). The N should be chosen far enough into the future to guaran- tee that the receiver will get the TCP message before time N. The farther away N is, the more robust the system becomes, but the source also loses its ability to adapt quickly. There are also several options for simple in- band signaling methods which can assist in error recovery. This is based on the assumption that it is better for the receiver to know that the encoding has changed (even though it doesn't know to what), than to know nothing. This avoids playing gar- bage out. A one or two bit coding sequence number can be used in the header. Such a number starts at zero. At the timestamp where the encoding changes, the SN increments, and stays incremented until the next change. In this fashion, we are guaranteed that the source will never play out data using the wrong coding type. Probably just one or two bits of this SN is necessary. Using in-band payload types allows the coding to be explicitly indi- cated for each packet. This eliminates synchronization problems, allows the sender to change encodings without out of band signaling. Its flexibility is the reason in-band payload types were used for generic RTP in the first place. By using dynamic types, the number of bits for the encoding can be reduced by limiting the number of codecs that can be used simultaneously during a session. Our conclusion is that it is desirable to have the PTI field in the payload (ie, in-band). This makes it possible to do more robust rate control, which becomes a significant issue when multiple connections are multiplexed together (and therefore the aggregate bitrate increases). It also makes sense to signal a table of encodings for the payload type at the beginning of the connection. Any particular pair of ITG will generally only support a few codecs. Therefore, dynamically setting the codings of the PTI bit makes a more compact representation possible without restricting the set of codecs which may be used. 4.2 Timestamps Timing is a very complex issue for the multiplexing protocol. The first question related to it is whether the protocol will support mixing of media derived from separate clocks (i.e., voice and video). Although doing this seems attractive, it is complex and in opposition to the philosophy under which RTP was developed. RTP explicitly states that separate media should be placed in separate RTP streams. This allows for different QoS to be requested for each media, and for clocks to be defined based on the media type. Furthermore, this pro- file is geared towards the aggregation of voice traffic generated J.Rosenberg, H.Schulzrinne [Page 5] Internet Draft RTP Mux Issues October 1, 1998 from the POTS across the Internet. As a result, the only source of data is from a single, 125us clock. The next basic question is whether timestamps are needed globally, i.e., just one per packet independent of the number of users, or locally, whereby each user within a packet needs their own timestamp. A separate question is the representation of these timestamps in an efficient manner. When considering these questions, the criteria to keep in mind are: 1. Can silence periods be recovered correctly 2. Can resynchronization occur in the face of packet loss 3. What is the impact on playout buffering and jitter computation The answer to this question depends on the desired capabilities of the protocol. In the most general case, it is possible to have different frame sizes for each user (for example, 20ms, 10ms, and 15ms) within the same packet. These frames can be arbitrarily aligned in time with respect to each other (i.e., the 20ms frame starts 5.3 ms after the beginning of another user's 10 ms frame). The user can send packets off at any point, containing data from those users whose frames have been generated before the packet departure time. A somewhat more restrictive capability is to allow for different frame sizes and time alignments, but to require that any packet contains all the same frame sizes, all aligned in time. The most restrictive case is to require separate RTP sessions for users with different frame sizes. This requires a channel to be torn down and re-setup when it changes codec. The desire to per- form flow control on a channel-by- channel basis makes this approach unacceptable, and it is not considered further. 4.2.1 General Case First consider the general case. Packets can contain frames from some or all of the users, and those frames are not the same length nor time aligned in any way. An example of such a scenario is depicted in Figure 3. In the figure, there are three sources, and the ti corre- spond to the times of packet emissions. When packets are lost, the variability in the amount and time alignment of data in each packet makes it impossible to reconstruct how much time had elapsed based solely on sequence numbers (such reconstruction IS possible in the single user case). Furthermore, the amount of time elapsed can easily vary from user to user, and therefore local timestamps are needed. The general case introduces further complications which have to do with jitter and delay computation. Such computations are needed for RTCP reporting and possibly for the estimation of network delays, J.Rosenberg, H.Schulzrinne [Page 6] Internet Draft RTP Mux Issues October 1, 1998 used in dynamic playout buffers. In the single user case, the jitter is computed between each packet as: D(i,j) = (Rj - Ri) - (Sj - Si) Where the Ri correspond to the reception times at the receiver mea- sured in RTP time, and the Si are the RTP timestamps in the data packets. The delay is computed as the difference between the arrival time at the receiver and generation time, as indicated by the RTP timestamp. In the multiple user case, these definitions no longer make sense, as there is no single RTP timestamp any longer. Each arriving packet will have a single arriving time (Ri), but multiple sending times (Si,j) for each block j in the ith packet. There are a number of alternatives for delay and jitter computation in this case: compute such information for all users, compute such information for a single user, or generate a single delay and jitter estimate, but have it be based on information from all users. There are pros and cons to each approach. First of all, it is possible for different blocks to experience dif- ferent delays (and jitters) even though they are within the same packet. This is because the general scenario allows for significant variability, whereby blocks may either vary in size from packet to packet and within a packet, or not be transmitted immediately after their completion (the latter happens to source B in Figure 3). Thus, it is arguable they it may be desirable to perform adaptive playout buffering separately for each user, which would require the storage and computation of delays for each user. The second alternative is to compute the delays for a single user, and use that information to size all of the other playout buffers. This may be sub-optimal in terms of delay and loss, depending on what fraction of the total delay and jitter are introduced by the packeti- zation itself. There is a second disadvantage to this approach, how- ever. When that particular user enters a silence period, delay and jitter information is no longer being received, and so estimates of network delay stop adapting. This implies that delay estimates will be old for certain periods of time. An alternative is to change the user from which delay and jitter estimates are being collected. The third alternative is to compute delay estimates based on some measure derived from all of the users. There are several reasonable approaches. For example, the delay estimate can be computed as: Delay = maxj, Ri - Si,j J.Rosenberg, H.Schulzrinne [Page 7] Internet Draft RTP Mux Issues October 1, 1998 which would yield a conservative estimate of the delay for some users. This approach requires storage of only a single set of delay information, although computation still grows with the number of users in a packet. --------------------------------- Source A | | | --------------------------------- Source B | | | | | | | --------------------------------- Source C | | | | | | --------------------------------- | | | | | || | t1 t2t3 t4 t5 t6t7 tt8 -------------------------------------- time / Figure 3: Global Timestamp Problem Sending local timestamps also requires extra bits in the block head- ers. It is possible, however, to use offsets for the local times- tamps. A global timestamp can be used in the RTP header (the field already exists), and each user has a modifier to indicate position in time relative to that timestamp. A related question is how big to make the offset field. This offset is bounded by the difference in time between the earliest and latest samples within a packet. Clearly, this itself is bounded by the pack- etization delay at the source. For this application, if we assume a 125us sample clock, and bound packetization delays to 100ms, the off- set field is bounded by 800 ticks, requiring 10 bits. 4.2.2 More Restrictive Case As a more restrictive case, we allow blocks to be present in a packet if their frame sizes are identical and aligned in time. Note that this does not imply identical codecs or identical block sizes in terms of bytes; many voice codecs operate with a 20ms or 50ms frame size. This case would allow all frame sizes of the same size and time alignment, independent of the codec, into a packet. This simplifies the timing issue tremendously. Now, the scenario is J.Rosenberg, H.Schulzrinne [Page 8] Internet Draft RTP Mux Issues October 1, 1998 much more like the single user application. The sequence numbers and the frame size completely determine the timing when at least one user is active. But, when all users enter silence, a global timestamp is needed to indicate the duration of the silence period. The global timestamp is sufficient to reconstruct the timing in the face of losses. Therefore, in this case, only a global timestamp is required. It is desirable to support a variety of different frame sizes within such an aggregated connection, however. The way to do this in this case is to simply mandate that different packets can contain differ- ent frame sizes; the only restriction is within a packet. This is not as simple as it may seem at first. Once this is done, the relation- ship between sequence numbers and timing is lost. Consider an exam- ple. There are two frame sizes, 10ms and 30ms. Packet N contains 10ms frames, as does packet N+1 and N+2, however, N+3 contains 30ms frames. Thus, although the difference in sequence number between the first and fourth is three, the relative timing is not 10ms*3 or 30ms*3. Due to this fact, the measurement of jitter is complicated (for the same reasons described in Section 4.3.1), as it should not be done between two packets with different frame sizes. It also makes recovery techniques based on sequence number more complex. To resolve this problem, we use a natural concept in RTP, which is the synchro- nization source (SSRC). The approach is to have a separate SSRC for each frame size in use. Then, sequence numbers are interpreted for each SSRC separately. This resolves the problem with the relationship between timing and sequence numbering. It also makes jitter and delay computations simpler - they are now done for each SSRC separately. Furthermore, multiple jitter (and delay, loss, etc.) values are reported to the source, one for each frame size. This is also desir- able, since the different frame sizes will cause different packetiza- tion delays and packet sizes, which may cause those packets to see different delays and losses in the network than other packets. This case has both advantages and drawbacks when compared to the gen- eral case. As an advantage, timing is greatly simplified, and the approach falls much in line with the original intentions of RTP. How- ever, it causes losses in efficiency for systems with a variety of different frame sizes in operation simultaneously. Such a situation arises naturally when flow control is applied to each source individ- ually, as opposed to altering the rate and codec type for all of the active sources. 4.3 Channel ID The question of channel identification may seem at first trivial - simply use a 32 bit number, much like the SSRC, and be done with it. However, 32 bits adds significant overhead. Reduction of the number of bits for the channel ID becomes a complex issue. Unlike the single J.Rosenberg, H.Schulzrinne [Page 9] Internet Draft RTP Mux Issues October 1, 1998 user case, the connection may remain active for long periods of time (days or months). The result is that channel IDs will need to be reused during the lifetime of the connection. It is critical to ensure that data from different channels is not confused because of this. Large channel ID spacing helps to resolve this issue (although it can not eliminate it), so an added side effect of reducing the number of channel IDs possible is an increase in the likelihood of such confusion. The first question to be addressed is how many simultaneous users can one expect to find in a single packet. 4.3.1 Number of Users There are several ways to come up with some minimums and maximums. 4.3.1.1 Delay-bound Clearly, as we add more users, the store and forward delays increase since the packet size gets larger. Therefore, if we bound the per-hop delay, and provide a lower bound on the codec bitrate and packetiza- tion delay, an upper bound on the number of users can be obtained. Consider a 2.4 kbps codec, with a 20ms frame size. This is a reason- able minimum combination. Next, consider 50ms store and forward delays. For a T1, this limits the number of users within a packet to 965. For a T3, it is 30 times this, or nearly 29,000. If silence sup- pression is used, the number of users within a packet is roughly half the number of active users (on average), thus requiring twice as many channel identifiers (1930 and 58,000). This bound doesn't seem to tight. Intuitively, even 965 seems too large. 4.3.1.2 Efficiency bound The entire purpose of multiplexing is to improve upon efficiency. Therefore, we should be able to support at least as many users as is necessary to get good efficiency. Consider a typical case, a 16 kbps codec, with a 20ms packetization delay. This results in 320 bits of data per user. If we assume IP/UDP/RTP (20+8+12=40 bytes = 320 bits), plus an additional word (32 bits) of overhead per user, the effi- ciency vs. N becomes: E = (320N / ((320 + 32)N + 320)) This reaches an asymptote of 90 percent of this, say 88 packet, so that we must support at least 14 active channels (again, due to stat mux). The lower bound, therefore, on the number of users is around 14. J.Rosenberg, H.Schulzrinne [Page 10] Internet Draft RTP Mux Issues October 1, 1998 4.3.1.3 MTU Bound In many cases, there is a maximum packet size. This is usually around 1500 bytes. If we consider a very low bitrate codec, the minimum block size from any particular user is 32 bits (otherwise, overheads become very large, and we lose word alignment, so 32 bits is a good minimum). Dividing 1500 bytes by 4 bytes, we obtain a maximum of 375 users. Multiplying by two, the number of active channels needed is around 750. Based on these bounds, we need to simultaneously support at least 10 users, and at most 750. This would imply that at least 8 to 10 bits of channel ID are required. 4.3.2 Channel ID Reuse Problem It is important to guarantee that data from a particular channel is never routed to a different channel; this would mean that a user may hear pieces of conversations from different users, an error we con- sider catastrophic. Such misrouting becomes possible when a channel is torn down, and a new channel is set up soon after using the same channel ID. Such a scenario is depicted in Figure 4. Sometime after channel K is torn down, a new channel is set up using the same chan- nel ID, K. If the data packets (dotted lines) are being delayed sig- nificantly, blocks from the old channel K may still be present in the data stream after the new channel K is established. These blocks will then be played out to the new user of channel K. Protocol support is needed to guarantee that this can never happen. The solution lies in an intelligent signaling protocol. The protocol must support a two-way handshake for all control messages. In addi- tion, three simple rules must be obeyed at a source when setting up or tearing down connections: 1. When a source sends a teardown message, it stops sending data in the UDP stream for that channel. Furthermore, in the sig- naling message, it indicates the sequence number of the packet which contained the last block for that channel, call this sequence number K. 2. A source cannot re-use a channel identifier until it has received an acknowledge from the destination that that partic- ular channel was successfully torn down. J.Rosenberg, H.Schulzrinne [Page 11] Internet Draft RTP Mux Issues October 1, 1998 | | t1 |------------- teardown K | |. --------------X| | .old K data |t2 | . -------| | ACK TD K --------- | t3 |X----------- | | . | | . | |------------- setup K | | .--------------X| | ....... |t4 | ACK SET K --------------| |X------------- .... | |...... ..X| | ........data new K | | .............X| Figure 4: Channel ID Reuse Problem 3. A source cannot send begin to send data from a particular channel in the UDP stream until it has received an acknowledge from the destination that the setup is complete. A few simple rules must also be used at the receiver: 1. When a receiver gets a teardown message, it checks the highest SN received so far (call this sequence number M). If M > K, the channel is torn down, and any further blocks containing that channel ID are discarded. If M < K, blocks from that channel are accepted until the received SN exceeds K. Once this happens, the channel is torn down and no further blocks with that channel ID are accepted. 2. When a setup message is received, the destination will begin to accept blocks with the given channel identifier, but only if the sequence numbers of the packets in which they ride is greater than K. The use of the sequence numbers allows the receiver to separate the old channel K blocks from the new ones. This guarantees that the destination will not misroute packets. An additional benefit is that the end of speech will not be clipped if the last data packets arrive after the teardown is received. This protocol is quite simple to implement, although it requires a table at the receiver of the values of K for each channel ID. Alternate solutions to this reuse problem exist which can operate when J.Rosenberg, H.Schulzrinne [Page 12] Internet Draft RTP Mux Issues October 1, 1998 the above restrictions are relaxed. The simplest approach is to have the source keep a linked list of free channel IDs. The list is initialized to contain all channel IDs, in order. When a new channel is required to be established, the channel ID is taken from the top of the list. When a channel is torn down, its ID is placed at the bottom of the list. This makes the time between channel ID reuse as long as possible, and reduces the probability of confusion. With this method, it is no longer neces- sary to include sequence numbers in the tear down messages. Also, the receiver does not need to maintain a table. 4.3.3 Channel ID Coding This section discusses some of the options for coding the channel ID field. 4.3.3.1 Fixed Length The fixed length approach is the most straightforward. A fixed number of bits is assigned to the channel ID. Issues surrounding the number of bits required have been discussed above. 4.3.3.2 Implicit + Present Mask In reality, the channel IDs are very redundant. Both source and des- tination know the set of active connections and their channel identi- fiers from the signalling messages. Therefore, if the blocks are placed in the packet in order of increasing channel ID, very little information actually needs to be sent. In fact, without silence sup- pression, channel activity and the presence of a block in a packet are likely to be equivalent, in which case NO information actually needs to be sent about channel IDs. Unfortunately, there are some practical problems with this. First, silence suppression is used. Secondly, even if it weren't, it is pos- sible for the voice codecs at the ITG not to have their framing syn- chronized (as in the general case above), so that a packet may not contain data from all users. Thirdly, the source and destination do NOT have a consistent view of the state of the system. There is a delay while signaling messages are in transit. A few simple mechanisms can be used to overcome these complexities. In the header of the packet, a mask is sent. Each bit in the mask indicates whether data from a channel is present in the packet or not. Mapping of channel ID's to bits is done by sorting the channel ID's, and mapping the lowest number to the first bit, next lowest to the second, etc. Therefore, if a channel has no data for that packet, its bit is set to zero. Given that the source and destination agree on how many connections are active at all points in time, the number J.Rosenberg, H.Schulzrinne [Page 13] Internet Draft RTP Mux Issues October 1, 1998 of bits required is known to both sides. The next step is to deal with the differences in state. An additional field, called the state-number, perhaps 5 bits, is sent in the header of the packet. This field starts at zero. Lets say at some point in time, its value is N. The source wishes to tear down a channel. It sends the tear down message to the destination, but continues to send data for that channel (or it may choose to send nothing, but must set the appropriate bit in the mask to zero). When the destination receives the message, it replies with an acknowledge. When the acknowledge is received by the source, the source considers the chan- nel torn down, and no longer sends data for it, nor considers it in computing the mask. In the packet where this happens, the source also increments the state-number field to N+1. The destination knows that the source will do this, and will therefore consider the state changed for all packets whose value of the field is N+1 or greater. When the next signaling message takes effect, the field is further increased. Even if packets are lost, the value of the state-number field for any correctly received packet completely tells the destina- tion the state of the system as seen in that packet. Furthermore, it is not necessary to wait for a particular setup or teardown to be acknowledged before requesting another setup or teardown. The number of bits for the state-number field should be set large enough to represent the maximum number of state changes which can have taken effect during a round trip time. As an alternative, an additional exchange can occur. After the destination receives a packet with state number greater than N, it destroys the state related to N, and sends back, reliably, a free-state N message, indi- cating to the destination that state N is now de-allocated, and can be used again. Until such a message is received, the source cannot reuse state N. This is essentially a window based flow control, where the flow is equal to changes in state. With this addition, the number of bits for the state number can be safely reduced, and it is guaran- teed that the destination will never confuse the state, independent of the number of state- number bits used. However, the use of too few state bits can cause call blocking or delay the teardown of inactive channels. This problem in state difference appears to be similar to the channel ID reuse problem described in Section 4.4.2. However, there is an important difference. In the channel ID reuse problem, if the packet containing the last block of a user arrives before the signaling mes- sage tearing down that connection, there is no problem. The destina- tion will generally play out silence until the signaling message is received. Here, however, the destination must know that blocks are no longer present in the data stream independent of when the signaling messages arrive. J.Rosenberg, H.Schulzrinne [Page 14] Internet Draft RTP Mux Issues October 1, 1998 There are some drawbacks to this approach. They require the source and destination to maintain state. Any error in processing at either end, or a hardware failure, causes a complete loss of synchroniza- tion. This hard-state nature of the protocol can be relaxed by having the source send the complete state of the system with each signaling message, along with the state-number field for which this state takes effect. This guarantees that even in the event of end- system fail- ure, the system state will be refreshed whenever a new connection is set up or torn down. Furthermore, the state can be sent periodically to improve performance. 4.4 Length Indicators There are many ways to actually code the length indicators. The first question, however, is the range of lengths which must be coded. 4.4.1 Range of Length Indicators Here, there is a clear tradeoff between flexibility and efficiency. A larger range can accommodate a variety of different media (such as video) where lengths may be large. However, this comes at the expense of a long length field, which may require another word of header to hold. For voice, one would expect a maximum bitrate to be 64 kbps, and around 50ms packetization delay. This yields exactly 100 words of data. Therefore, an eight bit field is probably sufficient for most voice applications. 4.4.2 PTI Based Lengths In many applications, the amount of data present depends on the voice codec in use. Frame based coders will generally send a frame at a time. Since the codec type is indicated by the PTI field, it may not always be necessary to send length information at all. Even for non- frame based codecs, such as PCM, default data sizes can be set in the standard (as in RFC 1890 [6]). An extension bit can be used to indi- cate a non-standard length, so that when set, a length field follows. This allows for efficient coding of the most common cases, but allows for variable lengths with little additional cost. 4.4.3 Variable Length w/ Indicator In this approach, a variable length header is used. All of the length indicators for all of the blocks are placed together in the beginning of the packet. However, the first four bits of this header field indicate the number of bits used for each length field. What follows are the length fields themselves, each using the number of bits indi- cated by the first four bits. This approach scales well, using a small overhead when the block lengths are small, and a larger J.Rosenberg, H.Schulzrinne [Page 15] Internet Draft RTP Mux Issues October 1, 1998 overhead when they are larger. The drawback is a variable length header field, plus additional complexity in the parsing. An example of this technique is depicted in Figure 5. In the first example, the four bit indicator field has a value of three, so that the length fields are all three bits long. The four lengths are then 2,6,3, and 8. In the second example, the 4 bit indicator has a value of two, so that the length fields are all two bits long. The four lengths are thus 3,2,1, and 3. 4b 3b 3b 3b 3b -------------------- Example 1 |0011|010|110|011|100| -------------------- 4b 2b 2b 2b 2b ---------------- Example 2 |0010|11|10|01|11| ---------------- Figure 5: Variable Length w/ Indicator 4.4.4 Remaining Packet Length Based Lengths UDP always informs RTP of how many bytes are in the payload. This itself restricts the possible length of the first block, since its length must be less than the total packet length minus the RTP header. Furthermore, as each block is placed into the packet, the possible set of lengths that it can have shrinks - it must always be less than the remaining length in the packet. This approach, there- fore, codes each length field with log2 of the number of bits remain- ing in the packet. This approach works extremely well when there is a long packet followed by several shorter ones, whereas the previous approach performs poorly in this case. Furthermore, it eliminates the length indicator present in the previous approach. However, it is even more complex than the previous technique. It can result in no savings under some conditions, especially since the header fields must be rounded to 32 bits. Consider an example. The total size of the packet is 31 words. Inside of it are three blocks, the first whose length is 17, the second 8, and the third, 6. We would code the length field with 5 bits. After this block is read, the remaining amount of data in the packet is 14 words. Therefore, the next length field is coded with 4 bits. After J.Rosenberg, H.Schulzrinne [Page 16] Internet Draft RTP Mux Issues October 1, 1998 this block, the remaining amount of data in the packet is 6 words, so the final length field is coded with three bits. The total is there- fore 5+4+3 = 12 bits. In the previous approach (Section 4.5.3), the entire length field would have required 4 bits for the indicator (whose value would be 5), followed by 3 five bit fields, for a total of 19 bits. One may question this example since the overhead of the length fields itself is not taken into account when computing the remaining length of the packet. While this can be incorporated, it makes things even more complex, and it is not actually necessary. All that is required is that the length fields are coded with log2(M), where M is any bound on the remaining amount of data which can be deterministically computed from past information. A simple bound is the packet length minus the data seen thus far (one can also subtract away any fixed length fields), precisely the metric used in the example above. 4.4.5 Table Based Approach Realistically, most systems will operate with codecs that generate data in a fixed set of lengths (a frame size, for example). In that case, the set of lengths which can appear in the packet are usually very restricted. To take advantage of this fact, a table can be transmitted to the receiver reliably before transmission commences. This table can indicate the actual length of a block, and its coding. The symbols transmitted in the data packets are then used in this table to look up the actual lengths. This can reduce the length field to 2 or 3 bits. These lengths then all occur next to each other in the header. The technique now relies on state at the receiver, and the parsing process is further complicated by table lookups. In addi- tion, the approach only works if you know the set of lengths before the system begins operation. If you allow the table to be dynamically modified during a session, synchronization problems occur, and the system becomes quite complex. Further gains can be achieved through the use of Huffman codes instead of fixed length codes This only makes sense when different codecs (and correspondingly different lengths) are used with differ- ent frequencies. An example of such a situation is when the codec changes to a higher rate because of music-on-hold; a rare event in general. 4.5 Marker Bit The marker bit has a general functionality, but is normally used to indicate the beginning of a talkspurt. It seems like a good idea to include this bit for each user. J.Rosenberg, H.Schulzrinne [Page 17] Internet Draft RTP Mux Issues October 1, 1998 4.6 Location of Per User Overhead There will generally be overhead on a per-user basis (information such as channel ID, length, etc.). This information can be located in one of three places. First, it can all reside in front of the block to which it is applicable. Second, it can all be pasted together and reside up front in the header of the packet. The third is a hybrid solution, where some of it resides up front (such as channel ID), and some resides in front of the data. There are various pros and cons to the different approaches. The hybrid approach can be complex, since data is split into multiple places. The case where all the header is up front has a few minor advantages. First, it allows for a complete separation of the data from the header. The implementation is likely to be a little less complex, since extracting blocks does not require actually moving through the payload. 5 Options 5.1 Option I: Mixer Based This option is the most straightforward to implement, but has the most overhead. The basic premise is to reuse the mixer concept intro- duced in RTP. Each user is considered a contributing source, and the gateway is considered a mixer. However, instead of mixing the media, separate data from each user appear in the payload. The 32 bit CSRC identifies each user, acting as the channel ID. Data from each user is organized into blocks. Each block has its own 32 bit header, which includes the length (12 bits) in units of 32 bit words, Marker bit (1b), TimeStamp Offset (12b), and Payload Type (7b). Furthermore, the payload type and marker bit are stricken from the RTP header (since they only make sense for an individual user), and the CC field expanded to fill the missing bytes. This allows for a 12 bit CC field, or 4096 users in a packet. Thus, the packet would look like: Figure 6: Option I This approach allows for the most amount of generality in terms of variable length coders and coders with different frame sizes (see Section 4.3.1). The channel ID is longer than necessary, but using the concept of a contributing source for the channel ID necessitates the use of the additional bits. There are several variations on option I, many of which have been mentioned above: I.A: Put the CSRC with each 32 bit length+M+PT field, instead of all of them being at the beginning. This has some pros and cons. As an interesting artifact of this change, it is no longer necessary to J.Rosenberg, H.Schulzrinne [Page 18] Internet Draft RTP Mux Issues October 1, 1998 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | contributing source (SSRC) identifier 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | contributing source (SSRC) identifier 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ .......... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | contributing source (SSRC) identifier N | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Timestamp Offset |M| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Timestamp Offset |M| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ have a CC field. The length passed up by UDP is sufficient to recover the point at where you stop checking for additional blocks from users in the payload. In fact, the length field in the last block is not strictly necessary either. I.B: Do the opposite of I.A. Put the length+M+PT field up front along with the CSRC fields, with the pattern being CSRC 1, length 1, CSRC 2, length 2, etc. Here again, the CC field is not strictly necessary. I.C: The CSRC field can be shrunk to 8 bits. This allows for either 4 or two channel IDs to be coded in the space of one word, whereas only one could in the current size of the field. I.D: The CSRC field can be shrunk to 16 bits. 5.2 Option II: One word header J.Rosenberg, H.Schulzrinne [Page 19] Internet Draft RTP Mux Issues October 1, 1998 This option eliminates the large channel ID field present in the pre- vious option. In the RTP header, the CC bit is set to zero, the marker bit has no meaning, and the payload type is TBD (possible uses include an indication of the number of blocks in the packet). The RTP timestamp corresponds to the generation of the first sample, among all blocks, enclosed in this packet. A one word header precedes each block of data. The number of blocks is known by parsing them until the end of the RTP packet. The one word field has a channel ID (8 bits), length (8 bits), Marker (1 bit), timestamp offset (11 bits), and payload type (4 bits). Channel ID number 255 is reserved, and causes the header to be expanded to allow for greater length, payload type, and possibly channel ID encodings. The specific format for this expanded header is for further study. Given the compacted payload type space, it may be a good idea to allow negotiation of the meaning for the payload type at the beginning of the connection. It may be worthwhile to expand the length field at the expense of the channel ID - this issue is for further study. The format of the packet is thus: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Timestamp Offset | CID |M| PTI | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Timestamp Offset | CID |M| PTI | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7: Option II J.Rosenberg, H.Schulzrinne [Page 20] Internet Draft RTP Mux Issues October 1, 1998 5.3 Option III - Restricted Case Option II has the advantage of being able to support multiple frame sizes within a single packet. However, it comes at the expense of a 32 bit header (which can be large for low bitrate codecs), and at a reduced payload type field. This option has a 16 bit header, but does not support different frame sizes within a packet. It therefore falls into the category described in Section 4.3.2. Of the 16 bit header, the first bit is an expand bit (to be described shortly), and the second bit is the marker bit. The following 6 bits indicate payload type, and the remaining 8 are for channel ID. When the expand bit is set, an additional 16 bits are present, which indicate the length of the block. When expand is clear, the length is derived from the pay- load type. Since there is no timestamp offset, all the blocks in the packet must be time aligned and have the same frame lengths. Differ- ent sized frames are supported by using a different SSRC for each frame length (see Section 4.3.2). In the RTP header, the CC field is always zero. The marker bits and payload type are undefined. The timestamp indicates the time of generation of the first sample of each block. SSRC is randomly chosen, but always different for each frame size. The block headers are all located at the beginning of the packet, and follow each other. If the total length of the fields is not a multi- ple of 32 bits, it is padded out to 32. The structure of the header is such that fields never break across packet boundaries. An example of such a packet is given in Figure 8. There are 7 blocks in this example. The first two have standard lengths based on the PT field. The next one uses the expansion bit to indicate the length. The fourth uses the PT field, the fifth the expansion bit, and the last two use the PT field. The last 16 bits of the header are padded out. Figure 8: Option III 5.4 Option IV - Stacked RTP This approach uses a duplicate of the RTP header as the per-block header. It is therefore extremely inefficient (12 bytes per block), but has several advantages: different media types can be mixed, since the timestamps are no longer related, and little processing is required if the sources being combined came from a single user RTP source. It also works well when one of the users is actually a mixer (for example, a conference bridge), since the CSRC can be used. Its main advantage is the reduction in overhead due to the IP and UDP headers. In addition to the standard RTP header, an additional header is required for length indication. This header has a number of 16 bit J.Rosenberg, H.Schulzrinne [Page 21] Internet Draft RTP Mux Issues October 1, 1998 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|M| PT | ID |E|M| PT | ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|M| PT | ID | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|M| PT | ID |E|M| PT | ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length |E|M| PT | ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|M| PT | ID | PAD | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 5 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 6 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 7 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ fields, each of which indicates a length for its corresponding block (including the 12 byte RTP header). The number of such 16 bit lengths fields is known by continuing to look for additional length fields J.Rosenberg, H.Schulzrinne [Page 22] Internet Draft RTP Mux Issues October 1, 1998 until the total length of the packet passed up from UDP has been accounted for. If an odd number of such length fields is required, then an additional 16 bits of padding is inserted to make the length header a multiple of 32 bits. The format of such a packet is given in Figure 9. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length 1 | Length 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length 3 | PAD | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 9: Option IV 5.4 Option V: Compacted This option uses the Implicit + Mask approach outlined in Section 4.4.3.2 to code the channel ID. In all other respects it is similar to Option III. Now, however, the per-block header can be reduced to one byte: 1 bit of expansion, 1 bit of marker, and 6 bits of payload type. Furthermore, the length field (present when the expansion bit J.Rosenberg, H.Schulzrinne [Page 23] Internet Draft RTP Mux Issues October 1, 1998 is set) is reduced to 8 bits from 16 in Option III. This reduction saves on space, but it also guarantees that fields remain aligned on byte boundaries. The mask bits are present in the beginning of the packet, and they are preceded by a 8 bit state-number. If the number of active channels is not a multiple of 32, the mask field is padded out to a full word. This approach is extremely efficient, but the channel identification procedure is more complex and requires addi- tional signaling support. A diagram of a typical packet for this option is given in Figure 10. The marker bits are indicated with lowercase ms. There are four active channels, each of which is present in this packet (all four mask bits would then be 1). The first block has a standard length, but the second has its expansion bit set, so that an 8 bit length field follows. The remaining two blocks have normal 8 bit headers. The last 24 bits of the header are padded to a word boundary. Figure 10: Option V 6 Comparison of Options In this section, the options are compared in terms of efficiency. Issues relating to complexity, scalability, and generality have already been discussed in previous sections. The analysis here con- sists of a series of tables, indicating the efficiency of each option for a variety of speech codecs. Several tables are included for dif- ferent numbers of users. 6.1 Specific Codecs In both Table 1 and Table 2, the efficiency vs. codec for all three options is tabulated. For G.711, G.726, G.728 and G.722, the frame size listed is a multiple of the actual frame size of the codec, which is too small to be sent one at a time. The efficiency is com- puted as the number of words of payload such a codec would occupy, times the number of users, divided by the total packet size (i.e., it does not consider inefficiencies due to padding the payload portion). Note that Option V is always superior in efficiency. The efficiencies are generally 1 to 10 percent apart. Table 1 considers the case where there are 10 users, and Table 2 considers the case where there are 24. Codec|rate|Frame(ms)| I |I.C |I.D | II | III | IV | V G.711| 64 | 20 |93.02 |94.56 |94.12 |95.24 |96.39 |90.50 |96.84 G.726| 32 | 20 |86.96 |89.69 |88.89 |90.91 |93.02 |82.64 |93.88 J.Rosenberg, H.Schulzrinne [Page 24] Internet Draft RTP Mux Issues October 1, 1998 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | State Num |m|m|m|m| Pad | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|M| PT |E|M| PT |E|M| PT |E|M| PT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|M| PT | PAD | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Block 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Block 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Block 3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Block 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ G.728| 16 | 18.75 |76.92 |81.30 |80.00 |83.33 |86.96 |70.42 |88.47 G.729| 8 | 10 |50.00 |56.60 |54.55 |60.00 |66.67 |41.67 |69.72 G.723|5.3 | 30 |62.50 |68.49 |66.67 |71.43 |76.92 |54.35 |79.33 G.723|6.3 | 30 |66.67 |72.29 |70.59 |75.00 |80.00 |58.82 |82.16 ITU | 4 | 20 |50.00 |56.60 |54.55 |60.00 |66.67 |41.67 |69.72 G.722| 64 | 15 |90.91 |92.88 |92.31 |93.75 |95.24 |87.72 |95.84 GSM F| 13 | 20 |75.00 |79.65 |78.26 |81.82 |85.71 |68.18 |87.35 IS54 |7.95| 20 |62.50 |68.49 |66.67 |71.43 |76.92 |54.35 |79.33 IS96 |8.5 | 20 |66.67 |72.29 |70.59 |75.00 |80.00 |58.82 |82.16 Table 1: 10 Users Codec|rate|Frame(ms)| I |I.C |I.D | II | III | IV | V J.Rosenberg, H.Schulzrinne [Page 25] Internet Draft RTP Mux Issues October 1, 1998 G.711| 64 | 20 |94.30 |96.00 |95.43 |96.58 |97.76 |91.34 |98.26 | G.726| 32 | 20 |89.22 |92.31 |91.25 |93.39 |95.62 |84.06 |96.57 | G.728| 16 | 18.75 |80.54 |85.71 |83.92 |87.59 |91.60 |72.51 |93.37 | G.729| 8 | 10 |55.38 |64.29 |61.02 |67.92 |76.60 |44.17 |80.87 | G.723| 5.3| 30 |67.42 |75.00 |72.29 |77.92 |84.51 |56.87 |87.57 | G.723| 6.3| 30 |71.29 |78.26 |75.79 |80.90 |86.75 |61.28 |89.42 | ITU | 4 | 20 |55.38 |64.29 |61.02 |67.92 |76.60 |44.17 |80.87 | G.722| 64 | 15 |92.54 |94.74 |93.99 |95.49 |97.04 |88.78 |97.69 | GSM F| 13 | 20 |78.83 |84.38 |82.44 |86.40 |90.76 |70.36 |92.69 | IS54 |7.95| 20 |67.42 |75.00 |72.29 |77.92 |84.51 |56.87 |87.57 | IS96 |8.5 | 20 |71.29 |78.26 |75.79 |80.90 |86.75 |61.28 |89.42 | Table 2: 24 Users 7 Authors' Addresses Jonathan Rosenberg Rm. 4C-526 Bell Laboratories, Lucent Technologies 101 Crawfords Corner Rd. Holmdel, NJ 07733 electronic mail: jdrosen@bell-labs.com Henning Schulzrinne Dept. of Computer Science Columbia University 1214 Amsterdam Avenue New York, NY 10027 USA electronic mail: schulzrinne@cs.columbia.edu 8 Bibliography [1] M. Handley and V. Jacobson, SDP: session description protocol, Request for Comments (Proposed Standard) 2327, Internet Engineering Task Force, Apr. 1998. [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: a transport protocol for real-time applications, Request for Comments (Proposed Standard) 1889, Internet Engineering Task Force, Jan. 1996. [3] B. Subbiah and S. Sengodan, User multiplexing in rtp payload between ip telephony gateways, (internet draft), Internet Engineering Task Force, Aug. 1998. Work in Progress. [4] J. Rosenberg and H. Schulzrinne, An RTP payload format for user J.Rosenberg, H.Schulzrinne [Page 26] Internet Draft RTP Mux Issues October 1, 1998 multiplexing, Internet Draft, Internet Engineering Task Force, May 1998. Work in progress. [5] K. Tanigawa, T. Hoshi, and K. Tsukada, An rtp simple multiplexing transfer method for internet telephony gateway, (internet draft), Internet Engineering Task Force, June 1998. Work in Progress. [6] H. Schulzrinne, RTP profile for audio and video conferences with minimal control, Request for Comments (Proposed Standard) 1890, Internet Engineering Task Force, Jan. 1996. J.Rosenberg, H.Schulzrinne [Page 27]