idnits 2.17.1 

draft-ietf-avt-muxissues-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-20) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 26
     longer pages, the longest (page 12) being 63 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 27 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There are 13 instances of too long lines in the document, the longest
     one being 11 characters in excess of 72.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 853 has weird spacing: '...ibuting  sourc...'

  == Line 855 has weird spacing: '...ibuting  sourc...'

  == Line 859 has weird spacing: '...ibuting  sourc...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 1, 1998) is 9333 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? '1' on line 1181 looks like a reference

  -- Missing reference section? '2' on line 1185 looks like a reference

  -- Missing reference section? '3' on line 1189 looks like a reference

  -- Missing reference section? '4' on line 1193 looks like a reference

  -- Missing reference section? '5' on line 1197 looks like a reference

  -- Missing reference section? '6' on line 1201 looks like a reference


     Summary: 10 errors (**), 0 flaws (~~), 7 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                        AVT Working Group
2	Internet Draft                                J.Rosenberg, H.Schulzrinne
3	draft-ietf-avt-muxissues-00.txt                     Bell Labs/Columbia U.
4	October 1, 1998
5	Expires: March 1999

7	                Issues and Options for RTP Multiplexing

9	STATUS OF THIS MEMO

11	   This document is an Internet-Draft. Internet-Drafts are working docu-
12	   ments of the Internet Engineering Task Force (IETF), its areas, and
13	   its working groups.  Note that other groups may also distribute work-
14	   ing documents as Internet-Drafts.

16	   Internet-Drafts are draft documents valid for a maximum of six months
17	   and may be updated, replaced, or obsoleted by other documents at any
18	   time.  It is inappropriate to use Internet-Drafts as reference mate-
19	   rial or to cite them other than as ``work in progress''.

21	   To learn the current status of any Internet-Draft, please check the
22	   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
23	   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
24	   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
25	   ftp.isi.edu (US West Coast).

27	   Distribution of this document is unlimited.

29	                                 ABSTRACT

31	        This memorandum discusses the issues and options involved
32	        in the design of a new transport protocol for multiplexed
33	        voice within a single packet. The intended application is
34	        the interconnection of devices which provide trunking or
35	        long distance telephone service over the Internet. Such
36	        devices have many voice connections simultaneously between
37	        them. Multiplexing them into the same connection improves
38	        on the efficiency, enables the use of low bitrate voice
39	        codecs, and improves scalability. Options and issues con-
40	        cerning timestamping, payload type identification, length
41	        indication, and channel identification are discussed. Sev-
42	        eral possible header formats are identified, and their
43	        efficiencies are compared.

45	1 Introduction

47	   Internet telephone gateways (ITGs) allow a public switched telephony
48	   user (PSTN) user to contact another PSTN user, with the long distance
49	   portion of the call routed over the Internet. Such a scenario is
50	   depicted in Figure 1.

52	        ~~~~~~~~    -------     ~~~~~~~~~~     -------    ~~~~~~~~
53	   A --|        |  |       |   |          |   |       |  |        |-- C
54	       |  PSTN  |--|  ITG  |---|  IP NET  |---|  ITG  |--|  PSTN  |
55	   B --|   X    |  |   J   |   |          |   |   K   |  |   Y    |-- D
56	        ~~~~~~~~    -------     ~~~~~~~~~~     -------    ~~~~~~~~

58	   Figure 1: Internet telephony gateway architecture

60	   Subscribers A and B connect to ITG J via their local telephone net-
61	   work, X. A wishes to speak with user C, and B wishes to speak with
62	   user D, both of which are connected to local phone network Y.  To
63	   complete the call, ITG J packetizes and transports the voice to and
64	   from A and B through the IP network, to remote gateway K. There, ITG
65	   K completes the calls to C and D through PSTN Y. This type of
66	   arrangement and common destination may be particularly common for
67	   connecting the PBXs of corporate branch offices across the Internet.

69	   In this scenario, ITGs J and K act as Internet hosts, which are
70	   effectively proxies for the telephone users connected to them. Unlike
71	   typical Internet telephony, however, their will often be multiple
72	   active calls between a pair of gateways, each representing a differ-
73	   ent pair of users. Gateways can signal calls using SIP [1], H.323 or
74	   proprietary signalling protocols. Media data is transported via a
75	   separate RTP [2] session for each user.

77	   We observe that using a separate RTP session for each user connected
78	   between a pair of gateways is wasteful. Rather, it would be more
79	   efficient to multiplex users between a pair of gateways into a single
80	   RTP session. A number of proposals have been made for RTP extensions
81	   to accomplish this multiplexing, [3][4] [5].

83	   This memo discusses some of the issues and options for multiplexing
84	   users within RTP between a pair of gateways. There are other applica-
85	   tions for RTP multiplexing, such as transport of RTP in a switched
86	   RTP network, depicted in Figure 2. In this scenario, an entity which
87	   we call in RTP Switch, receives some number of RTP muxed connections.
88	   It extracts the multiplexed payloads from each of the received multi-
89	   plex streams, switches the payloads, and generates a new set of RTP
90	   Multiplexed streams. These streams may be destined for other RTP
91	   Switches, or for telephony gateways.

93	   The switched scenario allows better network utilization. By allowing
94	   RTP multiplexing only between pairs of gateways, there is an effec-
95	   tive full mesh RTP network, with the number of multiplexed users
96	   between a pair of gateway potentially growing small with a large num-
97	   ber of gateways. An RTP Switched network would allow for greater mul-
98	   tiplexing. However, it comes at the significant cost of management,
99	   dynamic routing, and central point of failure requirements.

101	   These scenarios have differing requirements. In this document, we
102	   focus on the gateway to gateway case in Figure 1.

104	                -------                             --------
105	              |       | RTPMux  ---------  RTPMux |        |
106	              |  ITG  |--------|         |--------|  ITG 3 |
107	              |   1   |        |         |         --------
108	               -------         |         | RTPMux  --------
109	               -------         |  RTP    |--------|        |
110	              |       | RTPMux | Switch  |        |  ITG 4 |
111	              |  ITG  |--------|         |         --------
112	              |   2   |        |         | RTPMux  --------
113	               -------         |         |--------|        |
114	                               |         |        |  ITG 5 |
115	                                 ---------          --------

117	   Figure 2: Internet telephony gateway architecture

119	2 Terminology

121	     oUser: One of the individuals who has data within the RTP packet.

123	     oConnection: The point to point RTP session between two ITGs.

125	     oChannel: A virtual connection which is established by allowing a
126	      user to send data within a packet. There are many channels per
127	      connection - this represents the multiplexing.

129	     oChannel Identifier: A number which identifies a channel.

131	     oBlock: The section of the payload of a packet which contains data
132	      for a particular user.

134	3 Requirements

136	   The transport protocol must provide, at a minimum, the following
137	   functionality:

139	     1.   Delineation: Data from different users must be clearly delin-
140	          eated.

142	     2.   Identification. The channel to which the data belongs must be
143	          identified.

145	     3.   Variable lengths: The protocol should support variable length
146	          blocks from a particular user. This allows for variable rate
147	          codecs and adjustment of packetization delays.

149	     4.   Low overhead: Since the protocol is designed for low rate
150	          voice, it should have low overhead. This issue is extremely
151	          important. New coders are emerging which can support near toll
152	          quality at 5.3 kbps, and acceptable quality at rates even as
153	          low as 4 kbps. It is desirable to support such codecs, as they
154	          can reduce the cost of providing an ITG service. Furthermore,
155	          advances in coding technology indicate that it is desirable to
156	          send very low bitrate information (1 kbps or less) during
157	          silence periods, so that background noise can be reproduced
158	          well (as opposed to sending nothing). Support of such rates
159	          requires a protocol with low overhead.

161	     5.   Marker: A general purpose marker bit should be available for
162	          all users within the connection.

164	     6.   Payload Identification. The codec in use for each user should
165	          be indicated somehow. It is a requirement to allow for the
166	          coding type to change during the lifetime of a channel.

168	4 Issues

170	   The following section identifies a number of issues which have an
171	   impact on the design of the protocol. It also identifies a variety of
172	   options for providing the specific services of the protocol.

174	4.1 Payload type identification

176	   There are a number of ways to identify the coding of the payload.
177	   They include in-band static types, in-band dynamic types, or out-of-
178	   band. The in-band approaches are based on some kind of payload type
179	   identifier, the semantics of which are either known apriori (static),
180	   or signaled ahead of time (dynamic). The out of band techniques sig-
181	   nal a binding between the channel identifier and a coder at the
182	   beginning (or even during) the lifetime of the connection.

184	   With out-of-band signaling, synchronizing the signaling with the
185	   media stream is a major issue. The synchronization can be accom-
186	   plished with either timestamps of sequence numbers.

188	   One approach to performing the synchronization is as follows: The
189	   source sends a message reliably to the receiver, indicating that it
190	   will change codings at timestamp N, where N is some future timestamp
191	   (or SN). The N should be chosen far enough into the future to guaran-
192	   tee that the receiver will get the TCP message before time N. The
193	   farther away N is, the more robust the system becomes, but the source
194	   also loses its ability to adapt quickly. There are also several
195	   options for simple in- band signaling methods which can assist in
196	   error recovery. This is based on the assumption that it is better for
197	   the receiver to know that the encoding has changed (even though it
198	   doesn't know to what), than to know nothing. This avoids playing gar-
199	   bage out. A one or two bit coding sequence number can be used in the
200	   header. Such a number starts at zero. At the timestamp where the
201	   encoding changes, the SN increments, and stays incremented until the
202	   next change. In this fashion, we are guaranteed that the source will
203	   never play out data using the wrong coding type. Probably just one or
204	   two bits of this SN is necessary.

206	   Using in-band payload types allows the coding to be explicitly indi-
207	   cated for each packet. This eliminates synchronization problems,
208	   allows the sender to change encodings without out of band signaling.
209	   Its flexibility is the reason in-band payload types were used for
210	   generic RTP in the first place. By using dynamic types, the number of
211	   bits for the encoding can be reduced by limiting the number of codecs
212	   that can be used simultaneously during a session.

214	   Our conclusion is that it is desirable to have the PTI field in the
215	   payload (ie, in-band). This makes it possible to do more robust rate
216	   control, which becomes a significant issue when multiple connections
217	   are multiplexed together (and therefore the aggregate bitrate
218	   increases). It also makes sense to signal a table of encodings for
219	   the payload type at the beginning of the connection. Any particular
220	   pair of ITG will generally only support a few codecs. Therefore,
221	   dynamically setting the codings of the PTI bit makes a more compact
222	   representation possible without restricting the set of codecs which
223	   may be used.

225	4.2 Timestamps

227	   Timing is a very complex issue for the multiplexing protocol. The
228	   first question related to it is whether the protocol will support
229	   mixing of media derived from separate clocks (i.e., voice and video).
230	   Although doing this seems attractive, it is complex and in opposition
231	   to the philosophy under which RTP was developed. RTP explicitly
232	   states that separate media should be placed in separate RTP streams.
233	   This allows for different QoS to be requested for each media, and for
234	   clocks to be defined based on the media type. Furthermore, this pro-
235	   file is geared towards the aggregation of voice traffic generated
236	   from the POTS across the Internet. As a result, the only source of
237	   data is from a single, 125us clock.

239	   The next basic question is whether timestamps are needed globally,
240	   i.e., just one per packet independent of the number of users, or
241	   locally, whereby each user within a packet needs their own timestamp.
242	   A separate question is the representation of these timestamps in an
243	   efficient manner. When considering these questions, the criteria to
244	   keep in mind are:

246	     1.   Can silence periods be recovered correctly

248	     2.   Can resynchronization occur in the face of packet loss

250	     3.   What is the impact on playout buffering and jitter computation

252	The answer to this question depends on the desired capabilities of the
253	protocol. In the most general case, it is possible to have different
254	frame sizes for each user (for example, 20ms, 10ms, and 15ms) within the
255	same packet. These frames can be arbitrarily aligned in time with
256	respect to each other (i.e., the 20ms frame starts 5.3 ms after the
257	beginning of another user's 10 ms frame). The user can send packets off
258	at any point, containing data from those users whose frames have been
259	generated before the packet departure time. A somewhat more restrictive
260	capability is to allow for different frame sizes and time alignments,
261	but to require that any packet contains all the same frame sizes, all
262	aligned in time. The most restrictive case is to require separate RTP
263	sessions for users with different frame sizes. This requires a channel
264	to be torn down and re-setup when it changes codec. The desire to per-
265	form flow control on a channel-by- channel basis makes this approach
266	unacceptable, and it is not considered further.

268	4.2.1 General Case

270	   First consider the general case. Packets can contain frames from some
271	   or all of the users, and those frames are not the same length nor
272	   time aligned in any way. An example of such a scenario is depicted in
273	   Figure 3. In the figure, there are three sources, and the ti corre-
274	   spond to the times of packet emissions. When packets are lost, the
275	   variability in the amount and time alignment of data in each packet
276	   makes it impossible to reconstruct how much time had elapsed based
277	   solely on sequence numbers (such reconstruction IS possible in the
278	   single user case). Furthermore, the amount of time elapsed can easily
279	   vary from user to user, and therefore local timestamps are needed.

281	   The general case introduces further complications which have to do
282	   with jitter and delay computation. Such computations are needed for
283	   RTCP reporting and possibly for the estimation of network delays,
284	   used in dynamic playout buffers. In the single user case, the jitter
285	   is computed between each packet as:

287	   D(i,j) = (Rj - Ri) - (Sj - Si)

289	   Where the Ri correspond to the reception times at the receiver mea-
290	   sured in RTP time, and the Si are the RTP timestamps in the data
291	   packets. The delay is computed as the difference between the arrival
292	   time at the receiver and generation time, as indicated by the RTP
293	   timestamp.

295	   In the multiple user case, these definitions no longer make sense, as
296	   there is no single RTP timestamp any longer. Each arriving packet
297	   will have a single arriving time (Ri), but multiple sending times
298	   (Si,j) for each block j in the ith packet. There are a number of
299	   alternatives for delay and jitter computation in this case: compute
300	   such information for all users, compute such information for a single
301	   user, or generate a single delay and jitter estimate, but have it be
302	   based on information from all users. There are pros and cons to each
303	   approach.

305	   First of all, it is possible for different blocks to experience dif-
306	   ferent delays (and jitters) even though they are within the same
307	   packet. This is because the general scenario allows for significant
308	   variability, whereby blocks may either vary in size from packet to
309	   packet and within a packet, or not be transmitted immediately after
310	   their completion (the latter happens to source B in Figure 3). Thus,
311	   it is arguable they it may be desirable to perform adaptive playout
312	   buffering separately for each user, which would require the storage
313	   and computation of delays for each user.

315	   The second alternative is to compute the delays for a single user,
316	   and use that information to size all of the other playout buffers.
317	   This may be sub-optimal in terms of delay and loss, depending on what
318	   fraction of the total delay and jitter are introduced by the packeti-
319	   zation itself. There is a second disadvantage to this approach, how-
320	   ever. When that particular user enters a silence period, delay and
321	   jitter information is no longer being received, and so estimates of
322	   network delay stop adapting. This implies that delay estimates will
323	   be old for certain periods of time. An alternative is to change the
324	   user from which delay and jitter estimates are being collected.

326	   The third alternative is to compute delay estimates based on some
327	   measure derived from all of the users. There are several reasonable
328	   approaches. For example, the delay estimate can be computed as:

330	   Delay = maxj, Ri - Si,j
331	   which would yield a conservative estimate of the delay for some
332	   users. This approach requires storage of only a single set of delay
333	   information, although computation still grows with the number of
334	   users in a packet.

336	                   ---------------------------------
337	         Source A |           |           |
338	                   ---------------------------------
339	         Source B |    |    |    |    |    |    |
340	                   ---------------------------------
341	         Source C |     |     |     |     |     |
342	                   ---------------------------------

344	                        |   | |  |    |   ||    |
345	                       t1  t2t3 t4   t5  t6t7   tt8

347	            -------------------------------------- time                           /

349	   Figure 3: Global Timestamp Problem

351	   Sending local timestamps also requires extra bits in the block head-
352	   ers. It is possible, however, to use offsets for the local times-
353	   tamps. A global timestamp can be used in the RTP header (the field
354	   already exists), and each user has a modifier to indicate position in
355	   time relative to that timestamp.

357	   A related question is how big to make the offset field. This offset
358	   is bounded by the difference in time between the earliest and latest
359	   samples within a packet. Clearly, this itself is bounded by the pack-
360	   etization delay at the source. For this application, if we assume a
361	   125us sample clock, and bound packetization delays to 100ms, the off-
362	   set field is bounded by 800 ticks, requiring 10 bits.

364	4.2.2 More Restrictive Case

366	   As a more restrictive case, we allow blocks to be present in a packet
367	   if their frame sizes are identical and aligned in time. Note that
368	   this does not imply identical codecs or identical block sizes in
369	   terms of bytes; many voice codecs operate with a 20ms or 50ms frame
370	   size. This case would allow all frame sizes of the same size and time
371	   alignment, independent of the codec, into a packet.

373	   This simplifies the timing issue tremendously. Now, the scenario is
374	   much more like the single user application. The sequence numbers and
375	   the frame size completely determine the timing when at least one user
376	   is active. But, when all users enter silence, a global timestamp is
377	   needed to indicate the duration of the silence period. The global
378	   timestamp is sufficient to reconstruct the timing in the face of
379	   losses. Therefore, in this case, only a global timestamp is required.

381	   It is desirable to support a variety of different frame sizes within
382	   such an aggregated connection, however. The way to do this in this
383	   case is to simply mandate that different packets can contain differ-
384	   ent frame sizes; the only restriction is within a packet. This is not
385	   as simple as it may seem at first. Once this is done, the relation-
386	   ship between sequence numbers and timing is lost. Consider an exam-
387	   ple. There are two frame sizes, 10ms and 30ms. Packet N contains 10ms
388	   frames, as does packet N+1 and N+2, however, N+3 contains 30ms
389	   frames. Thus, although the difference in sequence number between the
390	   first and fourth is three, the relative timing is not 10ms*3 or
391	   30ms*3. Due to this fact, the measurement of jitter is complicated
392	   (for the same reasons described in Section 4.3.1), as it should not
393	   be done between two packets with different frame sizes. It also makes
394	   recovery techniques based on sequence number more complex. To resolve
395	   this problem, we use a natural concept in RTP, which is the synchro-
396	   nization source (SSRC). The approach is to have a separate SSRC for
397	   each frame size in use. Then, sequence numbers are interpreted for
398	   each SSRC separately. This resolves the problem with the relationship
399	   between timing and sequence numbering. It also makes jitter and delay
400	   computations simpler - they are now done for each SSRC separately.
401	   Furthermore, multiple jitter (and delay, loss, etc.)  values are
402	   reported to the source, one for each frame size. This is also desir-
403	   able, since the different frame sizes will cause different packetiza-
404	   tion delays and packet sizes, which may cause those packets to see
405	   different delays and losses in the network than other packets.

407	   This case has both advantages and drawbacks when compared to the gen-
408	   eral case. As an advantage, timing is greatly simplified, and the
409	   approach falls much in line with the original intentions of RTP. How-
410	   ever, it causes losses in efficiency for systems with a variety of
411	   different frame sizes in operation simultaneously. Such a situation
412	   arises naturally when flow control is applied to each source individ-
413	   ually, as opposed to altering the rate and codec type for all of the
414	   active sources.

416	4.3 Channel ID

418	   The question of channel identification may seem at first trivial -
419	   simply use a 32 bit number, much like the SSRC, and be done with it.
420	   However, 32 bits adds significant overhead. Reduction of the number
421	   of bits for the channel ID becomes a complex issue. Unlike the single
422	   user case, the connection may remain active for long periods of time
423	   (days or months). The result is that channel IDs will need to be
424	   reused during the lifetime of the connection. It is critical to
425	   ensure that data from different channels is not confused because of
426	   this. Large channel ID spacing helps to resolve this issue (although
427	   it can not eliminate it), so an added side effect of reducing the
428	   number of channel IDs possible is an increase in the likelihood of
429	   such confusion.

431	   The first question to be addressed is how many simultaneous users can
432	   one expect to find in a single packet.

434	4.3.1 Number of Users

436	   There are several ways to come up with some minimums and maximums.

438	4.3.1.1 Delay-bound

440	   Clearly, as we add more users, the store and forward delays increase
441	   since the packet size gets larger. Therefore, if we bound the per-hop
442	   delay, and provide a lower bound on the codec bitrate and packetiza-
443	   tion delay, an upper bound on the number of users can be obtained.
444	   Consider a 2.4 kbps codec, with a 20ms frame size. This is a reason-
445	   able minimum combination. Next, consider 50ms store and forward
446	   delays. For a T1, this limits the number of users within a packet to
447	   965. For a T3, it is 30 times this, or nearly 29,000. If silence sup-
448	   pression is used, the number of users within a packet is roughly half
449	   the number of active users (on average), thus requiring twice as many
450	   channel identifiers (1930 and 58,000). This bound doesn't seem to
451	   tight. Intuitively, even 965 seems too large.

453	4.3.1.2 Efficiency bound

455	   The entire purpose of multiplexing is to improve upon efficiency.
456	   Therefore, we should be able to support at least as many users as is
457	   necessary to get good efficiency. Consider a typical case, a 16 kbps
458	   codec, with a 20ms packetization delay. This results in 320 bits of
459	   data per user. If we assume IP/UDP/RTP (20+8+12=40 bytes = 320 bits),
460	   plus an additional word (32 bits) of overhead per user, the effi-
461	   ciency vs. N becomes:

463	   E = (320N / ((320 + 32)N + 320))

465	   This reaches an asymptote of 90 percent of this, say 88 packet, so
466	   that we must support at least 14 active channels (again, due to stat
467	   mux). The lower bound, therefore, on the number of users is around
468	   14.

470	4.3.1.3 MTU Bound

472	   In many cases, there is a maximum packet size. This is usually around
473	   1500 bytes. If we consider a very low bitrate codec, the minimum
474	   block size from any particular user is 32 bits (otherwise, overheads
475	   become very large, and we lose word alignment, so 32 bits is a good
476	   minimum). Dividing 1500 bytes by 4 bytes, we obtain a maximum of 375
477	   users. Multiplying by two, the number of active channels needed is
478	   around 750.

480	   Based on these bounds, we need to simultaneously support at least 10
481	   users, and at most 750. This would imply that at least 8 to 10 bits
482	   of channel ID are required.

484	4.3.2 Channel ID Reuse Problem

486	   It is important to guarantee that data from a particular channel is
487	   never routed to a different channel; this would mean that a user may
488	   hear pieces of conversations from different users, an error we con-
489	   sider catastrophic. Such misrouting becomes possible when a channel
490	   is torn down, and a new channel is set up soon after using the same
491	   channel ID. Such a scenario is depicted in Figure 4. Sometime after
492	   channel K is torn down, a new channel is set up using the same chan-
493	   nel ID, K. If the data packets (dotted lines) are being delayed sig-
494	   nificantly, blocks from the old channel K may still be present in the
495	   data stream after the new channel K is established. These blocks will
496	   then be played out to the new user of channel K. Protocol support is
497	   needed to guarantee that this can never happen.

499	   The solution lies in an intelligent signaling protocol. The protocol
500	   must support a two-way handshake for all control messages. In addi-
501	   tion, three simple rules must be obeyed at a source when setting up
502	   or tearing down connections:

504	     1.   When a source sends a teardown message, it stops sending data
505	          in the UDP stream for that channel. Furthermore, in the sig-
506	          naling message, it indicates the sequence number of the packet
507	          which contained the last block for that channel, call this
508	          sequence number K.

510	     2.   A source cannot re-use a channel identifier until it has
511	          received an acknowledge from the destination that that partic-
512	          ular channel was successfully torn down.

514	                         |                            |
515	                      t1 |------------- teardown K    |
516	                         |.            --------------X|
517	                         |  .old K data               |t2
518	                         |    .                -------|
519	                         |  ACK TD K  ---------       |
520	                      t3 |X-----------                |
521	                         |        .                   |
522	                         |          .                 |
523	                         |------------- setup K       |
524	                         |            .--------------X|
525	                         |              .......       |t4
526	                         |   ACK SET K  --------------|
527	                         |X-------------       ....   |
528	                         |......                   ..X|
529	                         |      ........data new K    |
530	                         |              .............X|

532	       Figure 4: Channel ID Reuse Problem

534	     3.   A source cannot send begin to send data from a particular
535	          channel in the UDP stream until it has received an acknowledge
536	          from the destination that the setup is complete.

538	A few simple rules must also be used at the receiver:

540	     1.   When a receiver gets a teardown message, it checks the highest
541	          SN received so far (call this sequence number M). If M > K,
542	          the channel is torn down, and any further blocks containing
543	          that channel ID are discarded. If M < K, blocks from that
544	          channel are accepted until the received SN exceeds K. Once
545	          this happens, the channel is torn down and no further blocks
546	          with that channel ID are accepted.

548	     2.   When a setup message is received, the destination will begin
549	          to accept blocks with the given channel identifier, but only
550	          if the sequence numbers of the packets in which they ride is
551	          greater than K.

553	The use of the sequence numbers allows the receiver to separate the old
554	channel K blocks from the new ones. This guarantees that the destination
555	will not misroute packets. An additional benefit is that the end of
556	speech will not be clipped if the last data packets arrive after the
557	teardown is received. This protocol is quite simple to implement,
558	although it requires a table at the receiver of the values of K for each
559	channel ID.

561	Alternate solutions to this reuse problem exist which can operate when
562	the above restrictions are relaxed. The simplest approach is to have the
563	source keep a linked list of free channel IDs. The list is initialized
564	to contain all channel IDs, in order. When a new channel is required to
565	be established, the channel ID is taken from the top of the list. When a
566	channel is torn down, its ID is placed at the bottom of the list. This
567	makes the time between channel ID reuse as long as possible, and reduces
568	the probability of confusion. With this method, it is no longer neces-
569	sary to include sequence numbers in the tear down messages. Also, the
570	receiver does not need to maintain a table.

572	4.3.3 Channel ID Coding

574	   This section discusses some of the options for coding the channel ID
575	   field.

577	4.3.3.1 Fixed Length

579	   The fixed length approach is the most straightforward. A fixed number
580	   of bits is assigned to the channel ID. Issues surrounding the number
581	   of bits required have been discussed above.

583	4.3.3.2 Implicit + Present Mask

585	   In reality, the channel IDs are very redundant. Both source and des-
586	   tination know the set of active connections and their channel identi-
587	   fiers from the signalling messages. Therefore, if the blocks are
588	   placed in the packet in order of increasing channel ID, very little
589	   information actually needs to be sent. In fact, without silence sup-
590	   pression, channel activity and the presence of a block in a packet
591	   are likely to be equivalent, in which case NO information actually
592	   needs to be sent about channel IDs.

594	   Unfortunately, there are some practical problems with this. First,
595	   silence suppression is used. Secondly, even if it weren't, it is pos-
596	   sible for the voice codecs at the ITG not to have their framing syn-
597	   chronized (as in the general case above), so that a packet may not
598	   contain data from all users. Thirdly, the source and destination do
599	   NOT have a consistent view of the state of the system. There is a
600	   delay while signaling messages are in transit.

602	   A few simple mechanisms can be used to overcome these complexities.
603	   In the header of the packet, a mask is sent. Each bit in the mask
604	   indicates whether data from a channel is present in the packet or
605	   not. Mapping of channel ID's to bits is done by sorting the channel
606	   ID's, and mapping the lowest number to the first bit, next lowest to
607	   the second, etc. Therefore, if a channel has no data for that packet,
608	   its bit is set to zero. Given that the source and destination agree
609	   on how many connections are active at all points in time, the number
610	   of bits required is known to both sides.

612	   The next step is to deal with the differences in state. An additional
613	   field, called the state-number, perhaps 5 bits, is sent in the header
614	   of the packet. This field starts at zero. Lets say at some point in
615	   time, its value is N. The source wishes to tear down a channel. It
616	   sends the tear down message to the destination, but continues to send
617	   data for that channel (or it may choose to send nothing, but must set
618	   the appropriate bit in the mask to zero). When the destination
619	   receives the message, it replies with an acknowledge. When the
620	   acknowledge is received by the source, the source considers the chan-
621	   nel torn down, and no longer sends data for it, nor considers it in
622	   computing the mask. In the packet where this happens, the source also
623	   increments the state-number field to N+1. The destination knows that
624	   the source will do this, and will therefore consider the state
625	   changed for all packets whose value of the field is N+1 or greater.
626	   When the next signaling message takes effect, the field is further
627	   increased. Even if packets are lost, the value of the state-number
628	   field for any correctly received packet completely tells the destina-
629	   tion the state of the system as seen in that packet. Furthermore, it
630	   is not necessary to wait for a particular setup or teardown to be
631	   acknowledged before requesting another setup or teardown.

633	   The number of bits for the state-number field should be set large
634	   enough to represent the maximum number of state changes which can
635	   have taken effect during a round trip time. As an alternative, an
636	   additional exchange can occur. After the destination receives a
637	   packet with state number greater than N, it destroys the state
638	   related to N, and sends back, reliably, a free-state N message, indi-
639	   cating to the destination that state N is now de-allocated, and can
640	   be used again. Until such a message is received, the source cannot
641	   reuse state N. This is essentially a window based flow control, where
642	   the flow is equal to changes in state. With this addition, the number
643	   of bits for the state number can be safely reduced, and it is guaran-
644	   teed that the destination will never confuse the state, independent
645	   of the number of state- number bits used. However, the use of too few
646	   state bits can cause call blocking or delay the teardown of inactive
647	   channels.

649	   This problem in state difference appears to be similar to the channel
650	   ID reuse problem described in Section 4.4.2. However, there is an
651	   important difference. In the channel ID reuse problem, if the packet
652	   containing the last block of a user arrives before the signaling mes-
653	   sage tearing down that connection, there is no problem. The destina-
654	   tion will generally play out silence until the signaling message is
655	   received. Here, however, the destination must know that blocks are no
656	   longer present in the data stream independent of when the signaling
657	   messages arrive.

659	   There are some drawbacks to this approach. They require the source
660	   and destination to maintain state. Any error in processing at either
661	   end, or a hardware failure, causes a complete loss of synchroniza-
662	   tion. This hard-state nature of the protocol can be relaxed by having
663	   the source send the complete state of the system with each signaling
664	   message, along with the state-number field for which this state takes
665	   effect. This guarantees that even in the event of end- system fail-
666	   ure, the system state will be refreshed whenever a new connection is
667	   set up or torn down. Furthermore, the state can be sent periodically
668	   to improve performance.

670	4.4 Length Indicators

672	   There are many ways to actually code the length indicators. The first
673	   question, however, is the range of lengths which must be coded.

675	4.4.1 Range of Length Indicators

677	   Here, there is a clear tradeoff between flexibility and efficiency. A
678	   larger range can accommodate a variety of different media (such as
679	   video) where lengths may be large. However, this comes at the expense
680	   of a long length field, which may require another word of header to
681	   hold. For voice, one would expect a maximum bitrate to be 64 kbps,
682	   and around 50ms packetization delay. This yields exactly 100 words of
683	   data. Therefore, an eight bit field is probably sufficient for most
684	   voice applications.

686	4.4.2 PTI Based Lengths

688	   In many applications, the amount of data present depends on the voice
689	   codec in use. Frame based coders will generally send a frame at a
690	   time. Since the codec type is indicated by the PTI field, it may not
691	   always be necessary to send length information at all. Even for non-
692	   frame based codecs, such as PCM, default data sizes can be set in the
693	   standard (as in RFC 1890 [6]). An extension bit can be used to indi-
694	   cate a non-standard length, so that when set, a length field follows.
695	   This allows for efficient coding of the most common cases, but allows
696	   for variable lengths with little additional cost.

698	4.4.3 Variable Length w/ Indicator

700	   In this approach, a variable length header is used. All of the length
701	   indicators for all of the blocks are placed together in the beginning
702	   of the packet. However, the first four bits of this header field
703	   indicate the number of bits used for each length field. What follows
704	   are the length fields themselves, each using the number of bits indi-
705	   cated by the first four bits. This approach scales well, using a
706	   small overhead when the block lengths are small, and a larger
707	   overhead when they are larger. The drawback is a variable length
708	   header field, plus additional complexity in the parsing. An example
709	   of this technique is depicted in Figure 5. In the first example, the
710	   four bit indicator field has a value of three, so that the length
711	   fields are all three bits long. The four lengths are then 2,6,3, and
712	   8. In the second example, the 4 bit indicator has a value of two, so
713	   that the length fields are all two bits long. The four lengths are
714	   thus 3,2,1, and 3.

716	                                   4b   3b 3b  3b  3b
717	                                  --------------------
718	                    Example 1   |0011|010|110|011|100|
719	                                  --------------------

721	                                     4b  2b 2b 2b 2b
722	                                    ----------------
723	                      Example 2   |0010|11|10|01|11|
724	                                    ----------------

726	   Figure 5: Variable Length w/ Indicator

728	4.4.4 Remaining Packet Length Based Lengths

730	   UDP always informs RTP of how many bytes are in the payload. This
731	   itself restricts the possible length of the first block, since its
732	   length must be less than the total packet length minus the RTP
733	   header. Furthermore, as each block is placed into the packet, the
734	   possible set of lengths that it can have shrinks - it must always be
735	   less than the remaining length in the packet. This approach, there-
736	   fore, codes each length field with log2 of the number of bits remain-
737	   ing in the packet. This approach works extremely well when there is a
738	   long packet followed by several shorter ones, whereas the previous
739	   approach performs poorly in this case. Furthermore, it eliminates the
740	   length indicator present in the previous approach. However, it is
741	   even more complex than the previous technique. It can result in no
742	   savings under some conditions, especially since the header fields
743	   must be rounded to 32 bits.

745	   Consider an example. The total size of the packet is 31 words. Inside
746	   of it are three blocks, the first whose length is 17, the second 8,
747	   and the third, 6. We would code the length field with 5 bits. After
748	   this block is read, the remaining amount of data in the packet is 14
749	   words. Therefore, the next length field is coded with 4 bits. After
750	   this block, the remaining amount of data in the packet is 6 words, so
751	   the final length field is coded with three bits. The total is there-
752	   fore 5+4+3 = 12 bits. In the previous approach (Section 4.5.3), the
753	   entire length field would have required 4 bits for the indicator
754	   (whose value would be 5), followed by 3 five bit fields, for a total
755	   of 19 bits.

757	   One may question this example since the overhead of the length fields
758	   itself is not taken into account when computing the remaining length
759	   of the packet. While this can be incorporated, it makes things even
760	   more complex, and it is not actually necessary. All that is required
761	   is that the length fields are coded with log2(M), where M is any
762	   bound on the remaining amount of data which can be deterministically
763	   computed from past information. A simple bound is the packet length
764	   minus the data seen thus far (one can also subtract away any fixed
765	   length fields), precisely the metric used in the example above.

767	4.4.5 Table Based Approach

769	   Realistically, most systems will operate with codecs that generate
770	   data in a fixed set of lengths (a frame size, for example). In that
771	   case, the set of lengths which can appear in the packet are usually
772	   very restricted. To take advantage of this fact, a table can be
773	   transmitted to the receiver reliably before transmission commences.
774	   This table can indicate the actual length of a block, and its coding.
775	   The symbols transmitted in the data packets are then used in this
776	   table to look up the actual lengths. This can reduce the length field
777	   to 2 or 3 bits. These lengths then all occur next to each other in
778	   the header. The technique now relies on state at the receiver, and
779	   the parsing process is further complicated by table lookups. In addi-
780	   tion, the approach only works if you know the set of lengths before
781	   the system begins operation. If you allow the table to be dynamically
782	   modified during a session, synchronization problems occur, and the
783	   system becomes quite complex.

785	   Further gains can be achieved through the use of Huffman codes
786	   instead of fixed length codes This only makes sense when different
787	   codecs (and correspondingly different lengths) are used with differ-
788	   ent frequencies. An example of such a situation is when the codec
789	   changes to a higher rate because of music-on-hold; a rare event in
790	   general.

792	4.5 Marker Bit

794	   The marker bit has a general functionality, but is normally used to
795	   indicate the beginning of a talkspurt. It seems like a good idea to
796	   include this bit for each user.

798	4.6 Location of Per User Overhead

800	   There will generally be overhead on a per-user basis (information
801	   such as channel ID, length, etc.). This information can be located in
802	   one of three places. First, it can all reside in front of the block
803	   to which it is applicable. Second, it can all be pasted together and
804	   reside up front in the header of the packet. The third is a hybrid
805	   solution, where some of it resides up front (such as channel ID), and
806	   some resides in front of the data. There are various pros and cons to
807	   the different approaches. The hybrid approach can be complex, since
808	   data is split into multiple places. The case where all the header is
809	   up front has a few minor advantages. First, it allows for a complete
810	   separation of the data from the header. The implementation is likely
811	   to be a little less complex, since extracting blocks does not require
812	   actually moving through the payload.

814	5 Options

816	5.1 Option I: Mixer Based

818	   This option is the most straightforward to implement, but has the
819	   most overhead. The basic premise is to reuse the mixer concept intro-
820	   duced in RTP. Each user is considered a contributing source, and the
821	   gateway is considered a mixer. However, instead of mixing the media,
822	   separate data from each user appear in the payload. The 32 bit CSRC
823	   identifies each user, acting as the channel ID. Data from each user
824	   is organized into blocks. Each block has its own 32 bit header, which
825	   includes the length (12 bits) in units of 32 bit words, Marker bit
826	   (1b), TimeStamp Offset (12b), and Payload Type (7b). Furthermore, the
827	   payload type and marker bit are stricken from the RTP header (since
828	   they only make sense for an individual user), and the CC field
829	   expanded to fill the missing bytes. This allows for a 12 bit CC
830	   field, or 4096 users in a packet. Thus, the packet would look like:

832	   Figure 6: Option I

834	   This approach allows for the most amount of generality in terms of
835	   variable length coders and coders with different frame sizes (see
836	   Section 4.3.1). The channel ID is longer than necessary, but using
837	   the concept of a contributing source for the channel ID necessitates
838	   the use of the additional bits. There are several variations on
839	   option I, many of which have been mentioned above:

841	   I.A: Put the CSRC with each 32 bit length+M+PT field, instead of all
842	   of them being at the beginning. This has some pros and cons. As an
843	   interesting artifact of this change, it is no longer necessary to
844	         0                   1                   2                   3
845	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
846	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
847	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
848	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
849	      |                           timestamp                           |
850	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
851	      |           synchronization source (SSRC) identifier            |
852	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
853	      |           contributing  source (SSRC) identifier  1           |
854	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
855	      |           contributing  source (SSRC) identifier  2           |
856	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
857	                                              ..........
858	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
859	      |           contributing  source (SSRC) identifier  N           |
860	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
861	      |           Length      |      Timestamp Offset |M|             |
862	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
863	      |                                                               |
864	      |                        Payload 1                              |
865	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
866	      |           Length      |      Timestamp Offset |M|             |
867	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
868	      |                                                               |
869	      |                        Payload 2                              |
870	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

872	   have a CC field. The length passed up by UDP is sufficient to recover
873	   the point at where you stop checking for additional blocks from users
874	   in the payload. In fact, the length field in the last block is not
875	   strictly necessary either.

877	   I.B: Do the opposite of I.A. Put the length+M+PT field up front along
878	   with the CSRC fields, with the pattern being CSRC 1, length 1, CSRC
879	   2, length 2, etc. Here again, the CC field is not strictly necessary.

881	   I.C: The CSRC field can be shrunk to 8 bits. This allows for either 4
882	   or two channel IDs to be coded in the space of one word, whereas only
883	   one could in the current size of the field.

885	   I.D: The CSRC field can be shrunk to 16 bits.

887	5.2 Option II: One word header
888	   This option eliminates the large channel ID field present in the pre-
889	   vious option. In the RTP header, the CC bit is set to zero, the
890	   marker bit has no meaning, and the payload type is TBD (possible uses
891	   include an indication of the number of blocks in the packet). The RTP
892	   timestamp corresponds to the generation of the first sample, among
893	   all blocks, enclosed in this packet. A one word header precedes each
894	   block of data. The number of blocks is known by parsing them until
895	   the end of the RTP packet. The one word field has a channel ID (8
896	   bits), length (8 bits), Marker (1 bit), timestamp offset (11 bits),
897	   and payload type (4 bits). Channel ID number 255 is reserved, and
898	   causes the header to be expanded to allow for greater length, payload
899	   type, and possibly channel ID encodings. The specific format for this
900	   expanded header is for further study. Given the compacted payload
901	   type space, it may be a good idea to allow negotiation of the meaning
902	   for the payload type at the beginning of the connection. It may be
903	   worthwhile to expand the length field at the expense of the channel
904	   ID - this issue is for further study.

906	   The format of the packet is thus:

908	         0                   1                   2                   3
909	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
910	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
911	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
912	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
913	      |                           timestamp                           |
914	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
915	      |           synchronization source (SSRC) identifier            |
916	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
917	      |    Length     |    Timestamp Offset |    CID        |M|  PTI  |
918	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
919	      |                                                               |
920	      |                        Payload 1                              |
921	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
922	      |    Length     |    Timestamp Offset |    CID        |M|  PTI  |
923	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
924	      |                                                               |
925	      |                        Payload 2                              |
926	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

928	   Figure 7: Option II

930	5.3 Option III - Restricted Case

932	   Option II has the advantage of being able to support multiple frame
933	   sizes within a single packet. However, it comes at the expense of a
934	   32 bit header (which can be large for low bitrate codecs), and at a
935	   reduced payload type field. This option has a 16 bit header, but does
936	   not support different frame sizes within a packet. It therefore falls
937	   into the category described in Section 4.3.2. Of the 16 bit header,
938	   the first bit is an expand bit (to be described shortly), and the
939	   second bit is the marker bit. The following 6 bits indicate payload
940	   type, and the remaining 8 are for channel ID. When the expand bit is
941	   set, an additional 16 bits are present, which indicate the length of
942	   the block. When expand is clear, the length is derived from the pay-
943	   load type. Since there is no timestamp offset, all the blocks in the
944	   packet must be time aligned and have the same frame lengths. Differ-
945	   ent sized frames are supported by using a different SSRC for each
946	   frame length (see Section 4.3.2). In the RTP header, the CC field is
947	   always zero. The marker bits and payload type are undefined. The
948	   timestamp indicates the time of generation of the first sample of
949	   each block. SSRC is randomly chosen, but always different for each
950	   frame size.

952	   The block headers are all located at the beginning of the packet, and
953	   follow each other. If the total length of the fields is not a multi-
954	   ple of 32 bits, it is padded out to 32. The structure of the header
955	   is such that fields never break across packet boundaries. An example
956	   of such a packet is given in Figure 8. There are 7 blocks in this
957	   example. The first two have standard lengths based on the PT field.
958	   The next one uses the expansion bit to indicate the length. The
959	   fourth uses the PT field, the fifth the expansion bit, and the last
960	   two use the PT field. The last 16 bits of the header are padded out.

962	   Figure 8: Option III

964	   5.4 Option IV - Stacked RTP

966	   This approach uses a duplicate of the RTP header as the per-block
967	   header. It is therefore extremely inefficient (12 bytes per block),
968	   but has several advantages: different media types can be mixed, since
969	   the timestamps are no longer related, and little processing is
970	   required if the sources being combined came from a single user RTP
971	   source. It also works well when one of the users is actually a mixer
972	   (for example, a conference bridge), since the CSRC can be used. Its
973	   main advantage is the reduction in overhead due to the IP and UDP
974	   headers. In addition to the standard RTP header, an additional header
975	   is required for length indication. This header has a number of 16 bit
976	         0                   1                   2                   3
977	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
978	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
979	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
980	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
981	      |                           timestamp                           |
982	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
983	      |           synchronization source (SSRC) identifier            |
984	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
985	      |E|M|     PT    |      ID       |E|M|    PT     |      ID       |
986	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
987	      |E|M|     PT    |      ID       |            Length             |
988	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
989	      |E|M|     PT    |      ID       |E|M|    PT     |      ID       |
990	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
991	      |         Length                |E|M|    PT     |      ID       |
992	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
993	      |E|M|     PT    |      ID       |              PAD              |
994	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
995	      |                                                               |
996	      |                        Payload 1                              |
997	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
998	      |                                                               |
999	      |                        Payload 2                              |
1000	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1001	      |                                                               |
1002	      |                        Payload 3                              |
1003	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1004	      |                                                               |
1005	      |                        Payload 4                              |
1006	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1007	      |                                                               |
1008	      |                        Payload 5                              |
1009	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1010	      |                                                               |
1011	      |                        Payload 6                              |
1012	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1013	      |                                                               |
1014	      |                        Payload 7                              |
1015	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1017	   fields, each of which indicates a length for its corresponding block
1018	   (including the 12 byte RTP header). The number of such 16 bit lengths
1019	   fields is known by continuing to look for additional length fields
1020	   until the total length of the packet passed up from UDP has been
1021	   accounted for. If an odd number of such length fields is required,
1022	   then an additional 16 bits of padding is inserted to make the length
1023	   header a multiple of 32 bits.

1025	   The format of such a packet is given in Figure 9.

1027	         0                   1                   2                   3
1028	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1029	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1030	      |           Length 1            |         Length 2              |
1031	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1032	      |           Length 3            |            PAD                |
1033	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1034	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
1035	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1036	      |                           timestamp                           |
1037	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1038	      |           synchronization source (SSRC) identifier            |
1039	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1040	      |                                                               |
1041	      |                        Payload 1                              |
1042	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1043	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
1044	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1045	      |                           timestamp                           |
1046	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1047	      |           synchronization source (SSRC) identifier            |
1048	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1049	      |                                                               |
1050	      |                        Payload 2                              |
1051	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1053	   Figure 9: Option IV

1055	5.4 Option V: Compacted

1057	   This option uses the Implicit + Mask approach outlined in Section
1058	   4.4.3.2 to code the channel ID. In all other respects it is similar
1059	   to Option III. Now, however, the per-block header can be reduced to
1060	   one byte: 1 bit of expansion, 1 bit of marker, and 6 bits of payload
1061	   type. Furthermore, the length field (present when the expansion bit
1062	   is set) is reduced to 8 bits from 16 in Option III. This reduction
1063	   saves on space, but it also guarantees that fields remain aligned on
1064	   byte boundaries. The mask bits are present in the beginning of the
1065	   packet, and they are preceded by a 8 bit state-number. If the number
1066	   of active channels is not a multiple of 32, the mask field is padded
1067	   out to a full word. This approach is extremely efficient, but the
1068	   channel identification procedure is more complex and requires addi-
1069	   tional signaling support.

1071	   A diagram of a typical packet for this option is given in Figure 10.
1072	   The marker bits are indicated with lowercase ms. There are four
1073	   active channels, each of which is present in this packet (all four
1074	   mask bits would then be 1). The first block has a standard length,
1075	   but the second has its expansion bit set, so that an 8 bit length
1076	   field follows. The remaining two blocks have normal 8 bit headers.
1077	   The last 24 bits of the header are padded to a word boundary.

1079	   Figure 10: Option V

1081	6 Comparison of Options

1083	   In this section, the options are compared in terms of efficiency.
1084	   Issues relating to complexity, scalability, and generality have
1085	   already been discussed in previous sections. The analysis here con-
1086	   sists of a series of tables, indicating the efficiency of each option
1087	   for a variety of speech codecs. Several tables are included for dif-
1088	   ferent numbers of users.

1090	   6.1 Specific Codecs

1092	   In both Table 1 and Table 2, the efficiency vs. codec for all three
1093	   options is tabulated. For G.711, G.726, G.728 and G.722, the frame
1094	   size listed is a multiple of the actual frame size of the codec,
1095	   which is too small to be sent one at a time. The efficiency is com-
1096	   puted as the number of words of payload such a codec would occupy,
1097	   times the number of users, divided by the total packet size (i.e., it
1098	   does not consider inefficiencies due to padding the payload portion).
1099	   Note that Option V is always superior in efficiency. The efficiencies
1100	   are generally 1 to 10 percent apart. Table 1 considers the case where
1101	   there are 10 users, and Table 2 considers the case where there are
1102	   24.

1104	   Codec|rate|Frame(ms)|   I  |I.C   |I.D   |  II  | III  |  IV  | V
1105	   G.711| 64 |   20    |93.02 |94.56 |94.12 |95.24 |96.39 |90.50 |96.84
1106	   G.726| 32 |   20    |86.96 |89.69 |88.89 |90.91 |93.02 |82.64 |93.88
1107	         0                   1                   2                   3
1108	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1109	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1110	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
1111	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1112	      |                           timestamp                           |
1113	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1114	      |           synchronization source (SSRC) identifier            |
1115	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1116	      |  State Num    |m|m|m|m|               Pad                     |
1117	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1118	      |E|M|    PT     |E|M|      PT   |E|M|     PT    |E|M|   PT      |
1119	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1120	      |E|M|    PT     |                   PAD                         |
1121	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1122	      |                           timestamp                           |
1123	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1124	      |                                                               |
1125	      |                        Block 1                                |
1126	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1127	      |                        Block 2                                |
1128	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1129	      |                        Block 3                                |
1130	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1131	      |                                                               |
1132	      |                        Block 4                                |
1133	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1135	   G.728| 16 |  18.75  |76.92 |81.30 |80.00 |83.33 |86.96 |70.42 |88.47
1136	   G.729|  8 |   10    |50.00 |56.60 |54.55 |60.00 |66.67 |41.67 |69.72
1137	   G.723|5.3 |   30    |62.50 |68.49 |66.67 |71.43 |76.92 |54.35 |79.33
1138	   G.723|6.3 |   30    |66.67 |72.29 |70.59 |75.00 |80.00 |58.82 |82.16
1139	   ITU  | 4  |   20    |50.00 |56.60 |54.55 |60.00 |66.67 |41.67 |69.72
1140	   G.722| 64 |   15    |90.91 |92.88 |92.31 |93.75 |95.24 |87.72 |95.84
1141	   GSM F| 13 |   20    |75.00 |79.65 |78.26 |81.82 |85.71 |68.18 |87.35
1142	   IS54 |7.95|   20    |62.50 |68.49 |66.67 |71.43 |76.92 |54.35 |79.33
1143	   IS96 |8.5 |   20    |66.67 |72.29 |70.59 |75.00 |80.00 |58.82 |82.16

1145	                          Table 1: 10 Users

1147	   Codec|rate|Frame(ms)|   I  |I.C   |I.D   |  II  | III  |  IV  | V
1148	   G.711| 64 |   20    |94.30 |96.00 |95.43 |96.58 |97.76 |91.34 |98.26 |
1149	   G.726| 32 |   20    |89.22 |92.31 |91.25 |93.39 |95.62 |84.06 |96.57 |
1150	   G.728| 16 |  18.75  |80.54 |85.71 |83.92 |87.59 |91.60 |72.51 |93.37 |
1151	   G.729|  8 |   10    |55.38 |64.29 |61.02 |67.92 |76.60 |44.17 |80.87 |
1152	   G.723| 5.3|   30    |67.42 |75.00 |72.29 |77.92 |84.51 |56.87 |87.57 |
1153	   G.723| 6.3|   30    |71.29 |78.26 |75.79 |80.90 |86.75 |61.28 |89.42 |
1154	   ITU  | 4  |   20    |55.38 |64.29 |61.02 |67.92 |76.60 |44.17 |80.87 |
1155	   G.722| 64 |   15    |92.54 |94.74 |93.99 |95.49 |97.04 |88.78 |97.69 |
1156	   GSM F| 13 |   20    |78.83 |84.38 |82.44 |86.40 |90.76 |70.36 |92.69 |
1157	   IS54 |7.95|   20    |67.42 |75.00 |72.29 |77.92 |84.51 |56.87 |87.57 |
1158	   IS96 |8.5 |   20    |71.29 |78.26 |75.79 |80.90 |86.75 |61.28 |89.42 |

1160	                         Table 2: 24 Users

1162	7 Authors' Addresses

1164	   Jonathan Rosenberg
1165	   Rm. 4C-526
1166	   Bell Laboratories, Lucent Technologies
1167	   101 Crawfords Corner Rd.
1168	   Holmdel, NJ 07733
1169	   electronic mail:  jdrosen@bell-labs.com

1171	   Henning Schulzrinne
1172	   Dept. of Computer Science
1173	   Columbia University
1174	   1214 Amsterdam Avenue
1175	   New York, NY 10027
1176	   USA
1177	   electronic mail:  schulzrinne@cs.columbia.edu

1179	8 Bibliography

1181	   [1] M. Handley and V. Jacobson, SDP: session description protocol,
1182	   Request for Comments (Proposed Standard) 2327, Internet Engineering
1183	   Task Force, Apr.  1998.

1185	   [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: a
1186	   transport protocol for real-time applications, Request for Comments
1187	   (Proposed Standard) 1889, Internet Engineering Task Force, Jan. 1996.

1189	   [3] B. Subbiah and S. Sengodan, User multiplexing in rtp payload
1190	   between ip telephony gateways, (internet draft), Internet Engineering
1191	   Task Force, Aug.  1998.  Work in Progress.

1193	   [4] J. Rosenberg and H. Schulzrinne, An RTP payload format for user
1194	   multiplexing, Internet Draft, Internet Engineering Task Force, May
1195	   1998.  Work in progress.

1197	   [5] K. Tanigawa, T. Hoshi, and K. Tsukada, An rtp simple multiplexing
1198	   transfer method for internet telephony gateway, (internet draft),
1199	   Internet Engineering Task Force, June 1998.  Work in Progress.

1201	   [6] H. Schulzrinne, RTP profile for audio and video conferences with
1202	   minimal control, Request for Comments (Proposed Standard) 1890,
1203	   Internet Engineering Task Force, Jan. 1996.