idnits 2.17.1 

draft-ietf-avt-mpeg4-simple-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 39 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The "Author's Address" (or "Authors' Addresses") section title is
     misspelled.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     This mode is signaled by mode=CELP-cbr. In this mode one or more
     complete CELP frames of fixed size can be transported in one RTP packet;
     there is no support for interleaving. The RTP payload consists of one or
     more concatenated CELP frames, each of the same size. CELP frames MUST
     not be fragmented when using this mode. Both the AU Header Section and
     the Auxiliary Section MUST be empty.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     This mode is signaled by mode=CELP-vbr. With this mode one or more
     complete CELP frames of variable size can be transported in one RTP
     packet with optional interleaving. As CELP frames are very small, while
     the largest possible AU-size in this mode is greater than the maximum
     CELP frame size, there is no support for fragmentation of CELP frames.
     Hence CELP frames MUST not be fragmented when using this mode.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     This mode is signaled by mode=AAC-lbr. This mode supports transport
     of one or more complete AAC frames of variable size. In this mode the AAC
     frames are allowed to be interleaved and hence receivers MUST support
     de-interleaving. The maximum size of an AAC frame in this mode is 63
     octets. CELP frames MUST not be fragmented when using this mode.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 2003) is 7620 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0' is mentioned on line 1875, but not defined

  == Missing Reference: '9' is mentioned on line 1777, but not defined

  == Missing Reference: '10' is mentioned on line 1817, but not defined

  == Missing Reference: '11' is mentioned on line 1880, but not defined

  == Missing Reference: '15' is mentioned on line 1881, but not defined

  == Missing Reference: '19' is mentioned on line 1882, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550)

  ** Obsolete normative reference: RFC 3016 (ref. '5') (Obsoleted by RFC 6416)

  ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566)

  ** Downref: Normative reference to an Experimental RFC: RFC 2974 (ref. '7')

  ** Obsolete normative reference: RFC 2326 (ref. '8') (Obsoleted by RFC 7826)


     Summary: 8 errors (**), 0 flaws (~~), 12 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                         J. van der Meer
2	Internet Draft                                      Philips Electronics
3	                                                              D. Mackie
4	                                                         Apple Computer
5	                                                         V. Swaminathan
6	                                                  Sun Microsystems Inc.
7	                                                              D. Singer
8	                                                         Apple Computer
9	                                                             P. Gentric
10	                                                    Philips Electronics

12	                                                          December 2002
13	                                                      Expires June 2003

15	   Document: draft-ietf-avt-mpeg4-simple-05.txt

17	   Transport of MPEG-4 Elementary Streams

19	Status of this Memo

21	   This document is an Internet-Draft and is in full conformance with
22	   all provisions of section 10 of RFC2026.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF), its areas, and its working groups. Note that
26	   other groups may also distribute working documents as Internet-
27	   Drafts. Internet-Drafts are draft documents valid for a maximum of
28	   six months and may be updated, replaced, or obsoleted by other
29	   documents at any time. It is inappropriate to use Internet- Drafts
30	   as reference material or to cite them other than as "work in
31	   progress."

33	   The list of current Internet-Drafts can be accessed at
34	   http://www.ietf.org/ietf/1id-abstracts.txt
35	   The list of Internet-Draft Shadow Directories can be accessed at
36	   http://www.ietf.org/shadow.html.

38	   This specification is a product of the Audio/Video Transport working
39	   group within the Internet Engineering Task Force. Comments are
40	   solicited and should be addressed to the working group's mailing
41	   list at avt@ietf.org and/or the authors.

43	   << Note for the RFC editor: xxxx should be replaced with the RFC
44	   number that will be assigned. >>

46	Abstract

48	   The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in
49	   ISO that produced the MPEG-4 standard. MPEG defines tools to
50	   compress content such as audio-visual information into elementary
51	   streams. This specification defines a simple, but generic RTP
52	   payload format for transport of any non-multiplexed MPEG-4
53	   elementary stream.

55	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

57	Table of Contents

59	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . .   4
60	   2.  Carriage of MPEG-4 elementary streams over RTP . . . . . . .   6
61	   2.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . .   6
62	   2.2.  MPEG Access Units  . . . . . . . . . . . . . . . . . . . .   6
63	   2.3.  Concatenation of Access Units  . . . . . . . . . . . . . .   6
64	   2.4.  Fragmentation of Access Units  . . . . . . . . . . . . . .   7
65	   2.5.  Interleaving . . . . . . . . . . . . . . . . . . . . . . .   7
66	   2.6.  Time stamp information . . . . . . . . . . . . . . . . . .   8
67	   2.7.  State indication of MPEG-4 system streams  . . . . . . . .   8
68	   2.8.  Random Access Indication . . . . . . . . . . . . . . . . .   8
69	   2.9.  Carriage of auxiliary information  . . . . . . . . . . . .   9
70	   2.10. MIME format parameters and configuring conditional field .   9
71	   2.11. Global structure of payload format . . . . . . . . . . . .   9
72	   2.12. Modes to transport MPEG-4 streams  . . . . . . . . . . . .  10
73	   2.13. Alignment with RFC 3016  . . . . . . . . . . . . . . . . .  10
74	   3.  Payload format . . . . . . . . . . . . . . . . . . . . . . .  11
75	   3.1.  Usage of RTP header fields and RTCP  . . . . . . . . . . .  11
76	   3.2.  RTP payload structure  . . . . . . . . . . . . . . . . . .  12
77	   3.2.1.  The AU Header Section  . . . . . . . . . . . . . . . . .  12
78	   3.2.1.1.  The AU-header  . . . . . . . . . . . . . . . . . . . .  12
79	   3.2.2.  The Auxiliary Section  . . . . . . . . . . . . . . . . .  14
80	   3.2.3.  The Access Unit Data Section . . . . . . . . . . . . . .  15
81	   3.2.3.1.  Fragmentation  . . . . . . . . . . . . . . . . . . . .  16
82	   3.2.3.2.  Interleaving . . . . . . . . . . . . . . . . . . . . .  16
83	   3.2.3.3.  Constraints for interleaving . . . . . . . . . . . . .  17
84	   3.3.  Usage of this specification  . . . . . . . . . . . . . . .  21
85	   3.3.1.  General  . . . . . . . . . . . . . . . . . . . . . . . .  21
86	   3.3.2.  The generic mode . . . . . . . . . . . . . . . . . . . .  21
87	   3.3.3.  Constant bit rate CELP . . . . . . . . . . . . . . . . .  22
88	   3.3.4.  Variable bit rate CELP . . . . . . . . . . . . . . . . .  22
89	   3.3.5.  Low bit rate AAC . . . . . . . . . . . . . . . . . . . .  23
90	   3.3.6.  High bit rate AAC  . . . . . . . . . . . . . . . . . . .  24
91	   3.3.7.  Additional modes . . . . . . . . . . . . . . . . . . . .  25
92	   4.  IANA considerations  . . . . . . . . . . . . . . . . . . . .  26
93	   4.1.  MIME type registration . . . . . . . . . . . . . . . . . .  26
94	   4.2.  Registration of mode definitions with IANA . . . . . . . .  31
95	   4.3.  Concatenation of parameters  . . . . . . . . . . . . . . .  31
96	   4.4.  Usage of SDP . . . . . . . . . . . . . . . . . . . . . . .  32
97	   4.4.1.  The a=fmtp keyword . . . . . . . . . . . . . . . . . . .  32
98	   5.  Security considerations  . . . . . . . . . . . . . . . . . .  32
99	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . .  33
100	   7.  References . . . . . . . . . . . . . . . . . . . . . . . . .  33
101	   8.  Author addresses . . . . . . . . . . . . . . . . . . . . . .  34

103	       APPENDIX: Usage of this payload format . . . . . . . . . . .  36
104	       A. Examples of delay analysis with interleave  . . . . . . .  36
105	       A.1 Introduction . . . . . . . . . . . . . . . . . . . . . .  36
106	       A.2 De-interleaving and error concealment  . . . . . . . . .  36

108	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

110	       A.3 Simple Group interleave  . . . . . . . . . . . . . . . .  36
111	       A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . .  36
112	       A.3.2 Determining the de-interleave buffer size  . . . . . .  37
113	       A.3.3 Determining the maximum displacement . . . . . . . . .  37
114	       A.4 More subtle group interleave . . . . . . . . . . . . . .  37
115	       A.4.1 Introduction . . . . . . . . . . . . . . . . . . . . .  37
116	       A.4.2 Determining the de-interleave buffer size  . . . . . .  38
117	       A.4.3 Determining the maximum displacement . . . . . . . . .  38
118	       A.5 Continuous interleave  . . . . . . . . . . . . . . . . .  38
119	       A.5.1 Introduction . . . . . . . . . . . . . . . . . . . . .  38
120	       A.5.2 Determining the de-interleave buffer size  . . . . . .  39
121	       A.5.3 Determining the maximum displacement . . . . . . . . .  39

123	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

125	1. Introduction

127	   The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29
128	   that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4
129	   standards [1]. The MPEG-4 standard specifies compression of
130	   audio-visual data into for example an audio or video elementary
131	   stream. In the MPEG-4 standard, these streams take the form of
132	   audio-visual objects that may be arranged into an audio-visual scene
133	   by means of a scene description. Each MPEG-4 elementary stream
134	   consists of a sequence of Access Units; examples of an Access Unit
135	   (AU) are an audio frame and a video picture.

137	   This specification defines a general and configurable payload
138	   structure to transport MPEG-4 elementary streams, in particular
139	   MPEG-4 audio (including speech) streams, MPEG-4 video streams and
140	   also MPEG-4 systems streams, such as BIFS (BInary Format for
141	   Scenes), OCI (Object Content Information), OD (Object Descriptor)
142	   and IPMP (Intellectual Property Management and Protection) streams.
143	   The RTP payload defined in this document is simple to implement and
144	   reasonably efficient. It allows for optional interleaving of Access
145	   Units (such as audio frames) to increase error resiliency in packet
146	   loss.

148	   Some types of MPEG-4 elementary streams include "crucial"
149	   information whose loss cannot be tolerated, but RTP does not provide
150	   reliable transmission so receipt of that crucial information is not
151	   assured.  Section 3.2.3.4 specifies how stream state is conveyed so
152	   that the receiver can detect the loss of crucial information and
153	   cease decoding until the next random access point is received.
154	   Applications transmitting streams that include crucial information,
155	   such as OD commands, BIFS commands, or programmatic content such as
156	   MPEG-J (Java) and ECMAScript, should include random access points
157	   sufficiently often, depending upon the probability of loss, to
158	   reduce stream corruption to an acceptable level.  An example is the
159	   carousel mechanism as defined by MPEG in ISO/IEC 14496-1.

161	   Such applications may also employ additional protocols or services
162	   to reduce the probability of loss.  At the RTP layer, these measures
163	   include payload formats and profiles for retransmission or forward
164	   error correction (such as in RFC 2733), which must be employed with
165	   due consideration to congestion control.  Another solution that may
166	   be appropriate for some applications is to carry RTP over TCP (such
167	   as in RFC 2326, section 10.12).  At the network layer, resource
168	   allocation or preferential service may be available to reduce the
169	   probability of loss.  For a general description of methods to repair
170	   streaming media see RFC 2354.

172	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

174	   Though the RTP payload format defined in this document is capable
175	   of transporting any MPEG-4 stream, other, more specific, formats
176	   may exist, such as RFC 3016 for transport of MPEG-4 video (part 2).

178	   Configuration of the payload is provided to accommodate transport
179	   of any MPEG-4 stream at any possible bit rate. However, for a
180	   specific MPEG-4 elementary stream typically only very few
181	   configurations are needed. So as to allow for the design of
182	   simplified, but dedicated receivers, this specification requires
183	   that specific modes are defined for transport of MPEG-4 streams.
184	   This document defines modes for MPEG-4 CELP and AAC streams, as
185	   well as a generic mode that can be used to transport any MPEG-4
186	   stream. In the future new RFCs are expected to specify additional
187	   modes for transport of MPEG-4 streams.

189	   The RTP payload format defined in this document specifies carriage
190	   of system-related information that is often equivalent to the
191	   information that may be contained in the MPEG-4 Sync Layer (SL) as
192	   defined in MPEG-4 Systems [1]. This document does not prescribe how
193	   to transcode or map information from the SL to fields defined in
194	   the RTP payload format. Such processing, if any, is left to the
195	   discretion of the application. However, to anticipate the need for
196	   transport of any additional system-related information in future,
197	   an auxiliary field can be configured that may carry any such data.

199	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
200	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
201	   this document are to be interpreted as described in RFC 2119 [3].

203	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

205	2. Carriage of MPEG-4 elementary streams over RTP

207	2.1 Introduction

209	   With this payload format a single MPEG-4 elementary stream can be
210	   transported. Information on the type of MPEG-4 stream carried in
211	   the payload is conveyed by MIME format parameters, for example in
212	   an SDP [6] message or by other means (see section 4). These MIME
213	   format parameters specify the configuration of the payload. To
214	   allow for simplified and dedicated receivers, a MIME format
215	   parameter is available to signal a specific mode of using this
216	   payload. A mode definition MAY include the type of MPEG-4
217	   elementary stream as well as the applied configuration, so as to
218	   avoid the need in receivers to parse all MIME format parameters.
219	   The applied mode MUST be signaled.

221	2.2 MPEG Access Units

223	   For carriage of compressed audio-visual data MPEG defines Access
224	   Units. An MPEG Access Unit (AU) is the smallest data entity to
225	   which timing information is attributed. In case of audio an Access
226	   Unit may represent an audio frame and in case of video a picture.
227	   MPEG Access Units are by definition octet-aligned. If for example
228	   an audio frame is not octet-aligned, up to 7 zero-padding bits MUST
229	   be inserted at the end of the frame to achieve the octet-aligned
230	   Access Units, as required by the MPEG-4 specification. MPEG-4
231	   decoders MUST be able to decode AUs in which such padding is
232	   applied.

234	   Consistent with the MPEG-4 specification, this document requires
235	   that each MPEG-4 part 2 video Access Unit includes all the coded
236	   data of a picture, any video stream headers that may precede the
237	   coded picture data, and any video stream stuffing that may follow
238	   it, up to, but not including the startcode indicating the start of
239	   a new video stream or the next Access Unit.

241	2.3 Concatenation of Access Units

243	   Frequently it is possible to carry multiple Access Units in one RTP
244	   packet. This is particularly useful for audio; for example, when
245	   AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC
246	   frames contain on average approximately 200 octets. On a LAN with a
247	   1500 octet MTU this would allow on average 7 complete AAC frames to
248	   be carried per AAC packet.

250	   Access Units may have a fixed size in octets, but a variable size
251	   is also possible. To facilitate parsing in case of multiple
252	   concatenated AUs in one RTP packet, the size of each AU is made
253	   known to the receiver. When concatenating in case of a constant AU
254	   size, this size is communicated "out of band" through a MIME format
255	   parameter. When concatenating in case of variable size AUs, the RTP
256	   payload carries "in band" an AU size field for each contained AU.

258	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

260	   In combination with the RTP payload length the size information
261	   allows the RTP payload to be split by the receiver back into the
262	   individual AUs.

264	   To simplify the implementation of RTP receivers, it is required
265	   that when multiple AUs are carried in an RTP packet, each AU MUST
266	   be complete, i.e. the number of AUs in an RTP packet MUST be
267	   integral. In addition, an AU MUST NOT be repeated in other RTP
268	   packets; hence repetition of an AU is only possible by using a
269	   duplicate RTP packet.

271	2.4 Fragmentation of Access Units

273	   MPEG allows for very large Access Units. Since most IP networks
274	   have significantly smaller MTU sizes, this payload format allows
275	   for the fragmentation of an Access Unit over multiple RTP packets
276	   so as to avoid IP layer fragmentation. To simplify the
277	   implementation of RTP receivers, an RTP packet SHALL either carry
278	   one or more complete Access Units or a single fragment of one
279	   Access Unit (i.e. packets MUST NOT contain fragments of multiple
280	   Access Units).

282	2.5 Interleaving

284	   When an RTP packet carries a contiguous sequence of Access Units,
285	   the loss of such a packet can result in a "decoding gap" for the
286	   user. One method to alleviate this problem is to allow for the
287	   Access Units to be interleaved in the RTP packets. For a modest
288	   cost in latency and implementation complexity, significant error
289	   resiliency to packet loss can be achieved.

291	   To support optional interleaving of Access Units, this payload
292	   format allows for index information to be sent for each Access Unit.
293	   After informing receivers about buffer resources to allocate for
294	   de-interleaving, the RTP sender is free to choose the interleaving
295	   pattern without propagating this information a priori to the
296	   receiver(s). Indeed the sender could dynamically adjust the
297	   interleaving pattern based on the Access Unit size, error rates,
298	   etc. The RTP receiver does not need to know the interleaving
299	   pattern used, it only needs to extract the index information of the
300	   Access Unit and insert the Access Unit into the appropriate
301	   sequence in the decoding or rendering queue. An example of
302	   interleaving is given below.

304	   Assume that an RTP packet contains 3 AUs, and that the AUs are
305	   numbered 0, 1, 2, 3, 4, etc. If an interleaving group length of 9 is
306	   chosen, then RTP packet(i) contains the following AU(n):
307	   RTP packet(0):  AU(0),  AU(3),  AU(6)
308	   RTP packet(1):  AU(1),  AU(4),  AU(7)
309	   RTP packet(2):  AU(2),  AU(5),  AU(8)
310	   RTP packet(3):  AU(9),  AU(12), AU(15)
311	   RTP packet(4):  AU(10), AU(13), AU(16)
312	   Etc.

314	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

316	2.6 Time stamp information

318	   The RTP time stamp MUST carry the sampling instant of the first AU
319	   (fragment) in the RTP packet. When multiple AUs are carried within
320	   an RTP packet, the time stamps of subsequent AUs can be calculated
321	   if the frame period of each AU is known. For audio and video this
322	   is possible if the frame rate is constant. However, in some cases
323	   it is not possible to make such calculation, for example for
324	   variable frame rate video and for MPEG-4 BIFS streams carrying
325	   composition information. To support such cases, this payload format
326	   can be configured to carry a time stamp in the RTP payload for each
327	   contained Access Unit. A time stamp MAY be conveyed in the RTP
328	   payload only for non-first AUs in the RTP packet, and SHALL NOT be
329	   conveyed for the first AU (fragment), as the time stamp for the
330	   first AU in the RTP packet is carried by the RTP time stamp.

332	   MPEG-4 defines two type of time stamps, the composition time stamp
333	   (CTS) and the decoding time stamp (DTS). The CTS represents the
334	   sampling instant of an AU, and hence the CTS is equivalent to the
335	   RTP time stamp. The DTS may be used in MPEG-4 video streams that
336	   use bi-directional coding, i.e. when pictures are predicted in both
337	   forward and backward direction by using either a reference picture
338	   in the past, or a reference picture in the future. The DTS cannot
339	   be carried in the RTP header. In some cases the DTS can be derived
340	   from the RTP time stamp using frame rate information; this requires
341	   deep parsing in the video stream, which may be considered
342	   objectionable. But if the video frame rate is variable, the required
343	   information may not even be present in the video stream. For both
344	   reasons, the capability has been defined to optionally carry the
345	   DTS in the RTP payload for each contained Access Unit.

347	   To keep the coding of time stamps efficient, each time stamp
348	   contained in the RTP payload is coded differentially, the CTS from
349	   the RTP time stamp, and the DTS from the CTS.

351	2.7 State indication of MPEG-4 system streams

353	   ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to
354	   convey state information when transporting MPEG-4 system streams,
355	   this payload format allows for the optional carriage in the RTP
356	   payload of the stream state for each contained Access Unit. Stream
357	   states are used to signal "crucial" AUs that carry information whose
358	   loss cannot be tolerated and are also useful when repeating AUs
359	   according to the carousel mechanism defined in ISO/IEC 14496-1.

361	2.8 Random access indication

363	   Random access to the content of MPEG-4 elementary streams may be
364	   possible at some but not all Access Units. To signal Access Units
365	   where random access is possible, a random access point flag can

367	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

369	   optionally be carried in the RTP payload for each contained Access
370	   Unit. Carriage of random access points is particularly useful for
371	   MPEG-4 system streams in combination with the stream state.

373	2.9 Carriage of auxiliary information.

375	   This payload format defines a specific field to carry auxiliary
376	   data. The auxiliary data field is preceded by a field that specifies
377	   the length of the auxiliary data, so as to facilitate skipping of
378	   the data without parsing it. The coding of the auxiliary data is not
379	   defined in this document; instead the format, meaning and signaling
380	   of auxiliary information is expected to be specified in one or more
381	   future RFCs. Auxiliary information MUST NOT be transmitted until its
382	   format, meaning and signaling have been specified and its use has
383	   been signaled. Receivers that have knowledge of the auxiliary data
384	   MAY decode the auxiliary data, but receivers without knowledge of
385	   such data MUST skip the auxiliary data field.

387	2.10 MIME format parameters and configuring conditional fields

389	   To support the features described in the previous sections several
390	   fields are defined for carriage in the RTP payload. However, their
391	   use strongly depends on the type of MPEG-4 elementary stream that
392	   is carried. Sometimes a specific field is needed with a certain
393	   length, while in other cases such field is not needed at all. To be
394	   efficient in either case, the fields to support these features are
395	   configurable by means of MIME format parameters. In general, a MIME
396	   format parameter defines the presence and length of the associated
397	   field. A length of zero indicates absence of the field. As a
398	   consequence, parsing of the payload requires knowledge of MIME
399	   format parameters. The MIME format parameters are conveyed to the
400	   receiver via SDP [6] messages, as specified in section 4.4.1, or
401	   through other means.

403	2.11 Global structure of payload format

405	   The RTP payload following the RTP header, contains three
406	   octet-aligned data sections, of which the first two MAY be empty.
407	   See figure 1.

409	          +---------+-----------+-----------+---------------+
410	          | RTP     | AU Header | Auxiliary | Access Unit   |
411	          | Header  | Section   | Section   | Data Section  |
412	          +---------+-----------+-----------+---------------+

414	                    <----------RTP Packet Payload----------->

416	   Figure 1: Data sections within an RTP packet

418	   The first data section is the AU (Access Unit) Header Section, that
419	   contains one or more AU-headers; however, each AU-header MAY be
420	   empty, in which case the entire AU Header Section is empty. The

422	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

424	   second section is the Auxiliary Section, containing auxiliary data;
425	   this section MAY also be configured empty. The third section is the
426	   Access Unit Data Section, containing either a single fragment of
427	   one Access Unit or one or more complete Access Units. The Access
428	   Unit Data Section MUST NOT be empty.

430	2.12 Modes to transport MPEG-4 streams

432	   While it is possible to build fully configurable receivers capable
433	   of receiving any MPEG-4 stream, this specification also allows for
434	   the design of simplified, but dedicated receivers, that are capable
435	   for example of receiving only one type of MPEG-4 stream. This
436	   is achieved by requiring that specific modes be defined for using
437	   this specification. Each mode may define constraints for transport
438	   of one or more type of MPEG-4 streams, for instance on the payload
439	   configuration.

441	   The applied mode MUST be signaled. Signaling the mode is
442	   particularly important for receivers that are only capable of
443	   decoding one or more specific modes. Such receivers need to
444	   determine whether the applied mode is supported, so as to avoid
445	   problems with processing of payloads that are beyond the
446	   capabilities of the receiver.

448	   In this document several modes are defined for transport of MPEG-4
449	   CELP and AAC streams, as well as a generic mode that can be used
450	   for any MPEG-4 stream. In the future, new RFCs may specify other
451	   modes of using this specification. However, each mode MUST be in
452	   full compliance with this specification (see section 3.3.7).

454	2.13 Alignment with RFC 3016

456	   This payload can be configured to be nearly identical to the
457	   payload format defined in RFC 3016 [5] for the MPEG-4 video
458	   configurations recommended in RFC 3016. Hence, receivers that
459	   comply with RFC 3016 can decode such RTP payload, providing that
460	   additional packets containing video decoder configuration (VO,
461	   VOL, VOSH) are inserted in the stream, as required by RFC 3016.
462	   Conversely, receivers that comply with the specification in this
463	   document should be able to decode payloads, names and parameters
464	   defined for MPEG-4 video in RFC 3016. In this respect it is
465	   strongly RECOMMENDED to implement the ability to ignore "in band"
466	   video decoder configuration packets in the RFC 3016 payload.

468	   Note the "out of band" availability of the video decoder
469	   configuration is optional in RFC 3016. To achieve maximum
470	   interoperability with the RTP payload format defined in this
471	   document, applications that use RFC 3016 to transport MPEG-4 video
472	   (part 2) are recommended to make the video decoder configuration
473	   available as a MIME parameter.

475	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

477	3. Payload Format

479	3.1 Usage of RTP Header Fields and RTCP

481	   Payload Type (PT): The assignment of an RTP payload type for this
482	   packet format is outside the scope of this document; it is
483	   specified by the RTP profile under which this payload format is
484	   used.

486	   Marker (M) bit: The M bit is set to 1 to indicate that the RTP
487	   packet payload contains either the final fragment of a fragmented
488	   Access Unit or one or more complete Access Units.

490	   Extension (X) bit: Defined by the RTP profile used.

492	   Sequence Number: The RTP sequence number SHOULD be generated by the
493	   sender in the usual manner with a constant random offset.

495	   Timestamp: Indicates the sampling instant of the first AU
496	   contained in the RTP payload. This sampling instant is equivalent
497	   to the CTS in the MPEG-4 time domain. When using SDP the clock rate
498	   of the RTP time stamp MUST be expressed using the "rtpmap"
499	   attribute. If an MPEG-4 audio stream is transported, the rate SHOULD
500	   be set to the same value as the sampling rate of the audio stream.
501	   If an MPEG-4 video stream is transported, it is RECOMMENDED to set
502	   the rate to 90 kHz.

504	   In all cases, the sender SHALL make sure that RTP time stamps
505	   are identical only if the RTP time stamp refers to fragments of the
506	   same Access Unit.

508	   According to RFC 1889 [2] (section 5.1), RTP time stamps are
509	   RECOMMENDED to start at a random value for security reasons. This
510	   is not an issue for synchronization of multiple RTP streams. When,
511	   however, streams from multiple sources are to be synchronized (for
512	   example one stream from local storage, another from an RTP streaming
513	   server), synchronization may become impossible if the receiver only
514	   knows the original time stamp relationships. Synchronization in such
515	   cases, may require to provide the correct relationship between time
516	   stamps for obtaining synchronization by out of band means. The
517	   format of such information as well as methods to convey such
518	   information are beyond the scope of this specification.

520	   SSRC: set as described in RFC 1889 [2].

522	   CC and CSRC fields are used as described in RFC 1889 [2].

524	   RTCP SHOULD be used as defined in RFC 1889 [2]. Note that time
525	   stamps in RTCP Sender Reports may be used to synchronize multiple

527	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

529	   MPEG-4 elementary streams and also to synchronize MPEG-4 streams
530	   with non-MPEG-4 streams, in case the delivery of these streams uses
531	   RTP.

533	3.2 RTP Payload Structure

535	3.2.1 The AU Header Section

537	   When present, the AU Header Section consists of the
538	   AU-headers-length field, followed by a number of AU-headers. See
539	   figure 2.

541	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
542	   |AU-headers-length|AU-header|AU-header|      |AU-header|padding|
543	   |                 |   (1)   |   (2)   |      |   (n)   | bits  |
544	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+

546	   Figure 2: The AU Header Section

548	   The AU-headers are configured using MIME format parameters and MAY
549	   be empty. If the AU-header is configured empty, the
550	   AU-headers-length field SHALL NOT be present and consequently the
551	   AU Header Section is empty. If the AU-header is not configured
552	   empty, then the AU-headers-length is a two octet field that
553	   specifies the length in bits of the immediately following
554	   AU-headers, excluding the padding bits.

556	   Each AU-header is associated with a single Access Unit (fragment)
557	   contained in the Access Unit Data Section in the same RTP packet.
558	   For each contained Access Unit (fragment) there is exactly one
559	   AU-header. Within the AU Header Section, the AU-headers are
560	   bit-wise concatenated in the order in which the Access Units are
561	   contained in the Access Unit Data Section. Hence, the n-th
562	   AU-header refers to the n-th AU (fragment). If the concatenated
563	   AU-headers consume a non-integer number of octets, up to 7
564	   zero-padding bits MUST be inserted at the end in order to achieve
565	   octet-alignment of the AU Header Section.

567	3.2.1.1 The AU-header

569	   Each AU-header may contain the fields given in figure 3. The length
570	   in bits of the above fields with the exception of the CTS-flag, the
571	   DTS-flag and the RAP-flag fields is defined by MIME format
572	   parameters; see section 4.1. If a MIME format parameter has the
573	   default value of zero, then the associated field is not present.

575	   If present, the fields MUST occur in the mutual order given in
576	   figure 3. In the general case a receiver can only discover the size
577	   of an AU-header by parsing it since the presence of the CTS-delta
578	   and DTS-delta fields is signaled by the value of the CTS-flag and
579	   DTS-flag, respectively.

581	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

583	   +---------------------------------------+
584	   |     AU-size                           |
585	   +---------------------------------------+
586	   |     AU-Index / AU-Index-delta         |
587	   +---------------------------------------+
588	   |     CTS-flag                          |
589	   +---------------------------------------+
590	   |     CTS-delta                         |
591	   +---------------------------------------+
592	   |     DTS-flag                          |
593	   +---------------------------------------+
594	   |     DTS-delta                         |
595	   +---------------------------------------+
596	   |     RAP-flag                          |
597	   +---------------------------------------+
598	   |     Stream-state                      |
599	   +---------------------------------------+

601	   Figure 3: The fields in the AU-header. If used, the AU-Index field
602	             only occurs in the first AU-header within an AU Header
603	             Section; in any other AU-header the AU-Index-delta field
604	             occurs instead.

606	   AU-size: Indicates the size in octets of the associated Access Unit
607	         in the Access Unit Data Section in the same RTP packet. When
608	         the AU-size is associated with an AU fragment, the AU size
609	         indicates the size of the entire AU and not the size of the
610	         fragment. In this case, the size of the fragment is known
611	         from the size of the AU data section. This can be exploited
612	         to determine whether a packet contains an entire AU or a
613	         fragment, which is particularly useful after losing a packet
614	         carrying the last fragment of an AU.

616	   AU-Index: Indicates the serial number of the associated Access Unit
617	         (fragment). For each (in decoding order) consecutive AU or AU
618	         fragment, the serial number is incremented with 1. When
619	         present, the AU-Index field occurs in the first AU-header in
620	         the AU Header Section, but MUST NOT occur in any subsequent
621	         (non-first) AU-header in that Section. To encode the serial
622	         number in any such non-first AU-header, the AU-Index-delta
623	         field is used. If each AU-Index field is coded with the value
624	         0, the serial number of the AU (fragment) is not specified,
625	         and in that case receivers may ignore the AU-Index field.

627	   AU-Index-delta: The AU-Index-delta field is an unsigned integer
628	         that specifies the serial number of the associated AU as the
629	         difference with respect to the serial number of the previous
630	         Access Unit. Hence, for the n-th (n>1) AU the serial number
631	         is found from:
632	         AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1
633	         If the AU-Index field is present in the first AU-header in

635	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

637	         the AU Header Section, then the AU-Index-delta field MUST be
638	         present in any subsequent (non-first) AU-header. When the
639	         AU-Index-delta is coded with the value 0, it indicates that
640	         the Access Units are consecutive in decoding order. An
641	         AU-Index-delta value larger than 0 signals that interleaving
642	         is applied.

644	   CTS-flag: Indicates whether the CTS-delta field is present.
645	         A value of 1 indicates that the field is present, a value
646	         of 0 that it is not present.
647	         The CTS-flag field MUST be present in each AU-header if the
648	         length of the CTS-delta field is signaled to be larger than
649	         zero. In that case, the CTS-flag field MUST have the value 0
650	         in the first AU-header and MAY have the value 1 in all
651	         non-first AU-headers. The CTS-flag field SHOULD be 0 for
652	         any non-first fragment of an Access Unit.

654	   CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's
655	         complement offset (delta) from the time stamp in the RTP
656	         header of this RTP packet. The CTS MUST use the same clock
657	         rate as the time stamp in the RTP header.

659	   DTS-flag: Indicates whether the DTS-delta field is present. A value
660	         of 1 indicates that DTS-delta is present, a value of 0 that
661	         it is not present.
662	         The DTS-flag field MUST be present in each AU-header if the
663	         length of the DTS-delta field is signaled to be larger than
664	         zero. The DTS-flag field MUST have the same value for all
665	         fragments of an Access Unit.

667	   DTS-delta: Specifies the value of the DTS as a 2's complement
668	         offset (delta) from the CTS. The DTS MUST use the
669	         same clock rate as the time stamp in the RTP header. The
670	         DTS-delta field MUST have the same value for all fragments of
671	         an Access Unit.

673	   RAP-flag: Indicates when set to 1 that the associated Access Unit
674	         provides a random access point to the content of the stream.
675	         If an Access Unit is fragmented, the RAP flag, if present,
676	         MUST be set to 0 for each non-first fragment of the AU.

678	   Stream-state:  Specifies the state of the stream for an AU of an
679	         MPEG-4 system stream; each state is identified by a value of
680	         a modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams
681	         use the AU_SequenceNumber to signal stream states. When the
682	         stream state changes, the value of stream-state MUST be
683	         incremented by one.

685	         Note: no relation is required between stream-states of
686	         different streams.

688	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

690	3.2.2 The Auxiliary Section

692	   The Auxiliary Section consists of the auxiliary-data-size field
693	   followed by the auxiliary-data field. Receivers MAY (but are not
694	   required to) parse the auxiliary-data field; to facilitate skipping
695	   of the auxiliary-data field by receivers, the auxiliary-data-size
696	   field indicates the length in bits of the auxiliary-data. If the
697	   concatenation of the auxiliary-data-size and the auxiliary-data
698	   fields consume a non-integer number of octets, up to 7 zero padding
699	   bits MUST be inserted immediately after the auxiliary data in order
700	   to achieve octet-alignment. See figure 4.

702	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
703	   | auxiliary-data-size   | auxiliary-data       |padding bits |
704	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+

706	   Figure 4: The fields in the Auxiliary Section

708	   The length in bits of the auxiliary-data-size field is configurable
709	   by a MIME format parameter; see section 4.1. The default length of
710	   zero indicates that the entire Auxiliary Section is absent.

712	   auxiliary-data-size: specifies the length in bits of the immediately
713	         following auxiliary-data field;

715	   auxiliary-data: the auxiliary-data field contains data of a format
716	         not defined by this specification.

718	3.2.3 The Access Unit Data Section

720	   The Access Unit Data Section contains an integer number of complete
721	   Access Units or a single fragment of one AU. The Access Unit Data
722	   Section is never empty. If data of more than one Access Unit is
723	   present, then the AUs are concatenated into a contiguous string
724	   of octets. See figure 5. The AUs inside the Access Unit Data
725	   Section MUST be in decoding order, though not necessarily contiguous
726	   in the case of interleaving.

728	   The size and number of Access Units SHOULD be adjusted such that
729	   the resulting RTP packet is not larger than the path MTU. To handle
730	   larger packets, this payload format relies on lower layers for
731	   fragmentation, which may result in reduced performance.

733	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

735	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
736	   |AU(1)                                                          |
737	   +                                                               |
738	   |                                                               |
739	   |               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
740	   |               |AU(2)                                          |
741	   +-+-+-+-+-+-+-+-+                                               |
742	   |                                                               |
743	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
744	   |                               | AU(n)                         |
745	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
746	   |AU(n) continued|
747	   |-+-+-+-+-+-+-+-+

749	   Figure 5: Access Unit Data Section; each AU is octet-aligned.

751	   When multiple Access Units are carried, the size of each AU MUST be
752	   made available to the receiver. If the AU size is variable then the
753	   size of each AU MUST be indicated in the AU-size field of the
754	   corresponding AU-header. However, if the AU size is constant for a
755	   stream, this mechanism SHOULD NOT be used, but instead the fixed
756	   size SHOULD be signaled by the MIME format parameter
757	   "ConstantSize", see section 4.1.

759	   The absence of both AU-size in the AU-header and the ConstantSize
760	   MIME format parameter indicates carriage of a single AU (fragment),
761	   i.e. that a single Access Unit (fragment) is transported in each
762	   RTP packet for that stream.

764	3.2.3.1 Fragmentation

766	   A packet SHALL carry either one or more complete Access Units, or
767	   a single fragment of an Access Unit.  Fragments of the same Access
768	   Unit have the same time stamp but different RTP sequence numbers.
769	   The marker bit in the RTP header is 1 on the last fragment of an
770	   Access Unit, and 0 on all other fragments.

772	3.2.3.2 Interleaving

774	   Access Units MAY be interleaved. Senders MAY perform interleaving.
775	   Receivers MUST support interleaving, except if the receiver only
776	   supports modes in which no interleaving is allowed. When
777	   interleaving of Access Units is used it SHALL be implemented using
778	   the AU-Index and AU-Index-delta fields in the AU-header.

780	   Based on the RTP sequence number, the RTP time stamp, the AU-Index
781	   and the AU-Index-delta, a receiver can unambiguously reconstruct
782	   the original order even in case of out-of-order packets, packet
783	   loss or duplication. Note that for this purpose the AU-Index is

785	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

787	   redundant when the RTP time stamp and the AU-Index-delta values are
788	   sufficient for placing the AUs correctly in time. In such cases
789	   receivers MAY ignore the AU-Index value and senders MAY code the
790	   AU-Index field with the value 0, but only if they code each AU-Index
791	   field with that value. If the AU-Index is not redundant, senders
792	   SHOULD use a length of the AU-Index field so that this field is not
793	   coded with the value 0 in two subsequent RTP packets.

795	   When interleaving is applied, a de-interleave buffer is needed in
796	   receivers to put the Access Units in their correct logical
797	   consecutive decoding order. This requires the computation of the
798	   time stamp for each Access Unit. In case of a fixed time duration
799	   per Access Unit, the time stamp of the i-th access unit in an RTP
800	   packet with RTP time stamp T is calculated as follows:

802	   Timestamp[0] = T
803	   Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k]
804	                         + 1))) * access-unit-duration

806	   When AU-Index-delta is always 0, this reduces to T + i * (access-
807	   unit-duration). This is the non-interleaved case, where the frames
808	   are consecutive in decoding order. Note that the AU-Index field
809	   (present for the first Access Unit) is not needed in this
810	   calculation. Hence in cases where the access-unit-duration has a
811	   fixed and known value, the AU-Index does not need to provide index
812	   information and can be coded with the value 0. See also the
813	   semantics of the AU-Index field in 3.2.1.1.

815	   If the Access Units are not fixed duration, the AU-Index is not
816	   redundant, and MUST provide the index information required for
817	   re-ordering. The number of bits of the AU-Index field MUST be chosen
818	   so that valid index information is provided at the applied
819	   interleaving scheme, without causing problems due to roll-over of
820	   the AU-Index field. Note that the CTS-delta may be required to
821	   compute the correct time stamp for each AU.

823	3.2.3.3 Constraints for interleaving

825	   The size of the packets should be suitably chosen to be appropriate
826	   to both the path MTU and the capacity of the receiver's
827	   de-interleave buffer. The maximum packet size for a session SHOULD
828	   be chosen not to exceed the path MTU.

830	   To allow receivers to allocate sufficient resources for
831	   de-interleaving, senders MUST provide the information to receivers
832	   as specified in this section.

834	   AUs enter the decoder in decoding order. The de-interleave buffer
835	   is used to re-order a stream of interleaved AUs back into decoding
836	   order. When interleaving is applied, the decoding of "early" AUs
837	   has to be postponed until all AUs that precede in decoding order

839	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

841	   have been received. Therefore these "early" AUs are stored in the
842	   de-interleave buffer. As an example in figure 6 the interleaving
843	   pattern from section 2.5 is considered.

845	                             +--+--+--+--+--+--+--+--+--+--+--+-
846	   Interleaved AUs           | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
847	                             +--+--+--+--+--+--+--+--+--+--+--+-
848	   Storage of "early" AUs         3  3  3  3  3  3
849	6 6  6  6  6  6
850	4 4  4
851	                                              7  7  7
852	                                                            12 12

854	   Figure 6: Storage of "early" AUs in the de-interleave buffer per
855	             interleaved AU.

857	   AU(3) is to be delivered to the decoder after AU(0), AU(1)and AU(2);
858	   of these AUs, AU(2) is most late and hence AU(3) needs to be stored
859	   until AU(2) is received. Similarly, AU(6) is to be stored until
860	   AU(5) is received, while AU(4) and AU(7) are to be stored until
861	   AU(2) and AU(5) are received, respectively. Note that the fullness
862	   of the de-interleave buffer varies in time. In figure 6, the
863	   de-interleave buffer contains at most 4, but often less AUs.

865	   So as to give a rough indication of the resources needed in the
866	   receiver for de-interleaving, the maximum displacement in time of
867	   an AU is defined. The maximum displacement in time of an AU is the
868	   maximum difference between the time stamp of any received AU and
869	   the time stamp of the earliest AU that is not yet received. In other
870	   words, when considering a sequence of interleaved AUs, then:

872	   Maximum displacement = max{TS(i) - TS(j)}, for any i and any j>i,

874	            where i and j indicate the index of the AU in the
875	                             interleaving pattern and
876	                     TS denotes the time stamp of the AU

878	   As an example in figure 7 the interleaving pattern from section 2.5
879	   is considered. For each AU in the pattern the earliest not yet
880	   received AU  is indicated. A "-" indicates that all previous AUs
881	   are received. If the AU period is constant, the maximum displacement
882	   equals 5 AU periods, as found for AU(6) and AU(7).

884	                                 +--+--+--+--+--+--+--+--+--+--+--+-
885	   Interleaved AUs               | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
886	                                 +--+--+--+--+--+--+--+--+--+--+--+-

888	   Earliest not yet received AU    -  1  1  -  2  2  -  -  -  - 10

890	   Figure 7: The earliest not yet received AU for each AU in the
891	             interleaving pattern.

893	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

895	   When interleaving, senders MUST signal the maximum displacement
896	   in time during the session via the MIME format parameter
897	   "maxDisplacement"; see section 4.1.

899	   An estimate of the size of the de-interleave buffer is found by
900	   multiplying the maximum displacement by the maximum bit rate:

902	   size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP
903	                                clock frequency),

905	   where Rate(max) is the maximum bit-rate of the transported stream.

907	   Note that receivers can derive Rate(max) from the MIME format
908	   parameters StreamType, Profile-level-id, and config.

910	   However, this calculation estimates the size of the de-interleave
911	   buffer and its size may be larger than calculated. If this
912	   calculation under-estimates the size of the de-interleave buffer,
913	   then senders, when interleaving, MUST signal a size of the
914	   de-interleave buffer that is large enough to contain all "early"
915	   AUs at any point in time during the session via the MIME format
916	   parameter "de-interleaveBufferSize"; see section 4.1.

918	   If the "de-interleaveBufferSize" parameter is present, then the
919	   applied buffer for de-interleaving in a receiver MUST have a size
920	   that is at least equal to the signaled size of the de-interleave
921	   buffer, else a size that is at least equal to the calculated size
922	   of the de-interleave buffer.

924	   No matter what interleaving scheme is used, the scheme must be
925	   analyzed to calculate the applicable maxDisplacement value, as well
926	   as the required size of the de-interleave buffer. Senders SHOULD
927	   signal values that are not larger than the strictly required
928	   values; if larger values are signalled, the receiver will buffer
929	   excessively.

931	   Note that for low bit-rate material, the applied interleaving
932	   may make packets shorter than the MTU size.

934	3.2.3.5. Crucial and non-crucial AUs with MPEG-4 System data

936	   Some Access Units with MPEG-4 system data, called "crucial" AUs,
937	   carry information whose loss cannot be tolerated, either in the
938	   presentation or in the decoder. At each crucial AU in an MPEG-4
939	   system stream, the stream state changes. The stream-state MAY
940	   remain constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4
941	   system streams use the AU_SequenceNumber to signal stream states.

943	   Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set
944	   position of node X", AU3 = "Set position of node X". AU1 is crucial,
945	   since if it is lost, AU2 cannot be executed. However, AU2 is not
946	   crucial, since AU3 can be executed even if AU2 is lost.

948	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

950	   When a crucial AU is (possibly) lost, the stream is corrupted. For
951	   example, when an AU is lost and the stream state has changed at the
952	   next received AU, then it is possible that the lost AU was crucial.
953	   Once corrupted, the stream remains corrupted until the next random
954	   access point. Note that loss of non-crucial AUs does not corrupt the
955	   stream. When a decoder starts receiving a stream, the decoder MUST
956	   consider the stream corrupted until an AU is received that provides
957	   a random access point.

959	   An AU that provides a random access point, as signaled by the
960	   RAP-flag, may be crucial or not. Non-crucial RAP AUs provide a
961	   "repeated" random access point for use by decoders that recently
962	   joined the stream or that need to re-start decoding after a stream
963	   corruption. Non-crucial RAP AUs MUST include all updates since the
964	   last crucial RAP AU.

966	   Upon receiving AUs, decoders are to react as follows:
967	   a) if the RAP-flag is set to 1 and the stream-state changes, then
968	      the AU is a crucial RAP AU, and the AU MUST be decoded.
969	   b) if the RAP-flag is set to 1 and the stream state does not change,
970	      then the AU is a non-crucial RAP AU, and the receiver SHOULD
971	      decode it if the stream is corrupted. Otherwise, the decoder MUST
972	      ignore the AU.
973	   c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless
974	      the stream is corrupted, in which case the AU MUST be ignored.

976	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

978	3.3 Usage of this specification

980	3.3.1 General

982	   Usage of this specification requires definition of a mode. A mode
983	   defines how to use this specification, as deemed appropriate.
984	   Senders MUST signal the applied mode via the MIME format parameter
985	   "Mode", as specified in section 4.1. This specification defines a
986	   generic mode that can be used for any MPEG-4 stream, as well as
987	   specific modes for transport of MPEG-4 CELP and MPEG-4 AAC streams,
988	   defined in ISO/IEC 14496-3.

990	   When use of this payload format is signaled using SDP [6], an
991	   "rtpmap" attribute is part of that signaling.  The same requirements
992	   apply for the rtpmap attribute in any mode compliant to this
993	   specification. The general form of an rtpmap attribute is:
994	   a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding
995	             parameters>]
996	   For audio streams, <encoding parameters> specifies the number of
997	   audio channels: 2 for stereo material (see RFC 2327) and 1 for
998	   mono. Provided no additional parameters are needed, this parameter
999	   may be omitted for mono material, hence its default value is 1.

1001	3.3.2 The generic mode

1003	   The generic mode can be used for any MPEG-4 stream. In this mode
1004	   no mode-specific constraints are applied; hence, in the generic
1005	   mode the full flexibility of this specification can be exploited.
1006	   The generic mode is signaled by mode=generic.

1008	   An example is given below for transport of a BIFS stream. In this
1009	   example carriage of multiple BIFS Access Units is allowed in one
1010	   RTP packet. The AU-header contains the AU-size field, the CTS-flag
1011	   and, if the CTS flag is set to 1, the CTS-delta field. The number
1012	   of bits of the AU-size and the CTS-delta fields is 10 and 16,
1013	   respectively. The AU-header also contains the RAP-flag and the
1014	   Stream-state of 4 bits. This results in an AU-header with a
1015	   total size of two or four octets per BIFS AU. The RTP time stamp
1016	   uses a 1 kHz clock. Note that the media type name is video,
1017	   because the BIFS stream is part of an audio-visual presentation. For
1018	   conventions on media type names see section 4.1.

1020	   In detail:

1022	   m=video 49230 RTP/AVP 96
1023	   a=rtpmap:96 mpeg4-generic/1000
1024	   a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic;
1025	   ObjectType=2; config=BIFSConfiguration(); SizeLength=10;
1026	   CTSDeltaLength=16; RandomAccessIndication=1;
1027	   StreamStateIndication=4

1029	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1031	   Note: The a=fmtp line has been wrapped to fit the page, it comprises
1032	         a single line in the SDP file.
1033	   BIFSConfiguration() is the hexadecimal string as defined in ISO/IEC
1034	   14496-1; for the description of MIME parameters see section 4.1.

1036	3.3.3 Constant bit-rate CELP

1038	   This mode is signaled by mode=CELP-cbr. In this mode one or more
1039	   complete CELP frames of fixed size can be transported in one RTP
1040	   packet; there is no support for interleaving. The RTP payload
1041	   consists of one or more concatenated CELP frames, each of the same
1042	   size. CELP frames MUST not be fragmented when using this mode. Both
1043	   the AU Header Section and the Auxiliary Section MUST be empty.

1045	   The MIME format parameter ConstantSize MUST be provided to specify
1046	   the length of each CELP frame.

1048	   For example:

1050	   m=audio 49230 RTP/AVP 96
1051	   a=rtpmap:96 mpeg4-generic/44100/2
1052	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config=
1053	   AudioSpecificConfig(); ConstantSize=xxx;

1055	   Note: The a=fmtp line has been wrapped to fit the page, it comprises
1056	         a single line in the SDP file.

1058	   AudioSpecificConfig() is the hexadecimal string as defined in
1059	   ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio
1060	   stream type is CELP. For the description of MIME parameters see
1061	   section 4.1.

1063	3.3.4 Variable bit-rate CELP

1065	   This mode is signaled by mode=CELP-vbr. With this mode one or more
1066	   complete CELP frames of variable size can be transported in one RTP
1067	   packet with optional interleaving. As CELP frames are very small,
1068	   while the largest possible AU-size in this mode is greater than the
1069	   maximum CELP frame size, there is no support for fragmentation of
1070	   CELP frames. Hence CELP frames MUST not be fragmented when using
1071	   this mode.

1073	   In this mode the RTP payload consists of the AU Header Section,
1074	   followed by one or more concatenated CELP frames. The Auxiliary
1075	   Section MUST be empty. For each CELP frame contained in the payload
1076	   there MUST be a one octet AU-header in the AU Header Section to
1077	   provide:
1078	   (a) the size of each CELP frame in the payload and
1079	   (b) index information for computing the sequence (and hence timing)
1080	       of each CELP frame.
1081	   Transport of CELP frames requires that the AU-size field is coded
1082	   with 6 bits. In this mode therefore 6 bits are allocated to the

1084	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1086	   AU-size field, and 2 bits to the AU-Index(-delta) field. Each
1087	   AU-Index field MUST be coded with the value 0. In the AU Header
1088	   Section, the concatenated AU-headers are preceded by the 16-bit
1089	   AU-headers-length field, as specified in section 3.2.1.

1091	   In addition to the required MIME format parameters, the following
1092	   parameters MUST be present: SizeLength, IndexLength, and
1093	   IndexDeltaLength.
1094	   When interleaving is applied (AU-Index-delta coded with a value
1095	   larger than 0), the parameter InterleaveDelay MUST also be present.

1097	   For example:

1099	   m=audio 49230 RTP/AVP 96
1100	   a=rtpmap:96 mpeg4-generic/44100/2
1101	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config=
1102	   AudioSpecificConfig(); SizeLength=6; IndexLength=2;
1103	   IndexDeltaLength=2

1105	   Note: The a=fmtp line has been wrapped to fit the page, it comprises
1106	         a single line in the SDP file.

1108	   AudioSpecificConfig() is the hexadecimal string as defined in
1109	   ISO/IEC 14496-3, AudioSpecificConfig() specifies that the audio
1110	   stream type is CELP. For the description of MIME parameters see
1111	   section 4.1.

1113	3.3.5 Low bit-rate AAC

1115	   This mode is signaled by mode=AAC-lbr. This mode supports transport
1116	   of one or more complete AAC frames of variable size. In this mode
1117	   the AAC frames are allowed to be interleaved and hence receivers
1118	   MUST support de-interleaving. The maximum size of an AAC frame in
1119	   this mode is 63 octets. CELP frames MUST not be fragmented when
1120	   using this mode.

1122	   The payload configuration in this mode is the same as in the
1123	   variable bit-rate CELP mode as defined in 3.3.4. The RTP payload
1124	   consists of the AU Header Section, followed by concatenated AAC
1125	   frames. The Auxiliary Section MUST be empty. For each AAC frame
1126	   contained in the payload the one octet AU-header MUST provide:
1127	   (a) the size of each AAC frame in the payload and
1128	   (b) index information for computing the sequence (and hence timing)
1129	       of each AAC frame.
1130	   In the AU-header, the AU-size MUST be coded with 6 bits and the
1131	   AU-Index(-delta) with 2 bits; the AU-Index field MUST have the
1132	   value 0 in each AU-header.
1133	   In the AU-header Section, the concatenated AU-headers MUST be
1134	   preceded by the 16-bit AU-headers-length field, as specified in
1135	   section 3.2.1.

1137	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1139	   In addition to the required MIME format parameters, the following
1140	   parameters MUST be present: SizeLength, IndexLength, and
1141	   IndexDeltaLength.
1142	   When interleaving is applied (AU-Index-delta coded with a value
1143	   larger than 0), also the parameter InterleaveDelay MUST be present.

1145	   For example:

1147	   m=audio 49230 RTP/AVP 96
1148	   a=rtpmap:96 mpeg4-generic/44100/2
1149	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config=
1150	   AudioSpecificConfig(); SizeLength=6; IndexLength=2;
1151	   IndexDeltaLength=2

1153	   Note: The a=fmtp line has been wrapped to fit the page, it comprises
1154	         a single line in the SDP file.

1156	   AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC
1157	   14496-3. AudioSpecificConfig() specifies that the audio
1158	   stream type is AAC. For the description of MIME parameters see
1159	   section 4.1.

1161	3.3.6 High bit-rate AAC

1163	   This mode is signaled by mode=AAC-hbr. This mode supports transport
1164	   of variable size AAC frames. In one RTP packet either one or more
1165	   complete AAC frames are carried, or a single fragment of an AAC
1166	   frame. In this mode the AAC frames are allowed to be interleaved
1167	   and hence receivers MUST support de-interleaving. The maximum size
1168	   of an AAC frame in this mode is 8191 octets.

1170	   In this mode the RTP payload consists of the AU Header Section,
1171	   followed by either one AAC frame, several concatenated AAC frames
1172	   or one fragmented AAC frame. The Auxiliary Section MUST be empty.
1173	   For each AAC frame contained in the payload there MUST be an
1174	   AU-header in the AU Header Section to provide:
1175	   (a) the size of each AAC frame in the payload and
1176	   (b) index information for computing the sequence (and hence timing)
1177	       of each AAC frame.

1179	   To code the maximum size of an AAC frame requires 13 bits. Therefore
1180	   in this configuration 13 bits are allocated to the AU-size, and
1181	   3 bits to the AU-Index(-delta) field. Thus each AU-header has a size
1182	   of 2 octets. Each AU-Index field MUST be coded with the value 0. In
1183	   the AU Header Section, the concatenated AU-headers MUST be preceded
1184	   by the 16-bit AU-headers-length field, as specified in section 3.2.1.

1186	   In addition to the required MIME format parameters, the following
1187	   parameters MUST be present: SizeLength, IndexLength, and
1188	   IndexDeltaLength.
1189	   When interleaving is applied (AU-Index-delta coded with a value
1190	   larger than 0), also the parameter InterleaveDelay MUST be present.

1192	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1194	   For example:

1196	   m=audio 49230 RTP/AVP 96
1197	   a=rtpmap:96 mpeg4-generic/44100/2
1198	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr;
1199	   config=AudioSpecificConfig(); SizeLength=13; IndexLength=3;
1200	   IndexDeltaLength=3

1202	   Note: The a=fmtp line has been wrapped to fit the page, it comprises
1203	         a single line in the SDP file.

1205	   AudioSpecificConfig() is the hexadecimal string as defined in
1206	   ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio
1207	   stream type is AAC. For the description of MIME parameters see
1208	   section 4.1.

1210	3.3.7 Additional modes

1212	   This specification only defines the modes specified in sections
1213	   3.3.2 up to 3.3.6. Additional modes are expected to be defined in
1214	   future RFCs. Each additional mode MUST be in full compliance with
1215	   this specification.

1217	   Any new mode MUST be defined such that an implementation including
1218	   all the features of this specification can decode the payload format
1219	   corresponding to this new mode. For this reason a mode MUST NOT
1220	   specify new default values for MIME parameters. In particular, MIME
1221	   parameters that configure the RTP payload MUST be present (unless
1222	   they have the default value), even if its presence is redundant in
1223	   case the mode assigns a fixed value to a parameter. A mode may
1224	   define additionally that some MIME parameters are required instead
1225	   of optional, that some MIME parameters have fixed values (or
1226	   ranges), and that there are rules restricting the usage.

1228	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1230	4. IANA considerations

1232	   This section describes the MIME types and names associated with
1233	   this payload format. Section 4.1 registers the MIME types, as per
1234	   RFC 2048.

1236	   This format may require additional information about the mapping to
1237	   be made available to the receiver. This is done using parameters
1238	   also described in the next section.

1240	4.1 MIME type registration

1242	   MIME media type name: "video" or "audio" or "application"

1244	   "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2)
1245	   or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information
1246	   needed for an audio/visual presentation.

1248	   "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3)
1249	   or MPEG-4 Systems streams that convey information needed for an
1250	   audio only presentation.

1252	   "application" MUST be used for MPEG-4 Systems streams (ISO/IEC
1253	   14496-1) that serve purposes other than audio/visual presentation,
1254	   e.g. in some cases when MPEG-J (Java) streams are transmitted.

1256	   Depending on the required payload configuration, MIME format
1257	   parameters need to be available to the receiver. This is done using
1258	   the parameters described in the next section. There are required
1259	   and optional parameters.

1261	   Optional parameters are of two types: general parameters and
1262	   configuration parameters. The configuration parameters are used to
1263	   configure the fields in the AU Header section and in the auxiliary
1264	   section. The absence of any configuration parameter is equivalent to
1265	   the associated field set to its default value, which is always zero.
1266	   The absence of all configuration parameters resolves into a default
1267	   "basic" configuration with an empty AU-header section and an empty
1268	   auxiliary section in each RTP packet.

1270	   MIME subtype name: mpeg4-generic

1272	   Required parameters:

1274	   MIME format parameters are not case dependent; however for clarity
1275	   both upper and lower case are used in the names of the parameters
1276	   described in this specification.

1278	      StreamType:
1279	      The integer value that indicates the type of MPEG-4 stream that
1280	      is carried; its coding corresponds to the values of the
1281	      streamType as defined in Table 9 (streamType Values) in ISO/IEC
1282	      14496-1.

1284	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1286	      Profile-level-id:
1287	      A decimal representation of the MPEG-4 Profile Level indication.
1288	      This parameter MUST be used in the capability exchange or
1289	      session set-up procedure to indicate the MPEG-4 Profile and Level
1290	      combination of which the relevant MPEG-4 media codec is capable
1291	      of.
1292	      For MPEG-4 Audio streams, this parameter is the decimal value
1293	         from Table 5 (audioProfileLevelIndication Values) in ISO/IEC
1294	         14496-1, indicating which MPEG-4 Audio tool subsets are
1295	         required to decode the audio stream.
1296	      For MPEG-4 Visual streams, this parameter is the decimal value
1297	         from Table G-1 (FLC table for profile and level indication) of
1298	         ISO/IEC 14496-2, indicating which MPEG-4 Visual tool subsets
1299	         are required to decode the visual stream.
1300	      For BIFS streams, this parameter is the decimal value that is
1301	         obtained from (SPLI + 256*GPLI), where:
1302	         SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with
1303	            the applied sceneProfileLevelIndication;
1304	         GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with
1305	            the applied graphicsProfileLevelIndication.
1306	      For MPEG-J streams, this parameter is the decimal value from
1307	         table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1,
1308	         indicating the profile and level of the MPEG-J stream.
1309	      For OD streams, this parameter is the decimal value from table 3
1310	         (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the
1311	         profile and level of the OD stream.
1312	      For IPMP streams, this parameter has either the decimal value 0,
1313	         indicating an unspecified profile and level, or a value larger
1314	         than zero, indicating an MPEG-4 IPMP profile and level as
1315	         defined in a future MPEG-4 specification.
1316	      For Clock Reference streams and Object Content Info streams, this
1317	         parameter has the decimal value zero, indicating that profile
1318	         and level information is conveyed through the OD framework.

1320	      Config:
1321	      A hexadecimal representation of an octet string that expresses
1322	      the media payload configuration. Configuration data is mapped
1323	      onto the hexadecimal octet string in an MSB-first basis. The
1324	      first bit of the configuration data SHALL be located at the MSB
1325	      of the first octet. In the last octet, if necessary to achieve
1326	      octet-alignment, up to 7 zero-valued padding bits shall follow
1327	      the configuration data.
1328	      For MPEG-4 Audio streams, config is the audio object type
1329	         specific decoder configuration data AudioSpecificConfig() as
1330	         defined in ISO/IEC 14496-3. For Structured Audio, the
1331	         AudioSpecificConfig() may be conveyed by other means, not
1332	         defined by this specification. If the AudioSpecificConfig()
1333	         is conveyed by other means for Structured Audio, then the
1334	         config MUST be a quoted empty hexadecimal octet string, as
1335	         follows: config="".
1336	         Note that a future mode of using this RTP payload format for
1337	         Structured Audio may define such other means.

1339	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1341	      For MPEG-4 Visual streams, config is the MPEG-4 Visual
1342	         configuration information as defined in subclause 6.2.1 Start
1343	         codes of ISO/IEC 14496-2. The configuration information
1344	         indicated by this parameter SHALL be the same as the
1345	         configuration information in the corresponding MPEG-4 Visual
1346	         stream, except for first-half-vbv-occupancy and
1347	         latter-half-vbv-occupancy, if it exists, which may vary in
1348	         the repeated configuration information inside an MPEG-4
1349	         Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2).
1350	      For BIFS streams, this is the BIFSConfig() information as defined
1351	         in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in
1352	         section 9.3.5.2, and for version 2 in section 9.3.5.3. The
1353	         MIME format parameter ObjectType signals the version of
1354	         BIFSConfig.
1355	      For IPMP streams, this is either a quoted empty hexadecimal octet
1356	         string, indicating the absence of any decoder configuration
1357	         information (config=""), or the IPMPConfiguration() as
1358	         defined in a future MPEG-4 IPMP specification.
1359	      For Object Content Info (OCI) streams, this is the
1360	         OCIDecoderConfiguration() information of the OCI stream, as
1361	         defined in section 8.4.2.4 in ISO/IEC 14496-1.
1362	      For OD streams, Clock Reference streams and MPEG-J streams, this
1363	         is a quoted empty hexadecimal octet string (config=""), as
1364	         no information on the decoder configuration is required.

1366	      Mode:
1367	      The mode in which this specification is used. The following modes
1368	      can be signaled:
1369	      mode=generic,
1370	      mode=CELP-cbr,
1371	      mode=CELP-vbr,
1372	      mode=AAC-lbr and
1373	      mode=AAC-hbr.
1374	      Other modes are expected to be defined in future RFCs. See also
1375	      section 3.3.7 and 4.2 of RFC xxxx.

1377	   Optional general parameters:

1379	      ObjectType:
1380	      The decimal value from Table 8 in ISO/IEC 14496-1, indicating
1381	      the value of the objectTypeIndication of the transported stream.
1382	      For BIFS streams this parameter MUST be present to signal the
1383	      version of BIFSConfiguration(). Note that ObjectTypeIndication
1384	      may signal a non-MPEG-4 stream and that the RTP payload format
1385	      defined in this document may not be suitable to carry a stream
1386	      that is not defined by MPEG-4. ObjectType SHOULD NOT be set to
1387	      a value that signals a stream that cannot be carried by this
1388	      payload format.

1390	      ConstantSize:
1391	      The constant size in octets of each Access Unit for this stream.
1392	      The ConstantSize and the SizeLength parameters MUST NOT be
1393	      simultaneously present.

1395	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1397	      maxDisplacement:
1398	      The decimal representation of the maximum displacement in time
1399	      of an interleaved AU, as defined in section 3.2.3.3, expressed
1400	      in units of the RTP time stamp clock.
1401	      This parameter MUST be present when interleaving is applied.

1403	      de-interleaveBufferSize:
1404	      The decimal representation in number of octets of the size of
1405	      the de-interleave buffer, described in section 3.2.3.3.
1406	      When interleaving, this parameter MUST be present if the
1407	      calculation of the de-interleave buffer size given in 3.2.3.3
1408	      and based on maxDisplacement and rate(max) under-estimates the
1409	      size of the de-interleave buffer. If this calculation does not
1410	      under-estimate the size of the de-interleave buffer, then the
1411	      de-interleaveBufferSize parameter SHOULD NOT be present.

1413	   Optional configuration parameters:

1415	      SizeLength:
1416	      The number of bits on which the AU-size field is encoded in the
1417	      AU-header. The SizeLength and the ConstantSize parameters MUST
1418	      NOT be simultaneously present.

1420	      IndexLength:
1421	      The number of bits on which the AU-Index is encoded in the first
1422	      AU-header. The default value of zero indicates the absence of
1423	      the AU-Index and AU-Index-delta fields in each AU-header.

1425	      IndexDeltaLength:
1426	      The number of bits on which the AU-Index-delta field is encoded
1427	      in any non-first AU-header.

1429	      CTSDeltaLength:
1430	      The number of bits on which the CTS-delta field is encoded in
1431	      the AU-header.

1433	      DTSDeltaLength:
1434	      The number of bits on which the DTS-delta field is encoded in
1435	      the AU-header.

1437	      RandomAccessIndication:
1438	      A decimal value of zero or one, indicating whether the RAP-flag
1439	      is present in the AU-header. The decimal value of one indicates
1440	      presence of the RAP-flag, the default value zero its absence.

1442	      StreamStateIndication:
1443	      The number of bits on which the Stream-state field is encoded in
1444	      the AU-header. This parameter MAY be present when transporting
1445	      MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio
1446	      and MPEG-4 video streams.

1448	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1450	      AuxiliaryDataSizeLength:
1451	      The number of bits that is used to encode the auxiliary-data-size
1452	      field.

1454	   Applications MAY use more parameters, in addition to those defined
1455	   above. Each additional parameter MUST be registered with IANA, to
1456	   ensure that there is no clash of names. Each additional parameter
1457	   MUST be accompanied by a specification in the form of an RFC, MPEG
1458	   standard, or other permanent and readily available reference (the
1459	   "Specification Required" policy defined in RFC 2434). Receivers MUST
1460	   tolerate the presence of such additional parameters, but these
1461	   parameters SHALL NOT impact the decoding of receivers that comply to
1462	   this specification.

1464	   Encoding considerations:
1465	   This MIME subtype is defined for RTP transport only. System
1466	   bitstreams MUST be generated according to MPEG-4 Systems
1467	   specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated
1468	   according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio
1469	   bitstreams MUST be generated according to MPEG-4 Audio
1470	   specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized
1471	   according to the RTP payload format defined in RFC xxxx.

1473	   Security considerations:
1474	   As defined in section 5 of RFC xxxx.

1476	   Interoperability considerations:
1477	   MPEG-4 provides a large and rich set of tools for the coding of
1478	   visual objects.  For effective implementation of the standard,
1479	   subsets of the MPEG-4 tool sets have been provided for use in
1480	   specific applications. These subsets, called 'Profiles', limit the
1481	   size of the tool set a decoder is required to implement. In order to
1482	   restrict computational complexity, one or more 'Levels' are set for
1483	   each Profile. A Profile@Level combination allows:
1484	   . a codec builder to implement only the subset of the standard he
1485	     needs, while maintaining interworking with other MPEG-4 devices
1486	     that implement the same combination, and
1487	   . checking whether MPEG-4 devices comply with the standard
1488	     ('conformance testing').

1490	   A stream SHALL be compliant with the MPEG-4 Profile@Level specified
1491	   by the parameter "profile-level-id". Interoperability between a
1492	   sender and a receiver is achieved by specifying the parameter
1493	   "profile-level-id" in MIME content. In the capability exchange /
1494	   announcement procedure this parameter may mutually be set to the
1495	   same value.

1497	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1499	   Published specification:
1500	   The specifications for MPEG-4 streams are presented in ISO/IEC
1501	   14496-1, 14496-2, and 14496-3. The RTP payload format is described
1502	   in RFC xxxx.

1504	   Applications which use this media type:
1505	   Multimedia streaming and conferencing tools.

1507	   Additional information: none

1509	   Magic number(s): none

1511	   File extension(s):
1512	   None. A file format with the extension .mp4 has been defined for
1513	   MPEG-4 content but is not directly correlated with this MIME type
1514	   for which the sole purpose is RTP transport.

1516	   Macintosh File Type Code(s): none

1518	   Person & email address to contact for further information:
1519	   Authors of RFC xxxx, IETF Audio/Video Transport working group.

1521	   Intended usage: COMMON

1523	   Author/Change controller:
1524	   Authors of RFC xxxx, IETF Audio/Video Transport working group.

1526	4.2 Registration of mode definitions with IANA

1528	   This specification can be used in a number of modes. The mode of
1529	   operation is signaled using the "Mode" MIME parameter, with the
1530	   initial set of values specified in section 4.1. New modes may be
1531	   defined at any time, as described in section 3.3.7. These modes
1532	   MUST be registered with IANA, to ensure that there is no clash
1533	   of names.

1535	   A new mode registration MUST be accompanied by a specification in
1536	   the form of an RFC, MPEG standard, or other permanent and readily
1537	   available reference (the "Specification Required" policy defined
1538	   in RFC 2434).

1540	4.3 Concatenation of parameters

1542	   Multiple parameters SHOULD be expressed as a MIME media type string,
1543	   in the form of a semicolon-separated list of parameter=value pairs
1544	   (for parameter usage examples see sections 3.3.2 up to 3.3.6).

1546	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1548	4.4 Usage of SDP

1550	4.4.1 The a=fmtp keyword

1552	   It is assumed that one typical way to transport the above-described
1553	   parameters associated with this payload format is via a SDP message
1554	   [6] for example transported to the client in reply to a RTSP
1555	   DESCRIBE [8] or via SAP [7]. In that case the (a=fmtp) keyword MUST
1556	   be used as described in RFC 2327 [6], section 6, the syntax being
1557	   then:

1559	   a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>]

1561	5. Security Considerations

1563	   RTP packets using the payload format defined in this specification
1564	   are subject to the security considerations discussed in the RTP
1565	   specification [2]. This implies that confidentiality of the media
1566	   streams is achieved by encryption. Because the data compression used
1567	   with this payload format is applied end-to-end, encryption may be
1568	   performed on the compressed data so there is no conflict between the
1569	   two operations. The packet processing complexity of this payload
1570	   type (i.e. excluding media data processing) does not exhibit any
1571	   significant non-uniformity in the receiver side to cause a denial-
1572	   of-service threat.

1574	   However, it is possible to inject non-compliant MPEG streams (Audio,
1575	   Video, and Systems) to overload the receiver/decoder's buffers,
1576	   which might compromise the functionality of the receiver or even
1577	   crash it. This is especially true for end-to-end systems like MPEG
1578	   where the buffer models are precisely defined.

1580	   MPEG-4 Systems supports stream types including commands that are
1581	   executed on the terminal like OD commands, BIFS commands, etc. and
1582	   programmatic content like MPEG-J (Java(TM) Byte Code) and
1583	   ECMAScript. It is possible to use one or more of the above in a
1584	   manner non-compliant to MPEG to crash the receiver or make it
1585	   temporarily unavailable. Senders that transport MPEG-4 content
1586	   SHOULD ensure that such content is MPEG compliant, as defined in the
1587	   compliance part of IEC/ISO 14496 [1]. Receivers that support MPEG-4
1588	   content should prevent malfunctioning of the receiver in case of
1589	   non MPEG compliant content.

1591	   Authentication mechanisms can be used to validate the sender and
1592	   the data to prevent security problems due to non-compliant malignant
1593	   MPEG-4 streams.

1595	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1597	   In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems
1598	   streams carrying MPEG-J access units which comprise Java(TM) classes
1599	   and objects. MPEG-J defines a set of Java APIs and a secure
1600	   execution model. MPEG-J content can call this set of APIs and
1601	   Java(TM) methods from a set of Java packages supported in the
1602	   receiver within the defined security model. According to this
1603	   security model, downloaded byte code is forbidden to load libraries,
1604	   define native methods, start programs, read or write files, or read
1605	   system properties.
1606	   Receivers can implement intelligent filters to validate the buffer
1607	   requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J,
1608	   ECMAScript) commands in the streams. However, this can increase the
1609	   complexity significantly.

1611	6. Acknowledgements

1613	   This document evolved through several revisions thanks to
1614	   contributions by people from the ISMA forum, from the IETF AVT
1615	   Working Group and from the 4-on-IP ad-hoc group within MPEG. The
1616	   authors wish to thank all involved people, and in particular Andrea
1617	   Basso, Stephen Casner, M. Reha Civanlar, Carsten Herpel, John
1618	   Lazaro, Zvi Lifshitz, Young-kwon Lim, Alex MacAulay, Bill May,
1619	   Colin Perkins, Dorairaj V and Stephan Wenger for their valuable
1620	   comments and support.

1622	7. References

1624	   [1] ISO/IEC International Standard 14496 (MPEG-4); "Information
1625	   technology - Coding of audio-visual objects", January 2000

1627	   [2] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson RTP, "A
1628	   Transport Protocol for Real Time Applications", RFC 1889, Internet
1629	   Engineering Task Force, January 1996.

1631	   [3] S. Bradner, "Key words for use in RFCs to Indicate Requirement
1632	   Levels", RFC 2119, March 1997.

1634	   [4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload
1635	   format for MPEG1/MPEG2 Video", RFC 2250, January 1998.

1637	   [5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP
1638	   payload format for MPEG-4 Audio/Visual streams", RFC 3016.

1640	   [6] M. Handley, V. Jacobson, "SDP: Session Description Protocol",
1641	   RFC 2327, Internet Engineering Task Force, April 1998.

1643	   [7] M. Handley, C. Perkins, E. Whelan, "SAP: Session Announcement
1644	   Protocol", RFC 2974, Internet Engineering Task Force, October 2000.

1646	   [8] H. Schulzrinne, A. Rao, R. Lanphier, "RTSP: Real-Time Session
1647	   Protocol", RFC 2326, Internet Engineering Task Force, April 1998.

1649	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1651	8. Author Addresses

1653	   Jan van der Meer
1654	   Philips Digital Networks
1655	   Cederlaan 4
1656	   5600 JB Eindhoven
1657	   Netherlands
1658	   Email : jan.vandermeer@philips.com

1660	   David Mackie
1661	   Apple Computer, Inc.
1662	   One Infinite Loop, MS:302-2LF
1663	   Cupertino  CA 95014
1664	   Email: dmackie@apple.com

1666	   Viswanathan Swaminathan
1667	   Sun Microsystems Inc.
1668	   901 San Antonio Road, M/S UMPK15-214
1669	   Palo Alto, CA 94303
1670	   Email: viswanathan.swaminathan@sun.com

1672	   David Singer
1673	   Apple Computer, Inc.
1674	   One Infinite Loop, MS:302-3MT
1675	   Cupertino  CA 95014
1676	   Email: singer@apple.com

1678	   Philippe Gentric
1679	   Philips Digital Networks, MP4Net
1680	   51 rue Carnot
1681	   92156 Suresnes
1682	   France
1683	   e-mail: philippe.gentric@philips.com

1685	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1687	   Full Copyright Statement

1689	   Copyright (C) The Internet Society (December 2002). All Rights
1690	   Reserved.

1692	   This document and translations of it may be copied and furnished to
1693	   others, and derivative works that comment on or otherwise explain
1694	   it or assist in its implementation may be prepared, copied,
1695	   published and distributed, in whole or in part, without restriction
1696	   of any kind, provided that the above copyright notice and this
1697	   paragraph are included on all such copies and derivative works.
1698	   However, this document itself may not be modified in any way, such
1699	   as by removing the copyright notice or references to the Internet
1700	   Society or other Internet organizations, except as needed for the
1701	   purpose of developing Internet standards in which case the
1702	   procedures for copyrights defined in the Internet Standards process
1703	   MUST be followed, or as required to translate it into languages
1704	   other than English.

1706	   The limited permissions granted above are perpetual and will
1707	   not be revoked by the Internet Society or its successors or
1708	   assigns.

1710	   This document and the information contained herein is provided on
1711	   an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
1712	   ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
1713	   IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
1714	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1715	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1717	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1719	APPENDIX: Usage of this payload format

1721	Appendix A. Interleave analysis

1723	A.1 Introduction

1725	   In this appendix interleaving issues are discussed. Some general
1726	   notes are provided on de-interleaving and error concealment, while
1727	   a number of interleaving patterns are examined, in particular
1728	   for determining the maximum displacement in time and the size of
1729	   the de-interleave buffer. In these examples, the maximum
1730	   displacement is cited in terms of an access unit count, for ease of
1731	   reading. In actual streams, it is signalled in units of the RTP
1732	   time stamp clock.

1734	A.2 De-interleaving and error concealment

1736	   This appendix does not describe any details on de-interleaving and
1737	   error concealment, as the control of the AU decoding and error
1738	   concealment process has little to do with interleaving. If the
1739	   next AU to be decoded is present and there is sufficient storage
1740	   available for the decoded AU, then decode it now. If not, wait.
1741	   When the decoding deadline is reached (i.e., the time when decoding
1742	   must begin in order to be completed by the time the AU is to be
1743	   presented), or if the decoder is some hardware that presents a
1744	   constant delay between initiation of decoding of an AU and
1745	   presentation of that AU, then decoding must begin at that deadline
1746	   time.

1748	   If the next AU to be decoded is not present when the decoding
1749	   deadline is reached, then that AU is lost so the receiver must take
1750	   whatever error concealment measures is deemed appropriate. The
1751	   playout delay may need to be adjusted at that point (especially if
1752	   other AUs have also missed their deadline recently).  Or, if it
1753	   was a momentary delay, and maintaining the latency is important,
1754	   then the receiver should minimize the glitch and continue processing
1755	   with the next AU.

1757	A.3 Simple Group interleave

1759	A.3.1 Introduction

1761	   An example of regular interleave is when packets are formed into
1762	   groups. If the 'stride' of the interleave (the distance between
1763	   interleaved AUs) is N, packet 0 could contain AU(0), AU(N), AU(2N),
1764	   and so on; packet 1 could contain AU(1), AU(1+N), AU(1+2N), and so
1765	   on. If there are M access units in a packet, then there are M*N
1766	   access units in the group.

1768	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1770	  An example with N=M=3 follows; note that this is the same example
1771	   as given in section 2.5:

1773	   Packet   Time stamp   Carried AUs      AU-Index, AU-Index-delta
1774	   P(0)     T[0]         0, 3, 6          0, 2, 2
1775	   P(1)     T[1]         1, 4, 7          0, 2, 2
1776	   P(2)     T[2]         2, 5, 8          0, 2, 2
1777	   P(3)     T[9]         9,12,15          0, 2, 2

1779	   In the above example the AU-Index is coded with the value 0, as
1780	   required for the modes defined in this document. The position of
1781	   the first AU of each packet within the group is defined by the RTP
1782	   time stamp, while the AU-Index-delta field indicates the position
1783	   of subsequent AUs relative to the first AU in the packet. All
1784	   AU-Index-delta fields are coded with the value N-1, equal to 2 in
1785	   this example. Hence the RTP time stamp and the AU-Index-delta are
1786	   used to reconstruct the original order. See also section 3.2.3.2.

1788	A.3.2 Determining the de-interleave buffer size

1790	   For the regular pattern as in this example, figure 6 in section
1791	   3.2.3.3 shows that the de-interleave buffer size is equal to 4 AU
1792	   sizes.

1794	A.3.3 Determining the maximum displacement

1796	   For the regular pattern as in this example, figure 7 in section 3.3
1797	   shows that the value of the maxDisplacement equals 5 AU periods.

1799	A.4 More subtle group interleave

1801	A.4.1 Introduction

1803	   Another example of forming packets with group interleave is given
1804	   below. In this example the packets are formed such that the loss of
1805	   two subsequent RTP packets does not cause the loss of two subsequent
1806	   AUs. Note that in this example the RTP time stamps of packet 3 and
1807	   packet 4 are earlier than the RTP time stamps of packets 1 and 2,
1808	   respectively.

1810	   Packet   Time stamp   Carried AUs      AU-Index, AU-Index-delta
1811	   0        T[0]         0,  5            0, 5
1812	   1        T[2]         2,  7            0, 5
1813	   2        T[4]         4,  9            0, 5
1814	   3        T[1]         1,  6            0, 5
1815	   4        T[3]         3,  8            0, 5

1817	   5        T[10]       10, 15            0, 5
1818	   and so on ..

1820	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1822	   In this example the AU-Index is coded with the value 0, as required
1823	   for the modes defined in this document. To reconstruct the original
1824	   order, the RTP time stamp and the AU-Index-delta (coded with the
1825	   value 5) are used. See also section 3.2.3.2.

1827	A.4.2 Determining the de-interleave buffer size

1829	   From figure 8 it can be to determined that at most 5 "early" AUs
1830	   are to be stored. If the AUs are of constant size, then this value
1831	   equals 5 times the AU size.

1833	                              +--+--+--+--+--+--+--+--+--+--+
1834	   Interleaved AUs            | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
1835	                              +--+--+--+--+--+--+--+--+--+--+
1836	                                -  -  5  -  5  -  2  7  4  9
1837	                                            7     4  9  5
1838	   Received "early" AUs                           5     6
1839	                                                  7     7
1840	                                                  9     9

1842	   Figure 8: Storage of "early" AUs in the de-interleave buffer per
1843	             interleaved AU.

1845	A.4.2 Determining the maximum displacement

1847	   From figure 9 it can be seen that max-interleaveDisplacement has
1848	   a value of 8 AU periods.

1850	                                    +--+--+--+--+--+--+--+--+--+--+
1851	   Interleaved AUs                  | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
1852	                                    +--+--+--+--+--+--+--+--+--+--+

1854	   Earliest not yet received AU       -  1  1  1  1  1  -  3  -  -

1856	   Figure 9: The earliest not yet received AU for each AU in the
1857	             interleaving pattern.

1859	A.5 Continuous interleave

1861	A.5.1 Introduction

1863	   In continuous interleave, once the scheme is 'primed', the number
1864	   of AUs in a packet exceeds the 'stride' (the distance between
1865	   them). This shortens the buffering needed, smooths the data-flow,
1866	   and gives slightly larger packets -- and thus lower overhead -- for
1867	   the same interleave. For example, here is a continuous interleave
1868	   also over a stride of 3 AUs, but with 4 AUs per packet, for a run
1869	   of 20 AUs. This shows both how the scheme 'starts up' and how it
1870	   finishes.

1872	RFC xxxx        Transport of MPEG-4 Elementary Streams    December 2002

1874	  Packet   Time-stamp   Carried AUs         AU-Index, AU-Index-delta
1875	   0        T[0]                      0      0
1876	   1        T[1]                  1   4      0  2
1877	   2        T[2]              2   5   8      0  2  2
1878	   3        T[3]          3   6   9  12      0  2  2  2
1879	   4        T[7]          7  10  13  16      0  2  2  2
1880	   5        T[11]        11  14  17  20      0  2  2  2
1881	   6        T[15]        15  18              0  2
1882	   7        T[19]        19                  0

1884	   Also in this example the AU-Index is coded with the value 0, as
1885	   required for the modes defined in this document. To reconstruct the
1886	   original order, the RTP time stamp and the AU-Index-delta (coded
1887	   with the value 2) are used. See also 3.2.3.2.  Note that this
1888	   example has RTP time-stamps in increasing order.

1890	A.5.2 Determining the de-interleave buffer size

1892	   For this example the de-interleave buffer size can be derived from
1893	   figure 10. The maximum number of "early" AUs is three. If the AUs
1894	   are of constant size, then this value equals 3 times the AU size.
1895	   Compared to the example in A.2, for constant size AUs the
1896	   de-interleave buffer size is reduced from 4 to 3 times the AU size,
1897	   while maintaining the same 'stride'.

1899	                        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
1900	   Interleaved AUs      | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16|
1901	                        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
1902	                          -  -  -  4  -  -  4  8  -  -  8 12  -  -
1903	                                            5           9
1904	   Received "early" AUs                     8          12

1906	   Figure 10: Storage of "early" AUs in the de-interleave buffer per
1907	              interleaved AU.

1909	A.5.3 Determining the maximum displacement

1911	   For this example the maxDisplacement has a value of 5 AU periods.
1912	   See figure 11.

1914	                        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
1915	   Interleaved AUs      | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16|
1916	                        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
1917	   Earliest not yet
1918	        received AU       -  -  2  -  3  3  -  -  7  7  -  - 11 11

1920	   Figure 11: The earliest not yet received AU for each AU in the
1921	              interleaving pattern.