idnits 2.17.1 

draft-ietf-avt-mpeg4-simple-08.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 39
     longer pages, the longest (page 34) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 41 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The "Author's Address" (or "Authors' Addresses") section title is
     misspelled.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 2004) is 7368 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0' is mentioned on line 1988, but not defined

  == Missing Reference: '15' is mentioned on line 1994, but not defined

  == Missing Reference: '19' is mentioned on line 1995, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550)

  ** Obsolete normative reference: RFC 2048 (ref. '3') (Obsoleted by RFC
     4288, RFC 4289)

  ** Obsolete normative reference: RFC 2327 (ref. '5') (Obsoleted by RFC 4566)

  ** Obsolete normative reference: RFC 2434 (ref. '6') (Obsoleted by RFC 5226)

  -- Obsolete informational reference (is this intentional?): RFC 2326 (ref.
     '8') (Obsoleted by RFC 7826)

  -- Obsolete informational reference (is this intentional?): RFC 2733 (ref.
     '10') (Obsoleted by RFC 5109)

  -- Obsolete informational reference (is this intentional?): RFC 3016 (ref.
     '12') (Obsoleted by RFC 6416)


     Summary: 6 errors (**), 0 flaws (~~), 7 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                         J. van der Meer
2	Internet Draft                                      Philips Electronics
3	                                                              D. Mackie
4	                                                         Apple Computer
5	                                                         V. Swaminathan
6	                                                  Sun Microsystems Inc.
7	                                                              D. Singer
8	                                                         Apple Computer
9	                                                             P. Gentric
10	                                                    Philips Electronics

12	                                                            August 2003
13	                                                  Expires February 2004

15	   Document: draft-ietf-avt-mpeg4-simple-08.txt

17	   RTP Payload Format for Transport of MPEG-4 Elementary Streams

19	Status of this Memo

21	   This document is an Internet-Draft and is in full conformance with
22	   all provisions of section 10 of RFC 2026.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF), its areas, and its working groups.  Note that
26	   other groups may also distribute working documents as Internet-
27	   Drafts.  Internet-Drafts are draft documents valid for a maximum of
28	   six months and may be updated, replaced, or obsoleted by other
29	   documents at any time.  It is inappropriate to use Internet- Drafts
30	   as reference material or to cite them other than as "work in
31	   progress."

33	   The list of current Internet-Drafts can be accessed at
34	   http://www.ietf.org/ietf/1id-abstracts.txt
35	   The list of Internet-Draft Shadow Directories can be accessed at
36	   http://www.ietf.org/shadow.html.

38	   This specification is a product of the Audio/Video Transport working
39	   group within the Internet Engineering Task Force.  Comments are
40	   solicited and should be addressed to the working group's mailing
41	   list at avt@ietf.org and/or the authors.

43	   << Note for the RFC editor: xxxx should be replaced with the RFC
44	   number that will be assigned. >>

46	Abstract

48	   The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in
49	   ISO that produced the MPEG-4 standard.  MPEG defines tools to
50	   compress content such as audio-visual information into elementary
51	   streams.  This specification defines a simple, but generic RTP
52	   payload format for transport of any non-multiplexed MPEG-4
53	   elementary stream.

55	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

57	Table of Contents

59	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . .   4
60	   2.  Carriage of MPEG-4 elementary streams over RTP . . . . . . .   6
61	   2.1.  Signaling by MIME format parameters  . . . . . . . . . . .   6
62	   2.2.  MPEG Access Units  . . . . . . . . . . . . . . . . . . . .   6
63	   2.3.  Concatenation of Access Units  . . . . . . . . . . . . . .   6
64	   2.4.  Fragmentation of Access Units  . . . . . . . . . . . . . .   7
65	   2.5.  Interleaving . . . . . . . . . . . . . . . . . . . . . . .   7
66	   2.6.  Time stamp information . . . . . . . . . . . . . . . . . .   8
67	   2.7.  State indication of MPEG-4 system streams  . . . . . . . .   8
68	   2.8.  Random Access Indication . . . . . . . . . . . . . . . . .   8
69	   2.9.  Carriage of auxiliary information  . . . . . . . . . . . .   9
70	   2.10. MIME format parameters and configuring conditional field .   9
71	   2.11. Global structure of payload format . . . . . . . . . . . .   9
72	   2.12. Modes to transport MPEG-4 streams  . . . . . . . . . . . .  10
73	   2.13. Alignment with RFC 3016  . . . . . . . . . . . . . . . . .  10
74	   3.  Payload format . . . . . . . . . . . . . . . . . . . . . . .  11
75	   3.1.  Usage of RTP header fields and RTCP  . . . . . . . . . . .  11
76	   3.2.  RTP payload structure  . . . . . . . . . . . . . . . . . .  12
77	   3.2.1.  The AU Header Section  . . . . . . . . . . . . . . . . .  12
78	   3.2.1.1.  The AU-header  . . . . . . . . . . . . . . . . . . . .  12
79	   3.2.2.  The Auxiliary Section  . . . . . . . . . . . . . . . . .  15
80	   3.2.3.  The Access Unit Data Section . . . . . . . . . . . . . .  15
81	   3.2.3.1.  Fragmentation  . . . . . . . . . . . . . . . . . . . .  16
82	   3.2.3.2.  Interleaving . . . . . . . . . . . . . . . . . . . . .  16
83	   3.2.3.3.  Constraints for interleaving . . . . . . . . . . . . .  18
84	   3.2.3.4.  Crucial and non-crucial AUs with MPEG-4 System data  .  20
85	   3.3.  Usage of this specification  . . . . . . . . . . . . . . .  22
86	   3.3.1.  General  . . . . . . . . . . . . . . . . . . . . . . . .  22
87	   3.3.2.  The generic mode . . . . . . . . . . . . . . . . . . . .  22
88	   3.3.3.  Constant bit rate CELP . . . . . . . . . . . . . . . . .  23
89	   3.3.4.  Variable bit rate CELP . . . . . . . . . . . . . . . . .  23
90	   3.3.5.  Low bit rate AAC . . . . . . . . . . . . . . . . . . . .  24
91	   3.3.6.  High bit rate AAC  . . . . . . . . . . . . . . . . . . .  25
92	   3.3.7.  Additional modes . . . . . . . . . . . . . . . . . . . .  26
93	   4.  IANA considerations  . . . . . . . . . . . . . . . . . . . .  27
94	   4.1.  MIME type registration . . . . . . . . . . . . . . . . . .  27
95	   4.2.  Registration of mode definitions with IANA . . . . . . . .  32
96	   4.3.  Concatenation of parameters  . . . . . . . . . . . . . . .  32
97	   4.4.  Usage of SDP . . . . . . . . . . . . . . . . . . . . . . .  33
98	   4.4.1.  The a=fmtp keyword . . . . . . . . . . . . . . . . . . .  33
99	   5.  Security considerations  . . . . . . . . . . . . . . . . . .  33
100	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . .  34
101	   7.  References . . . . . . . . . . . . . . . . . . . . . . . . .  34
102	   7.1 Normative references . . . . . . . . . . . . . . . . . . . .  34
103	   7.2 Informative references . . . . . . . . . . . . . . . . . . .  35
104	   8.  Author addresses . . . . . . . . . . . . . . . . . . . . . .  35

106	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

108	       APPENDIX: Usage of this payload format . . . . . . . . . . .  37
109	       A. Examples of delay analysis with interleave  . . . . . . .  37
110	       A.1 Introduction . . . . . . . . . . . . . . . . . . . . . .  37
111	       A.2 De-interleaving and error concealment  . . . . . . . . .  37
112	       A.3 Simple Group interleave  . . . . . . . . . . . . . . . .  37
113	       A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . .  37
114	       A.3.2 Determining the de-interleave buffer size  . . . . . .  38
115	       A.3.3 Determining the maximum displacement . . . . . . . . .  38
116	       A.4 More subtle group interleave . . . . . . . . . . . . . .  38
117	       A.4.1 Introduction . . . . . . . . . . . . . . . . . . . . .  38
118	       A.4.2 Determining the de-interleave buffer size  . . . . . .  39
119	       A.4.3 Determining the maximum displacement . . . . . . . . .  39
120	       A.5 Continuous interleave  . . . . . . . . . . . . . . . . .  40
121	       A.5.1 Introduction . . . . . . . . . . . . . . . . . . . . .  40
122	       A.5.2 Determining the de-interleave buffer size  . . . . . .  40
123	       A.5.3 Determining the maximum displacement . . . . . . . . .  41

125	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

127	1. Introduction

129	   The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29
130	   that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4
131	   standards [1].  The MPEG-4 standard specifies compression of
132	   audio-visual data into for example an audio or video elementary
133	   stream.  In the MPEG-4 standard, these streams take the form of
134	   audio-visual objects that may be arranged into an audio-visual scene
135	   by means of a scene description.  Each MPEG-4 elementary stream
136	   consists of a sequence of Access Units; examples of an Access Unit
137	   (AU) are an audio frame and a video picture.

139	   This specification defines a general and configurable payload
140	   structure to transport MPEG-4 elementary streams, in particular
141	   MPEG-4 audio (including speech) streams, MPEG-4 video streams and
142	   also MPEG-4 systems streams, such as BIFS (BInary Format for
143	   Scenes), OCI (Object Content Information), OD (Object Descriptor)
144	   and IPMP (Intellectual Property Management and Protection) streams.
145	   The RTP payload defined in this document is simple to implement and
146	   reasonably efficient.  It allows for optional interleaving of Access
147	   Units (such as audio frames) to increase error resiliency in packet
148	   loss.

150	   Some types of MPEG-4 elementary streams include "crucial"
151	   information whose loss cannot be tolerated, but RTP does not provide
152	   reliable transmission so receipt of that crucial information is not
153	   assured.  Section 3.2.3.4 specifies how stream state is conveyed so
154	   that the receiver can detect the loss of crucial information and
155	   cease decoding until the next random access point is received.
156	   Applications transmitting streams that include crucial information,
157	   such as OD commands, BIFS commands, or programmatic content such as
158	   MPEG-J (Java) and ECMAScript, should include random access points
159	   sufficiently often, depending upon the probability of loss, to
160	   reduce stream corruption to an acceptable level.  An example is the
161	   carousel mechanism as defined by MPEG in ISO/IEC 14496-1.

163	   Such applications may also employ additional protocols or services
164	   to reduce the probability of loss.  At the RTP layer, these measures
165	   include payload formats and profiles for retransmission or forward
166	   error correction (such as in RFC 2733 [10]), which must be employed
167	   with due consideration to congestion control.  Another solution that
168	   may be appropriate for some applications is to carry RTP over TCP
169	   (such as in RFC 2326 [8], section 10.12).  At the network layer,
170	   resource allocation or preferential service may be available to
171	   reduce the probability of loss.  For a general description of methods
172	   to repair streaming media see RFC 2354 [9].

174	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

176	   Though the RTP payload format defined in this document is capable
177	   of transporting any MPEG-4 stream, other, more specific, formats
178	   may exist, such as RFC 3016 [12] for transport of MPEG-4 video
179	   (ISO/IEC 14496 [1] part 2).

181	   Configuration of the payload is provided to accommodate transport
182	   of any MPEG-4 stream at any possible bit rate.  However, for a
183	   specific MPEG-4 elementary stream typically only very few
184	   configurations are needed.  So as to allow for the design of
185	   simplified, but dedicated receivers, this specification requires
186	   that specific modes are defined for transport of MPEG-4 streams.
187	   This document defines modes for MPEG-4 CELP and AAC streams, as
188	   well as a generic mode that can be used to transport any MPEG-4
189	   stream.  In the future new RFCs are expected to specify additional
190	   modes for transport of MPEG-4 streams.

192	   The RTP payload format defined in this document specifies carriage
193	   of system-related information that is often equivalent to the
194	   information that may be contained in the MPEG-4 Sync Layer (SL) as
195	   defined in MPEG-4 Systems [1].  This document does not prescribe how
196	   to transcode or map information from the SL to fields defined in
197	   the RTP payload format.  Such processing, if any, is left to the
198	   discretion of the application.  However, to anticipate the need for
199	   transport of any additional system-related information in future,
200	   an auxiliary field can be configured that may carry any such data.

202	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
203	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
204	   this document are to be interpreted as described in RFC 2119 [4].

206	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

208	2. Carriage of MPEG-4 elementary streams over RTP

210	2.1 Signaling by MIME format parameters

212	   With this payload format a single MPEG-4 elementary stream can be
213	   transported.  Information on the type of MPEG-4 stream carried in
214	   the payload is conveyed by MIME format parameters, for example in
215	   an SDP [5] message or by other means (see section 4).  These MIME
216	   format parameters specify the configuration of the payload.  To
217	   allow for simplified and dedicated receivers, a MIME format
218	   parameter is available to signal a specific mode of using this
219	   payload.  A mode definition MAY include the type of MPEG-4
220	   elementary stream as well as the applied configuration, so as to
221	   avoid the need for receivers to parse all MIME format parameters.
222	   The applied mode MUST be signaled.

224	2.2 MPEG Access Units

226	   For carriage of compressed audio-visual data MPEG defines Access
227	   Units.  An MPEG Access Unit (AU) is the smallest data entity to
228	   which timing information is attributed.  In case of audio an Access
229	   Unit may represent an audio frame and in case of video a picture.
230	   MPEG Access Units are by definition octet-aligned.  If for example
231	   an audio frame is not octet-aligned, up to 7 zero-padding bits MUST
232	   be inserted at the end of the frame to achieve the octet-aligned
233	   Access Units, as required by the MPEG-4 specification.  MPEG-4
234	   decoders MUST be able to decode AUs in which such padding is
235	   applied.

237	   Consistent with the MPEG-4 specification, this document requires
238	   that each MPEG-4 part 2 video Access Unit includes all the coded
239	   data of a picture, any video stream headers that may precede the
240	   coded picture data, and any video stream stuffing that may follow
241	   it, up to, but not including the startcode indicating the start of
242	   a new video stream or the next Access Unit.

244	2.3 Concatenation of Access Units

246	   Frequently it is possible to carry multiple Access Units in one RTP
247	   packet.  This is particularly useful for audio; for example, when
248	   AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC
249	   frames contain on average approximately 200 octets.  On a LAN with a
250	   1500 octet MTU this would allow on average 7 complete AAC frames to
251	   be carried per RTP packet.

253	   Access Units may have a fixed size in octets, but a variable size
254	   is also possible.  To facilitate parsing in case of multiple
255	   concatenated AUs in one RTP packet, the size of each AU is made
256	   known to the receiver.  When concatenating in case of a constant AU
257	   size, this size is communicated "out of band" through a MIME format
258	   parameter.  When concatenating in case of variable size AUs, the RTP
259	   payload carries "in band" an AU size field for each contained AU.

261	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

263	   In combination with the RTP payload length the size information
264	   allows the RTP payload to be split by the receiver back into the
265	   individual AUs.

267	   To simplify the implementation of RTP receivers, it is required
268	   that when multiple AUs are carried in an RTP packet, each AU MUST
269	   be complete, i.e. the number of AUs in an RTP packet MUST be
270	   integral.  In addition, an AU MUST NOT be repeated in other RTP
271	   packets; hence repetition of an AU is only possible by using a
272	   duplicate RTP packet.

274	2.4 Fragmentation of Access Units

276	   MPEG allows for very large Access Units.  Since most IP networks
277	   have significantly smaller MTU sizes, this payload format allows
278	   for the fragmentation of an Access Unit over multiple RTP packets.
279	   Hence when an IP packet is lost after IP-level fragmentation, only an
280	   AU fragment may get lost instead of the entire AU. To simplify the
281	   implementation of RTP receivers, an RTP packet SHALL either carry
282	   one or more complete Access Units or a single fragment of one AU,
283	   i.e. packets MUST NOT contain fragments of multiple Access Units.

285	2.5 Interleaving

287	   When an RTP packet carries a contiguous sequence of Access Units,
288	   the loss of such a packet can result in a "decoding gap" for the
289	   user.  One method to alleviate this problem is to allow for the
290	   Access Units to be interleaved in the RTP packets.  For a modest
291	   cost in latency and implementation complexity, significant error
292	   resiliency to packet loss can be achieved.

294	   To support optional interleaving of Access Units, this payload
295	   format allows for index information to be sent for each Access Unit.
296	   After informing receivers about buffer resources to allocate for
297	   de-interleaving, the RTP sender is free to choose the interleaving
298	   pattern without propagating this information a priori to the
299	   receiver(s).  Indeed the sender could dynamically adjust the
300	   interleaving pattern based on the Access Unit size, error rates,
301	   etc.  The RTP receiver does not need to know the interleaving
302	   pattern used, it only needs to extract the index information of the
303	   Access Unit and insert the Access Unit into the appropriate
304	   sequence in the decoding or rendering queue.  An example of
305	   interleaving is given below.

307	   For example, if we assume that an RTP packet contains 3 AUs, and
308	   that the AUs are numbered 0, 1, 2, 3, 4, and so forth, and if an
309	   interleaving group length of 9 is chosen, then RTP packet(i)
310	   contains the following AU(n):
311	   RTP packet(0):  AU(0),  AU(3),  AU(6)
312	   RTP packet(1):  AU(1),  AU(4),  AU(7)
313	   RTP packet(2):  AU(2),  AU(5),  AU(8)
314	   RTP packet(3):  AU(9),  AU(12), AU(15)
315	   RTP packet(4):  AU(10), AU(13), AU(16)  Etc.

317	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

319	2.6 Time stamp information

321	   The RTP time stamp MUST carry the sampling instant of the first AU
322	   (fragment) in the RTP packet.  When multiple AUs are carried within
323	   an RTP packet, the time stamps of subsequent AUs can be calculated
324	   if the frame period of each AU is known.  For audio and video this
325	   is possible if the frame rate is constant.  However, in some cases
326	   it is not possible to make such calculation. For example, for
327	   variable frame rate video, or for MPEG-4 BIFS streams carrying
328	   composition information.  To support such cases, this payload format
329	   can be configured to carry a time stamp in the RTP payload for each
330	   contained Access Unit.  A time stamp MAY be conveyed in the RTP
331	   payload only for non-first AUs in the RTP packet, and SHALL NOT be
332	   conveyed for the first AU (fragment), as the time stamp for the
333	   first AU in the RTP packet is carried by the RTP time stamp.

335	   MPEG-4 defines two types of time stamp: the composition time stamp
336	   (CTS) and the decoding time stamp (DTS).  The CTS represents the
337	   sampling instant of an AU, and hence the CTS is equivalent to the
338	   RTP time stamp.  The DTS may be used in MPEG-4 video streams that
339	   use bi-directional coding, i.e. when pictures are predicted in both
340	   forward and backward direction by using either a reference picture
341	   in the past, or a reference picture in the future.  The DTS cannot
342	   be carried in the RTP header.  In some cases the DTS can be derived
343	   from the RTP time stamp using frame rate information; this requires
344	   deep parsing in the video stream, which may be considered
345	   objectionable.  But if the video frame rate is variable, the required
346	   information may not even be present in the video stream.  For both
347	   reasons, the capability has been defined to optionally carry the
348	   DTS in the RTP payload for each contained Access Unit.

350	   To keep the coding of time stamps efficient, each time stamp
351	   contained in the RTP payload is coded differentially, the CTS from
352	   the RTP time stamp, and the DTS from the CTS.

354	2.7 State indication of MPEG-4 system streams

356	   ISO/IEC 14496-1 defines states for MPEG-4 system streams.  So as to
357	   convey state information when transporting MPEG-4 system streams,
358	   this payload format allows for the optional carriage in the RTP
359	   payload of the stream state for each contained Access Unit.  Stream
360	   states are used to signal "crucial" AUs that carry information whose
361	   loss cannot be tolerated and are also useful when repeating AUs
362	   according to the carousel mechanism defined in ISO/IEC 14496-1.

364	2.8 Random access indication

366	   Random access to the content of MPEG-4 elementary streams may be
367	   possible at some but not all Access Units.  To signal Access Units
368	   where random access is possible, a random access point flag can
369	   optionally be carried in the RTP payload for each contained Access
370	   Unit.  Carriage of random access points is particularly useful for
371	   MPEG-4 system streams in combination with the stream state.

373	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

375	2.9 Carriage of auxiliary information.

377	   This payload format defines a specific field to carry auxiliary
378	   data.  The auxiliary data field is preceded by a field that specifies
379	   the length of the auxiliary data, so as to facilitate skipping of
380	   the data without parsing it.  The coding of the auxiliary data is not
381	   defined in this document; instead the format, meaning and signaling
382	   of auxiliary information is expected to be specified in one or more
383	   future RFCs.  Auxiliary information MUST NOT be transmitted until its
384	   format, meaning and signaling have been specified and its use has
385	   been signaled.  Receivers that have knowledge of the auxiliary data
386	   MAY decode the auxiliary data, but receivers without knowledge of
387	   such data MUST skip the auxiliary data field.

389	2.10 MIME format parameters and configuring conditional fields

391	   To support the features described in the previous sections several
392	   fields are defined for carriage in the RTP payload.  However, their
393	   use strongly depends on the type of MPEG-4 elementary stream that
394	   is carried.  Sometimes a specific field is needed with a certain
395	   length, while in other cases such field is not needed at all.  To be
396	   efficient in either case, the fields to support these features are
397	   configurable by means of MIME format parameters.  In general, a MIME
398	   format parameter defines the presence and length of the associated
399	   field.  A length of zero indicates absence of the field.  As a
400	   consequence, parsing of the payload requires knowledge of MIME
401	   format parameters.  The MIME format parameters are conveyed to the
402	   receiver via SDP [5] messages, as specified in section 4.4.1, or
403	   through other means.

405	2.11 Global structure of payload format

407	   The RTP payload following the RTP header, contains three
408	   octet-aligned data sections, of which the first two MAY be empty.
409	   See figure 1.

411	          +---------+-----------+-----------+---------------+
412	          | RTP     | AU Header | Auxiliary | Access Unit   |
413	          | Header  | Section   | Section   | Data Section  |
414	          +---------+-----------+-----------+---------------+

416	                    <----------RTP Packet Payload----------->

418	   Figure 1: Data sections within an RTP packet

420	   The first data section is the AU (Access Unit) Header Section, that
421	   contains one or more AU-headers; however, each AU-header MAY be
422	   empty, in which case the entire AU Header Section is empty.  The
423	   second section is the Auxiliary Section, containing auxiliary data;
424	   this section MAY also be configured empty.  The third section is the
425	   Access Unit Data Section, containing either a single fragment of

427	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

429	   one Access Unit or one or more complete Access Units.  The Access
430	   Unit Data Section MUST NOT be empty.

432	2.12 Modes to transport MPEG-4 streams

434	   While it is possible to build fully configurable receivers capable
435	   of receiving any MPEG-4 stream, this specification also allows for
436	   the design of simplified, but dedicated receivers, that are capable
437	   for example of receiving only one type of MPEG-4 stream.  This
438	   is achieved by requiring that specific modes be defined for using
439	   this specification.  Each mode may define constraints for transport
440	   of one or more type of MPEG-4 streams, for instance on the payload
441	   configuration.

443	   The applied mode MUST be signaled.  Signaling the mode is
444	   particularly important for receivers that are only capable of
445	   decoding one or more specific modes.  Such receivers need to
446	   determine whether the applied mode is supported, so as to avoid
447	   problems with processing of payloads that are beyond the
448	   capabilities of the receiver.

450	   In this document several modes are defined for transport of MPEG-4
451	   CELP and AAC streams, as well as a generic mode that can be used
452	   for any MPEG-4 stream.  In the future, new RFCs may specify other
453	   modes of using this specification.  However, each mode MUST be in
454	   full compliance with this specification (see section 3.3.7).

456	2.13 Alignment with RFC 3016

458	   This payload can be configured to be nearly identical to the
459	   payload format defined in RFC 3016 [12] for the MPEG-4 video
460	   configurations recommended in RFC 3016.  Hence, receivers that
461	   comply with RFC 3016 can decode such RTP payload, providing that
462	   additional packets containing video decoder configuration (VO,
463	   VOL, VOSH) are inserted in the stream, as required by RFC 3016.
464	   Conversely, receivers that comply with the specification in this
465	   document SHOULD be able to decode payloads, names and parameters
466	   defined for MPEG-4 video in RFC 3016.  In this respect it is
467	   strongly RECOMMENDED to implement the ability to ignore "in band"
468	   video decoder configuration packets in the RFC 3016 payload.

470	   Note the "out of band" availability of the video decoder
471	   configuration is optional in RFC 3016.  To achieve maximum
472	   interoperability with the RTP payload format defined in this
473	   document, applications that use RFC 3016 to transport MPEG-4 video
474	   (part 2) are recommended to make the video decoder configuration
475	   available as a MIME parameter.

477	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

479	3. Payload Format

481	3.1 Usage of RTP Header Fields and RTCP

483	   Payload Type (PT): The assignment of an RTP payload type for this
484	   packet format is outside the scope of this document; it is
485	   specified by the RTP profile under which this payload format is
486	   used, or signaled dynamically out-of-band (e.g. using SDP).

488	   Marker (M) bit: The M bit is set to 1 to indicate that the RTP
489	   packet payload contains either the final fragment of a fragmented
490	   Access Unit or one or more complete Access Units.

492	   Extension (X) bit: Defined by the RTP profile used.

494	   Sequence Number: The RTP sequence number SHOULD be generated by the
495	   sender in the usual manner with a constant random offset.

497	   Timestamp: Indicates the sampling instant of the first AU
498	   contained in the RTP payload.  This sampling instant is equivalent
499	   to the CTS in the MPEG-4 time domain.  When using SDP the clock rate
500	   of the RTP time stamp MUST be expressed using the "rtpmap"
501	   attribute.  If an MPEG-4 audio stream is transported, the rate SHOULD
502	   be set to the same value as the sampling rate of the audio stream.
503	   If an MPEG-4 video stream is transported, it is RECOMMENDED to set
504	   the rate to 90 kHz.

506	   In all cases, the sender SHALL make sure that RTP time stamps
507	   are identical only if the RTP time stamp refers to fragments of the
508	   same Access Unit.

510	   According to RFC 1889 [2] (section 5.1), RTP time stamps are
511	   RECOMMENDED to start at a random value for security reasons.  This
512	   is not an issue for synchronization of multiple RTP streams.  When,
513	   however, streams from multiple sources are to be synchronized (for
514	   example one stream from local storage, another from an RTP streaming
515	   server), synchronization may become impossible if the receiver only
516	   knows the original time stamp relationships.  Synchronization in such
517	   cases, may require to provide the correct relationship between time
518	   stamps for obtaining synchronization by out of band means.  The
519	   format of such information as well as methods to convey such
520	   information are beyond the scope of this specification.

522	   SSRC: set as described in RFC 1889 [2].

524	   CC and CSRC fields are used as described in RFC 1889 [2].

526	   RTCP SHOULD be used as defined in RFC 1889 [2].  Note that time
527	   stamps in RTCP Sender Reports may be used to synchronize multiple
528	   MPEG-4 elementary streams and also to synchronize MPEG-4 streams
529	   with non-MPEG-4 streams, in case the delivery of these streams uses
530	   RTP.

532	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

534	3.2 RTP Payload Structure

536	3.2.1 The AU Header Section

538	   When present, the AU Header Section consists of the
539	   AU-headers-length field, followed by a number of AU-headers.  See
540	   figure 2.

542	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
543	   |AU-headers-length|AU-header|AU-header|      |AU-header|padding|
544	   |                 |   (1)   |   (2)   |      |   (n)   | bits  |
545	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+

547	   Figure 2: The AU Header Section

549	   The AU-headers are configured using MIME format parameters and MAY
550	   be empty.  If the AU-header is configured empty, the
551	   AU-headers-length field SHALL NOT be present and consequently the
552	   AU Header Section is empty.  If the AU-header is not configured
553	   empty, then the AU-headers-length is a two octet field that
554	   specifies the length in bits of the immediately following
555	   AU-headers, excluding the padding bits.

557	   Each AU-header is associated with a single Access Unit (fragment)
558	   contained in the Access Unit Data Section in the same RTP packet.
559	   For each contained Access Unit (fragment) there is exactly one
560	   AU-header.  Within the AU Header Section, the AU-headers are
561	   bit-wise concatenated in the order in which the Access Units are
562	   contained in the Access Unit Data Section.  Hence, the n-th
563	   AU-header refers to the n-th AU (fragment).  If the concatenated
564	   AU-headers consume a non-integer number of octets, up to 7
565	   zero-padding bits MUST be inserted at the end in order to achieve
566	   octet-alignment of the AU Header Section.

568	3.2.1.1 The AU-header

570	   Each AU-header may contain the fields given in figure 3.  The length
571	   in bits of the fields, with the exception of the CTS-flag, the
572	   DTS-flag and the RAP-flag fields is defined by MIME format
573	   parameters; see section 4.1.  If a MIME format parameter has the
574	   default value of zero, then the associated field is not present.
575	   The number of bits for fields that are present and that represent
576	   the value of a parameter MUST be chosen large enough to correctly
577	   encode the largest value of that parameter during the session.

579	   If present, the fields MUST occur in the mutual order given in
580	   figure 3.  In the general case a receiver can only discover the size
581	   of an AU-header by parsing it since the presence of the CTS-delta
582	   and DTS-delta fields is signaled by the value of the CTS-flag and
583	   DTS-flag, respectively.

585	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

587	   +---------------------------------------+
588	   |     AU-size                           |
589	   +---------------------------------------+
590	   |     AU-Index / AU-Index-delta         |
591	   +---------------------------------------+
592	   |     CTS-flag                          |
593	   +---------------------------------------+
594	   |     CTS-delta                         |
595	   +---------------------------------------+
596	   |     DTS-flag                          |
597	   +---------------------------------------+
598	   |     DTS-delta                         |
599	   +---------------------------------------+
600	   |     RAP-flag                          |
601	   +---------------------------------------+
602	   |     Stream-state                      |
603	   +---------------------------------------+

605	   Figure 3: The fields in the AU-header.  If used, the AU-Index field
606	             only occurs in the first AU-header within an AU Header
607	             Section; in any other AU-header the AU-Index-delta field
608	             occurs instead.

610	   AU-size: Indicates the size in octets of the associated Access Unit
611	         in the Access Unit Data Section in the same RTP packet.  When
612	         the AU-size is associated with an AU fragment, the AU size
613	         indicates the size of the entire AU and not the size of the
614	         fragment.  In this case, the size of the fragment is known
615	         from the size of the AU data section.  This can be exploited
616	         to determine whether a packet contains an entire AU or a
617	         fragment, which is particularly useful after losing a packet
618	         carrying the last fragment of an AU.

620	   AU-Index: Indicates the serial number of the associated Access Unit
621	         (fragment).  For each (in decoding order) consecutive AU or AU
622	         fragment, the serial number is incremented with 1.  When
623	         present, the AU-Index field occurs in the first AU-header in
624	         the AU Header Section, but MUST NOT occur in any subsequent
625	         (non-first) AU-header in that Section.  To encode the serial
626	         number in any such non-first AU-header, the AU-Index-delta
627	         field is used.

629	   AU-Index-delta: The AU-Index-delta field is an unsigned integer
630	         that specifies the serial number of the associated AU as the
631	         difference with respect to the serial number of the previous
632	         Access Unit.  Hence, for the n-th (n>1) AU the serial number
633	         is found from:
634	         AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1
635	         If the AU-Index field is present in the first AU-header in

637	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

639	         the AU Header Section, then the AU-Index-delta field MUST be
640	         present in any subsequent (non-first) AU-header.  When the
641	         AU-Index-delta is coded with the value 0, it indicates that
642	         the Access Units are consecutive in decoding order.  An
643	         AU-Index-delta value larger than 0 signals that interleaving
644	         is applied.

646	   CTS-flag: Indicates whether the CTS-delta field is present.
647	         A value of 1 indicates that the field is present, a value
648	         of 0 that it is not present.
649	         The CTS-flag field MUST be present in each AU-header if the
650	         length of the CTS-delta field is signaled to be larger than
651	         zero.  In that case, the CTS-flag field MUST have the value 0
652	         in the first AU-header and MAY have the value 1 in all
653	         non-first AU-headers.  The CTS-flag field SHOULD be 0 for
654	         any non-first fragment of an Access Unit.

656	   CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's
657	         complement offset (delta) from the time stamp in the RTP
658	         header of this RTP packet.  The CTS MUST use the same clock
659	         rate as the time stamp in the RTP header.

661	   DTS-flag: Indicates whether the DTS-delta field is present.  A value
662	         of 1 indicates that DTS-delta is present, a value of 0 that
663	         it is not present.
664	         The DTS-flag field MUST be present in each AU-header if the
665	         length of the DTS-delta field is signaled to be larger than
666	         zero.  The DTS-flag field MUST have the same value for all
667	         fragments of an Access Unit.

669	   DTS-delta: Specifies the value of the DTS as a 2's complement
670	         offset (delta) from the CTS.  The DTS MUST use the
671	         same clock rate as the time stamp in the RTP header.  The
672	         DTS-delta field MUST have the same value for all fragments of
673	         an Access Unit.

675	   RAP-flag: Indicates when set to 1 that the associated Access Unit
676	         provides a random access point to the content of the stream.
677	         If an Access Unit is fragmented, the RAP flag, if present,
678	         MUST be set to 0 for each non-first fragment of the AU.

680	   Stream-state:  Specifies the state of the stream for an AU of an
681	         MPEG-4 system stream; each state is identified by a value of
682	         a modulo counter.  In ISO/IEC 14496-1, MPEG-4 system streams
683	         use the AU_SequenceNumber to signal stream states.  When the
684	         stream state changes, the value of stream-state MUST be
685	         incremented by one.

687	         Note: no relation is required between stream-states of
688	         different streams.

690	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

692	3.2.2 The Auxiliary Section

694	   The Auxiliary Section consists of the auxiliary-data-size field
695	   followed by the auxiliary-data field.  Receivers MAY (but are not
696	   required to) parse the auxiliary-data field; to facilitate skipping
697	   of the auxiliary-data field by receivers, the auxiliary-data-size
698	   field indicates the length in bits of the auxiliary-data.  If the
699	   concatenation of the auxiliary-data-size and the auxiliary-data
700	   fields consume a non-integer number of octets, up to 7 zero padding
701	   bits MUST be inserted immediately after the auxiliary data in order
702	   to achieve octet-alignment.  See figure 4.

704	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
705	   | auxiliary-data-size   | auxiliary-data       |padding bits |
706	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+

708	   Figure 4: The fields in the Auxiliary Section

710	   The length in bits of the auxiliary-data-size field is configurable
711	   by a MIME format parameter; see section 4.1.  The default length of
712	   zero indicates that the entire Auxiliary Section is absent.

714	   auxiliary-data-size: specifies the length in bits of the immediately
715	         following auxiliary-data field;

717	   auxiliary-data: the auxiliary-data field contains data of a format
718	         not defined by this specification.

720	3.2.3 The Access Unit Data Section

722	   The Access Unit Data Section contains an integer number of complete
723	   Access Units or a single fragment of one AU.  The Access Unit Data
724	   Section is never empty.  If data of more than one Access Unit is
725	   present, then the AUs are concatenated into a contiguous string
726	   of octets.  See figure 5.  The AUs inside the Access Unit Data
727	   Section MUST be in decoding order, though not necessarily contiguous
728	   in the case of interleaving.

730	   The size and number of Access Units SHOULD be adjusted such that
731	   the resulting RTP packet is not larger than the path MTU.  To handle
732	   larger packets, this payload format relies on lower layers for
733	   fragmentation, which may result in reduced performance.

735	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

737	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
738	   |AU(1)                                                          |
739	   +                                                               |
740	   |                                                               |
741	   |               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
742	   |               |AU(2)                                          |
743	   +-+-+-+-+-+-+-+-+                                               |
744	   |                                                               |
745	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
746	   |                               | AU(n)                         |
747	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
748	   |AU(n) continued|
749	   |-+-+-+-+-+-+-+-+

751	   Figure 5: Access Unit Data Section; each AU is octet-aligned.

753	   When multiple Access Units are carried, the size of each AU MUST be
754	   made available to the receiver.  If the AU size is variable then the
755	   size of each AU MUST be indicated in the AU-size field of the
756	   corresponding AU-header.  However, if the AU size is constant for a
757	   stream, this mechanism SHOULD NOT be used, but instead the fixed
758	   size SHOULD be signaled by the MIME format parameter
759	   "constantSize", see section 4.1.

761	   The absence of both AU-size in the AU-header and the constantSize
762	   MIME format parameter indicates carriage of a single AU (fragment),
763	   i.e. that a single Access Unit (fragment) is transported in each
764	   RTP packet for that stream.

766	3.2.3.1 Fragmentation

768	   A packet SHALL carry either one or more complete Access Units, or
769	   a single fragment of an Access Unit.  Fragments of the same Access
770	   Unit have the same time stamp but different RTP sequence numbers.
771	   The marker bit in the RTP header is 1 on the last fragment of an
772	   Access Unit, and 0 on all other fragments.

774	3.2.3.2 Interleaving

776	   Unless prohibited by the signaled mode, a sender MAY interleave
777	   Access Units.  Receivers that are capable of receiving modes that
778	   support interleaving, MUST be able to decode interleaved Access
779	   Units.

781	   When a sender interleaves Access Units, it needs to provide
782	   sufficient information to enable a receiver to unambiguously
783	   reconstruct the original order, even in case of out-of-order
784	   packets, packet loss or duplication.  The information that senders

786	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

788	   need to provide depends on whether or not the Access Units have a
789	   constant time duration.  Access Units have a constant time duration,
790	   if:

792	      TS(i+1) - TS(i) = constant, for any i, where

794	            i indicates the index of the AU in original order
795	            TS(i) denotes the time stamp of AU(i)

797	   The MIME parameter "constantDuration" SHOULD be used to signal that
798	   Access Units have a constant time duration, see section 4.1.

800	   If the "constantDuration" parameter is present, the receiver can
801	   reconstruct the original Access Unit timing based solely on the RTP
802	   timestamp and AU-Index-delta.  Accordingly, when transmitting Access
803	   Units of constant duration, the AU-Index, if present, MUST be set
804	   to the value 0.  Receivers of constant duration Access Units MUST
805	   use the RTP timestamp to determine the index of the first AU in the
806	   RTP packet.  The AU-Index-delta header and the signaled
807	   "constantDuration" are used to reconstruct AU timing.

809	   If the "constantDuration" parameter is not present, then Access
810	   Units are assumed to have a variable duration, unless the AU-Index
811	   is present and coded with the value 0 in each RTP packet.  When
812	   transmitting Access Units of variable duration, then the
813	   "constantDuration" parameter MUST NOT be present, and the
814	   transmitter MUST use the AU-Index to encode the index information
815	   required for re-ordering, and the receiver MUST use that value to
816	   determine the index of each AU in the RTP packet.  The number of
817	   bits of the AU-Index field MUST be chosen so that valid index
818	   information is provided at the applied interleaving scheme, without
819	   causing problems due to roll-over of the AU-Index field.  In
820	   addition, the CTS-delta MUST be coded in the AU header for each
821	   non-first AU in the RTP packet, so that receivers can place the AUs
822	   correctly in time.

824	   When interleaving is applied, a de-interleave buffer is needed in
825	   receivers to put the Access Units in their correct logical
826	   consecutive decoding order.  This requires the computation of the
827	   time stamp for each Access Unit.  In case of a constant time duration
828	   per Access Unit, the time stamp of the i-th access unit in an RTP
829	   packet with RTP time stamp T is calculated as follows:

831	   Timestamp[0] = T
832	   Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k]
833	                         + 1))) * access-unit-duration

835	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

837	   When AU-Index-delta is always 0, this reduces to T + i * (access-
838	   unit-duration).  This is the non-interleaved case, where the frames
839	   are consecutive in decoding order.  Note that the AU-Index field
840	   (present for the first Access Unit) is indeed not needed in this
841	   calculation.

843	3.2.3.3 Constraints for interleaving

845	   The size of the packets should be suitably chosen to be appropriate
846	   to both the path MTU and the capacity of the receiver's
847	   de-interleave buffer.  The maximum packet size for a session SHOULD
848	   be chosen not to exceed the path MTU.

850	   To allow receivers to allocate sufficient resources for
851	   de-interleaving, senders MUST provide the information to receivers
852	   as specified in this section.

854	   AUs enter the decoder in decoding order.  The de-interleave buffer
855	   is used to re-order a stream of interleaved AUs back into decoding
856	   order.  When interleaving is applied, the decoding of "early" AUs
857	   has to be postponed until all AUs that precede in decoding order
858	   are present.  Therefore these "early" AUs are stored in the
859	   de-interleave buffer.  As an example in figure 6 the interleaving
860	   pattern from section 2.5 is considered.

862	                             +--+--+--+--+--+--+--+--+--+--+--+-
863	   Interleaved AUs           | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
864	                             +--+--+--+--+--+--+--+--+--+--+--+-
865	   Storage of "early" AUs         3  3  3  3  3  3
866	                                     6  6  6  6  6  6
867	                                           4  4  4
868	                                              7  7  7
869	                                                            12 12

871	   Figure 6: Storage of "early" AUs in the de-interleave buffer per
872	             interleaved AU.

874	   AU(3) is to be delivered to the decoder after AU(0), AU(1)and
875	   AU(2); of these AUs, AU(2) is most late and hence AU(3) needs to be
876	   stored until AU(2) is present in the pattern.  Similarly, AU(6) is
877	   to be stored until AU(5) is present, while AU(4) and AU(7) are to
878	   be stored until AU(2) and AU(5) are present, respectively.  Note
879	   that the fullness of the de-interleave buffer varies in time.  In
880	   figure 6, the de-interleave buffer contains at most 4, but often
881	   less AUs.

883	   So as to give a rough indication of the resources needed in the
884	   receiver for de-interleaving, the maximum displacement in time of
885	   an AU is defined.  For any AU in the pattern it can be verified

887	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

889	   which AUs are not yet present.  The maximum displacement in time of
890	   an AU is the maximum difference between the time stamp of an AU in
891	   the pattern and the time stamp of the earliest AU that is not yet
892	   present.  In other words, when considering a sequence of interleaved
893	   AUs, then:

895	   Maximum displacement = max{TS(i) - TS(j)}, for any i and any j>i,

897	            where i and j indicate the index of the AU in the
898	                  interleaving pattern and TS denotes the time stamp
899	                  of the AU

901	   As an example in figure 7 the interleaving pattern from section 2.5
902	   is considered.  For each AU in the pattern the earliest not yet
903	   present AU  is indicated.  A "-" indicates that all previous AUs
904	   are present.  If the AU period is constant, the maximum displacement
905	   equals 5 AU periods, as found for AU(6) and AU(7).

907	                                 +--+--+--+--+--+--+--+--+--+--+--+-
908	   Interleaved AUs               | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
909	                                 +--+--+--+--+--+--+--+--+--+--+--+-

911	   Earliest not yet present AU     -  1  1  -  2  2  -  -  -  - 10

913	   Figure 7: The earliest not yet present AU for each AU in the
914	             interleaving pattern.

916	   When interleaving, senders MUST signal the maximum displacement
917	   in time during the session via the MIME format parameter
918	   "maxDisplacement"; see section 4.1.

920	   An estimate of the size of the de-interleave buffer is found by
921	   multiplying the maximum displacement by the maximum bit rate:

923	   size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP
924	                                clock frequency),

926	   where Rate(max) is the maximum bit-rate of the transported stream.

928	   Note that receivers can derive Rate(max) from the MIME format
929	   parameters streamType, profile-level-id, and config.

931	   However, this calculation estimates the size of the de-interleave
932	   buffer and the really required size may differ from the calculated
933	   value.  If this calculation under-estimates the size of the
934	   de-interleave buffer, then senders, when interleaving, MUST signal
935	   a size of the de-interleave buffer via the MIME format parameter
936	   "de-interleaveBufferSize"; see section 4.1.  If the calculation

938	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

940	   over-estimates the size of the de-interleave buffer, then senders,
941	   when interleaving, MAY signal a size of the de-interleave buffer
942	   via the MIME format parameter "de-interleaveBufferSize".

944	   The signaled size of the de-interleave buffer MUST be large enough
945	   to contain all "early" AUs at any point in time during the session,
946	   that is:

948	   minimum de-interleave buffer size = max [sum {if TS(i) > TS(j) then
949	                                       AU-size(i) else 0}] for any j
950	                                                    and any i<j, where

952	              i and j indicate the index of an AU in the interleaving
953	                      pattern,
954	              TS(i) denotes the time stamp of AU(i), and
955	              AU-size(i) denotes the size of AU(i) in number of octets.

957	   If the "de-interleaveBufferSize" parameter is present, then the
958	   applied buffer for de-interleaving in a receiver MUST have a size
959	   that is at least equal to the signaled size of the de-interleave
960	   buffer, else a size that is at least equal to the calculated size
961	   of the de-interleave buffer.

963	   No matter what interleaving scheme is used, the scheme must be
964	   analyzed to calculate the applicable maxDisplacement value, as well
965	   as the required size of the de-interleave buffer.  Senders SHOULD
966	   signal values that are not larger than the strictly required
967	   values; if larger values are signaled, the receiver will buffer
968	   excessively.

970	   Note that for low bit-rate material, the applied interleaving
971	   may make packets shorter than the MTU size.

973	3.2.3.4 Crucial and non-crucial AUs with MPEG-4 System data

975	   Some Access Units with MPEG-4 system data, called "crucial" AUs,
976	   carry information whose loss cannot be tolerated, either in the
977	   presentation or in the decoder.  At each crucial AU in an MPEG-4
978	   system stream, the stream state changes.  The stream-state MAY
979	   remain constant at non-crucial AUs.  In ISO/IEC 14496-1, MPEG-4
980	   system streams use the AU_SequenceNumber to signal stream states.

982	   Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set
983	   position of node X", AU3 = "Set position of node X".  AU1 is crucial,
984	   since if it is lost, AU2 cannot be executed.  However, AU2 is not
985	   crucial, since AU3 can be executed even if AU2 is lost.

987	   When a crucial AU is (possibly) lost, the stream is corrupted.  For
988	   example, when an AU is lost and the stream state has changed at the
989	   next received AU, then it is possible that the lost AU was crucial.
990	   Once corrupted, the stream remains corrupted until the next random
991	   access point.  Note that loss of non-crucial AUs does not corrupt the

993	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

995	   stream.  When a decoder starts receiving a stream, the decoder MUST
996	   consider the stream corrupted until an AU is received that provides
997	   a random access point.

999	   An AU that provides a random access point, as signaled by the
1000	   RAP-flag, may be crucial or not.  Non-crucial RAP AUs provide a
1001	   "repeated" random access point for use by decoders that recently
1002	   joined the stream or that need to re-start decoding after a stream
1003	   corruption.  Non-crucial RAP AUs MUST include all updates since the
1004	   last crucial RAP AU.

1006	   Upon receiving AUs, decoders are to react as follows:
1007	   a) if the RAP-flag is set to 1 and the stream-state changes, then
1008	      the AU is a crucial RAP AU, and the AU MUST be decoded.
1009	   b) if the RAP-flag is set to 1 and the stream state does not change,
1010	      then the AU is a non-crucial RAP AU, and the receiver SHOULD
1011	      decode it if the stream is corrupted.  Otherwise, the decoder MUST
1012	      ignore the AU.
1013	   c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless
1014	      the stream is corrupted, in which case the AU MUST be ignored.

1016	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1018	3.3 Usage of this specification

1020	3.3.1 General

1022	   Usage of this specification requires definition of a mode.  A mode
1023	   defines how to use this specification, as deemed appropriate.
1024	   Senders MUST signal the applied mode via the MIME format parameter
1025	   "mode", as specified in section 4.1.  This specification defines a
1026	   generic mode that can be used for any MPEG-4 stream, as well as
1027	   specific modes for transport of MPEG-4 CELP and MPEG-4 AAC streams,
1028	   defined in ISO/IEC 14496-3.

1030	   When use of this payload format is signaled using SDP [5], an
1031	   "rtpmap" attribute is part of that signaling.  The same requirements
1032	   apply for the rtpmap attribute in any mode compliant to this
1033	   specification.  The general form of an rtpmap attribute is:
1034	   a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding
1035	             parameters>]
1036	   For audio streams, <encoding parameters> specifies the number of
1037	   audio channels: 2 for stereo material (see RFC 2327 [5]) and 1 for
1038	   mono.  Provided no additional parameters are needed, this parameter
1039	   may be omitted for mono material, hence its default value is 1.

1041	3.3.2 The generic mode

1043	   The generic mode can be used for any MPEG-4 stream.  In this mode
1044	   no mode-specific constraints are applied; hence, in the generic
1045	   mode the full flexibility of this specification can be exploited.
1046	   The generic mode is signaled by mode=generic.

1048	   An example is given below for transport of a BIFS-Anim stream.  In
1049	   this example carriage of multiple BIFS-Anim Access Units is allowed
1050	   in one RTP packet.  The AU-header contains the AU-size field, the
1051	   CTS-flag and, if the CTS flag is set to 1, the CTS-delta field.  The
1052	   number of bits of the AU-size and the CTS-delta fields is 10 and
1053	   16, respectively.  The AU-header also contains the RAP-flag and the
1054	   Stream-state of 4 bits.  This results in an AU-header with a
1055	   total size of two or four octets per BIFS-Anim AU.  The RTP time
1056	   stamp uses a 1 kHz clock.  Note that the media type name is video,
1057	   because the BIFS-Anim stream is part of an audio-visual
1058	   presentation.  For conventions on media type names see section 4.1.

1060	   In detail:
1061	   m=video 49230 RTP/AVP 96
1062	   a=rtpmap:96 mpeg4-generic/1000
1063	   a=fmtp:96 streamtype=3; profile-level-id=1807; mode=generic;
1064	   objectType=2; config=0842237F24001FB400094002C0; sizeLength=10;
1065	   CTSDeltaLength=16; randomAccessIndication=1;
1066	   streamStateIndication=4

1068	   Note: The a=fmtp line has been wrapped to fit the page, it comprises
1069	         a single line in the SDP file.

1071	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1073	   The hexadecimal value of the "config" parameter is the
1074	   BIFSConfiguration() as defined in ISO/IEC 14496-1.  The
1075	   BIFSConfiguration() specifies that the BIFS stream is a BIFS-Anim
1076	   stream.  For the description of MIME parameters see section 4.1.

1078	3.3.3 Constant bit-rate CELP

1080	   This mode is signaled by mode=CELP-cbr.  In this mode one or more
1081	   complete CELP frames of fixed size can be transported in one RTP
1082	   packet; interleaving MUST NOT be used with this mode.  The RTP
1083	   payload consists of one or more concatenated CELP frames, each of
1084	   the same size.  CELP frames MUST NOT be fragmented when using this
1085	   mode.  Both the AU Header Section and the Auxiliary Section MUST be
1086	   empty.

1088	   The MIME format parameter constantSize MUST be provided to specify
1089	   the length of each CELP frame.

1091	   For example:

1093	   m=audio 49230 RTP/AVP 96
1094	   a=rtpmap:96 mpeg4-generic/16000/1
1095	   a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-cbr; config=
1096	   440E00; constantSize=27; constantDuration=240

1098	   Note: The a=fmtp line has been wrapped to fit the page, it comprises
1099	         a single line in the SDP file.

1101	   The hexadecimal value of the "config" parameter is the
1102	   AudioSpecificConfig()as defined in ISO/IEC 14496-3.
1103	   AudioSpecificConfig() specifies a mono CELP stream with a sampling
1104	   rate of 16 kHz at a fixed bitrate of 14.4 kb/s and 6 sub-frames per
1105	   CELP frame.  For the description of MIME parameters see section 4.1.

1107	3.3.4 Variable bit-rate CELP

1109	   This mode is signaled by mode=CELP-vbr.  With this mode one or more
1110	   complete CELP frames of variable size can be transported in one RTP
1111	   packet with OPTIONAL interleaving.  As CELP frames are very small,
1112	   while the largest possible AU-size in this mode is greater than the
1113	   maximum CELP frame size, there is no support for fragmentation of
1114	   CELP frames.  Hence CELP frames MUST NOT be fragmented when using
1115	   this mode.

1117	   In this mode the RTP payload consists of the AU Header Section,
1118	   followed by one or more concatenated CELP frames.  The Auxiliary
1119	   Section MUST be empty.  For each CELP frame contained in the payload
1120	   there MUST be a one octet AU-header in the AU Header Section to
1121	   provide:
1122	   (a) the size of each CELP frame in the payload and
1123	   (b) index information for computing the sequence (and hence timing)
1124	       of each CELP frame.

1126	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1128	   Transport of CELP frames requires that the AU-size field is coded
1129	   with 6 bits.  In this mode therefore 6 bits are allocated to the
1130	   AU-size field, and 2 bits to the AU-Index(-delta) field.  Each
1131	   AU-Index field MUST be coded with the value 0.  In the AU Header
1132	   Section, the concatenated AU-headers are preceded by the 16-bit
1133	   AU-headers-length field, as specified in section 3.2.1.

1135	   In addition to the required MIME format parameters, the following
1136	   parameters MUST be present: sizeLength, indexLength, and
1137	   indexDeltaLength.  CELP frames have fixed time duration per Access
1138	   Unit; when interleaving in this mode, the applicable duration MUST
1139	   be signaled by the MIME format parameter constantDuration.  In
1140	   addition, the parameter maxDisplacement MUST be present when
1141	   interleaving.

1143	   For example:

1145	   m=audio 49230 RTP/AVP 96
1146	   a=rtpmap:96 mpeg4-generic/16000/1
1147	   a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-vbr; config=
1148	   440F20; sizeLength=6; indexLength=2; indexDeltaLength=2;
1149	   constantDuration=160; maxDisplacement=5

1151	   Note: The a=fmtp line has been wrapped to fit the page, it comprises
1152	         a single line in the SDP file.

1154	   The hexadecimal value of the "config" parameter is the
1155	   AudioSpecificConfig()as defined in ISO/IEC 14496-3.
1156	   AudioSpecificConfig() specifies a mono CELP stream with a sampling
1157	   rate of 16 kHz at a bitrate that varies between 13.9 and 16.2 kb/s
1158	   and with 4 sub-frames per CELP frame.  For the description of MIME
1159	   parameters see section 4.1.

1161	3.3.5 Low bit-rate AAC

1163	   This mode is signaled by mode=AAC-lbr.  This mode supports transport
1164	   of one or more complete AAC frames of variable size.  In this mode
1165	   the AAC frames are allowed to be interleaved and hence receivers
1166	   MUST support de-interleaving.  The maximum size of an AAC frame in
1167	   this mode is 63 octets.  AAC frames MUST NOT be fragmented when
1168	   using this mode.  Hence, when using this mode, encoders MUST ensure
1169	   that the size of each AAC frame is at most 63 octets.

1171	   The payload configuration in this mode is the same as in the
1172	   variable bit-rate CELP mode as defined in 3.3.4.  The RTP payload
1173	   consists of the AU Header Section, followed by concatenated AAC
1174	   frames.  The Auxiliary Section MUST be empty.  For each AAC frame
1175	   contained in the payload the one octet AU-header MUST provide:
1176	   (a) the size of each AAC frame in the payload and
1177	   (b) index information for computing the sequence (and hence timing)
1178	       of each AAC frame.

1180	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1182	   In the AU-header Section, the concatenated AU-headers MUST be
1183	   preceded by the 16-bit AU-headers-length field, as specified in
1184	   section 3.2.1.

1186	   In addition to the required MIME format parameters, the following
1187	   parameters MUST be present: sizeLength, indexLength, and
1188	   indexDeltaLength.  AAC frames have fixed time duration per Access
1189	   Unit; when interleaving in this mode, the applicable duration MUST
1190	   be signaled by the MIME format parameter constantDuration.  In
1191	   addition, the parameter maxDisplacement MUST be present when
1192	   interleaving.

1194	   For example:

1196	   m=audio 49230 RTP/AVP 96
1197	   a=rtpmap:96 mpeg4-generic/22050/1
1198	   a=fmtp:96 streamtype=5; profile-level-id=14; mode=AAC-lbr; config=
1199	   1388; sizeLength=6; indexLength=2; indexDeltaLength=2;
1200	   constantDuration=1024; maxDisplacement=5

1202	   Note: The a=fmtp line has been wrapped to fit the page, it comprises
1203	         a single line in the SDP file.

1205	   The hexadecimal value of the "config" parameter is the
1206	   AudioSpecificConfig() as defined in ISO/IEC 14496-3.
1207	   AudioSpecificConfig() specifies a mono AAC stream with a sampling
1208	   rate of 22.05 kHz.  For the description of MIME parameters see
1209	   section 4.1.

1211	3.3.6 High bit-rate AAC

1213	   This mode is signaled by mode=AAC-hbr.  This mode supports transport
1214	   of variable size AAC frames.  In one RTP packet either one or more
1215	   complete AAC frames are carried, or a single fragment of an AAC
1216	   frame.  In this mode the AAC frames are allowed to be interleaved
1217	   and hence receivers MUST support de-interleaving.  The maximum size
1218	   of an AAC frame in this mode is 8191 octets.

1220	   In this mode the RTP payload consists of the AU Header Section,
1221	   followed by either one AAC frame, several concatenated AAC frames
1222	   or one fragmented AAC frame.  The Auxiliary Section MUST be empty.
1223	   For each AAC frame contained in the payload there MUST be an
1224	   AU-header in the AU Header Section to provide:
1225	   (a) the size of each AAC frame in the payload and
1226	   (b) index information for computing the sequence (and hence timing)
1227	       of each AAC frame.

1229	   To code the maximum size of an AAC frame requires 13 bits.  Therefore
1230	   in this configuration 13 bits are allocated to the AU-size, and
1231	   3 bits to the AU-Index(-delta) field.  Thus each AU-header has a size
1232	   of 2 octets.  Each AU-Index field MUST be coded with the value 0.  In
1233	   the AU Header Section, the concatenated AU-headers MUST be preceded

1235	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1237	   by the 16-bit AU-headers-length field, as specified in
1238	   section 3.2.1.

1240	   In addition to the required MIME format parameters, the following
1241	   parameters MUST be present: sizeLength, indexLength, and
1242	   indexDeltaLength.  AAC frames have fixed time duration per Access
1243	   Unit; when interleaving in this mode, the applicable duration MUST
1244	   be signaled by the MIME format parameter constantDuration.  In
1245	   addition, the parameter maxDisplacement MUST be present when
1246	   interleaving.

1248	   For example:

1250	   m=audio 49230 RTP/AVP 96
1251	   a=rtpmap:96 mpeg4-generic/48000/6
1252	   a=fmtp:96 streamtype=5; profile-level-id=16; mode=AAC-hbr;
1253	   config=11B0; sizeLength=13; indexLength=3;
1254	   indexDeltaLength=3; constantDuration=1024

1256	   Note: The a=fmtp line has been wrapped to fit the page, it comprises
1257	         a single line in the SDP file.

1259	   The hexadecimal value of the "config" parameter is the
1260	   AudioSpecificConfig() as defined in ISO/IEC 14496-3.
1261	   AudioSpecificConfig() specifies a 5.1 channel AAC stream with a
1262	   sampling rate of 48 kHz.  For the description of MIME parameters see
1263	   section 4.1.

1265	3.3.7 Additional modes

1267	   This specification only defines the modes specified in sections
1268	   3.3.2 up to 3.3.6.  Additional modes are expected to be defined in
1269	   future RFCs.  Each additional mode MUST be in full compliance with
1270	   this specification.

1272	   Any new mode MUST be defined such that an implementation including
1273	   all the features of this specification can decode the payload format
1274	   corresponding to this new mode.  For this reason a mode MUST NOT
1275	   specify new default values for MIME parameters.  In particular, MIME
1276	   parameters that configure the RTP payload MUST be present (unless
1277	   they have the default value), even if its presence is redundant in
1278	   case the mode assigns a fixed value to a parameter.  A mode may
1279	   define additionally that some MIME parameters are required instead
1280	   of optional, that some MIME parameters have fixed values (or
1281	   ranges), and that there are rules restricting the usage.

1283	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1285	4. IANA considerations

1287	   This section describes the MIME types and names associated with
1288	   this payload format.  Section 4.1 registers the MIME types, as per
1289	   RFC 2048 [3].

1291	   This format may require additional information about the mapping to
1292	   be made available to the receiver.  This is done using parameters
1293	   also described in the next section.

1295	4.1 MIME type registration

1297	   MIME media type name: "video" or "audio" or "application"

1299	   "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2)
1300	   or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information
1301	   needed for an audio/visual presentation.

1303	   "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3)
1304	   or MPEG-4 Systems streams that convey information needed for an
1305	   audio only presentation.

1307	   "application" MUST be used for MPEG-4 Systems streams (ISO/IEC
1308	   14496-1) that serve purposes other than audio/visual presentation,
1309	   e.g. in some cases when MPEG-J (Java) streams are transmitted.

1311	   Depending on the required payload configuration, MIME format
1312	   parameters need to be available to the receiver.  This is done using
1313	   the parameters described in the next section.  There are required
1314	   and optional parameters.

1316	   Optional parameters are of two types: general parameters and
1317	   configuration parameters.  The configuration parameters are used to
1318	   configure the fields in the AU Header section and in the auxiliary
1319	   section.  The absence of any configuration parameter is equivalent to
1320	   the associated field set to its default value, which is always zero.
1321	   The absence of all configuration parameters resolves into a default
1322	   "basic" configuration with an empty AU-header section and an empty
1323	   auxiliary section in each RTP packet.

1325	   MIME subtype name: mpeg4-generic

1327	   Required parameters:

1329	   MIME format parameters are not case dependent; however for clarity
1330	   both upper and lower case are used in the names of the parameters
1331	   described in this specification.

1333	      streamType:
1334	      The integer value that indicates the type of MPEG-4 stream that
1335	      is carried; its coding corresponds to the values of the
1336	      streamType as defined in Table 9 (streamType Values) in ISO/IEC
1337	      14496-1.

1339	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1341	      profile-level-id:
1342	      A decimal representation of the MPEG-4 Profile Level indication.
1343	      This parameter MUST be used in the capability exchange or
1344	      session set-up procedure to indicate the MPEG-4 Profile and Level
1345	      combination of which the relevant MPEG-4 media codec is capable
1346	      of.
1347	      For MPEG-4 Audio streams, this parameter is the decimal value
1348	         from Table 5 (audioProfileLevelIndication Values) in ISO/IEC
1349	         14496-1, indicating which MPEG-4 Audio tool subsets are
1350	         required to decode the audio stream.
1351	      For MPEG-4 Visual streams, this parameter is the decimal value
1352	         from Table G-1 (FLC table for profile and level indication) of
1353	         ISO/IEC 14496-2, indicating which MPEG-4 Visual tool subsets
1354	         are required to decode the visual stream.
1355	      For BIFS streams, this parameter is the decimal value that is
1356	         obtained from (SPLI + 256*GPLI), where:
1357	         SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with
1358	            the applied sceneProfileLevelIndication;
1359	         GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with
1360	            the applied graphicsProfileLevelIndication.
1361	      For MPEG-J streams, this parameter is the decimal value from
1362	         table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1,
1363	         indicating the profile and level of the MPEG-J stream.
1364	      For OD streams, this parameter is the decimal value from table 3
1365	         (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the
1366	         profile and level of the OD stream.
1367	      For IPMP streams, this parameter has either the decimal value 0,
1368	         indicating an unspecified profile and level, or a value larger
1369	         than zero, indicating an MPEG-4 IPMP profile and level as
1370	         defined in a future MPEG-4 specification.
1371	      For Clock Reference streams and Object Content Info streams, this
1372	         parameter has the decimal value zero, indicating that profile
1373	         and level information is conveyed through the OD framework.

1375	      config:
1376	      A hexadecimal representation of an octet string that expresses
1377	      the media payload configuration.  Configuration data is mapped
1378	      onto the hexadecimal octet string in an MSB-first basis.  The
1379	      first bit of the configuration data SHALL be located at the MSB
1380	      of the first octet.  In the last octet, if necessary to achieve
1381	      octet-alignment, up to 7 zero-valued padding bits shall follow
1382	      the configuration data.
1383	      For MPEG-4 Audio streams, config is the audio object type
1384	         specific decoder configuration data AudioSpecificConfig() as
1385	         defined in ISO/IEC 14496-3.  For Structured Audio, the
1386	         AudioSpecificConfig() may be conveyed by other means, not
1387	         defined by this specification.  If the AudioSpecificConfig()
1388	         is conveyed by other means for Structured Audio, then the
1389	         config MUST be a quoted empty hexadecimal octet string, as
1390	         follows: config="".
1391	         Note that a future mode of using this RTP payload format for
1392	         Structured Audio may define such other means.

1394	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1396	      For MPEG-4 Visual streams, config is the MPEG-4 Visual
1397	         configuration information as defined in subclause 6.2.1 Start
1398	         codes of ISO/IEC 14496-2.  The configuration information
1399	         indicated by this parameter SHALL be the same as the
1400	         configuration information in the corresponding MPEG-4 Visual
1401	         stream, except for first-half-vbv-occupancy and
1402	         latter-half-vbv-occupancy, if it exists, which may vary in
1403	         the repeated configuration information inside an MPEG-4
1404	         Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2).
1405	      For BIFS streams, this is the BIFSConfig() information as defined
1406	         in ISO/IEC 14496-1.  For version 1, BIFSConfig is defined in
1407	         section 9.3.5.2, and for version 2 in section 9.3.5.3.  The
1408	         MIME format parameter objectType signals the version of
1409	         BIFSConfig.
1410	      For IPMP streams, this is either a quoted empty hexadecimal octet
1411	         string, indicating the absence of any decoder configuration
1412	         information (config=""), or the IPMPConfiguration() as
1413	         defined in a future MPEG-4 IPMP specification.
1414	      For Object Content Info (OCI) streams, this is the
1415	         OCIDecoderConfiguration() information of the OCI stream, as
1416	         defined in section 8.4.2.4 in ISO/IEC 14496-1.
1417	      For OD streams, Clock Reference streams and MPEG-J streams, this
1418	         is a quoted empty hexadecimal octet string (config=""), as
1419	         no information on the decoder configuration is required.

1421	      mode:
1422	      The mode in which this specification is used.  The following modes
1423	      can be signaled:
1424	      mode=generic,
1425	      mode=CELP-cbr,
1426	      mode=CELP-vbr,
1427	      mode=AAC-lbr and
1428	      mode=AAC-hbr.
1429	      Other modes are expected to be defined in future RFCs.  See also
1430	      section 3.3.7 and 4.2 of RFC xxxx.

1432	   Optional general parameters:

1434	      objectType:
1435	      The decimal value from Table 8 in ISO/IEC 14496-1, indicating
1436	      the value of the objectTypeIndication of the transported stream.
1437	      For BIFS streams this parameter MUST be present to signal the
1438	      version of BIFSConfiguration().  Note that objectTypeIndication
1439	      may signal a non-MPEG-4 stream and that the RTP payload format
1440	      defined in this document may not be suitable to carry a stream
1441	      that is not defined by MPEG-4.  The objectType parameter SHOULD
1442	      NOT be set to a value that signals a stream that cannot be
1443	      carried by this payload format.

1445	      constantSize:
1446	      The constant size in octets of each Access Unit for this stream.
1447	      The constantSize and the sizeLength parameters MUST NOT be
1448	      simultaneously present.

1450	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1452	      constantDuration:
1453	      The constant duration of each Access Unit for this stream,
1454	      measured with the same units as the RTP time stamp.

1456	      maxDisplacement:
1457	      The decimal representation of the maximum displacement in time
1458	      of an interleaved AU, as defined in section 3.2.3.3, expressed
1459	      in units of the RTP time stamp clock.
1460	      This parameter MUST be present when interleaving is applied.

1462	      de-interleaveBufferSize:
1463	      The decimal representation in number of octets of the size of
1464	      the de-interleave buffer, described in section 3.2.3.3.
1465	      When interleaving, this parameter MUST be present if the
1466	      calculation of the de-interleave buffer size given in 3.2.3.3
1467	      and based on maxDisplacement and rate(max) under-estimates the
1468	      size of the de-interleave buffer.  If this calculation does not
1469	      under-estimate the size of the de-interleave buffer, then the
1470	      de-interleaveBufferSize parameter SHOULD NOT be present.

1472	   Optional configuration parameters:

1474	      sizeLength:
1475	      The number of bits on which the AU-size field is encoded in the
1476	      AU-header.  The sizeLength and the constantSize parameters MUST
1477	      NOT be simultaneously present.

1479	      indexLength:
1480	      The number of bits on which the AU-Index is encoded in the first
1481	      AU-header.  The default value of zero indicates the absence of
1482	      the AU-Index field in each first AU-header.

1484	      indexDeltaLength:
1485	      The number of bits on which the AU-Index-delta field is encoded
1486	      in any non-first AU-header.  The default value of zero indicates
1487	      the absence of the AU-Index-delta field in each non-first
1488	      AU-header.

1490	      CTSDeltaLength:
1491	      The number of bits on which the CTS-delta field is encoded in
1492	      the AU-header.

1494	      DTSDeltaLength:
1495	      The number of bits on which the DTS-delta field is encoded in
1496	      the AU-header.

1498	      randomAccessIndication:
1499	      A decimal value of zero or one, indicating whether the RAP-flag
1500	      is present in the AU-header.  The decimal value of one indicates
1501	      presence of the RAP-flag, the default value zero its absence.

1503	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1505	      streamStateIndication:
1506	      The number of bits on which the Stream-state field is encoded in
1507	      the AU-header.  This parameter MAY be present when transporting
1508	      MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio
1509	      and MPEG-4 video streams.

1511	      auxiliaryDataSizeLength:
1512	      The number of bits that is used to encode the auxiliary-data-size
1513	      field.

1515	   Applications MAY use more parameters, in addition to those defined
1516	   above.  Each additional parameter MUST be registered with IANA, to
1517	   ensure that there is no clash of names.  Each additional parameter
1518	   MUST be accompanied by a specification in the form of an RFC, MPEG
1519	   standard, or other permanent and readily available reference (the
1520	   "Specification Required" policy defined in RFC 2434 [6]).  Receivers
1521	   MUST tolerate the presence of such additional parameters, but these
1522	   parameters SHALL NOT impact the decoding of receivers that comply to
1523	   this specification.

1525	   Encoding considerations:
1526	   This MIME subtype is defined for RTP transport only.  System
1527	   bitstreams MUST be generated according to MPEG-4 Systems
1528	   specifications (ISO/IEC 14496-1).  Video bitstreams MUST be generated
1529	   according to MPEG-4 Visual specifications (ISO/IEC 14496-2).  Audio
1530	   bitstreams MUST be generated according to MPEG-4 Audio
1531	   specifications (ISO/IEC 14496-3).  The RTP packets MUST be packetized
1532	   according to the RTP payload format defined in RFC xxxx.

1534	   Security considerations:
1535	   As defined in section 5 of RFC xxxx.

1537	   Interoperability considerations:
1538	   MPEG-4 provides a large and rich set of tools for the coding of
1539	   visual objects.  For effective implementation of the standard,
1540	   subsets of the MPEG-4 tool sets have been provided for use in
1541	   specific applications.  These subsets, called 'Profiles', limit the
1542	   size of the tool set a decoder is required to implement.  In order to
1543	   restrict computational complexity, one or more 'Levels' are set for
1544	   each Profile.  A Profile@Level combination allows:
1545	   . a codec builder to implement only the subset of the standard he
1546	     needs, while maintaining interworking with other MPEG-4 devices
1547	     that implement the same combination, and
1548	   . checking whether MPEG-4 devices comply with the standard
1549	     ('conformance testing').

1551	   A stream SHALL be compliant with the MPEG-4 Profile@Level specified
1552	   by the parameter "profile-level-id".  Interoperability between a
1553	   sender and a receiver is achieved by specifying the parameter
1554	   "profile-level-id" in MIME content.  In the capability exchange /
1555	   announcement procedure this parameter may mutually be set to the
1556	   same value.

1558	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1560	   Published specification:
1561	   The specifications for MPEG-4 streams are presented in ISO/IEC
1562	   14496-1, 14496-2, and 14496-3.  The RTP payload format is described
1563	   in RFC xxxx.

1565	   Applications which use this media type:
1566	   Multimedia streaming and conferencing tools.

1568	   Additional information: none

1570	   Magic number(s): none

1572	   File extension(s):
1573	   None.  A file format with the extension .mp4 has been defined for
1574	   MPEG-4 content but is not directly correlated with this MIME type
1575	   for which the sole purpose is RTP transport.

1577	   Macintosh File Type Code(s): none

1579	   Person & email address to contact for further information:
1580	   Authors of RFC xxxx, IETF Audio/Video Transport working group.

1582	   Intended usage: COMMON

1584	   Author/Change controller:
1585	   Authors of RFC xxxx, IETF Audio/Video Transport working group.

1587	4.2 Registration of mode definitions with IANA

1589	   This specification can be used in a number of modes.  The mode of
1590	   operation is signaled using the "mode" MIME parameter, with the
1591	   initial set of values specified in section 4.1.  New modes may be
1592	   defined at any time, as described in section 3.3.7.  These modes
1593	   MUST be registered with IANA, to ensure that there is no clash
1594	   of names.

1596	   A new mode registration MUST be accompanied by a specification in
1597	   the form of an RFC, MPEG standard, or other permanent and readily
1598	   available reference (the "Specification Required" policy defined
1599	   in RFC 2434 [6]).

1601	4.3 Concatenation of parameters

1603	   Multiple parameters SHOULD be expressed as a MIME media type string,
1604	   in the form of a semicolon-separated list of parameter=value pairs
1605	   (for parameter usage examples see sections 3.3.2 up to 3.3.6).

1607	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1609	4.4 Usage of SDP

1611	4.4.1 The a=fmtp keyword

1613	   It is assumed that one typical way to transport the above-described
1614	   parameters associated with this payload format is via a SDP message
1615	   [5] for example transported to the client in reply to a RTSP
1616	   DESCRIBE [8] or via SAP [11].  In that case the (a=fmtp) keyword
1617	   MUST be used as described in RFC 2327 [5], section 6, the syntax
1618	   being then:

1620	   a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>]

1622	5. Security Considerations

1624	   RTP packets using the payload format defined in this specification
1625	   are subject to the security considerations discussed in the RTP
1626	   specification [2].  This implies that confidentiality of the media
1627	   streams is achieved by encryption.  Because the data compression used
1628	   with this payload format is applied end-to-end, encryption may be
1629	   performed on the compressed data so there is no conflict between the
1630	   two operations.  The packet processing complexity of this payload
1631	   type (i.e. excluding media data processing) does not exhibit any
1632	   significant non-uniformity in the receiver side to cause a denial-
1633	   of-service threat.

1635	   However, it is possible to inject non-compliant MPEG streams (Audio,
1636	   Video, and Systems) to overload the receiver/decoder's buffers,
1637	   which might compromise the functionality of the receiver or even
1638	   crash it.  This is especially true for end-to-end systems like MPEG
1639	   where the buffer models are precisely defined.

1641	   MPEG-4 Systems supports stream types including commands that are
1642	   executed on the terminal like OD commands, BIFS commands, etc. and
1643	   programmatic content like MPEG-J (Java(TM) Byte Code) and MPEG-4
1644	   scripts.  It is possible to use one or more of the above in a
1645	   manner non-compliant to MPEG to crash the receiver or make it
1646	   temporarily unavailable.  Senders that transport MPEG-4 content
1647	   SHOULD ensure that such content is MPEG compliant, as defined in the
1648	   compliance part of IEC/ISO 14496 [1].  Receivers that support MPEG-4
1649	   content should prevent malfunctioning of the receiver in case of
1650	   non MPEG compliant content.

1652	   Authentication mechanisms can be used to validate the sender and
1653	   the data to prevent security problems due to non-compliant malignant
1654	   MPEG-4 streams.

1656	   In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems
1657	   streams carrying MPEG-J access units which comprise Java(TM) classes
1658	   and objects.  MPEG-J defines a set of Java APIs and a secure
1659	   execution model.  MPEG-J content can call this set of APIs and
1660	   Java(TM) methods from a set of Java packages supported in the

1662	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1664	   receiver within the defined security model.  According to this
1665	   security model, downloaded byte code is forbidden to load libraries,
1666	   define native methods, start programs, read or write files, or read
1667	   system properties.
1668	   Receivers can implement intelligent filters to validate the buffer
1669	   requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J,
1670	   MPEG-4 scripts) commands in the streams.  However, this can increase
1671	   the complexity significantly.

1673	   Implementors of MPEG-4 streaming over RTP who also implement MPEG-4
1674	   scripts (subset of ECMAScript) MUST ensure that the action of such
1675	   scripts is limited solely to the domain of the single presentation
1676	   in which they reside (thus disallowing session to session
1677	   communication, access to local resources and storage, etc). Though
1678	   loading static network-located resources (such as media) into the
1679	   presentation should be permitted, network access by scripts MUST be
1680	   restricted to such (media) download.

1682	6. Acknowledgements

1684	   This document evolved through several revisions thanks to
1685	   contributions by people from the ISMA forum, from the IETF AVT
1686	   Working Group and from the 4-on-IP ad-hoc group within MPEG.  The
1687	   authors wish to thank all involved people, and in particular Andrea
1688	   Basso, Stephen Casner, M. Reha Civanlar, Carsten Herpel, John
1689	   Lazaro, Zvi Lifshitz, Young-kwon Lim, Alex MacAulay, Bill May,
1690	   Colin Perkins, Dorairaj V and Stephan Wenger for their valuable
1691	   comments and support.

1693	7. References

1695	7.1 Normative references

1697	   [1] ISO/IEC International Standard 14496 (MPEG-4); "Information
1698	   technology - Coding of audio-visual objects", January 2000

1700	   [2] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson RTP, "A
1701	   Transport Protocol for Real Time Applications", RFC 1889, Internet
1702	   Engineering Task Force, January 1996.

1704	   [3] N. Freed, J. Klensin, J. Postel, " Multipurpose Internet Mail
1705	   Extensions (MIME) Part Four: Registration Procedures", RFC 2048,
1706	   Internet Engineering Task Force, November 1996.

1708	   [4] S. Bradner, "Key words for use in RFCs to Indicate Requirement
1709	   Levels", RFC 2119, March 1997.

1711	   [5] M. Handley, V. Jacobson, "SDP: Session Description Protocol",
1712	   RFC 2327, Internet Engineering Task Force, April 1998.

1714	   [6] T. Narten, H. Alvestrand, " Guidelines for Writing an IANA
1715	   Considerations Section in RFCs", RFC 2434, October 1998.

1717	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1719	7.2 Informative references

1721	   [7] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload
1722	   format for MPEG1/MPEG2 Video", RFC 2250, January 1998.

1724	   [8] H. Schulzrinne, A. Rao, R. Lanphier, "RTSP: Real-Time Session
1725	   Protocol", RFC 2326, Internet Engineering Task Force, April 1998.

1727	   [9] C. Perkins, O. Hodson, "Options for Repair of Streaming Media"
1728	   RFC 2354, Internet Engineering Task Force, June 1998.

1730	   [10] H. Schulzrinne, J. Rosenberg, "An RTP Payload Format for
1731	   Generic Forward Error Correction", RFC 2733, Internet Engineering
1732	   Task Force, December 1999.

1734	   [11] M. Handley, C. Perkins, E. Whelan, "SAP: Session Announcement
1735	   Protocol", RFC 2974, Internet Engineering Task Force, October 2000.

1737	   [12] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP
1738	   payload format for MPEG-4 Audio/Visual streams", RFC 3016, Internet
1739	   Engineering Task Force, November 2000.

1741	8. Author Addresses

1743	   Jan van der Meer
1744	   Philips Electronics, MP4Net
1745	   Prof Holstlaan 4
1746	   Building WDB-1
1747	   5600 JZ Eindhoven
1748	   Netherlands
1749	   Email : jan.vandermeer@philips.com

1751	   David Mackie
1752	   Apple Computer, Inc.
1753	   One Infinite Loop, MS:302-2LF
1754	   Cupertino  CA 95014
1755	   Email: dmackie@apple.com

1757	   Viswanathan Swaminathan
1758	   Sun Microsystems Inc.
1759	   901 San Antonio Road, M/S UMPK15-214
1760	   Palo Alto, CA 94303
1761	   Email: viswanathan.swaminathan@sun.com

1763	   David Singer
1764	   Apple Computer, Inc.
1765	   One Infinite Loop, MS:302-3MT
1766	   Cupertino  CA 95014
1767	   Email: singer@apple.com

1769	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1771	   Philippe Gentric
1772	   Philips Electronics, MP4Net
1773	   51 rue Carnot
1774	   92156 Suresnes
1775	   France
1776	   e-mail: philippe.gentric@philips.com

1778	   Full Copyright Statement

1780	   Copyright (C) The Internet Society (August 2003).  All Rights
1781	   Reserved.

1783	   This document and translations of it may be copied and furnished to
1784	   others, and derivative works that comment on or otherwise explain
1785	   it or assist in its implementation may be prepared, copied,
1786	   published and distributed, in whole or in part, without restriction
1787	   of any kind, provided that the above copyright notice and this
1788	   paragraph are included on all such copies and derivative works.
1789	   However, this document itself may not be modified in any way, such
1790	   as by removing the copyright notice or references to the Internet
1791	   Society or other Internet organizations, except as needed for the
1792	   purpose of developing Internet standards in which case the
1793	   procedures for copyrights defined in the Internet Standards process
1794	   MUST be followed, or as required to translate it into languages
1795	   other than English.

1797	   The limited permissions granted above are perpetual and will
1798	   not be revoked by the Internet Society or its successors or
1799	   assigns.

1801	   This document and the information contained herein is provided on
1802	   an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
1803	   ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
1804	   IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
1805	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1806	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1808	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1810	APPENDIX: Usage of this payload format

1812	Appendix A. Interleave analysis

1814	A.1 Introduction

1816	   In this appendix interleaving issues are discussed.  Some general
1817	   notes are provided on de-interleaving and error concealment, while
1818	   a number of interleaving patterns are examined, in particular
1819	   for determining the maximum displacement in time and the size of
1820	   the de-interleave buffer.  In these examples, the maximum
1821	   displacement is cited in terms of an access unit count, for ease of
1822	   reading.  In actual streams, it is signaled in units of the RTP
1823	   time stamp clock.

1825	A.2 De-interleaving and error concealment

1827	   This appendix does not describe any details on de-interleaving and
1828	   error concealment, as the control of the AU decoding and error
1829	   concealment process has little to do with interleaving.  If the
1830	   next AU to be decoded is present and there is sufficient storage
1831	   available for the decoded AU, then decode it now.  If not, wait.
1832	   When the decoding deadline is reached (i.e., the time when decoding
1833	   must begin in order to be completed by the time the AU is to be
1834	   presented), or if the decoder is some hardware that presents a
1835	   constant delay between initiation of decoding of an AU and
1836	   presentation of that AU, then decoding must begin at that deadline
1837	   time.

1839	   If the next AU to be decoded is not present when the decoding
1840	   deadline is reached, then that AU is lost so the receiver must take
1841	   whatever error concealment measures is deemed appropriate.  The
1842	   play-out delay may need to be adjusted at that point (especially if
1843	   other AUs have also missed their deadline recently).  Or, if it was
1844	   a momentary delay, and maintaining the latency is important, then
1845	   the receiver should minimize the glitch and continue processing
1846	   with the next AU.

1848	A.3 Simple Group interleave

1850	A.3.1 Introduction

1852	   An example of regular interleave is when packets are formed into
1853	   groups.  If the 'stride' of the interleave (the distance between
1854	   interleaved AUs) is N, packet 0 could contain AU(0), AU(N), AU(2N),
1855	   and so on; packet 1 could contain AU(1), AU(1+N), AU(1+2N), and so
1856	   on.  If there are M access units in a packet, then there are M*N
1857	   access units in the group.

1859	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1861	   An example with N=M=3 follows; note that this is the same example
1862	   as given in section 2.5 and that a fixed time duration per Access
1863	   Unit is assumed:

1865	   Packet   Time stamp   Carried AUs      AU-Index, AU-Index-delta
1866	   P(0)     T[0]         0, 3, 6          0, 2, 2
1867	   P(1)     T[1]         1, 4, 7          0, 2, 2
1868	   P(2)     T[2]         2, 5, 8          0, 2, 2
1869	   P(3)     T[9]         9,12,15          0, 2, 2

1871	   In this example the AU-Index is present in the first AU-header and
1872	   coded with the value 0, as required for fixed duration AUs.  The
1873	   position of the first AU of each packet within the group is defined
1874	   by the RTP time stamp, while the AU-Index-delta field indicates the
1875	   position of subsequent AUs relative to the first AU in the packet.
1876	   All AU-Index-delta fields are coded with the value N-1, equal to 2
1877	   in this example.  Hence the RTP time stamp and the AU-Index-delta are
1878	   used to reconstruct the original order.  See also section 3.2.3.2.

1880	A.3.2 Determining the de-interleave buffer size

1882	   For the regular pattern as in this example, figure 6 in section
1883	   3.2.3.3 shows that the de-interleave buffer stores at most 4 AUs.  A
1884	   de-interleaveBufferSize value may be signaled that is at least
1885	   equal to the total number of octets of any 4 "early" AUs that are
1886	   stored at the same time.

1888	A.3.3 Determining the maximum displacement

1890	   For the regular pattern as in this example, figure 7 in section 3.3
1891	   shows that the maximum displacement in time equals 5 AU periods.
1892	   Hence the minimum maxDisplacement value that must be signaled is 5
1893	   AU periods.  In case each AU has the same size, this maxDisplacement
1894	   value over-estimates the de-interleave buffer size with one AU.
1895	   However, note that in case of variable AU sizes the total size of
1896	   any 4 "early" AUs that must be stored at the same time may exceed
1897	   maxDisplacement times the maximum bitrate, in which case the
1898	   de-interleaveBufferSize must be signaled.

1900	A.4 More subtle group interleave

1902	A.4.1 Introduction

1904	   Another example of forming packets with group interleave is given
1905	   below.  In this example the packets are formed such that the loss of
1906	   two subsequent RTP packets does not cause the loss of two subsequent
1907	   AUs.  Note that in this example the RTP time stamps of packet 3 and
1908	   packet 4 are earlier than the RTP time stamps of packets 1 and 2,
1909	   respectively; a fixed time duration per Access Unit is assumed.

1911	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1913	   Packet   Time stamp   Carried AUs      AU-Index, AU-Index-delta
1914	   0        T[0]         0,  5            0, 4
1915	   1        T[2]         2,  7            0, 4
1916	   2        T[4]         4,  9            0, 4
1917	   3        T[1]         1,  6            0, 4
1918	   4        T[3]         3,  8            0, 4
1919	   5        T[10]       10, 15            0, 4
1920	   and so on ..

1922	   In this example the AU-Index is present in the first AU-header and
1923	   coded with the value 0, as required for AUs with a fixed duration.
1924	   To reconstruct the original order, the RTP time stamp and the
1925	   AU-Index-delta (coded with the value 4) are used.  See also
1926	   section 3.2.3.2.

1928	A.4.2 Determining the de-interleave buffer size

1930	   From figure 8 it can be to determined that at most 5 "early" AUs
1931	   are to be stored.  If the AUs are of constant size, then this value
1932	   equals 5 times the AU size.  The minimum size of the de-interleave
1933	   buffer equals the maximum total number of octets of the "early" AUs
1934	   that are to be stored at the same time.  This gives the minimum
1935	   value of the de-interleaveBufferSize that may be signaled.

1937	                              +--+--+--+--+--+--+--+--+--+--+
1938	   Interleaved AUs            | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
1939	                              +--+--+--+--+--+--+--+--+--+--+
1940	                                -  -  5  -  5  -  2  7  4  9
1941	                                            7     4  9  5
1942	   "Early" AUs                                    5     6
1943	                                                  7     7
1944	                                                  9     9

1946	   Figure 8: Storage of "early" AUs in the de-interleave buffer per
1947	             interleaved AU.

1949	A.4.3 Determining the maximum displacement

1951	   From figure 9 it can be seen that the maximum displacement in time
1952	   equals 8 AU periods.  Hence the minimum maxDisplacement value to be
1953	   signaled is 8 AU periods.

1955	                                    +--+--+--+--+--+--+--+--+--+--+
1956	   Interleaved AUs                  | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
1957	                                    +--+--+--+--+--+--+--+--+--+--+

1959	   Earliest not yet present AU        -  1  1  1  1  1  -  3  -  -

1961	   Figure 9: The earliest not yet present AU for each AU in the
1962	             interleaving pattern.

1964	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

1966	   In case each AU has the same size, the found maxDisplacement value
1967	   over-estimates the de-interleave buffer size with three AUs.
1968	   However, in case of variable AU sizes the total size of any 5
1969	   "early" AUs stored at the same time may exceed maxDisplacement
1970	   times the maximum bitrate, in which case de-interleaveBufferSize
1971	   must be signaled.

1973	A.5 Continuous interleave

1975	A.5.1 Introduction

1977	   In continuous interleave, once the scheme is 'primed', the number
1978	   of AUs in a packet exceeds the 'stride' (the distance between
1979	   them).  This shortens the buffering needed, smooths the data-flow,
1980	   and gives slightly larger packets -- and thus lower overhead -- for
1981	   the same interleave.  For example, here is a continuous interleave
1982	   also over a stride of 3 AUs, but with 4 AUs per packet, for a run
1983	   of 20 AUs.  This shows both how the scheme 'starts up' and how it
1984	   finishes.  Once again, the example assumes fixed time duration per
1985	   Access Unit.

1987	   Packet   Time-stamp   Carried AUs         AU-Index, AU-Index-delta
1988	   0        T[0]                      0      0
1989	   1        T[1]                  1   4      0  2
1990	   2        T[2]              2   5   8      0  2  2
1991	   3        T[3]          3   6   9  12      0  2  2  2
1992	   4        T[7]          7  10  13  16      0  2  2  2
1993	   5        T[11]        11  14  17  20      0  2  2  2
1994	   6        T[15]        15  18              0  2
1995	   7        T[19]        19                  0

1997	   In this example the AU-Index is present in the first AU-header and
1998	   coded with the value 0, as required for AUs with a fixed duration.
1999	   To reconstruct the original order, the RTP time stamp and the
2000	   AU-Index-delta (coded with the value 2) are used.  See also 3.2.3.2.
2001	   Note that this example has RTP time-stamps in increasing order.

2003	A.5.2 Determining the de-interleave buffer size

2005	   For this example the de-interleave buffer size can be derived from
2006	   figure 10.  The maximum number of "early" AUs is three.  If the AUs
2007	   are of constant size, then this value equals 3 times the AU size.
2008	   Compared to the example in A.2, for constant size AUs the
2009	   de-interleave buffer size is reduced from 4 to 3 times the AU size,
2010	   while maintaining the same 'stride'.

2012	RFC xxxx        Transport of MPEG-4 Elementary Streams      August 2003

2014	                        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
2015	   Interleaved AUs      | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16|
2016	                        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
2017	                          -  -  -  4  -  -  4  8  -  -  8 12  -  -
2018	                                            5           9
2019	   "Early" AUs                              8          12

2021	   Figure 10: Storage of "early" AUs in the de-interleave buffer per
2022	              interleaved AU.

2024	A.5.3 Determining the maximum displacement

2026	   For this example the maximum displacement has a value of 5 AU
2027	   periods.  See figure 11.  Compared to the example in A.2, the maximum
2028	   displacement does not decrease, though in fact less de-interleave
2029	   buffering is required.

2031	                        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
2032	   Interleaved AUs      | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16|
2033	                        +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
2034	   Earliest not yet
2035	        present AU        -  -  2  -  3  3  -  -  7  7  -  - 11 11

2037	   Figure 11: The earliest not yet present AU for each AU in the
2038	              interleaving pattern.