idnits 2.17.1 

draft-ietf-avt-mpeg4-simple-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 32 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'SHALL not' in this paragraph:
     
     The AU-headers are configured using MIME format parameters and MAY
     be empty. If the AU-header is configured empty, the AU-headers-length
     field SHALL not be present and consequently the AU Header Section is
     empty. If the AU-header is not configured empty, then the
     AU-headers-length is a two octet field that specifies the length in bits
     of the immediately following AU-headers, excluding the padding bits.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'SHALL not' in this paragraph:
     
     Applications MAY use more parameters, in addition to those defined
     above. Receivers MUST tolerate the presence of such additional
     parameters, but these parameters SHALL not impact the decoding of
     receivers that comply to this specification.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (December 2002) is 7801 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0' is mentioned on line 1495, but not defined

  == Missing Reference: '9' is mentioned on line 1441, but not defined

  == Missing Reference: '20' is mentioned on line 1480, but not defined

  == Missing Reference: '7' is mentioned on line 1499, but not defined

  == Missing Reference: '11' is mentioned on line 1500, but not defined

  == Missing Reference: '15' is mentioned on line 1501, but not defined

  == Missing Reference: '19' is mentioned on line 1502, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550)

  ** Obsolete normative reference: RFC 3016 (ref. '5') (Obsoleted by RFC 6416)

  ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566)


     Summary: 7 errors (**), 0 flaws (~~), 11 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                         J. van der Meer
2	Internet Draft                                      Philips Electronics
3	                                                              D. Mackie
4	                                                     Cisco Systems Inc.
5	                                                         V. Swaminathan
6	                                                  Sun Microsystems Inc.
7	                                                              D. Singer
8	                                                         Apple Computer
9	                                                             P. Gentric
10	                                                    Philips Electronics

12	                                                              June 2002
13	                                                  Expires December 2002

15	   Document: draft-ietf-avt-mpeg4-simple-03.txt

17	   Transport of MPEG-4 Elementary Streams

19	Status of this Memo

21	   This document is an Internet-Draft and is in full conformance with
22	   all provisions of Section 10 of RFC2026.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF), its areas, and its working groups. Note that
26	   other groups may also distribute working documents as Internet-
27	   Drafts. Internet-Drafts are draft documents valid for a maximum of
28	   six months and may be updated, replaced, or obsoleted by other
29	   documents at any time. It is inappropriate to use Internet- Drafts
30	   as reference material or to cite them other than as "work in
31	   progress."

33	   The list of current Internet-Drafts can be accessed at
34	   http://www.ietf.org/ietf/1id-abstracts.txt
35	   The list of Internet-Draft Shadow Directories can be accessed at
36	   http://www.ietf.org/shadow.html.

38	   This specification is a product of the Audio/Video Transport working
39	   group within the Internet Engineering Task Force. Comments are
40	   solicited and should be addressed to the working group's mailing
41	   list at avt@ietf.org and/or the authors.

43	   << Note for the RFC editor: xxxx should be replaced with the RFC
44	   number that will be assigned. >>

46	Abstract

48	   The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in
49	   ISO that produced the MPEG-4 standard. MPEG defines tools to
50	   compress content such as audio-visual information into elementary
51	   streams. This specification defines a simple, but generic RTP
52	   payload format for transport of any non-multiplexed MPEG-4
53	   elementary stream.

55	Table of Contents

57	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . .   3
58	   2.  Carriage of MPEG-4 elementary streams over RTP . . . . . . .   4
59	   2.1.  Introduction . . . . . . . . . . . . . . . . . . . . . . .   4
60	   2.2.  MPEG Access Units  . . . . . . . . . . . . . . . . . . . .   4
61	   2.3.  Concatenation of Access Units  . . . . . . . . . . . . . .   4
62	   2.4.  Fragmentation of Access Units  . . . . . . . . . . . . . .   5
63	   2.5.  Interleaving . . . . . . . . . . . . . . . . . . . . . . .   5
64	   2.6.  Time stamp information . . . . . . . . . . . . . . . . . .   6
65	   2.7.  Random Access Indication . . . . . . . . . . . . . . . . .   6
66	   2.8.  State indication of MPEG-4 system streams  . . . . . . . .   6
67	   2.9.  Carriage of auxiliary information  . . . . . . . . . . . .   7
68	   2.10. MIME format parameters and configuring conditional field .   7
69	   2.11. Global structure of payload format . . . . . . . . . . . .   7
70	   2.12. Modes to transport MPEG-4 streams  . . . . . . . . . . . .   8
71	   2.13. Alignment with RFC 3016  . . . . . . . . . . . . . . . . .   8
72	   3.  Payload format . . . . . . . . . . . . . . . . . . . . . . .   9
73	   3.1.  Usage of RTP header fields and RTCP  . . . . . . . . . . .   9
74	   3.2.  RTP payload structure  . . . . . . . . . . . . . . . . . .  10
75	   3.2.1.  The AU Header Section  . . . . . . . . . . . . . . . . .  10
76	   3.2.1.1.  The AU-header  . . . . . . . . . . . . . . . . . . . .  10
77	   3.2.2.  The Auxiliary Section  . . . . . . . . . . . . . . . . .  13
78	   3.2.3.  The Access Unit Data Section . . . . . . . . . . . . . .  13
79	   3.2.3.1.  Fragmentation  . . . . . . . . . . . . . . . . . . . .  14
80	   3.2.3.2.  Interleaving . . . . . . . . . . . . . . . . . . . . .  14
81	   3.2.3.3.  Constraints for interleaving . . . . . . . . . . . . .  15
82	   3.3.  Usage of this specification  . . . . . . . . . . . . . . .  16
83	   3.3.1.  General  . . . . . . . . . . . . . . . . . . . . . . . .  16
84	   3.3.2.  The generic mode . . . . . . . . . . . . . . . . . . . .  16
85	   3.3.3.  Constant bit rate CELP . . . . . . . . . . . . . . . . .  17
86	   3.3.4.  Variable bit rate CELP . . . . . . . . . . . . . . . . .  18
87	   3.3.5.  Low bit rate AAC . . . . . . . . . . . . . . . . . . . .  19
88	   3.3.6.  High bit rate AAC  . . . . . . . . . . . . . . . . . . .  19
89	   3.3.7.  Additional modes . . . . . . . . . . . . . . . . . . . .  20
90	   4.  IANA considerations  . . . . . . . . . . . . . . . . . . . .  21
91	   4.1.  MIME type registration . . . . . . . . . . . . . . . . . .  21
92	   4.2.  Concatenation of parameters  . . . . . . . . . . . . . . .  26
93	   4.3.  Usage of SDP . . . . . . . . . . . . . . . . . . . . . . .  26
94	   4.3.1.  The a=fmtp keyword . . . . . . . . . . . . . . . . . . .  26
95	   5.  Security considerations  . . . . . . . . . . . . . . . . . .  27
96	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . .  28
97	   7.  References . . . . . . . . . . . . . . . . . . . . . . . . .  28
98	   8.  Author addresses . . . . . . . . . . . . . . . . . . . . . .  29
99	       APPENDIX: Usage of this payload format . . . . . . . . . . .  30
100	       A. Examples of delay analysis with interleave  . . . . . . .  30
101	       A.1 Group interleave . . . . . . . . . . . . . . . . . . . .  30
102	       A.2 Continuous interleave  . . . . . . . . . . . . . . . . .  31

104	1. Introduction

106	   The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29
107	   that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4
108	   standards [1]. The MPEG-4 standard specifies compression of
109	   audio-visual data into for example an audio or video elementary
110	   stream. In the MPEG-4 standard, these streams take the form of
111	   audiovisual objects that may be arranged into an audio-visual scene
112	   by means of a scene description. Each MPEG-4 elementary stream
113	   consists of a sequence of Access Units; examples of an Access Unit
114	   (AU) are an audio frame and a video picture.

116	   This specification defines a general and configurable payload
117	   structure to transport MPEG-4 elementary streams, in particular
118	   MPEG-4 audio (including speech) streams, MPEG-4 video streams and
119	   also MPEG-4 systems streams, such as BIFS (BInary Format for
120	   Scenes), OCI (Object Content Information), OD (Object Descriptor)
121	   and IPMP (Intellectual Property Management and Protection) streams.
122	   The RTP payload defined in this document is simple to implement and
123	   reasonably efficient. It allows for optional interleaving of Access
124	   Units (such as audio frames) to increase error resiliency in packet
125	   loss.

127	   Though the RTP payload format defined in this document is capable
128	   to transport any MPEG-4 stream, more dedicated formats may exist,
129	   such as RFC 3016 for transport of MPEG-4 video (part 2).

131	   Configuration of the payload is provided to accommodate transport
132	   of any MPEG-4 stream at any possible bit rate. However, for a
133	   specific MPEG-4 elementary stream typically only very few
134	   configurations are needed. So as to allow for the design of
135	   simplified, but dedicated receivers, this specification requires
136	   that specific modes are defined for transport of MPEG-4 streams.
137	   This document defines modes for MPEG-4 CELP and AAC streams, as
138	   well as a generic mode that can be used to transport any MPEG-4
139	   stream. In the future new RFCs are expected to specify additional
140	   modes for transport of MPEG-4 streams.

142	   The RTP payload format defined in this document specifies carriage
143	   of system-related information that is often equivalent to the
144	   information that may be contained in the MPEG-4 SL. This
145	   document does not prescribe how to transcode or map information
146	   from the SL to fields defined in the RTP payload format. Such
147	   processing, if any, is left to the discretion of the application.
148	   However, to anticipate the need for transport of any additional
149	   system-related information in future, an auxiliary field can be
150	   configured that may carry any such data.

152	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
153	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
154	   this document are to be interpreted as described in RFC 2119 [3].

156	2. Carriage of MPEG-4 elementary streams over RTP

158	2.1 Introduction

160	   With this payload format a single MPEG-4 elementary stream can be
161	   transported. Information on the type of MPEG-4 stream carried in
162	   the payload is conveyed by MIME format parameters, for example in
163	   an SDP [6] message or by other means. These MIME format parameters
164	   specify the configuration of the payload. To allow for simplified
165	   and dedicated receivers, a MIME format parameter is available
166	   to signal a specific mode of using this payload. A mode definition
167	   MAY include the type of MPEG-4 elementary stream as well as the
168	   applied configuration, so as to avoid the need in receivers
169	   to parse all MIME format parameters. The applied mode MUST be
170	   signalled.

172	2.2 MPEG Access Units

174	   For carriage of compressed audio-visual data MPEG defines Access
175	   Units. An MPEG Access Unit (AU) is the smallest data entity to
176	   which timing information is attributed. In case of audio an Access
177	   Unit may represent an audio frame and in case of video a picture.
178	   MPEG Access Units are by definition byte aligned. If for example an
179	   audio frame is not byte aligned, up to 7 zero-padding bits MUST be
180	   inserted at the end of the frame to achieve a byte-aligned Access
181	   Unit. MPEG-4 decoders MUST be able to decode AUs in which such
182	   padding is applied.

184	   Consistent with the MPEG-4 specification, this document requires
185	   that each MPEG-4 part 2 video Access Unit includes all the coded
186	   data of a picture, any video stream headers that may precede the
187	   coded picture data, and any video stream stuffing that may follow
188	   it, up to, but not including the startcode indicating the start of
189	   a new video stream or the next Access Unit.

191	2.3 Concatenation of Access Units

193	   Frequently it is possible to carry multiple Access Units in one RTP
194	   packet. This is particularly useful for audio; for example, when
195	   AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC
196	   frames contain on average approximately 200 octets. On a LAN with a
197	   1500 octet MTU this would allow on average 7 complete AAC frames to
198	   be carried per AAC packet.

200	   Access Units may have a fixed size in octets, but a variable size
201	   is also possible. To facilitate parsing in case of multiple
202	   concatenated AUs in one RTP packet, the size of each AU is made
203	   known to the receiver. When concatenating in case of a constant AU
204	   size, this size is communicated "out of band" through a MIME format
205	   parameter. When concatenating in case of variable size AUs, the RTP
206	   payload carries "in band" an AU size field for each contained AU.
207	   In combination with the RTP payload length the size information
208	   allows the RTP payload to be split by the receiver back into the
209	   individual AUs.

211	   To simplify the implementation of RTP receivers, it is required
212	   that when multiple AUs are carried in an RTP packet, each AU MUST
213	   be complete, i.e. the number of AUs in an RTP packet MUST be
214	   integral.

216	2.4 Fragmentation of Access Units

218	   MPEG allows for very large Access Units. Since most IP networks
219	   have significantly smaller MTU sizes, this payload format allows
220	   for the fragmentation of an Access Unit over multiple RTP packets
221	   so as to avoid IP layer fragmentation. To simplify the
222	   implementation of RTP receivers, an RTP packet SHALL either carry
223	   one or more complete Access Units or a single fragment of one
224	   Access Unit.

226	2.5 Interleaving

228	   When an RTP packet carries a contiguous sequence of Access Units,
229	   the loss of such a packet can result in a "decoding gap" for the
230	   user. One method to alleviate this problem is to allow for the
231	   Access Units to be interleaved in the RTP packets. For a modest
232	   cost in latency and implementation complexity, significant error
233	   resiliency to packet loss can be achieved.

235	   To support optional interleaving of Access Units, this payload
236	   format allows for index information to be sent for each Access Unit.
237	   The RTP sender is free to choose the interleaving pattern without
238	   propagating this information to the receiver(s). Indeed the sender
239	   could dynamically adjust the interleaving pattern based on the
240	   Access Unit size, error rates, etc. The RTP receiver does not need
241	   to know the interleaving pattern used, it only needs to extract the
242	   index information of the Access Unit and insert the Access Unit
243	   into the appropriate sequence in the rendering queue. An example of
244	   interleaving is given below.

246	   Assume that an RTP packet contains 3 AUs, and that the AUs are
247	   numbered 1, 2, 3, 4, etc. If an interleaving group length of 9 is
248	   chosen, then RTP packet(i) contains the following AU(n):

250	   RTP packet(1):  AU(1),  AU(4),  AU(7)
251	   RTP packet(2):  AU(2),  AU(5),  AU(8)
252	   RTP packet(3):  AU(3),  AU(6),  AU(9)
253	   RTP packet(4):  AU(10), AU(13), AU(16)
254	   RTP packet(5):  AU(11), AU(14), AU(17)
255	   Etc.

257	2.6 Time stamp information

259	   The RTP time stamp MUST carry the sampling instance of the first AU
260	   (fragment) in the RTP packet. When multiple AUs are carried within
261	   an RTP packet, the time stamps of subsequent AUs can be calculated
262	   if the frame period of each AU is known. For audio and video this
263	   is possible if the frame rate is constant. However, in some cases
264	   it is not possible to make such calculation, for example for
265	   variable frame rate video and for MPEG-4 BIFS streams carrying
266	   composition information. To support such cases, this payload format
267	   can be configured to carry a time stamp in the RTP payload for each
268	   contained Access Unit. A time stamp MAY be conveyed in the RTP
269	   payload only for non-first AUs in the RTP packet, and SHALL NOT be
270	   conveyed for the first AU (fragment), as the time stamp for the
271	   latter is carried by the RTP time stamp.

273	   MPEG-4 defines two type of time stamps, the composition time stamp
274	   (CTS) and the decoding time stamp (DTS). The CTS represents the
275	   sampling instance of an AU, and hence the CTS is equivalent to the
276	   RTP time stamp. The DTS may be used only in MPEG-4 video streams
277	   that use bi-directional coding, i.e. when pictures are predicted in
278	   both forward and backward direction by using either a reference
279	   picture in the past, or a reference picture in the future. The DTS
280	   cannot be carried in the RTP header. In some cases the DTS can be
281	   derived from the RTP time stamp using frame rate information; this
282	   requires deep parsing in the video stream, which may be considered
283	   objectionable. But if the video frame rate is variable, the required
284	   information may not even be present in the video stream. For both
285	   reasons, the capability has been defined to optionally carry the
286	   DTS in the RTP payload for each contained Access Unit.

288	   Since RTP time stamps may be re-stamped by RTP devices, each time
289	   stamp contained in the RTP payload is coded differentially, the CTS
290	   from the RTP time stamp, and the DTS from the CTS, so as to avoid
291	   extensive parsing by re-stamping devices.

293	2.7 Random access indication

295	   Random access to the content of MPEG-4 elementary streams may be
296	   possible at some but not all Access Units. To signal Access Units
297	   where random access is possible, a random access point flag can
298	   optionally be carried in the RTP payload for each contained Access
299	   Unit.

301	2.8 State indication of MPEG-4 system streams

303	   ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to
304	   convey state information when transporting MPEG-4 system streams,
305	   this payload format allows for the optional carriage in the RTP
306	   payload of the stream state for each contained Access Unit. The
307	   indication of stream states is particularly useful when repeating
308	   AUs according to the carousel mechanism defined in ISO/IEC 14496-1.

310	2.9 Carriage of auxiliary information.

312	   This payload format defines a specific field to carry auxiliary
313	   data. The auxiliary data field is preceded by a field that specifies
314	   the length of the auxiliary data, so as to facilitate skipping of
315	   the data without parsing it. The coding of the auxiliary data is not
316	   defined in this document, but is left to the discretion of
317	   applications. Receivers that have knowledge of the auxiliary data
318	   MAY decode the auxiliary data, but receivers without knowledge of
319	   such data MUST skip the auxiliary data field.

321	2.10 MIME format parameters and configuring conditional fields

323	   To support the features described in the previous sections several
324	   fields are defined for carriage in the RTP payload. However, their
325	   use strongly depends on the type of MPEG-4 elementary stream that
326	   is carried. Sometimes a specific field is needed with a certain
327	   length, while in other cases such field is not needed at all. To be
328	   efficient in either case, the fields to support these features are
329	   configurable by means of MIME format parameters. In general, a MIME
330	   format parameter defines the presence and length of the associated
331	   field. A length of zero indicates absence of the field. As a
332	   consequence, parsing of the payload requires knowledge of MIME
333	   format parameters. The MIME format parameters are conveyed to the
334	   receiver via SDP [6] messages or through other means.

336	2.11 Global structure of payload format

338	   The RTP payload following the RTP header, contains three byte
339	   aligned data sections, of which the first two MAY be empty. See
340	   figure 1.

342	          +---------+-----------+-----------+---------------+
343	          | RTP     | AU Header | Auxiliary | Access Unit   |
344	          | Header  | Section   | Section   | Data Section  |
345	          +---------+-----------+-----------+---------------+

347	                    <----------RTP Packet Payload----------->

349	   Figure 1: Data sections within an RTP packet

351	   The first data section is the AU (Access Unit) Header Section, that
352	   contains one or more AU-headers; however, each AU-header MAY be
353	   empty, in which case the entire AU Header Section is empty. The
354	   second section is the Auxiliary Section, containing auxiliary data;
355	   this section MAY also be configured empty. The third section is the
356	   Access Unit Data Section, containing either a single fragment of
357	   one Access Unit or one or more complete Access Units. The Access
358	   Unit Data Section is never empty.

360	2.12 Modes to transport MPEG-4 streams

362	   While it is possible to build fully configurable receivers capable
363	   of receiving any MPEG-4 stream, this specification also allows for
364	   the design of simplified, but dedicated receivers, that are capable
365	   for example of receiving only one type of MPEG-4 stream. This
366	   is achieved by requiring that specific modes be defined for using
367	   this specification. Each mode may define constraints for transport
368	   of one or more type of MPEG-4 streams, for instance on the payload
369	   configuration.

371	   The applied mode MUST be signalled. Signalling the mode is
372	   particularly important for receivers that are only capable of
373	   decoding one or more specific modes. Such receivers need to
374	   determine whether the applied mode is supported, so as to avoid
375	   problems with processing of payloads that are beyond the
376	   capabilities of the receiver.

378	   In this document several modes are defined for transport of MPEG-4
379	   CELP and AAC streams, as well as a generic mode that can be used
380	   for any MPEG-4 stream. In future, new RFCs are expected to specify
381	   additional modes of using this specification. New modes can be
382	   defined as deemed appropriate, typically by specifications that are
383	   hierarchically higher than this payload format. However, each mode
384	   MUST be in full compliance with this specification.

386	2.13 Alignment with RFC 3016

388	   This payload can be configured to be nearly identical to the
389	   payload format defined in RFC 3016 [5] for the MPEG-4 video
390	   configurations recommended in RFC 3016. Hence, receivers that
391	   comply with RFC 3016 can decode such RTP payload, providing that
392	   additional packets containing video decoder configuration (VO,
393	   VOL, VOSH) are inserted in the stream, as required by RFC 3016.
394	   Conversely, receivers that comply with the specification in this
395	   document SHOULD be able to decode payloads, names and parameters
396	   defined for MPEG-4 video in RFC 3016. In this respect it is
397	   strongly recommended to implement the ability to ignore "in band"
398	   video decoder configuration packets in the RFC 3016 payload.

400	   Note the "out of band" availability of the video decoder
401	   configuration is optional in RFC 3016. To achieve maximum
402	   interoperability with the RTP payload format defined in this
403	   document, applications that use RFC 3016 to transport MPEG-4 video
404	   (part 2) are recommended to make the video decoder configuration
405	   available as a MIME parameter.

407	3. Payload Format

409	3.1 Usage of RTP Header Fields and RTCP

411	   Payload Type (PT): The assignment of an RTP payload type for this
412	   RTP packet format is outside the scope of this document, and will
413	   not be specified here. It is expected that the RTP profile for a
414	   particular class of applications will assign a payload type for
415	   this encoding, or if that is not done, then a payload type in the
416	   dynamic range shall be chosen.

418	   Marker (M) bit: The M bit is set to 1 to indicate that the RTP
419	   packet payload includes the end of each Access Unit of which data
420	   is contained in this RTP packet. As the payload either carries one
421	   or more complete Access Units or a single fragment of an Access
422	   Unit, the M bit is always set to 1, except when the packet carries
423	   a single fragment of an Access Unit that is not the last one.

425	   Extension (X) bit: Defined by the RTP profile used.

427	   Sequence Number: The RTP sequence number SHOULD be generated by
428	   the sender with a constant random offset.

430	   Timestamp: Indicates the sampling instance of the first AU
431	   contained in the RTP payload. This sampling instance is equivalent
432	   to the CTS in the MPEG-4 time domain. When using SDP the clock rate
433	   of the RTP time stamp MUST be expressed using the "rtpmap"
434	   attribute. If an MPEG-4 audio stream is transported, the rate SHOULD
435	   be set to the same value as the sampling rate of the audio stream.
436	   If an MPEG-4 video stream is transported, it is RECOMMENDED to set
437	   the rate to 90 kHz.
438	   In all cases, the sender SHALL make sure that RTP time stamps
439	   are identical only if the RTP time stamp refers to fragments of the
440	   same Access Unit.
441	   According to RFC 1889 [2] (section 5.1), RTP time stamps are
442	   recommended to start at a random value for security reasons. This
443	   is not an issue for synchronization of multiple RTP streams.
444	   However, in applications where streams from multiple sources are to
445	   be synchronized (for example one stream from local storage, another
446	   from a RTP streaming server), synchronization may become impossible.
447	   To also enable synchronization in such cases, it may be necessary to
448	   provide the required relationship between time stamps for obtaining
449	   synchronization by out of band means. The format of such information
450	   as well as methods to convey such information are beyond the scope
451	   of this specification.

453	   SSRC: set as described in RFC1889 [2].

455	   CC and CSRC fields are used as described in RFC 1889 [2].

457	   RTCP SHOULD be used as defined in RFC 1889 [2].

459	3.2 RTP Payload Structure

461	3.2.1 The AU Header Section

463	   When present, the AU Header Section consists of the AU-header-length
464	   field, followed by a number of AU-headers. See figure 2.

466	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
467	   |AU-headers-length|AU-header|AU-header|      |AU-header|padding|
468	   |                 |   (1)   |   (2)   |      |   (n)   | bits  |
469	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+

471	   Figure 2: The AU Header Section

473	   The AU-headers are configured using MIME format parameters and MAY
474	   be empty. If the AU-header is configured empty, the
475	   AU-headers-length field SHALL not be present and consequently the
476	   AU Header Section is empty. If the AU-header is not configured
477	   empty, then the AU-headers-length is a two octet field that
478	   specifies the length in bits of the immediately following
479	   AU-headers, excluding the padding bits.

481	   Each AU-header is associated with a single Access Unit (fragment)
482	   contained in the Access Unit Data Section in the same RTP packet.
483	   For each contained Access Unit (fragment) there is exactly one
484	   AU-header. Within the AU Header Section, the AU-headers are
485	   bit-wise concatenated in the order in which the Access Units are
486	   contained in the Access Unit Data Section. Hence, the n-th
487	   AU-header refers to the n-th AU (fragment). If the concatenated
488	   AU-headers consume a non-integer number of octets, up to 7
489	   zero-padding bits MUST be inserted at the end in order to achieve
490	   byte-alignment of the AU Header Section.

492	3.2.1.1 The AU-header

494	   The AU-header contains the fields given in figure 3. The length in
495	   bits of the above fields with the exception of the CTS-flag, the
496	   DTS-flag and the RAP-flag fields is defined by MIME format
497	   parameters; see section 4.1. If a MIME format parameter has the
498	   default value of zero, then the associated field is not present.

500	   +---------------------------------------+
501	   |     AU-size                           |
502	   +---------------------------------------+
503	   |     AU-Index / AU-Index-delta         |
504	   +---------------------------------------+
505	   |     CTS-flag                          |
506	   +---------------------------------------+
507	   |     CTS-delta                         |
508	   +---------------------------------------+
509	   |     DTS-flag                          |
510	   +---------------------------------------+
511	   |     DTS-delta                         |
512	   +---------------------------------------+
513	   |     RAP-flag                          |
514	   +---------------------------------------+
515	   |     Stream-state                      |
516	   +---------------------------------------+

518	   Figure 3: The fields in the AU-header. If used, the AU-Index field
519	             only occurs in the first AU-header within an AU Header
520	             Section; in any other AU-header the AU-Index-delta field
521	             occurs instead.

523	   AU-size: Indicates the size in octets of the associated Access Unit
524	         in the Access Unit Data Section in the same RTP packet. When
525	         the AU-size is associated with an AU fragment, the AU size
526	         indicates the size of the entire AU and not the size of the
527	         fragment. This can be exploited to determine whether a packet
528	         contains an entire AU or a fragment, which is particularly
529	         useful after losing a packet carrying the last fragment of an
530	         AU.

532	   AU-Index: Indicates the serial number of the associated Access Unit
533	         (fragment). For each (in decoding order) consecutive AU or AU
534	         fragment, the serial number is incremented with 1. When
535	         present, the AU-Index field occurs in the first AU-header in
536	         the AU Header Section, but MUST NOT occur in any subsequent
537	         (non-first) AU-header in that Section. To encode the serial
538	         number in any such non-first AU-header, the AU-Index-delta
539	         field is used. If each AU-Index field is coded with the value
540	         0, the serial number of the AU (fragment) is not specified,
541	         and in that case receivers MAY ignore the AU-Index field.

543	   AU-Index-delta: The AU-Index-delta field is an unsigned integer
544	         that specifies the serial number of the associated AU as the
545	         difference with respect to the serial number of the previous
546	         Access Unit. Hence, for the n-th (n>1) AU the serial number
547	         is found from:
548	         AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1
549	         If the AU-Index field is present in the first AU-header in
550	         the AU Header Section, then the AU-Index-delta field MUST be
551	         present in any subsequent (non-first) AU-header. When the
552	         AU-Index-delta is coded with the value 0, it indicates that
553	         the Access Units are consecutive in decoding order. An
554	         AU-Index-delta value larger than 0 signals that interleaving
555	         is applied.

557	   CTS-flag: Indicates whether the CTS-delta field is present.
558	         A value of 1 indicates that the field is present, a value
559	         of 0 that it is not present.
560	         The CTS-flag field MUST be present in each AU-header if the
561	         length of the CTS-delta field is signalled to be larger than
562	         zero. In that case, the CTS-flag field MUST have the value 0
563	         in the first AU-header and MAY have the value 1 in all
564	         non-first AU-headers. The CTS-flag field SHOULD be 0 for
565	         any non-first fragment of an Access Unit.

567	   CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's
568	         complement offset (delta) from the time stamp in the RTP
569	         header of this RTP packet. The CTS MUST use the same clock
570	         rate as the time stamp in the RTP header.

572	   DTS-flag: Indicates whether the DTS-delta field is present. A value
573	         of 1 indicates that DTS-delta is present, a value of 0 that
574	         it is not present.
575	         The DTS-flag field MUST be present in each AU-header if the
576	         length of the DTS-delta field is signalled to be larger than
577	         zero. The DTS-flag field SHOULD be 0 for any non-first
578	         fragment of an Access Unit.

580	   DTS-delta: Specifies the value of the DTS as a 2's complement
581	         offset (delta) from the CTS. The DTS MUST use the
582	         same clock rate as the time stamp in the RTP header.

584	   RAP-flag: Indicates when set to 1 that the associated Access Unit
585	         provides a random access point to the content of the stream.
586	         If an Access Unit is fragmented, the RAP flag, if present,
587	         MUST be set to 0 for each non-first fragment of the AU.

589	   Stream-state: Specifies the state of the stream for the AU of an
590	         MPEG-4 system stream. For states of MPEG-4 system streams see
591	         ISO/IEC 14496-1. The stream state is set either to 0 or to 1.
592	         A change of the stream state value (either from 1 to 0 or from
593	         0 to 1) indicates another state of the stream. At an AU that
594	         provides a random access point, as signalled by the RAP-flag,
595	         a change in the stream state MUST occur, unless the AU is a
596	         repeated random access point. Hence, receivers MAY ignore AUs
597	         with the RAP-flag set to 1 if the stream state does not
598	         change. Receivers that don't ignore a repeated random access
599	         point SHOULD take care that such processing does not disrupt
600	         the decoding process.
601	         Note: no relation is required between stream-states of
602	         different streams.

604	   If present, the fields MUST occur in the mutual order given in
605	   figure 3. In the general case a receiver can only discover the size
606	   of an AU-header by parsing it since the presence of the CTS-delta
607	   and DTS-delta fields is signalled by the value of the CTS-flag and
608	   DTS-flag, respectively.

610	3.2.2 The Auxiliary Section

612	   The Auxiliary Section consists of the auxiliary-data-size field
613	   followed by the auxiliary-data field. Receivers MAY (but are not
614	   required to) parse the auxiliary-data field; to facilitate skipping
615	   of the auxiliary-data field by receivers, the auxiliary-data-size
616	   field indicates the length in bits of the auxiliary-data. If the
617	   concatenation of the auxiliary-data-size and the auxiliary-data
618	   fields consume a non-integer number of octets, up to 7 zero padding
619	   bits MUST be inserted immediately after the auxiliary data in order
620	   to achieve byte-alignment. See figure 4.

622	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
623	   | auxiliary-data-size   | auxiliary-data       |padding bits |
624	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+

626	   Figure 4: The fields in the Auxiliary Section

628	   The length in bits of the auxiliary-data-size field is configurable
629	   by a MIME format parameter; see section 4.1. The default length of
630	   zero indicates that the entire Auxiliary Section is absent.

632	   auxiliary-data-size: specifies the length in bits of the immediately
633	         following auxiliary-data field;

635	   auxiliary-data: the auxiliary-data field contains data of a format
636	         not defined by this specification.

638	3.2.3 The Access Unit Data Section

640	   The Access Unit Data Section contains an integer number of complete
641	   Access Units or a single fragment of one AU. The Access Unit Data
642	   Section is never empty. If data of more than one Access Unit is
643	   present, then the AUs are concatenated into a contiguous string
644	   of octets. See figure 5. The AUs inside the Access Unit Data
645	   Section MUST be in decoding order.

647	   The size and number of Access Units SHOULD be adjusted such that
648	   the resulting RTP packet is not larger than the path MTU. To handle
649	   larger packets, this payload format relies on lower layers for
650	   fragmentation, which may not be desirable.

652	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
653	   |AU(1)                                                          |
654	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-                            |
655	   |                                                               |
656	   |     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
657	   |               |AU(2)                                          |
658	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
659	   |                                                               |
660	   |                            -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
661	   |                               | AU(n)                         |
662	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
663	   |               |
664	   |-+-+-+-+-+-+-+-+

666	   Figure 5: Access Unit Data Section; each AU is byte aligned.

668	   When multiple Access Units are carried, the size of each AU MUST be
669	   made available to the receiver. If the AU size is variable then the
670	   size of each AU MUST be indicated in the AU-size field of the
671	   corresponding AU-header. However, if the AU size is constant for a
672	   stream, this mechanism SHOULD NOT be used, but instead the fixed
673	   size SHOULD be signalled by the MIME format parameter
674	   "ConstantSize", see section 4.1.

676	   The absence of both AU-size in the AU-header and the ConstantSize
677	   MIME format parameter indicates carriage of a single AU (fragment),
678	   i.e. that a single Access Unit (fragment) is transported in each
679	   RTP packet for that stream.

681	3.2.3.1 Fragmentation

683	   A packet SHALL carry either one or more Access Units, or a single
684	   fragment of an Access Unit.  Fragments of the same Access Unit have
685	   the same time stamp but different RTP sequence numbers. The marker
686	   bit in the RTP header is 1 on the last fragment of an Access Unit,
687	   and 0 on all other fragments.

689	3.2.3.2 Interleaving

691	   Access Units MAY be interleaved. Senders MAY perform interleaving.
692	   Receivers MUST support interleaving. When interleaving of Access
693	   Units is used it SHALL be implemented using the AU-Index and
694	   AU-Index-delta fields in the AU-header.

696	   Based on the RTP sequence number, the RTP time stamp, the AU-Index
697	   and the AU-Index-delta, a receiver can unambiguously reconstruct
698	   the original order even in case of out-of-order packets, packet
699	   loss or duplication. Note that for this purpose the AU-Index is
700	   redundant when the RTP time stamp and the AU-Index-delta values are
701	   sufficient for placing the AUs correctly in time. In such cases
702	   receivers MAY ignore the AU-Index value and senders MAY code the
703	   AU-Index field with the value 0, but only if they code each AU-Index
704	   field with that value.

706	   When interleaving is applied, a de-interleave buffer is needed in
707	   receivers to put the Access Units in their correct logical
708	   consecutive decoding order. This requires the computation of the
709	   time stamp for each Access Unit. In case of a fixed time duration
710	   per Access Unit, the time stamp of the i-th access unit in an RTP
711	   packet with RTP time stamp T is calculated as follows:

713	   Timestamp[0] = T
714	   Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k]
715	                         + 1))) * access-unit-duration

717	   When AU-Index-delta is always 0, this reduces to T + i * (access-
718	   unit-duration). This is the non-interleaved case, where the frames
719	   are consecutive in decoding order. Note that the AU-Index field
720	   (present for the first Access Unit) is not needed in this
721	   calculation. Hence in cases where the Access-unit-duration has a
722	   fixed and known value, the AU-Index does not need to provide index
723	   information and can be coded with the value 0. See also the
724	   semantics of the AU-Index field in 3.2.1.1.

726	   When an RTP packet arrives (after any reordering has been done),
727	   receivers may 'flush' all Access Units from the interleave buffer
728	   which have a time stamp strictly less than the time stamp of the
729	   arriving packet. Similarly the first Access Unit of every arriving
730	   packet can always be flushed (as no following packet can provide
731	   an earlier Access Unit), and any Access Units which are consecutive
732	   with it which have already been received. Access Units should also
733	   be flushed in time to be played; this can be important if there is
734	   loss before end-of-stream, before a silence interval, or before a
735	   large drop-out.

737	3.2.3.3 Constraints for interleaving

739	   The size of the packets should be suitably chosen to be appropriate
740	   to both the path MTU and the duration and capacity of the receiver's
741	   de-interleave buffer. The maximum packet size for a session should
742	   be chosen not to exceed the path MTU.

744	   In order to control receiver latency and mitigate the effects of
745	   loss, there are profile-based limits on the size of the packet.
746	   This is expressed as a duration: it is calculated from the duration
747	   of the Access Units contained within a packet. Note that this
748	   duration is NOT the difference between the time stamps of the first
749	   and last Access Unit in a packet.

751	   No matter what interleaving scheme is used, the scheme must be
752	   analyzed to calculate the minimum number of frames a receiver has
753	   to buffer in order to de-interleave.

755	   Three profiles are defined to constrain the latency when
756	   interleaving. The applied profile is signalled by the MIME format
757	   parameter "Profile", indicating the decimal number of the profile.
758	   The maximum de-interleave buffer required at the receiver can be
759	   determined if the maximum packet duration is known. The maximum
760	   packet duration in milliseconds for the three profiles, shall not
761	   exceed:

763	   Profile 0 --  200 milliseconds
764	   Profile 1 --  500 milliseconds
765	   Profile 2 -- 1500 milliseconds

767	   When interleaving is applied, the applied profile MUST be signalled
768	   by the MIME format parameter "Profile"; see section 4.1.

770	   Note that for low bit-rate material, this duration limit may make
771	   packets shorter than the MTU size.

773	3.3 Usage of this specification

775	3.3.1 General

777	   Usage of this specification requires definition of a mode. A mode
778	   defines how to use this specification, as deemed appropriate.
779	   Senders MUST signal the applied mode via the MIME format parameter
780	   "Mode". This specification defines a generic mode that can be used
781	   for any MPEG-4 stream, as well as specific modes for transport of
782	   MPEG-4 CELP and MPEG-4 AAC streams, defined in ISO/IEC 14496-3.

784	   In any mode compliant to this specification the same requirements
785	   apply for the rtpmap attributes. The general form of an rtpmap
786	   attribute is:
787	   a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding
788	             parameters>]
789	   For audio streams, <encoding parameters> specifies the number of
790	   audio channels: 2 for stereo material (see RFC 2327) and 1 for
791	   mono. Provided no additional parameters are needed, this parameter
792	   may be omitted for mono material, hence its default value is 1.

794	3.3.2 The generic mode

796	   The generic mode can be used for any MPEG-4 stream. In this mode
797	   no mode-specific constraints are applied; hence, in the generic
798	   mode the full flexibility of this specification can be exploited.
799	   The generic mode is signalled by mode=generic.

801	   An example is given below for transport of a BIFS stream. In this
802	   example carriage of multiple BIFS Access Units is allowed in one
803	   RTP packet. The AU-header contains the AU-size field, the CTS-flag
804	   and, if the CTS flag is set to 1, the CTS-delta field. The number
805	   of bits of the AU-size and the CTS-delta fields is 14 and 15,
806	   respectively. The AU-header also contains the RAP-flag and the
807	   Stream-state, both of 1 bits. This results in an AU-header with a
808	   Total size of two or four octets per BIFS AU. The RTP time stamp
809	   uses a 1 kHz clock. Note that the media type name is video,
810	   because the BIFS stream is part of an audiovisual presentation. For
811	   conventions on media type names see section 4.1.

813	   In detail:

815	   m=video 49230 RTP/AVP 96
816	   a=rtpmap:96 mpeg4-generic/1000
817	   a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic;
818	   ObjectType=2; config=BIFSConfiguration(); SizeLength=15;
819	   CTSDeltaLength=16; RandomAccessIndication=1;
820	   StreamStateIndication=1

822	   Note that BIFSConfiguration() is defined in ISO/IEC 14496-1; for
823	   the description of MIME parameters see section 4.1.

825	3.3.3 Constant bit-rate CELP

827	   This mode is signalled by mode=CELP-cbr. In this mode one or more
828	   fixed size CELP frames can be transported in one RTP packet; there
829	   is no support for interleaving. The RTP payload consist of one or
830	   more concatenated CELP frames, each of the same size. Both the AU
831	   Header Section and the Auxiliary Section are empty.

833	   The MIME format parameter ConstantSize MUST be provided to specify
834	   the length of each CELP frame.

836	   For example:

838	   m=audio 49230 RTP/AVP 96
839	   a=rtpmap:96 mpeg4-generic/44100/2
840	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config=
841	   AudioSpecificConfig(); ConstantSize=xxx;

843	   The AudioSpecificConfig(), defined in ISO/IEC 14496-3, specifies
844	   that the audio stream type is CELP. For the description of MIME
845	   parameters see section 4.1.

847	3.3.4 Variable bit-rate CELP

849	   This mode is signalled by mode=CELP-vbr. With this mode one or
850	   more variable size CELP frames can be transported in one RTP packet
851	   with optional interleaving. As the largest possible frame size in
852	   this mode is greater than the maximum CELP frame size, there is no
853	   support for fragmentation of CELP frames.

855	   In this mode the RTP payload consists of the AU Header Section,
856	   followed by one or more concatenated CELP frames. The Auxiliary
857	   Section is empty. For each CELP frame contained in the payload
858	   there is a one octet AU-header in the AU Header Section to
859	   provide:
860	   (a) the size of each CELP frame in the payload and
861	   (b) index information for computing the sequence (and hence timing)
862	       of each CELP frame.
863	   Transport of CELP frames requires that the AU-size field is coded
864	   with 6 bits. In this mode therefore 6 bits are allocated to the
865	   AU-size field, and 2 bits to the AU-Index(-delta) field. Each
866	   AU-Index field MUST be coded with the value 0. In the AU Header
867	   Section, the concatenated AU-headers are preceded by the 16-bit
868	   AU-headers-length field, as specified in 3.2.1.

870	   In addition to the required MIME format parameters, the following
871	   parameters MUST be present: SizeLength, IndexLength, and
872	   IndexDeltaLength.
873	   When interleaving is applied (AU-Index-delta coded with a value
874	   larger than 0), the parameter Profile MUST also be present.

876	   For example:

878	   m=audio 49230 RTP/AVP 96
879	   a=rtpmap:96 mpeg4-generic/44100/2
880	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config=
881	   AudioSpecificConfig(); SizeLength=6; IndexLength=2;
882	   IndexDeltaLength=2; Profile=1

884	   The AudioSpecificConfig(), defined in ISO/IEC 14496-3,  specifies
885	   that the audio stream type is CELP. For the description of MIME
886	   parameters see section 4.1.

888	3.3.5 Low bit-rate AAC

890	   This mode is signalled by mode=AAC-lbr. This mode supports transport
891	   of one or more variable size AAC frames with optional support for
892	   interleaving and fragmenting. The maximum size of an AAC frame
893	   (fragment) in this mode is 63 octets.

895	   The payload configuration in this mode is the same as in the
896	   variable bit-rate CELP mode as defined in 3.3.4. The RTP payload
897	   consists of the AU Header Section, followed by concatenated AAC
898	   frames. The Auxiliary Section is empty. For each AAC frame contained
899	   in the payload the one octet AU-header provides:
900	   (a) the size of each AAC frame in the payload and
901	   (b) index information for computing the sequence (and hence timing)
902	       of each AAC frame.
903	   In the AU-header, the AU-size is coded with 6 bits and the
904	   AU-Index(-delta) with 2 bits; the AU-Index field MUST have the
905	   value 0 in each AU-header.
906	   In the AU-header Section, the concatenated AU-headers are preceded
907	   by the 16-bit AU-headers-length field, as specified in 3.2.1.

909	   In addition to the required MIME format parameters, the following
910	   parameters MUST be present: SizeLength, IndexLength, and
911	   IndexDeltaLength.
912	   When interleaving is applied (AU-Index-delta coded with a value
913	   larger than 0), also the parameter Profile MUST be present.

915	   For example:

917	   m=audio 49230 RTP/AVP 96
918	   a=rtpmap:96 mpeg4-generic/44100/2
919	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config=
920	   AudioSpecificConfig(); SizeLength=6; IndexLength=2;
921	   IndexDeltaLength=2; Profile=1

923	   The AudioSpecificConfig(), defined in ISO/IEC 14496-3, specifies
924	   that the audio stream type is AAC. For the description of MIME
925	   parameters see section 4.1.

927	3.3.6 High bit-rate AAC

929	   This mode is signalled by mode=AAC-hbr. This mode supports transport
930	   of one or more large variable size AAC frames in one RTP packet with
931	   optional support for interleaving and fragmenting. The maximum size
932	   of an AAC frame (fragment) in this mode is 8191 octets.

934	   In this mode the RTP payload consists of the AU Header Section,
935	   followed by one or more concatenated AAC frames. The Auxiliary
936	   Section is empty. For each AAC frame contained in the payload there
937	   is an AU-header in the AU Header Section to provide:
938	   (a) the size of each AAC frame in the payload and
939	   (b) index information for computing the sequence (and hence timing)
940	       of each AAC frame.

942	   To code the maximum size of an AAC frame requires 13 bits. Therefore
943	   in this configuration 13 bits are allocated to the AU-size, and
944	   3 bits to the AU-Index(-delta) field. Thus each AU-header has a size
945	   of 2 octets. Each AU-Index field MUST be coded with the value 0. In
946	   the AU Header Section, the concatenated AU-headers are preceded by
947	   the 16-bit AU-headers-length field, as specified in 3.2.1.

949	   In addition to the required MIME format parameters, the following
950	   parameters MUST be present: SizeLength, IndexLength, and
951	   IndexDeltaLength.
952	   When interleaving is applied (AU-Index-delta coded with a value
953	   larger than 0), also the parameter Profile MUST be present.

955	   For example:

957	   m=audio 49230 RTP/AVP 96
958	   a=rtpmap:96 mpeg4-generic/44100/2
959	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr;
960	   config=AudioSpecificConfig(); SizeLength=13; IndexLength=3;
961	   IndexDeltaLength=3; Profile=1

963	   The AudioSpecificConfig(), defined in ISO/IEC 14496-3, specifies
964	   that the audio stream type is AAC. For the description of MIME
965	   parameters see section 4.1.

967	3.3.7 Additional modes

969	   This specification only defines the modes specified in sections
970	   3.3.2 up to 3.3.6. Additional modes are expected to be defined in
971	   future RFCs. Each additional mode MUST be in full compliance with
972	   this specification.

974	   When defining a new mode care MUST be taken that an implementation
975	   of all features of this specification can decode the payload format
976	   corresponding to this new mode. For this reason a mode MUST NOT
977	   specify new default values for MIME parameters. In particular, MIME
978	   parameters that configure the RTP payload MUST be present (unless
979	   they have the default value), even if its presence is redundant in
980	   case the mode assigns a fixed value to a parameter. A mode may
981	   define additionally that some MIME parameters are required instead
982	   of optional, that some MIME parameters have fixed values (or
983	   ranges), and that there are rules restricting the usage.

985	4. IANA considerations

987	   This section describes the MIME types and names associated with
988	   this payload format. Section 4.1 registers the MIME types, as per
989	   RFC 2048.

991	   This format may require additional information about the mapping to
992	   be made available to the receiver. This is done using parameters
993	   also described in the next section.

995	4.1 MIME type registration

997	   MIME media type name: "video" or "audio" or "application"

999	   "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2)
1000	   or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information
1001	   needed for an audio/visual presentation.

1003	   "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3)
1004	   or MPEG-4 Systems streams that convey information needed for an
1005	   audio only presentation.

1007	   "application" MUST be used for MPEG-4 Systems streams (ISO/IEC
1008	   14496-1) that serve purposes other than audio/visual presentation,
1009	   e.g. in some cases when MPEG-J streams are transmitted.

1011	   Depending on the required payload configuration, MIME format
1012	   parameters need to be available to the receiver. This is done using
1013	   the parameters described in the next section. There are required
1014	   and optional parameters.

1016	   Optional parameters are of two types: general parameters and
1017	   configuration parameters. The configuration parameters are used to
1018	   configure the fields in the AU Header section and in the auxiliary
1019	   section. The absence of any configuration parameter is equivalent to
1020	   the associated field set to its default value, which is always zero.
1021	   The absence of all configuration parameters resolves into a default
1022	   "basic" configuration with an empty AU-header section and an empty
1023	   auxiliary section in each RTP packet.

1025	   MIME subtype name: mpeg4-generic
1026	   Required parameters:

1028	   MIME format parameters are not case dependent; however for clarity
1029	   both upper and lower case are used in the names of the parameters
1030	   described in this specification.

1032	      StreamType:
1033	      The integer value that indicates the type of MPEG-4 stream that
1034	      is carried; its coding corresponds to the values of the
1035	      streamType as defined in Table 9 (objectTypeIndication Values)
1036	      in ISO/IEC 14496-1. Note that the StreamType allows signalling of
1037	      an MPEG-7 stream; this RTP payload format is not designed to
1038	      carry an MPEG-7 stream, and may not be suitable for transport of
1039	      MPEG-7 streams.

1041	      Profile-level-id:
1042	      A decimal representation of the MPEG-4 Profile Level indication.
1043	      This parameter MUST be used in the capability exchange or
1044	      session set-up procedure to indicate the MPEG-4 Profile and Level
1045	      combination of which the relevant MPEG-4 media codec is capable
1046	      of.
1047	      For MPEG-4 Audio streams, this parameter is the decimal value
1048	         from Table 5 (audioProfileLevelIndication Values) in ISO/IEC
1049	         14496-1, indicating which MPEG-4 Audio tool subsets are
1050	         required to decode the audio stream.
1051	      For MPEG-4 Visual streams, this parameter is the decimal value
1052	         from Table G-1 (FLC table for profile and level indication of
1053	         ISO/IEC 14496-2), indicating which MPEG-4 Visual tool subsets
1054	         are required to decode the visual stream.
1055	      For BIFS streams, this parameter is the decimal value that is
1056	         obtained from (SPLI + 256*GPLI), where:
1057	         SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with
1058	            the applied sceneProfileLevelIndication;
1059	         GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with
1060	            the applied graphicsProfileLevelIndication.
1061	      For MPEG-J streams, this parameter is the decimal value from
1062	         table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1,
1063	         indicating the profile and level of the MPEG-J stream.
1064	      For OD streams, this parameter is the decimal value from table 3
1065	         (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the
1066	         profile and level of the OD stream.
1067	      For IPMP streams, this parameter has either the decimal value 0,
1068	         indicating an unspecified profile and level, or a value larger
1069	         than zero, indicating an MPEG-4 IPMP profile and level as
1070	         defined in a future MPEG-4 specification.
1071	      For Clock Reference streams and Object Content Info streams, this
1072	         parameter has the decimal value zero, indicating that profile
1073	         and level information is conveyed through the OD framework.

1075	      Config:
1076	      A hexadecimal representation of an octet string that expresses
1077	      the media payload configuration. Configuration data is mapped
1078	      onto the hexadecimal octet string in an MSB-first basis. The
1079	      first bit of the configuration data SHALL be located at the MSB
1080	      of the first octet. In the last octet, if necessary to achieve
1081	      byte alignment, up to 7 zero-valued padding bits shall follow
1082	      the configuration data.
1083	      For MPEG-4 Audio streams, config is the audio object type
1084	         specific decoder configuration data AudioSpecificConfig() as
1085	         defined in ISO/IEC 14496-3. For Stuctured Audio, the
1086	         AudioSpecificConfig()may be conveyed by other means, not
1087	         defined by this specification. If the AudioSpecificConfig()
1088	         is conveyed by other means for Stuctured Audio, then the
1089	         config MUST be a quoted empty hexadecimal octet string, as
1090	         follows: config="".
1091	         Note that a future mode of using this RTP payload format for
1092	         Structured Audio may define such other means.
1093	      For MPEG-4 Visual streams, config is the MPEG-4 Visual
1094	         configuration information as defined in subclause 6.2.1 Start
1095	         codes of ISO/IEC 14496-2. The configuration information
1096	         indicated by this parameter SHALL be the same as the
1097	         configuration information in the corresponding MPEG-4 Visual
1098	         stream, except for first-half-vbv-occupancy and
1099	         latter-half-vbv-occupancy, if it exists, which may vary in
1100	         the repeated configuration information inside an MPEG-4
1101	         Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2).
1102	      For BIFS streams, this is the BIFSConfig() information as defined
1103	         in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in
1104	         section 9.3.5.2, and for version 2 in section 9.3.5.3. The
1105	         MIME format parameter ObjectType signals the version of
1106	         BIFSConfig.
1107	      For IPMP streams, this is either a quoted empty hexadecimal octet
1108	         string, indicating the absence of any decoder configuration
1109	         information (config=""), or the IPMPConfiguration() as
1110	         defined in a future MPEG-4 IPMP specification.
1111	      For Object Content Info (OCI) streams, this is the
1112	         OCIDecoderConfiguration() information of the OCI stream, as
1113	         defined in section 8.4.2.4 in ISO/IEC 14496-1.
1114	      For OD streams, Clock Reference streams and MPEG-J streams, this
1115	         is a quoted empty hexadecimal octet string (config=""), as
1116	         no information on the decoder configuration is required.

1118	      Mode:
1119	      The mode in which this specification is used. The following modes
1120	      can be signalled:
1121	      mode=generic,
1122	      mode=CELP-cbr,
1123	      mode=CELP-vbr,
1124	      mode=AAC-lbr and
1125	      mode=AAC-hbr.
1126	      Other modes are expected to be defined in future RFCs. See also
1127	      section 3.3.7.

1129	   Optional general parameters:

1131	      ObjectType:
1132	      The decimal value from Table 8 in ISO/IEC 14496-1, indicating
1133	      the value of the objectTypeIndication of the transported stream.
1134	      For BIFS streams this parameter MUST be present to signal the
1135	      version of BIFSConfiguration(). Note that the ObjectType MAY
1136	      signal a non-MPEG-4 stream, and that the RTP payload format
1137	      defined in this document may not be suitable to carry a stream
1138	      that is not defined by MPEG-4.

1140	      ConstantSize:
1141	      The constant size in octets of each Access Unit for this stream.
1142	      Simultaneous presence of ConstantSize and the SizeLength
1143	      parameters is not permitted.

1145	      Profile:
1146	      The decimal representation of the applied profile to constrain
1147	      the latency when interleaving; see section 3.2.3.3. Absence of
1148	      this parameter signals that the profile is not specified.

1150	   Optional configuration parameters:

1152	      SizeLength:
1153	      The number of bits on which the AU-size field is encoded in the
1154	      AU-header. Simultaneous presence of SizeLength and the
1155	      ConstantSize parameter is not permitted.

1157	      IndexLength:
1158	      The number of bits on which the AU-Index is encoded in the first
1159	      AU-header. The default value of zero indicates the absence of
1160	      the AU-Index and AU-Index-delta fields in each AU-header.

1162	      IndexDeltaLength:
1163	      The number of bits on which the AU-Index-delta field is encoded
1164	      in any non-first AU-header.

1166	      CTSDeltaLength:
1167	      The number of bits on which the CTS-delta field is encoded in
1168	      the AU-header.

1170	      DTSDeltaLength:
1171	      The number of bits on which the DTS-delta field is encoded in
1172	      the AU-header.

1174	      RandomAccessIndication:
1175	      A decimal value of zero or one, indicating whether the RAP-flag
1176	      is present in the AU-header. The decimal value of one indicates
1177	      presence of the RAP-flag, the default value zero its absence.

1179	      StreamStateIndication:
1180	      A decimal value of zero or one, indicating whether the
1181	      Stream-state field is present in the AU-header. The decimal
1182	      value of one indicates presence of the Stream-state field, the
1183	      default value zero its absence.

1185	      AuxiliaryDataSizeLength:
1186	      The number of bits that is used to encode the auxiliary-data-size
1187	      field.

1189	   Applications MAY use more parameters, in addition to those defined
1190	   above. Receivers MUST tolerate the presence of such additional
1191	   parameters, but these parameters SHALL not impact the decoding of
1192	   receivers that comply to this specification.

1194	   Encoding considerations:
1195	   System bitstreams MUST be generated according to MPEG-4 Systems
1196	   specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated
1197	   according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio
1198	   bitstreams MUST be generated according to MPEG-4 Audio
1199	   specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized
1200	   according to the RTP payload format defined in RFC xxxx.

1202	   Security considerations:
1203	   As defined in section 5 of RFC xxxx.

1205	   Interoperability considerations:
1206	   MPEG-4 provides a large and rich set of tools for the coding of
1207	   visual objects.  For effective implementation of the standard,
1208	   subsets of the MPEG-4 tool sets have been provided for use in
1209	   specific applications. These subsets, called 'Profiles', limit the
1210	   size of the tool set a decoder is required to implement. In order to
1211	   restrict computational complexity, one or more 'Levels' are set for
1212	   each Profile. A Profile@Level combination allows:
1213	   . a codec builder to implement only the subset of the standard he
1214	     needs, while maintaining interworking with other MPEG-4 devices
1215	     that implement the same combination, and
1216	   . checking whether MPEG-4 devices comply with the standard
1217	     ('conformance testing').

1219	   A stream SHALL be compliant with the MPEG-4 Profile@Level specified
1220	   by the parameter "profile-level-id". Interoperability between a
1221	   sender and a receiver is achieved by specifying the parameter
1222	   "profile-level-id" in MIME content. In the capability exchange /
1223	   announcement procedure this parameter may mutually be set to the
1224	   same value.

1226	   Published specification:
1227	   The specifications for MPEG-4 streams are presented in ISO/IEC
1228	   14496-1, 14496-2, and 14496-3. The RTP payload format is described
1229	   in RFC xxxx.

1231	   Applications which use this media type:
1232	   Multimedia streaming and conferencing tools, Internet messaging and
1233	   Email applications.

1235	   Additional information: none

1237	   Magic number(s): none

1239	   File extension(s):
1240	   None. A file format with the extension .mp4 has been defined for
1241	   MPEG-4 content but is not directly correlated with this MIME type
1242	   for which the sole purpose is RTP transport.

1244	   Macintosh File Type Code(s): none

1246	   Person & email address to contact for further information:
1247	   Authors of RFC xxxx, IETF Audio/Video Transport working group.

1249	   Intended usage: COMMON

1251	   Author/Change controller:
1252	   Authors of RFC xxxx, IETF Audio/Video Transport working group.

1254	4.2 Concatenation of parameters

1256	   Multiple parameters SHOULD be expressed as a MIME media type string,
1257	   in the form of a semicolon-separated list of parameter=value pairs
1258	   (for parameter usage examples see sections 3.3.2 up to 3.3.6).

1260	4.3 Usage of SDP

1262	4.3.1 The a=fmtp keyword

1264	   It is assumed that one typical way to transport the above-described
1265	   parameters associated with this payload format is via a SDP message
1266	   [6] for example transported to the client in reply to a RTSP
1267	   DESCRIBE or via SAP. In that case the (a=fmtp) keyword MUST be used
1268	   as described in RFC 2327 [6], section 6, the syntax being then:

1270	   a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>]

1272	5. Security Considerations

1274	   RTP packets using the payload format defined in this specification
1275	   are subject to the security considerations discussed in the RTP
1276	   specification [2]. This implies that confidentiality of the media
1277	   streams is achieved by encryption. Because the data compression used
1278	   with this payload format is applied end-to-end, encryption may be
1279	   performed on the compressed data so there is no conflict between the
1280	   two operations. The packet processing complexity of this payload
1281	   type (i.e. excluding media data processing) does not exhibit any
1282	   significant non-uniformity in the receiver side to cause a denial-
1283	   of-service threat.

1285	   However, it is possible to inject non-compliant MPEG streams (Audio,
1286	   Video, and Systems) to overload the receiver/decoder's buffers,
1287	   which might compromise the functionality of the receiver or even
1288	   crash it. This is especially true for end-to-end systems like MPEG
1289	   where the buffer models are precisely defined.

1291	   MPEG-4 Systems supports stream types including commands that are
1292	   executed on the terminal like OD commands, BIFS commands, etc. and
1293	   programmatic content like MPEG-J (Java(TM) Byte Code) and
1294	   ECMAScript. It is possible to use one or more of the above in a
1295	   manner non-compliant to MPEG to crash or temporarily make the
1296	   receiver unavailable.

1298	   Senders SHOULD ensure that packet loss does not cause severe
1299	   problems in application execution when the packet carries OD
1300	   commands, BIFS commands, or programmatic content such as MPEG-J and
1301	   ECMAScript. For example, the reliability can be improved by
1302	   re-transmission, or by using the carousel mechanism as defined by
1303	   MPEG in ISO/IEC 14496-1, while observing the general congestion
1304	   control principles. When such measures are deemed unsufficiently
1305	   adequate, instead of this payload format applications SHOULD use
1306	   more reliable means to transport the information, for example by
1307	   applying an FEC scheme for RTP (such as in RFC 2733), or by using
1308	   RTP over TCP (such as in RFC 2326, section 10.12), while giving due
1309	   consideration to congestion control. For a general description of
1310	   methods to repair streaming media see RFC 2354.

1312	   Authentication mechanisms can be used to validate the sender and
1313	   the data to prevent security problems due to non-compliant malignant
1314	   MPEG-4 streams.

1316	   In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems
1317	   streams carrying MPEG-J access units which comprise Java(TM) classes
1318	   and objects. MPEG-J defines a set of Java APIs and a secure
1319	   execution model. MPEG-J content can call this set of APIs and
1320	   Java(TM) methods from a set of Java packages supported in the
1321	   receiver within the defined security model. According to this
1322	   security model, downloaded byte code is forbidden to load libraries,
1323	   define native methods, start programs, read or write files, or read
1324	   system properties.

1326	   Receivers can implement intelligent filters to validate the buffer
1327	   requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J,
1328	   ECMAScript) commands in the streams. However, this can increase the
1329	   complexity significantly.

1331	6. Acknowledgements

1333	   This document evolved through several revisions thanks to
1334	   contributions by people from the ISMA forum, from the IETF AVT
1335	   Working Group and from the 4-on-IP ad-hoc group within MPEG. The
1336	   authors wish to thank all involved people, and in particular John
1337	   Lazarro, Alex MacAulay, Bill May, Colin Perkins, Dorairaj V and
1338	   Stephan Wenger for their valuable comments and support.

1340	7. References

1342	   [1] ISO/IEC International Standard 14496 (MPEG-4); "Information
1343	   technology - Coding of audio-visual objects", January 2000

1345	   [2] Schulzrinne, Casner, Frederick, Jacobson RTP, "A Transport
1346	   Protocol for Real Time Applications", RFC 1889, Internet
1347	   Engineering Task Force, January 1996.

1349	   [3] S. Bradner, "Key words for use in RFCs to Indicate Requirement
1350	   Levels", RFC 2119, March 1997.

1352	   [4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload
1353	   format for MPEG1/MPEG2 Video", RFC 2250, January 1998.

1355	   [5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP
1356	   payload format for MPEG-4 Audio/Visual streams", RFC 3016.

1358	   [6] Handley, Jacobson, "SDP: Session Description Protocol",
1359	   RFC 2327, Internet Engineering Task Force, April 1998.

1361	8. Author Adresses

1363	   Jan van der Meer
1364	   Philips Digital Networks
1365	   Cederlaan 4
1366	   5600 JB Eindhoven
1367	   Netherlands
1368	   Email : jan.vandermeer@philips.com

1370	   David Mackie
1371	   Cisco Systems Inc.
1372	   170 West Tasman Dr.
1373	   San Jose, CA 95134
1374	   Email: dmackie@cisco.com

1376	   Viswanathan Swaminathan
1377	   Sun Microsystems Inc.
1378	   901 San Antonio Road, M/S UMPK15-214
1379	   Palo Alto, CA 94303
1380	   Email: viswanathan.swaminathan@sun.com

1382	   David Singer
1383	   Apple Computer, Inc.
1384	   One Infinite Loop, MS:302-3MT
1385	   Cupertino  CA 95014
1386	   Email: singer@apple.com

1388	   Philippe Gentric
1389	   Philips Digital Networks, MP4Net
1390	   51 rue Carnot
1391	   92156 Suresnes
1392	   France
1393	   e-mail: philippe.gentric@philips.com

1395	   Full Copyright Statement

1397	   "Copyright (C) The Internet Society (date). All Rights Reserved.
1398	   This document and translations of it may be copied and furnished to
1399	   others, and derivative works that comment on or otherwise explain
1400	   it or assist in its implementation may be prepared, copied,
1401	   published and distributed, in whole or in part, without restriction
1402	   of any kind, provided that the above copyright notice and this
1403	   paragraph are included on all such copies and derivative works.
1404	   However, this document itself may not be modified in any way, such
1405	   as by removing the copyright notice or references to the Internet
1406	   Society or other Internet organizations, except as needed for the
1407	   purpose of developing Internet standards in which case the
1408	   procedures for copyrights defined in the Internet Standards process
1409	   MUST be followed, or as required to translate it into.

1411	APPENDIX: Usage of this payload format

1413	Appendix A. Examples of delay analysis with interleave

1415	A.1 Group interleave

1417	   An example of regular interleave is when packets are formed into
1418	   groups.  If the number of packets in a group is N, for example
1419	   packet 0 could contain frame 0, frame N, frame 2N, and so on;
1420	   packet 1 could contain frame 1, frame 1+N, 1+2N, and so on.  The
1421	   AU-Index field is used to document the sequence of the packet
1422	   within the group (or the first frame in the packet, which is the
1423	   same thing in this scheme), and all the AU-Index-delta fields
1424	   contain N-1.

1426	   Because each subsequent frame in the packet has a higher time stamp
1427	   than the preceding frame, receivers can tell when a new interleave
1428	   group is starting, by noting that the computed time stamp of the
1429	   first frame in a packet is later than any previously computed time
1430	   stamp. In that case the time stamps of all frames contained in the
1431	   packet are higher than any previously computed time stamp, and
1432	   hence interleaving with any previously received frame is not
1433	   possible. In conclusion, a new group has been started.

1435	   If the group size is 3, then packets can be formed as follows:

1437	   Packet   Time stamp   Frame Numbers       AU-Index, AU-Index-delta
1438	   0        T[0]         0, 3, 6             0, 2, 2
1439	   1        T[1]         1, 4, 7             0, 2, 2
1440	   2        T[2]         2, 5, 8             0, 2, 2
1441	   3        T[9]         9,12,15             0, 2, 2

1443	   In this case, the receiver would have to buffer 4 frames at least
1444	   from packets 0 and 1, and can flush all frames when packet 2
1445	   arrives. (Frame 0 can be flushed as packet 0 arrives, since it is
1446	   the earliest frame we hold, and likewise frame 1 from packet 1; we
1447	   are therefore holding 3,4,6,7 until packet 2 arrives).

1449	   If there is loss, then the receiver may wait longer than is strictly
1450	   necessary before it emits frames.  For example, say packet 1 is lost
1451	   from the above example.  Packet 0 allows frame 0 to be emitted, and
1452	   then packet 2 arrives, allowing us to notice the loss of frame 1,
1453	   and emit frame 2 and 3. Then it is not until the arrival of packet 3
1454	   (which has a time-stamp beyond the times of all the frames seen so
1455	   far), that we can finish dealing with the loss, even though the
1456	   first group has, in fact, ended. (This is in contrast to schemes
1457	   which signal the group size explicitly;  if the receiver knows that
1458	   this is packet 3 of 3, then even if 2 of 3 is missing, it can
1459	   de-interleave this group without waiting for the next one to start).

1461	   In the above example the AU-Index is coded with the value 0, as
1462	   required for the modes defined in this document. To reconstruct the
1463	   original order, the RTP time stamp and the AU-Index-delta are used.
1464	   See also 3.2.3.2.

1466	   Another example of forming packets with group interleave is given
1467	   below. In this example the packets are formed such that the loss of
1468	   two subsequent RPT packets does not cause the loss of two subsequent
1469	   audio frames. Note that in this example the RTP time stamps of
1470	   packets 3 and 4 are earlier than the RTP time stamps of packets 1
1471	   and 2.

1473	   Packet   Time stamp   Frame Numbers       AU-Index, AU-Index-delta
1474	   0        T[0]         0,  5, 10, 15             0, 5, 5, 5
1475	   1        T[2]         2,  7, 12, 17             0, 5, 5, 5
1476	   2        T[4]         4,  9, 14, 19             0, 5, 5, 5
1477	   3        T[1]         1,  6, 11, 16             0, 5, 5, 5
1478	   4        T[3]         3,  8, 13, 18             0, 5, 5, 5

1480	   5        T[20]       20, 25, 30, 35             0, 5, 5, 5
1481	   and so on ..

1483	A.2 Continuous interleave

1485	   In continuous interleave, once the scheme is 'primed', the number of
1486	   frames in a packet exceeds the 'stride' (the distance between them).
1487	   This shortens the buffering needed, smooths the data-flow, and gives
1488	   slightly larger packets -- and thus lower overhead -- for the same
1489	   interleave.  For example, here is a continuous interleave also over
1490	   a stride of 3 frames, but with 4 frames per packet, for a run of 20
1491	   frames.  This shows both how the scheme 'starts up' and how it
1492	   finishes.

1494	   Packet   Time-stamp   Frame Numbers       AU-Index, AU-Index-delta
1495	   0        T[0]                     0       0
1496	   1        T[1]                 1   4       0  2
1497	   2        T[2]             2   5   8       0  2  2
1498	   3        T[3]          3   6   9  12      0  2  2  2
1499	   4        T[7]          7  10  13  16      0  2  2  2
1500	   5        T[11]        11  14  17  20      0 2  2  2
1501	   6        T[15]        15  18              0 2
1502	   7        T[19]        19                  0

1504	   In this case, the receiver has to buffer only 3 frames, not 4. Say
1505	   we are waiting for packet 4.  We can flush frames 0, 1, 2, 3, 4, 5,
1506	   6;  we are holding therefore 8, 9, 12.   Packet 4 arrives, allowing
1507	   us to emit 7,8,9,10, and we are holding 12,13,16.  Each arriving
1508	   packet contains 4 frames, and allows 4 frames to be flushed.

1510	   In the above example the AU-Index is coded with the value 0, as
1511	   required for the modes defined in this document. To reconstruct the
1512	   original order, the RTP time stamp and the AU-Index-delta are used.
1513	   See also 3.2.3.2.

1515	   If there is loss, again the receiver has to wait to emit the erasure
1516	   frames.  In this case, say packet 3 is lost.  We were holding frames
1517	   4, 5, and 8.  On the arrival of packet 4, (time-stamp of frame 7),
1518	   we now know frame 3 was lost, we can emit frames 4,5, and we know 6
1519	   must be lost, and emit 7, which is in the packet that arrived. Then
1520	   on the arrival of packet 5 (time-stamp 11) we can emit 8, indicate
1521	   loss of 9, and emit 10 and 11. Finally, the arrival of packet 6
1522	   (time-stamp 15) indicates that 12 must be lost;  we have now
1523	   detected all the lost frames.