idnits 2.17.1 

draft-ietf-avt-mpeg4-simple-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 1269 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 101 instances of too long lines in the document, the longest
     one being 6 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 741 has weird spacing: '...    for  stere...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'SHALL not' in this paragraph:
     
     The AU-headers are configured using format parameters and MAY be
     empty. If the AU-header is configured empty, the AU-headers-length field
     SHALL not be present and consequently the AU Header Section is empty. If
     the AU-header is not configured empty, then the AU-headers-length is a
     two octet field that specifies the length in bits of the immediately
     following AU-headers.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'SHALL not' in this paragraph:
     
     Applications MAY use more parameters, in addition to those defined
     above. Receivers MUST tolerate the presence of such additional
     parameters, but these parameters SHALL not impact the decoding of
     receivers that comply to this specification.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (September 2002) is 7891 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '0' is mentioned on line 1214, but not defined

  == Missing Reference: '9' is mentioned on line 1177, but not defined

  == Missing Reference: '11' is mentioned on line 1219, but not defined

  == Missing Reference: '15' is mentioned on line 1220, but not defined

  == Missing Reference: '19' is mentioned on line 1221, but not defined

  == Unused Reference: '4' is defined on line 1092, but no explicit reference
     was found in the text

  == Unused Reference: '6' is defined on line 1098, but no explicit reference
     was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550)

  ** Obsolete normative reference: RFC 3016 (ref. '5') (Obsoleted by RFC 6416)

  -- No information found for draft-gentric-avt-mpeg4-multiSL - is the name
     correct?

  -- Possible downref: Normative reference to a draft: ref. '6' 

  ** Obsolete normative reference: RFC 2327 (ref. '7') (Obsoleted by RFC 4566)


     Summary: 9 errors (**), 0 flaws (~~), 12 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                        J. van der Meer
2	Internet Draft                                     Philips Electronics
3	                                                             D. Mackie
4	                                                    Cisco Systems Inc.
5	                                                        V. Swaminathan
6	                                                 Sun Microsystems Inc.
7	                                                             D. Singer
8	                                                        Apple Computer

10	                                                             March 2002
11	                                                 Expires September 2002

13	   Document: draft-ietf-avt-mpeg4-simple-01.txt

15	   Use of "RFC XXXX" for MPEG-4 Elementary Streams with no SL layer

17	Status of this Memo

19	   This document is an Internet-Draft and is in full conformance with
20	   all provisions of Section 10 of RFC2026.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF), its areas, and its working groups. Note that
24	   other groups may also distribute working documents as Internet-
25	   Drafts. Internet-Drafts are draft documents valid for a maximum of
26	   six months and may be updated, replaced, or obsoleted by other
27	   documents at any time. It is inappropriate to use Internet- Drafts
28	   as reference material or to cite them other than as "work in
29	   progress."

31	   The list of current Internet-Drafts can be accessed at
32	   http://www.ietf.org/ietf/1id-abstracts.txt
33	   The list of Internet-Draft Shadow Directories can be accessed at
34	   http://www.ietf.org/shadow.html.

36	   This specification is a product of the Audio/Video Transport working
37	   group within the Internet Engineering Task Force. Comments are
38	   solicited and should be addressed to the working group's mailing
39	   list at avt@ietf.org and/or the authors.

41	   <<
42	   Note for the RFC editor:
43	   XXXX should be replaced with the RFC number that will be assigned to
44	   the companion RFC which draft is: draft-ietf-avt-mpeg4-multisl-**.txt.
45	   >>

47	   Abstract

49	   The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in ISO
50	   that recently produced the MPEG-4 standard. MPEG defines tools to
51	   compress content such as audio-visual information into elementary
52	   streams. In RFC XXXXX a generic RTP payload format is defined for
53	   transport of any non-multiplexed MPEG-4 elementary stream. To achieve
54	   the generic MPEG-4 functionality, RFC XXXXX addresses detailed issues
55	   related to the MPEG-4 SL layer. However, many initial applications will
56	   not use the SL Layer. To facilitate usage of RFC XXXXX by such
57	   applications, this document describes how to use RFC XXXX when no SL
58	   layer is used.

60	1. Introduction

62	   The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29
63	   that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4
64	   standards [1]. The MPEG-4 standard specifies compression of
65	   audio-visual data into for example an audio or video elementary
66	   stream. In the MPEG-4 standard, these streams take the form of
67	   audiovisual objects that may be arranged into an audio-visual scene
68	   by means of a scene description. Each MPEG-4 elementary stream
69	   consists of a sequence of Access Units; in case of audio an Access
70	   Unit (AU) is an audio frame and in case of video a picture.

72	   The MPEG-4 system specification is a rather abstract specification in
73	   the sense that no transport format for MPEG-4 elementary streams is
74	   defined. Instead, a conceptual SL layer has been specified to store
75	   transport specific information such as time stamps and random access
76	   point information. When transporting an MPEG-4 elementary stream,
77	   transport information from the SL layer is typically mapped to the
78	   actual transport layer. Note however that the SL layer is conceptual
79	   and may not exist in practice.

81	   In RFC XXXX, a general payload format is defined for transport of a single
82	   MPEG-4 elementary stream over RTP. The RTP payload format specified
83	   in RFC XXXX allows for carriage of any information that may be contained in
84	   the MPEG-4 SL layer, either by mapping to the RTP header fields or by
85	   carriage in specific fields defined in the RTP payload. Consequently,
86	   the format defined in RFC XXXX is very generic and complete; for example,
87	   transcoding issues from and to the SL layer are described in detail.

89	   However, in many initial MPEG-4 applications the SL layer does not
90	   exist in practice. Such applications do not require any knowledge of
91	   the SL layer. While the use of RFC XXXX is highly desirable for all MPEG-4
92	   applications, to understand RFC XXXX may be difficult without knowledge of
93	   the MPEG-4 SL layer. Therefore in this document the use of RFC XXXX is
94	   described without requiring knowledge of the SL layer to understand
95	   its functionality.

97	   Sophisticated features on interleaving of fragmented Access Units are
98	   defined in RFC XXXX. Because initial applications only need interleaving
99	   of complete (non-fragmented) Access Units, these more sophisticated
100	   features are not supported in this document. Hence, only a functional
101	   set of RFC XXXX is supported.

103	   In RFC XXXX, a general and configurable payload structure is defined for
104	   transport of MPEG-4 streams. This allows for the design of receivers
105	   that can be configured to receive any MPEG-4 stream. Configuration of
106	   the payload is provided to accommodate transport of any MPEG-4 stream,
107	   but for a specific MPEG-4 elementary stream typically only very few
108	   configurations are needed. So as to allow for the design of simplified,
109	   but dedicated receivers, this specifications requires that specific
110	   modes are defined for transport of MPEG-4 streams. In this document
111	   only modes are defined for transport of MPEG-4 CELP and AAC streams,
112	   but in future new RFCs are expected to specify additional modes for
113	   transport of other MPEG-4 streams.

115	   In summary, this document:
116	   - is intended for applications that do not apply the SL layer;
117	   - describes how to use RFC XXXX without requiring knowledge of the
118	     SL layer;
119	   - defines a functional but true subset of RFC XXXX;
120	   - defines modes how to use this specification for transport of MPEG-4
121	     CELP and AAC streams.

123	   The use of RFC XXXX defined in this document is simple to implement
124	   and reasonably efficient. It allows for optional interleaving of
125	   Access Units (such as audio frames) to increase error resiliency in
126	   packet loss.

128	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
129	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
130	   this document are to be interpreted as described in RFC 2119 [3].

132	2. Carriage of MPEG-4 elementary streams over RTP

134	2.1 Introduction

136	   With this payload format a single MPEG-4 elementary stream can be
137	   transported. Information on the type of MPEG-4 stream carried in the
138	   payload is conveyed by format parameters in an SDP [7] message or
139	   by other means. These format parameters specify the configuration
140	   of the payload. To simplify receivers, also a format parameter is
141	   available to signal a specific mode of using this payload. A mode
142	   definition MAY include the type of MPEG-4 elementary stream as well
143	   as the applied configuration, so as to avoid the need in receivers
144	   for parsing all format parameters.

146	2.2 MPEG Access Units

148	   For carriage of compressed audio-visual data MPEG defines Access
149	   Units. An MPEG Access Unit (AU) is the smallest data entity to which
150	   timing information can be attributed. In case of audio an Access
151	   Unit represents an audio frame and in case of video a picture. MPEG
152	   Access Units are by definition byte aligned. If for example an audio
153	   frame is not byte aligned, up to 7 zero-padding bits MUST be inserted
154	   at the end of the frame to achieve a byte-aligned Access Unit.
155	   Decoders MUST be able to decode AUs in which such padding is applied.

157	   Consistent with the MPEG-4 specification, this document requires that
158	   each MPEG-4 video Access Unit includes all the coded data of a
159	   picture, any video stream headers that may precede the coded picture
160	   data, and any video stream stuffing that may follow it, up to, but not
161	   including the startcode indicating the start of a new video stream or
162	   the next Access Unit.

164	2.3 Concatenation of Access Units

166	   Frequently it is possible to carry multiple Access Units in one RTP
167	   packet. This is particularly useful for audio; for example, when AAC
168	   is used for encoding of a stereo signal at 64 kbits/sec, AAC frames
169	   contain on average approximately 200 bytes. On a LAN with a 1500 octet
170	   MTU this would allow on average 7 complete AAC frames to be carried
171	   per AAC packet.

173	   Access Units may have a fixed size in octets, but a variable size is
174	   also possible. To facilitate parsing in case of multiple concatenated
175	   AUs in one RTP packet, the size of each AU is made known to the
176	   receiver. When concatenating in case of a constant AU size, this size
177	   is communicated through a format parameter. When concatenating in case
178	   of variable size AUs, the RTP payload carries an AU size field for
179	   each contained AU. In combination with the RTP payload length the
180	   size information allows the RTP payload to be split by the receiver
181	   back into the individual AUs.

183	   To simplify the implementation of RFC XXXX defined in this document, it
184	   is required that when multiple AUs are carried in an RTP packet, that
185	   each AU MUST be complete, i.e. the number of AUs in an RTP packet
186	   MUST be integral.

188	2.4 Fragmentation of Access Units

190	   MPEG allows for very large Access Units. Since most IP networks have
191	   significantly smaller MTU's, this payload format allows to fragment
192	   the AUs over multiple RTP packets so as to avoid IP layer
193	   fragmentation. To simplify the implementation of RFC XXXX defined in this
194	   document, an RTP packet SHALL either carry one or more complete
195	   Access Units or a single fragment of one Access Unit.

197	2.5 Interleaving

199	   When an RTP packet carries a contiguous sequence of Access Units,
200	   the loss of such packet can result in "decoding gaps" for the user.
201	   One method to alleviate this problem is to allow for the Access
202	   Units to be interleaved in the RTP packets. For a modest cost in
203	   latency and implementation complexity, significant error resiliency
204	   to packet loss can be achieved.

206	   To support optional interleaving of Access Units, this payload
207	   format allows for index information to be sent for each Access Unit.
208	   The RTP sender is free to choose the interleaving pattern without
209	   propagating this information to the receiver(s). Indeed the sender
210	   could dynamically adjust the interleaving pattern based on the
211	   Access Unit size, error rates, etc. The RTP receiver does not need
212	   to know the interleaving pattern used, it only need extract the
213	   index information of the Access Unit and insert the Access Unit into
214	   the appropriate sequence in the rendering queue. An example of
215	   interleaving is given below.

217	   Assume that an RTP packet contains 3 AUs, and that the AUs are
218	   numbered 1, 2, 3, 4, etc. If an interleaving group length of 9 is
219	   chosen, then RTP packet(i) contain the following AU(n):
220	   RTP packet(1):  AU(1),  AU(4),  AU(7)
221	   RTP packet(2):  AU(2),  AU(5),  AU(8)
222	   RTP packet(3):  AU(3),  AU(6),  AU(9)
223	   RTP packet(4):  AU(10), AU(13), AU(16)
224	   RTP packet(5):  AU(11), AU(14), AU(17)
225	   Etc.

227	2.6 Time stamp information

229	   MPEG-4 defines two type of time stamps, the decoding time stamp DTS
230	   and the composition time stamp CTS. The RTP timestamp is equivalent
231	   to the composition time stamp.

233	   The RTP time stamp MUST carry the sampling instance of the first AU
234	   (fragment) in the RTP packet. When multiple AUs are carried within
235	   an RTP packet, the time stamps of subsequent AUs can be calculated
236	   if the frame period of each AU is known. For audio and video this
237	   is possible if the frame rate is constant. However, in some cases it
238	   is not possible to make such calculation, for example for variable
239	   frame rate video and for MPEG-4 BIFS streams carrying composition
240	   information. To support such cases, this payload format can be
241	   configured to carry a CTS in the RTP payload for each contained
242	   Access Unit. A CTS time stamp MAY be conveyed in the RTP payload
243	   only for non-first AUs in the RTP packet, and SHALL NOT be conveyed
244	   for the first AU (fragment), as the time stamp for the latter is
245	   carried by the RTP time stamp.

247	   The DTS timestamp may be applied only in MPEG video streams that use
248	   bi-directional coding, i.e. when pictures may be predicted in both
249	   forward and backward direction by using either a reference picture in
250	   the past, or a reference picture in the future. The DTS cannot be
251	   carried in the RTP header. In some cases the DTS can be derived from
252	   the RTP time stamp using frame rate information; this requires deep
253	   parsing in the video stream, which may be considered objectionable.
254	   But if the video frame rate is variable, the required information
255	   may not even present in the video stream. For both reasons, the
256	   capability has been defined to optionally carry a DTS in the RTP
257	   payload for each contained Access Unit.

259	   Since RTP time stamps may be re-stamped by RTP devices, each CTS
260	   and DTS contained in the RTP payload is coded differentially from the
261	   RTP time stamp, so as to avoid extensive parsing by re-stamping
262	   devices.

264	2.7 Carriage of auxiliary information.

266	   This payload format defines a specific field to carry auxiliary data
267	   on the contained MPEG-4 stream, representing MPEG-4 system information.
268	   The auxiliary data corresponds to the RSLH field defined in RFC XXXX.
269	   Receivers MAY use the auxiliary data to decode the contained stream,
270	   but receivers that have no interest in such data MAY skip the
271	   auxiliary data field. To facilitate skipping of the data, and to avoid
272	   the need for parsing it, the auxiliary data field is preceded by a
273	   field that specifies the length of the auxiliary data.

275	2.8 Format parameters and the conditional presence and length of fields

277	   To support the features described in the previous sections several
278	   fields are defined for carriage in the RTP payload. However, their use
279	   strongly depends on the type of MPEG-4 elementary stream that is
280	   carried. Sometimes a specific field is needed with a certain length,
281	   while in other cases such field is not needed at all. To be efficient
282	   in either case, the fields needed for these features are configurable
283	   by means of format parameters. In general, a format parameter defines
284	   the presence and length of associated fields. A length of zero
285	   indicates absence of the field. As a consequence, parsing of the
286	   payload requires knowledge of format parameters. The format
287	   parameters are conveyed to the receiver via SDP [7] messages or
288	   through other means.

290	2.9 Global structure of payload format

292	   The payload structure in RFC XXXX is described in terms derived from the
293	   SL layer. In this document exactly the same structure is described
294	   in more general terms, so as to improve the readability for people
295	   with no knowledge of the SL layer. So the payload structure described
296	   below corresponds on bit level exactly to the payload structure
297	   defined in RFC XXXX.

299	   The RTP payload following the RTP header, contains three byte aligned
300	   data sections, of which the first two MAY be empty. See figure 1.

302	          +---------+-----------+-----------+---------------+
303	          | RTP     | AU Header | Auxiliary | Access Unit   |
304	          | Header  | Section   | Section   | Data Section  |
305	          +---------+-----------+-----------+---------------+

307	                    <----------RTP Packet Payload----------->

309	   Figure 1: Data sections within an RTP packet

311	   The first data section is the AU (Access Unit) Header Section, that
312	   contains one or more AU-headers; however, each AU-header MAY be empty,
313	   in which case the entire AU Header Section is empty. The second
314	   section is the Auxiliary Section, containing auxiliary data; also
315	   this section MAY be configured empty. The third section is the Access
316	   Unit Data Section, containing either a single fragment of one Access
317	   Unit or one or more complete Access Units. The Access Unit Data
318	   Section is never empty.

320	   When compared to the terms used in RFC XXXX, the AU Header Section
321	   exactly corresponds to the Payload Header Section, the Auxiliary
322	   Section to the RSLH Section, and the Access Unit Data Section to the
323	   Payload Section.

325	2.10 Modes to transport MPEG-4 streams

327	   While it is possible to build fully configurable receivers capable of
328	   receiving any MPEG-4 stream, this specification also allows for the
329	   design of simplified, but dedicated receivers, that are capable for
330	   example to receive only one type of MPEG-4 stream. This is achieved by
331	   requiring that specific modes be defined for using this specification.
332	   Each mode defines how to transport specific MPEG-4 streams, for example
333	   by defining suitable constraints or payload configurations. Modes can
334	   be defined as deemed appropriate. However, each mode MUST be in full
335	   compliance with this specification.

337	   The applied mode MUST be signalled. Signalling the mode is particularly
338	   important for receivers that are only capable of decoding a particular
339	   mode. Such receivers need to determine whether that particular mode is
340	   applied, so as to avoid problems with processing of payloads that are
341	   beyond the capabilities of the receiver.

343	   In this internet draft only modes are defined for transport of MPEG-4
344	   CELP and AAC streams. However, in future new RFCs are expected to
345	   specify additional modes of using this specification for transport of
346	   other MPEG-4 streams.

348	2.11 Alignment with RFC XXXX and RFC 3016

350	   This document defines a subset of the RFC XXXX. The main characteristic
351	   of this subset is that each RTP payload is only allowed to contain either
352	   a single fragment of one Access Unit or one or more complete Access Units.
353	   Obviously, RTP payloads that apply this subset in conformance with this
354	   document conform also to RFC XXXX. Receivers that comply with RFC XXXX
355	   are able to decode MPEG-4 streams carried in compliance with this
356	   document.

358	   Receivers designed to only comply to this document may not be able to
359	   decode a RTP payload that conforms to RFC XXXX but not to this document.
360	   Such receivers may also not be capable of exploiting some of features
361	   of the SL layer supported in RFC XXXX, such as knowledge of AU-start,
362	   random access information and other information carried in the SL header,
363	   but not described in this document.

365	   Furthermore, this payload can be configured to be identical to the
366	   payload format defined in RFC 3016 [5] for the MPEG-4 video configurations
367	   recommended in RFC 3016. Hence, receivers that comply with RFC 3016
368	   can decode such RTP payload. Vice versa, receivers that comply with the
369	   specification in this document SHOULD be able to decode payloads, names
370	   and parameters defined for MPEG-4 video in RFC 3016.

372	   For interoperability reasons, applications that transport MPEG-4 video
373	   over RTP SHOULD use the payload format and associated names and
374	   parameters defined in RFC 3016 if the functionality provided by RFC 3016
375	   can meet the requirements of that application.

377	3 Payload Format

379	3.1 RTP Header Fields Usage

381	   Payload Type (PT): The assignment of an RTP payload type for this
382	   RTP packet format is outside the scope of this document, and will
383	   not be specified here. It is expected that the RTP profile for a
384	   particular class of applications will assign a payload type for this
385	   encoding, or if that is not done, then a payload type in the dynamic
386	   range shall be chosen.

388	   Marker (M) bit: The M bit is set to 1 to indicate that the RTP packet
389	   payload includes the end of each Access Unit of which data is
390	   contained in this RTP packet. As the payload either carries one or
391	   more complete Access Units or a single fragment of an Access Unit,
392	   the M is always set to set to 1, except when the packet carries a
393	   single fragment of an Access Unit that is not the last one.

395	   Extension (X) bit: Defined by the RTP profile used.

397	   Sequence Number: The RTP sequence number SHOULD be generated by the
398	   sender with a constant random offset.

400	   Timestamp: Indicates the sampling instance of the first AU contained
401	   in the RTP payload. This sampling instance is equivalent to the CTS
402	   in the MPEG-4 time domain. The clock rate of the RTP time stamp MUST
403	   be expressed as part of the RTPMAP. If an audio or video stream with
404	   a fixed frame rate is transported, the rate SHOULD be set to the same
405	   value as the sampling frequency of the audio or video frames (number
406	   of samples per second).
407	   In all cases, the sender SHALL make sure that RTP time stamps
408	   are identical only if the RTP time stamp refers to fragments of the
409	   same Access Unit.
410	   According to RFC 1889 [2] (section 5.1), RTP timestamps are
411	   recommended to start at a random value for security reasons. However,
412	   then a receiver is, in the general case, not able to reconstruct the
413	   original MPEG Time Stamps, which creates problems for applications
414	   where streams from multiple sources are to be synchronized. To enable
415	   synchronisation in such cases, for example between one stream from
416	   local storage and another from an RTP streaming server, the applied
417	   random offset MUST be provided out of band. Methods to convey the
418	   applied random offset value are beyond the scope of this
419	   specification.

421	   SSRC: set as described in RFC1889 [2].

423	   CC and CSRC fields are used as described in RFC 1889 [2].

425	   RTCP SHOULD be used as defined in RFC 1889 [2].

427	3.2 RTP Payload Structure

429	   As already noted in section 2.9 of this document, this document uses
430	   more general names to describe exactly the same payload structure as
431	   defined in RFC XXXX. For mapping between section names in RFC XXXX and
432	   in this document see section 2.9.

434	3.2.1 The AU Header Section

436	   When present, the AU Header Section consists of the AU-header-length
437	   field, followed by a number of AU-headers. See figure 2.

439	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
440	   |AU-headers-length|AU-header|AU-header|      |AU-header|padding|
441	   |                 |   (1)   |   (2)   |      |   (n)   | bits  |
442	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+

444	   Figure 2: The AU Header Section

446	   The AU-headers are configured using format parameters and MAY be empty.
447	   If the AU-header is configured empty, the AU-headers-length field
448	   SHALL not be present and consequently the AU Header Section is empty.
449	   If the AU-header is not configured empty, then the AU-headers-length
450	   is a two octet field that specifies the length in bits of the
451	   immediately following AU-headers.

453	   Each AU-header is associated with a single Access Unit (fragment)
454	   contained in the Access Unit Data Section in the same RTP packet. For
455	   each contained Access Unit (fragment) there is exactly one AU-header.
456	   Within the AU Header Section, the AU-headers are bit-wise concatenated
457	   in the order in which the Access Units are contained in the Access
458	   Unit Data Section. Hence, the n-th AU-header refers to the n-th AU
459	   (fragment). If the concatenated AU-headers consume a non-integer
460	   number of octets, up to 7 zero-padding bits MUST be inserted at the end
461	   in order to achieve byte-alignment of the AU Header Section.

463	3.2.1.1 The AU-header

465	   The AU-header contains the fields given in figure 3. The length in
466	   bits of the above fields with the exception of the CTS-flag and
467	   the DTS-flag fields is defined by format parameters; see section 4.1.
468	   If a format parameter has the default value of zero, then the
469	   associated field is not present.

471	   +---------------------------------------+
472	   |     AU-size                           |
473	   +---------------------------------------+
474	   |     AU-Index / AU-Index-delta         |
475	   +---------------------------------------+
476	   |     CTS-flag                          |
477	   +---------------------------------------+
478	   |     CTS-delta                         |
479	   +---------------------------------------+
480	   |     DTS-flag                          |
481	   +---------------------------------------+
482	   |     DTS-delta                         |
483	   +---------------------------------------+

485	   Figure 3: The fields in the AU-header. If used, the AU-Index field
486	             only occurs in the first AU-header within an AU Header
487	             Section; in any other AU-header the AU-Index-delta field
488	             occurs instead.

490	   AU-size: indicates the size in octets of the associated Access Unit
491	         in the Access Unit Data Section in the same RTP packet. When the
492	         AU-size is associated to an AU fragment, the AU size indicates
493	         the size of the entire AU and not the size of the fragment. This
494	         can be exploited to determine whether a packet contains an entire
495	         AU or a fragment, which is particularly useful after losing a
496	         packet carrying the last fragment of an AU.

498	   AU-Index: indicates the serial number of the associated Access Unit
499	         (fragment). For each (in time) consecutive AU or AU fragment,
500	         the serial number is incremented with 1. When present, the
501	         AU-Index field occurs in the first AU-header in the AU Header
502	         Section, but MUST NOT occur in any subsequent (non-first)
503	         AU-header in that Section. To encode the serial number in any
504	         such non-first AU-header, the AU-Index-delta field is used.
505	         When each AU-Index field is coded with the value 0, the serial
506	         number of the AU (fragment) is not specified and in that case
507	         receivers MAY ignore the AU-Index field.

509	   AU-Index-delta: The AU-Index-delta field is an unsigned integer
510	         that specifies the serial number of the associated AU as the
511	         difference with respect to the serial number of the previous
512	         Access Unit. Hence, for the n-th (n>1) AU the serial number is
513	         found from:
514	         AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1
515	         If the AU-Index field is present in the first AU-header in
516	         the AU Header Section, then the AU-Index-delta field MUST be
517	         present in any subsequent (non-first) AU-header. When the
518	         AU-Index-delta is coded with the value 0, it indicates that
519	         the Access Units are consecutive in time. An AU-Index-delta
520	         value larger than 0 signals that interleaving is applied.

522	   CTS-flag: Indicates whether the CTS-delta field is present.
523	         A value of 1 indicates that the field is present, a value of 0
524	         that it is not present.
525	         The CTS-flag field MUST be present in each AU-header if the
526	         length of the CTS-delta field is signalled to be larger than
527	         zero. In that case, the CTS-flag field MUST have the value 0
528	         in the first AU-header and MAY have the value 1 in all non-first
529	         AU-headers. The CTS-flag field SHOULD be 0 for any non-first
530	         fragment of an Access Unit.

532	   CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's
533	         complement offset (delta) from the timestamp in the RTP header
534	         of this RTP packet. The CTS MUST use the same clock rate as the
535	         time stamp in the RTP header.

537	   DTS-flag: Indicates whether the DTS-delta field is present. A value
538	         of 1 indicates that DTS-delta is present, a value of 0 that it
539	         is not present.
540	         The DTS-flag field MUST be present in each AU-header if the
541	         length of the DTS-delta field is signalled to be larger than
542	         zero. The DTS-flag field SHOULD be 0 for any non-first
543	         fragment of an Access Unit.

545	   DTS-delta: specifies the value of the DTS as a 2's complement offset
546	         (delta) from the CTS timestamp. The DTS MUST use the same clock
547	         rate as the time stamp in the RTP header.

549	   If present, the fields MUST occur in the mutual order given in
550	   figure 3. In the general case a receiver can only discover the size
551	   of an AU-header by parsing it since the presence of the CTS-delta
552	   and DTS-delta fields is signalled by the value of the CTS-flag and
553	   DTS-flag, respectively.

555	3.2.2 The Auxiliary Section

557	   The Auxiliary Section consists of the auxiliary-data-size field
558	   followed by the auxiliary-data field. Receivers MAY (but are not
559	   required to) parse the auxiliary-data field; to facilitate skipping
560	   of the auxiliary-data field by receivers, the auxiliary-data-size
561	   field indicates the length in bits of the auxiliary-data. If the
562	   concatenation of the auxiliary-data-size and the auxiliary-data
563	   fields consume a non-integer number of octets, up to 7 zero padding
564	   bits MUST be inserted immediately after the auxiliary data in order
565	   to achieve byte-alignment. See figure 4.

567	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
568	   | auxiliary-data-size   | auxiliary-data       |padding bits |
569	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+

571	   Figure 4: The fields in the Auxiliary Section

573	   The length in bits of the auxiliary-data-size field is configurable
574	   by a format parameter; see section 4.1. The default length of zero
575	   indicates that the entire Auxiliary Section is absent.

577	   auxiliary-data-size; specifies the length in bits of the immediately
578	         following auxiliary-data field;

580	   auxiliary-data; the auxiliary-data field contains the Remaining SL
581	         headers (RSLHs) as defined in RFC XXXX.

583	3.2.3 The Access Unit Data Section

585	   The Access Unit Data Section contains an integer number of complete
586	   Access Units or a single fragment of one AU. The Access Unit Data
587	   Section is never empty. If data of more than one Access Units is
588	   contained, then the AUs are concatenated into a contiguous string of
589	   octets. See figure 5. The AUs inside the Access Unit Data Section
590	   MUST be in decoding order.

592	   The size and number of Access Units SHOULD be adjusted such that the
593	   resulting RTP packet is not larger than the path-MTU. To handle
594	   larger packets, this payload format relies on lower layers for
595	   fragmentation, which may not be desirable.

597	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
598	   |AU(1)                                                              |
599	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-                                |
600	   |                                                                   |
601	   |     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
602	   |               |AU(2)                                              |
603	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                                   |
604	   |                                                                   |
605	   |                            -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
606	   |                               | AU(n)                             |
607	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
608	   |               |
609	   |-+-+-+-+-+-+-+-+

611	   Figure 5: Access Unit Data Section; each AU is byte aligned.

613	   When multiple Access Units are carried, the size of each AU MUST be
614	   made available to the receiver. If the AU size is variable then the
615	   size of each AU MUST be indicated in the AU-size field of the
616	   corresponding AU-header. However, if the AU size is constant for a
617	   stream, this mechanism SHOULD NOT be used, but instead the fixed size
618	   SHOULD be signalled by the format parameter "ConstantSize", see
619	   section 4.1.

621	   The absence of both AU-size in the AU-header and the ConstantSize
622	   format parameter indicates carriage of a single AU (fragment), i.e.
623	   that a single Access Unit (fragment) is transported in each RTP
624	   packet for that stream.

626	3.2.3.1 Fragmentation

628	   A packet SHALL carry either one or more Access Units, or a single
629	   fragment of an Access Unit.  Fragments of the same Access Unit have
630	   the same time stamp but differing RTP sequence numbers. The marker
631	   bit in the RTP header is 1 on the last fragment of an Access Unit,
632	   and 0 on all other fragments.

634	3.2.3.2 Interleaving

636	   Access Units MAY be interleaved. Senders MAY perform interleaving.
637	   Receivers MUST support interleaving.

639	   When interleaving of Access Units is used it SHALL be implemented
640	   using the AU-Index and AU-Index-delta fields in the AU-header.

642	   Based on the RTP sequence number, the RTP time stamp, the AU-Index and
643	   the AU-Index-delta, a receiver can unambiguously reconstruct the
644	   original order even in case of out-of-order packets, packet loss or
645	   duplication. Note that for this purpose the AU-Index is redundant when
646	   the RTP time stamp and the AU-Index-delta values are sufficient for
647	   placing the AUs correctly in time. In such cases receivers MAY ignore
648	   the AU-Index value and senders MAY code the AU-Index field with the
649	   value 0, but only if they code each AU-Index field with that value.

651	   When interleaving is applied, a de-interleave buffer is needed in
652	   receivers to put the Access Units in their correct logical consecutive
653	   order in time. This requires the computation of the time stamp for
654	   each Access Unit. In case of a fixed time duration per Access Unit,
655	   the time-stamp of each access unit i in an RTP packet with RTP
656	   time-stamp T is calculated as follows:

658	   Timestamp[0] = T
659	   Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k]
660	                         + 1))) * access-unit-duration

662	   When AU-Index-delta is always 0, this reduces to T + I * (access-unit-
663	   duration). This is the non-interleaved case, the frames are consecutive
664	   in time. Note that the AU-Index field (present for the first Access
665	   Unit) is not needed in this calculation. Hence in cases where the
666	   Access-unit-duration has a fixed and known value, the AU-Index does not
667	   need to provide index information and can be coded with the value 0.
668	   See also the semantics of the AU-Index field in 3.2.1.1.

670	   When an RTP packet arrives (after any re-ordering has been done),
671	   receivers may 'flush' all Access Units from the interleave buffer
672	   which have a time-stamp strictly less than the time-stamp of the
673	   arriving packet. Similarly the first Access Unit of every arriving
674	   packet can always be flushed (as no following packet can provide an
675	   earlier Access Unit), and any Access Units which are consecutive with
676	   it which have already been received. Access Units should also be
677	   flushed in time to be played; this can be important if there is loss
678	   before end-of-stream, before a silence interval, or before a large
679	   drop-out.

681	3.2.3.3 Constraints for interleaving

683	   The size of the packets should be suitably chosen to be appropriate
684	   to both the path MTU and the duration and capacity of the receiver's
685	   de-interleave buffer. The maximum packet size for a session should be
686	   chosen not to exceed the path MTU.

688	   In order to control receiver latency and mitigate the effects of loss,
689	   there are profile-based limits on the size of the packet. This is
690	   expressed as a duration: it is calculated from the duration of the
691	   Access Units contained within a packet. It is NOT the difference in
692	   time-stamp between the first and last Access Unit in a packet.

694	   No matter what interleaving scheme is used, the scheme must be
695	   analyzed to calculate the minimum number of frames a receiver has to
696	   buffer in order to de-interleave.

698	   The maximum packet duration in milliseconds, and the maximum
699	   de-interleave buffer required at the receiver, for the two profiles,
700	   shall not exceed:

702	   RTP transport profile 0 -- 200 milliseconds
703	   RTP transport profile 1 -- 500 milliseconds

705	   When interleaving is applied, the applied RTP transport profile MUST
706	   be signalled by the profile parameter; see section 4.1.

708	   Note that for low bit-rate material, the duration limit may make
709	   packets shorter than the MTU size.

711	3.3 Usage of this specification

713	3.3.1 General

715	   Usage of this specification requires definition of a mode. A mode
716	   defines how use this specification for transport of one or more types
717	   of MPEG-4 streams. Each mode may specify constraints and payload
718	   configurations as deemed appropriate.

720	   Senders MUST signal the mode that they use by the format parameter
721	   Mode. In this document only modes are defined for transport of MPEG-4
722	   CELP and AAC streams, but more modes are expected to be defined in
723	   future RFCs.

725	3.3.2 Modes for MPEG-4 CELP and AAC streams

727	   Four modes are defined for transport of MPEG-4 CELP and AAC streams.
728	   In each of these modes, the same requirements apply for the rtpmap
729	   attributes. The general form of an rtpmap attribute is:
730	   a=rtpmap:<payload type><encoding name>/<clock rate>[/<encoding
731	             parameters>]
732	   For audio streams, <encoding parameters> specifies the number of
733	   audio channels. This parameter may be omitted if the number of
734	   channels is one, provided no additional parameters are needed.
735	   In all four modes, the following attributes are REQUIRED:
736	   a) The encoding name
737	   b) The RTP clock rate MUST be expressed. It is RECOMMENDED that this
738	      be the sampling rate of the audio, to give sample-accurate timing.
739	      However, other rates MAY be used (e.g. 90 kHz).
740	   c) The number of audio channels MUST be specified, for example as 2
741	      for  stereo material (see RFC 2327) and MAY be specified as 1 for
742	      mono material; 1 is the default.

744	3.3.3 Constant bit-rate CELP.

746	   This mode is signalled by mode=CELP-cbr. In this mode one or more
747	   fixed size CELP frames can be transported in one RTP packet; there is
748	   no support for interleaving. The RTP payload consist of one or more
749	   concatenated CELP frames, each of the same size. Both the AU Header
750	   Section and the Auxiliary Section are empty.

752	   The format parameter ConstantSize MUST be provided to specify the
753	   length of each CELP frame.

755	   For an example see below.

757	   m=audio 49230 RTP/AVP 96
758	   a=rtpmap:96 mpeg-generic/44100/2
759	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config=
760	   AudioSpecificConfig(); ConstantSize=xxx;

762	   The AudioSpecificConfig() specifies that the audio stream type is CELP.

764	3.3.4 Variable bit-rate CELP

766	   This mode is signalled by mode=CELP-vbr. With this mode in one RTP
767	   packet one or more variable size CELP frames can be transported with
768	   optional interleaving. As the largest possible frame size in this mode
769	   is greater than the maximum CELP frames size, there is no support for
770	   fragmentation on the CELP frames.

772	   In this mode the RTP payload consists of the AU Header Section,
773	   followed by one or more concatenated CELP frames. The Auxiliary Section
774	   is empty. For each CELP frame contained in the payload there is a one
775	   octet AU-header in the AU Header Section to provide :
776	   (a) the size of each CELP frame in the payload and
777	   (b) index information for computing the sequence (and hence timing) of
778	       each CELP frame.
779	   Transport of CELP frames requires that the AU-size field is coded with
780	   6 bits. In this mode therefore 6 bits are allocated to the AU-size
781	   field, and 2 bits to the AU-Index(-delta) field. Each AU-Index field
782	   MUST be coded with the value 0. In the AU Header Section, the
783	   concatenated AU-headers are preceded by the 16-bit AU-headers-length
784	   field, as specified in 3.2.1.

786	   Next to the required format parameters, the following parameters MUST
787	   be present:
788	   SizeLength, IndexLength, and IndexDeltaLength.
789	   When interleaving is applied (AU-Index-delta coded with a value larger
790	   than 0), also the parameter Profile MUST be present.

792	   Example :

794	   m=audio 49230 RTP/AVP 96
795	   a=rtpmap:96 mpeg4-generic/44100/2
796	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config=
797	   AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2;
798	   Profile=1

800	   The AudioSpecificConfig() specifies that the audio stream type is CELP.

802	3.3.5 Low bit-rate AAC

804	   This mode is signalled by AAC-lbr. This mode supports transport of one
805	   or more variable size AAC frames with optional support for interleaving
806	   and fragmenting. The maximum size of an AAC frame (fragment) in this
807	   mode is 63 octets.

809	   The payload configuration in this mode is the same as in the variable
810	   bit-rate CELP mode as defined in 3.3.4. The RTP payload consists of the
811	   AU Header Section, followed by concatenated AAC frames. The Auxiliary
812	   Section is empty. For each AAC frame contained in the payload the one
813	   octet AU-header provides :
814	   (a) the size of each AAC frame in the payload and
815	   (b) index information for computing the sequence (and hence timing) of
816	       each AAC frame.
817	   In the AU-header, the AU-size is coded with 6 and the AU-Index(-delta)
818	   with 2 bits; the AU-Index field MUST have the value 0 in each AU-header.
819	   In the AU-header Section, the concatenated AU-headers are preceded by
820	   the 16-bit AU-headers-length field, as specified in 3.2.1.

822	   Next to the required format parameters, the following parameters MUST
823	   be present:
824	   SizeLength, IndexLength, and IndexDeltaLength.
825	   When interleaving is applied (AU-Index-delta coded with a value larger
826	   than 0), also the parameter Profile MUST be present.

828	   Example :

830	   m=audio 49230 RTP/AVP 96
831	   a=rtpmap:96 mpeg4-generic/44100/2
832	   a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config=
833	   AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2;
834	   Profile=1

836	   The AudioSpecificConfig() specifies that the audio stream type is AAC.

838	3.3.6 High bit-rate AAC

840	   This mode is signalled by mode=AAC-hbr. This mode supports transport
841	   of one or more large variable size AAC frames in one RTP packet with
842	   optional support for interleaving and fragmenting. The maximum size of
843	   an AAC frame (fragment) in this mode is 8191 bytes.

845	   In this mode the RTP payload consists of the AU Header Section,
846	   followed by one or more concatenated AAC frames. The Auxiliary Section
847	   is empty. For each AAC frame contained in the payload there is an
848	   AU-header in the AU Header Section to provide :
849	   (a) the size of each AAC frame in the payload and
850	   (b) index information for computing the sequence (and hence timing) of
851	       each AAC frame.
852	   To code the maximum size of an AAC frame requires 13 bits. Therefore in
853	   this configuration 13 bits are allocated to the AU-size, and 3 bits
854	   to the AU-Index(-delta) field. Thus each AU-header has a size of 2
855	   octets. Each AU-Index field MUST be coded with the value 0. In the
856	   AU Header Section, the concatenated AU-headers are preceded by the
857	   16-bit AU-headers-length field, as specified in 3.2.1.

859	   Next to the required format parameters, the following parameters MUST
860	   be present:
861	   SizeLength, IndexLength, and IndexDeltaLength.
862	   When interleaving is applied (AU-Index-delta coded with a value larger
863	   than 0), also the parameter Profile MUST be present.

865	   Example :
866	   m=audio 49230 RTP/AVP 96
867	   a=rtpmap:96 mpeg4-generic/44100/2
868	   a=fmtp:96 streamtype=5; profile-level-id=15; mode= AAC-hbr; config=
869	   AudioSpecificConfig(); SizeLength=13; IndexLength=3; IndexDeltaLength=3;
870	   Profile=1

872	   The AudioSpecificConfig() specifies that the audio stream type is AAC.

874	4. IANA considerations

876	   This payload format uses the same the MIME types and names as defined
877	   in RFC XXXX. However, some additional format parameters are defined.

879	   Depending on the required payload configuration, format parameters may
880	   need to be available to the receiver. This is done using the parameters
881	   described in the next section. The absence of any of these parameters
882	   is equivalent to the associated field set to its default value, which
883	   is always zero. The absence of any such parameters resolves into a
884	   default "basic" configuration.

886	   MIME subtype name: mpeg4-generic

888	   Required parameters:

890	      StreamType:

892	      The integer value that indicates the type of MPEG-4 stream that is
893	      carried; its coding corresponds to the values of the streamType as
894	      defined for the DecoderConfigDescriptor in ISO/IEC 14496-1.

896	      Profile-level-id:
897	      A decimal representation of the MPEG-4 Profile Level indication.
898	      This parameter MUST be used in the capability exchange or session
899	      set-up procedure to indicate the MPEG-4 Profile and Level
900	      combination of which the relevant MPEG-4 media codec is capable
901	      of.
902	      For audio streams, this parameter is the decimal value from Table 5
903	      (audioProfileLevelIndicationValues) in ISO/IEC 14496-1, indicating
904	      which MPEG-4 Audio tool subsets are applied to encode the audio
905	      stream.
906	      For visual streams, this parameter is the decimal value from Table
907	      G-1 (FLC table for profile and level indication of ISO/IEC 14496-2,
908	      indicating which MPEG-4 Visual tool subsets are applied to encode
909	      the visual stream.

911	      Config:
912	      A hexadecimal representation of an octet string that expresses the
913	      media payload configuration. Configuration data is mapped onto the
914	      octet string in an MSB-first basis. The first bit of the
915	      configuration data SHALL be located at the MSB of the first octet.
916	      In the last octet, if necessary to achieve byte alignment, up to
917	      7 zero-valued padding bits shall follow the configuration data.
918	      For audio streams, config is the audio object type specific decoder
919	      configuration data AudioSpecificConfig() as defined in ISO/IEC
920	      14496-3.
921	      For visual streams, config is the MPEG-4 Visual configuration
922	      information, as defined in subclause 6.2.1 Start codes of
923	      ISO/IEC14496-2. The configuration information indicated by this
924	      parameter SHALL be the same as the configuration information in the
925	      corresponding MPEG-4 Visual stream, except for first-half-vbv-
926	      occupancy and latter-half-vbv-occupancy, if it exists, which may
927	      vary in the repeated configuration information inside an MPEG-4
928	      Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2).

930	   Optional parameters:

932	      Mode:
933	      The mode in which this specification is used. The following modes
934	      can be signalled :
935	      mode=CELP-cbr,
936	      mode=CELP-vbr,
937	      mode=AAC-lbr and
938	      mode=AAC-hbr.
939	      Other modes are expected to be defined in future RFCs. When defining
940	      a new mode care MUST be taken that an implementation of all features
941	      of this specification can decode the payload format corresponding to
942	      this new mode. For this reason a mode MUST NOT specify new default
943	      values for MIME parameters; in particular, MIME parameters MUST be
944	      present (unless they have the default value), even if it is redundant
945	      in case the mode assigns fixed values. A mode may define additionally
946	      that some MIME parameters are required instead of optional, that some
947	      MIME parameters have fixed values (or ranges), and that there are
948	      rules restricting the usage.

950	      ConstantSize:
951	      The constant size in octets of each Access Unit for this stream.
952	      Simultaneous presence of ConstantSize and the SizeLength
953	      parameters is not permitted.

955	      SizeLength:
956	      The number of bits on which the AU-size field is encoded in the
957	      AU-header. Simultaneous presence of SizeLength and the ConstantSize
958	      parameter is not permitted.

960	      IndexLength:
961	      The number of bits on which the AU-Index is encoded in the first
962	      AU-header. The default value of zero indicates the absence of the
963	      AU-Index and AU-Index-delta fields in each AU-header.

965	      IndexDeltaLength:
966	      The number of bits on which the AU-Index-delta field is encoded in
967	      any non-first AU-header.

969	      CTSDeltaLength:
970	      The number of bits on which the CTS-delta field is encoded in the
971	      AU-header.

973	      DTSDeltaLength:
974	      The number of bits on which the DTS-delta field is encoded in the
975	      AU-header.

977	      AuxiliaryDataSizeLength:
978	      The number of bits that is used to encode the auxiliary-data-size
979	      field.

981	      Profile:
982	      The decimal representation of the RTP transport profile.

984	   Applications MAY use more parameters, in addition to those defined
985	   above. Receivers MUST tolerate the presence of such additional
986	   parameters, but these parameters SHALL not impact the decoding of
987	   receivers that comply to this specification.

989	   Encoding considerations:
990	   System bitstreams MUST be generated according to MPEG-4 System
991	   specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated
992	   according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio
993	   bitstreams MUST be generated according to MPEG-4 Visual
994	   specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized
995	   according to the RTP payload format defined in RFC <self-reference-to-
996	   this>.

998	   Security considerations:
999	   As in RFC <self-reference-to-this>.

1001	   Interoperability considerations:
1002	   MPEG-4 provides a large and rich set of tools for the coding of
1003	   visual objects.  For effective implementation of the standard,
1004	   subsets of the MPEG-4 tool sets have been provided for use in
1005	   specific applications. These subsets, called 'Profiles', limit the
1006	   size of the tool set a decoder is required to implement. In order to
1007	   restrict computational complexity, one or more 'Levels' are set for
1008	   each Profile. A Profile@Level combination allows:
1009	   . a codec builder to implement only the subset of the standard he
1010	     needs, while maintaining interworking with other MPEG-4 devices
1011	     included in the same combination, and
1012	   . checking whether MPEG-4 devices comply with the standard
1013	     ('conformance testing').
1014	   A stream SHALL be compliant with the MPEG-4 Profile@Level specified
1015	   by the parameter "profile-level-id". Interoperability between a
1016	   sender and a receiver may be achieved by specifying the parameter
1017	   "profile-level-id" in MIME content, or by arranging in the
1018	   capability exchange/announcement procedure to set this parameter
1019	   mutually to the same value.

1021	   Published specification:
1022	   The specifications for MPEG-4 streams are presented in ISO/IEC
1023	   14469-1, 14469-2, and 14469-3.  The RTP payload format is described
1024	   in RFC <self-reference-to-this>.

1026	   Applications which use this media type:
1027	   Multimedia streaming and conferencing tools, Internet messaging and
1028	   Email applications.

1030	   Additional information: none

1032	   Magic number(s): none

1034	   File extension(s):
1035	   None. A file format with the extension .mp4 has been defined for
1036	   MPEG-4 content but is not directly correlated with this MIME type
1037	   which sole purpose is RTP transport.

1039	   Macintosh File Type Code(s): none

1041	   Person & email address to contact for further information:
1042	   Authors of RFC <self-reference-to-this>.

1044	   Intended usage: COMMON

1046	   Author/Change controller:
1047	   Authors of RFC <self-reference-to-this>.

1049	4.2 Concatenation of parameters

1051	   Multiple parameters SHOULD be expressed as a MIME media type string,
1052	   in the form of a semicolon-separated list of parameter=value pairs
1053	   (for parameter usage examples see Appendix A).

1055	4.3 Usage of SDP

1057	4.3.1 The a=fmtp keyword

1059	   It is assumed that one typical way to transport the above-described
1060	   parameters associated with this payload format is via a SDP message
1061	   [7] for example transported to the client in reply to a RTSP DESCRIBE
1062	   of via SAP. In that case the (a=fmtp) keyword MUST be used as
1063	   described in RFC 2327 [7, section 6]. The syntax being then:

1065	   a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>]

1067	5. Security Considerations

1069	   No additional security considerations apply beyond those discussed in
1070	   RFC 1889 and RFC XXXX.

1072	6. Acknowledgements

1074	   This document evolved through several revisions thanks to contributions
1075	   from a people from the ISMA forum, from the IETF AVT working group and
1076	   the 4-on-IP ad-hoc group within MPEG. The authors wish to thank all
1077	   involved people, and in particular Colin Perkins, Stephan Wenger and
1078	   Dorairaj V for their valuable comments and support.

1080	7. References

1082	   [1] ISO/IEC International Standard 14496 (MPEG-4); "Information
1083	   technology - Coding of audio-visual objects", January 2000

1085	   [2] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport
1086	   Protocol for Real Time Applications  RFC 1889, Internet Engineering
1087	   Task Force, January 1996.

1089	   [3] S. Bradner, Key words for use in RFCs to Indicate Requirement
1090	   Levels, RFC 2119, March 1997.

1092	   [4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, RTP payload
1093	   format for MPEG1/MPEG2 Video, RFC 2250, January 1998.

1095	   [5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP
1096	   payload format for MPEG-4 Audio/Visual streams, RFC 3016.

1098	   [6] Avaro, Basso, Casner, Civanlar, Gentric, Herpel, Lim, Perkins,
1099	   van der Meer, RTP payload format for MPEG-4 streams, work in progress,
1100	   draft-gentric-avt-mpeg4-multiSL-01.txt, January 2001.

1102	   [7] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327,
1103	   Internet Engineering Task Force, April 1998.

1105	7. Author Adresses

1107	   Jan van der Meer
1108	   Philips Digital Networks
1109	   Cederlaan 4
1110	   5600 JB Eindhoven
1111	   Netherlands
1112	   Email : jan.vandermeer@philips.com

1114	   David Mackie
1115	   Cisco Systems Inc.
1116	   170 West Tasman Dr.
1117	   San Jose, CA 95034
1118	   Email: dmackie@cisco.com

1120	   Viswanathan Swaminathan
1121	   Sun Microsystems Inc.
1122	   901 San Antonio Road, M/S UMPK15-214
1123	   Palo Alto, CA 94303
1124	   Email: viswanathan.swaminathan@sun.com

1126	   David Singer
1127	   Apple Computer, Inc.
1128	   One Infinite Loop, MS:302-3MT
1129	   Cupertino  CA 95014
1130	   Email: singer@apple.com

1132	   Full Copyright Statement

1134	   "Copyright (C) The Internet Society (date). All Rights Reserved. This
1135	   document and translations of it may be copied and furnished to others,
1136	   and derivative works that comment on or otherwise explain it or assist
1137	   in its implementation may be prepared, copied, published and
1138	   distributed, in whole or in part, without restriction of any kind,
1139	   provided that the above copyright notice and this paragraph are
1140	   included on all such copies and derivative works. However, this
1141	   document itself may not be modified in any way, such as by removing
1142	   the copyright notice or references to the Internet Society or other
1143	   Internet organizations, except as needed for the purpose of developing
1144	   Internet standards in which case the procedures for copyrights defined
1145	   in the Internet Standards process MUST be followed, or as required to
1146	   translate it into.

1148	APPENDIX: Usage of this payload format

1150	Appendix A. Examples

1152	A.1 Examples of delay analysis with interleave

1154	A.1.1 Group interleave

1156	   An example of regular interleave is when packets are formed into
1157	   groups.  If the number of packets in a group is N, packet 0 contains
1158	   frame 0, frame N, frame 2N, and so on;  packet 1 contains frame 1,
1159	   frame 1+N, 1+2N, and so on.  The AU-Index field is used to document
1160	   the sequence of the packet within the group (or the first frame in the
1161	   packet, which is the same thing in this scheme), and all the
1162	   AU-Index-delta fields contain N-1.

1164	   Receivers can tell when a new interleave group is starting, by noting
1165	   that the computed time-stamp of the first frame in a packet is later
1166	   than any previously computed time-stamp.  This is because no
1167	   following packet can contain an earlier RTP timestamp (RTP rules),
1168	   and the second and subsequent frames in a packet have larger
1169	   time-stamps (the frames in a packet are also in time-order).

1171	   If the group size is 3, then packets are formed as follows:

1173	   Packet   Time-stamp   Frame Numbers       AU-Index, AU-Index-delta
1174	   0        T[0]         0, 3, 6             0, 2, 2
1175	   1        T[1]         1, 4, 7             0, 2, 2
1176	   2        T[2]         2, 5, 8             0, 2, 2
1177	   3        T[9]         9,12,15             0, 2, 2

1179	   In this case, the receiver would have to buffer 4 frames at least
1180	   from packets 0 and 1, and can flush all frames when packet 2 arrives.
1181	   (Frame 0 can be flushed as packet 0 arrives, since it is the earliest
1182	   frame we hold, and likewise frame 1 from packet 1; we are therefore
1183	   holding 3,4,6,7 until packet 2 arrives).

1185	   If there is loss, then the receiver may wait longer than is strictly
1186	   necessary before it emits frames.  For example, say packet 1 is lost
1187	   from the above example.  Packet 0 allows frame 0 to be emitted, and
1188	   then packet 2 arrives, allowing us to notice the loss of frame 1, and
1189	   emit frame 2 and 3.  Then it is not until the arrival of packet 3
1190	   (which has a time-stamp beyond the times of all the frames seen so
1191	   far), that we can finish dealing with the loss, even though the first
1192	   group has, in fact, ended.  (This is in contrast to schemes which
1193	   signal the group size explicitly;  if the receiver knows that this is
1194	   packet 3 of 3, then even if 2 of 3 is missing, it can de-interleave
1195	   this group without waiting for the next one to start).

1197	   In the above example the AU-Index is coded with the value 0, as
1198	   required for the modes defined in this document. To reconstruct the
1199	   original order, the RTP time stamp and the AU-Index-delta are used.
1200	   See also 3.2.3.2.

1202	A.1.2 Continuous interleave

1204	   In continuous interleave, once the scheme is 'primed', the number of
1205	   frames in a packet exceeds the 'stride' (the distance between them).
1206	   This shortens the buffering needed, smooths the data-flow, and gives
1207	   slightly larger packets -- and thus lower overhead -- for the same
1208	   interleave.  For example, here is a continuous interleave also over a
1209	   stride of 3 frames, but with 4 frames per packet, for a run of 20
1210	   frames.  This shows both how the scheme 'starts up' and how it
1211	   finishes.

1213	   Packet   Time-stamp   Frame Numbers       AU-Index, AU-Index-delta
1214	   0        T[0]                     0       0
1215	   1        T[1]                 1   4       0  2
1216	   2        T[2]             2   5   8       0  2  2
1217	   3        T[3]          3   6   9  12      0  2  2  2
1218	   4        T[7]          7  10  13  16      0  2  2  2
1219	   5        T[11]        11  14  17  20      0 2  2  2
1220	   6        T[15]        15  18              0 2
1221	   7        T[19]        19                  0

1223	   In this case, the receiver has to buffer only 3 frames, not 4.  Say
1224	   we are waiting for packet 4.  We can flush frames 0, 1, 2, 3, 4, 5,
1225	   6;  we are holding therefore 8, 9, 12.   Packet 4 arrives, allowing
1226	   us to emit 7,8,9,10, and we are holding 12,13,16.  Each arriving
1227	   packet contains 4 frames, and allows 4 frames to be flushed.

1229	   In the above example the AU-Index is coded with the value 0, as
1230	   required for the modes defined in this document. To reconstruct the
1231	   original order, the RTP time stamp and the AU-Index-delta are used.
1232	   See also 3.2.3.2.

1234	   If there is loss, again the receiver has to wait to emit the erasure
1235	   frames.  In this case, say packet 3 is lost.  We were holding frames
1236	   4, 5, and 8.  On the arrival of packet 4, (time-stamp of frame 7), we
1237	   now know frame 3 was lost, we can emit frames 4,5, and we know 6 must
1238	   be lost, and emit 7, which is in the packet that arrived.  Then on
1239	   the arrival of packet 5 (time-stamp 11) we can emit 8, indicate loss
1240	   of 9, and emit 10 and 11.  Finally, the arrival of packet 6
1241	   (time-stamp 15) indicates that 12 must be lost;  we have now detected
1242	   all the lost frames.