idnits 2.17.1 

draft-ietf-avt-rfc3016bis-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RFC3640]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  -- The draft header indicates that this document obsoletes RFC3016, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (January 11, 2011) is 4853 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '14496-2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '14496-3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '23003-1'

  ** Obsolete normative reference: RFC 3016 (Obsoleted by RFC 6416)

  ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838)

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  -- Obsolete informational reference (is this intentional?): RFC 5246
     (Obsoleted by RFC 8446)


     Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	AVT                                                           M. Schmidt
3	Internet-Draft                                        Dolby Laboratories
4	Obsoletes: 3016 (if approved)                                 F. de Bont
5	Intended status: Standards Track                     Philips Electronics
6	Expires: July 15, 2011                                         S. Doehla
7	                                                          Fraunhofer IIS
8	                                                            Jaehwan. Kim
9	                                                     LG Electronics Inc.
10	                                                        January 11, 2011

12	           RTP Payload Format for MPEG-4 Audio/Visual Streams
13	                    draft-ietf-avt-rfc3016bis-02.txt

15	Abstract

17	   This document describes Real-Time Transport Protocol (RTP) payload
18	   formats for carrying each of MPEG-4 Audio and MPEG-4 Visual
19	   bitstreams without using MPEG-4 Systems.  For the purpose of directly
20	   mapping MPEG-4 Audio/Visual bitstreams onto RTP packets, it provides
21	   specifications for the use of RTP header fields and also specifies
22	   fragmentation rules.  It also provides specifications for Media Type
23	   registration and the use of Session Description Protocol (SDP).  The
24	   audio payload format described in this document has some limitations.
25	   for new system designs [RFC3640] is preferred.

27	Status of this Memo

29	   This Internet-Draft is submitted in full conformance with the
30	   provisions of BCP 78 and BCP 79.

32	   Internet-Drafts are working documents of the Internet Engineering
33	   Task Force (IETF).  Note that other groups may also distribute
34	   working documents as Internet-Drafts.  The list of current Internet-
35	   Drafts is at http://datatracker.ietf.org/drafts/current/.

37	   Internet-Drafts are draft documents valid for a maximum of six months
38	   and may be updated, replaced, or obsoleted by other documents at any
39	   time.  It is inappropriate to use Internet-Drafts as reference
40	   material or to cite them other than as "work in progress."

42	   This Internet-Draft will expire on July 15, 2011.

44	Copyright Notice

46	   Copyright (c) 2011 IETF Trust and the persons identified as the
47	   document authors.  All rights reserved.

49	   This document is subject to BCP 78 and the IETF Trust's Legal
50	   Provisions Relating to IETF Documents
51	   (http://trustee.ietf.org/license-info) in effect on the date of
52	   publication of this document.  Please review these documents
53	   carefully, as they describe your rights and restrictions with respect
54	   to this document.  Code Components extracted from this document must
55	   include Simplified BSD License text as described in Section 4.e of
56	   the Trust Legal Provisions and are provided without warranty as
57	   described in the Simplified BSD License.

59	Table of Contents

61	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
62	     1.1.  MPEG-4 Visual RTP Payload Format . . . . . . . . . . . . .  4
63	     1.2.  MPEG-4 Audio RTP Payload Format  . . . . . . . . . . . . .  5
64	     1.3.  Interoperability with RFC 3016 . . . . . . . . . . . . . .  5
65	   2.  Definitions and Abbreviations  . . . . . . . . . . . . . . . .  6
66	   3.  LATM Restrictions for RTP Packetization of MPEG-4 Audio
67	       Bitstreams . . . . . . . . . . . . . . . . . . . . . . . . . .  7
68	   4.  RTP Packetization of MPEG-4 Visual Bitstreams  . . . . . . . .  7
69	     4.1.  Use of RTP Header Fields for MPEG-4 Visual . . . . . . . .  8
70	     4.2.  Fragmentation of MPEG-4 Visual Bitstream . . . . . . . . .  9
71	     4.3.  Examples of Packetized MPEG-4 Visual Bitstream . . . . . . 11
72	   5.  RTP Packetization of MPEG-4 Audio Bitstreams . . . . . . . . . 14
73	     5.1.  RTP Packet Format  . . . . . . . . . . . . . . . . . . . . 14
74	     5.2.  Use of RTP Header Fields for MPEG-4 Audio  . . . . . . . . 15
75	     5.3.  Fragmentation of MPEG-4 Audio Bitstream  . . . . . . . . . 16
76	   6.  Media Type Registration for MPEG-4 Audio/Visual Streams  . . . 16
77	     6.1.  Media Type Registration for MPEG-4 Visual  . . . . . . . . 16
78	     6.2.  Mapping to SDP for MPEG-4 Visual . . . . . . . . . . . . . 18
79	       6.2.1.  Declarative SDP Usage for MPEG-4 Visual  . . . . . . . 19
80	     6.3.  Media Type Registration for MPEG-4 Audio . . . . . . . . . 19
81	     6.4.  Mapping to SDP for MPEG-4 Audio  . . . . . . . . . . . . . 23
82	       6.4.1.  Declarative SDP Usage for MPEG-4 Audio . . . . . . . . 23
83	         6.4.1.1.  Example: In-band Configuration . . . . . . . . . . 24
84	         6.4.1.2.  Example: 6kb/s CELP  . . . . . . . . . . . . . . . 24
85	         6.4.1.3.  Example: 64 kb/s AAC LC Stereo . . . . . . . . . . 24
86	         6.4.1.4.  Example: Use of the SBR-enabled Parameter  . . . . 25
87	         6.4.1.5.  Example: Hierarchical Signaling of SBR . . . . . . 25
88	         6.4.1.6.  Example: HE AAC v2 Signaling . . . . . . . . . . . 26
89	         6.4.1.7.  Example: Hierarchical Signaling of PS  . . . . . . 26
90	         6.4.1.8.  Example: MPEG Surround . . . . . . . . . . . . . . 26
91	         6.4.1.9.  Example: MPEG Surround with Extended SDP
92	                   Parameters . . . . . . . . . . . . . . . . . . . . 27
93	         6.4.1.10. Example: MPEG Surround with Single Layer
94	                   Configuration  . . . . . . . . . . . . . . . . . . 27
95	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 28
96	   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 28
97	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 28
98	   10. Differences to RFC 3016  . . . . . . . . . . . . . . . . . . . 29
99	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29
100	     11.1. Normative References . . . . . . . . . . . . . . . . . . . 29
101	     11.2. Informative References . . . . . . . . . . . . . . . . . . 30
102	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 31

104	1.  Introduction

106	   The RTP payload formats described in this document specify how MPEG-4
107	   Audio [14496-3] and MPEG-4 Visual streams [14496-2] are to be
108	   fragmented and mapped directly onto RTP packets.

110	   These RTP payload formats enable transport of MPEG-4 Audio/Visual
111	   streams without using the synchronization and stream management
112	   functionality of MPEG-4 Systems [14496-1].  Such RTP payload formats
113	   will be used in systems that have intrinsic stream management
114	   functionality and thus require no such functionality from MPEG-4
115	   Systems.  H.323 terminals are an example of such systems, where
116	   MPEG-4 Audio/Visual streams are not managed by MPEG-4 Systems Object
117	   Descriptors but by H.245.  The streams are directly mapped onto RTP
118	   packets without using the MPEG-4 Systems Sync Layer.  Other examples
119	   are SIP and RTSP where Media Type and SDP are used.  Media Type and
120	   SDP usages of the RTP payload formats described in this document are
121	   defined to directly specify the attribute of Audio/Visual streams
122	   (e.g., media type, packetization format and codec configuration)
123	   without using MPEG-4 Systems.  The obvious benefit is that these
124	   MPEG-4 Audio/Visual RTP payload formats can be handled in an unified
125	   way together with those formats defined for non-MPEG-4 codecs.  The
126	   disadvantage is that interoperability with environments using MPEG-4
127	   Systems may be difficult, hence, other payload formats may be better
128	   suited to those applications.

130	   The semantics of RTP headers in such cases need to be clearly
131	   defined, including the association with MPEG-4 Audio/Visual data
132	   elements.  In addition, it is beneficial to define the fragmentation
133	   rules of RTP packets for MPEG-4 Video streams so as to enhance error
134	   resiliency by utilizing the error resiliency tools provided inside
135	   the MPEG-4 Video stream.

137	1.1.  MPEG-4 Visual RTP Payload Format

139	   MPEG-4 Visual is a visual coding standard with many new features:
140	   high coding efficiency; high error resiliency; multiple, arbitrary
141	   shape object-based coding; etc. [14496-2].  It covers a wide range of
142	   bitrate from scores of Kbps to several Mbps.  It also covers a wide
143	   variety of networks, ranging from those guaranteed to be almost
144	   error-free to mobile networks with high error rates.

146	   With respect to the fragmentation rules for an MPEG-4 Visual
147	   bitstream defined in this document, since MPEG-4 Visual is used for a
148	   wide variety of networks, it is desirable not to apply too much
149	   restriction on fragmentation, and a fragmentation rule such as "a
150	   single video packet shall always be mapped on a single RTP packet"
151	   may be inappropriate.  On the other hand, careless, media unaware
152	   fragmentation may cause degradation in error resiliency and bandwidth
153	   efficiency.  The fragmentation rules described in this document are
154	   flexible but manage to define the minimum rules for preventing
155	   meaningless fragmentation while utilizing the error resiliency
156	   functionalities of MPEG-4 Visual.

158	   The fragmentation rule "Different VOPs SHOULD be fragmented into
159	   different RTP packets" is made so that the RTP timestamp uniquely
160	   indicates the VOP time framing.  On the other hand, MPEG-4 video may
161	   generate VOPs of very small size, in cases with an empty VOP
162	   (vop_coded=0) containing only VOP header or an arbitrary shaped VOP
163	   with a small number of coding blocks.  To reduce the overhead for
164	   such cases, the fragmentation rule permits concatenating multiple
165	   VOPs in an RTP packet.  (See fragmentation rule (4) in Section 4.2
166	   and marker bit and timestamp in Section 4.1.)

168	   While the additional media specific RTP header defined for such video
169	   coding tools as H.261 or MPEG-1/2 is effective in helping to recover
170	   picture headers corrupted by packet losses, MPEG-4 Visual has already
171	   error resiliency functionalities for recovering corrupt headers, and
172	   these can be used on RTP/IP networks as well as on other networks
173	   (H.223/mobile, MPEG-2/TS, etc.).  Therefore, no extra RTP header
174	   fields are defined in this MPEG-4 Visual RTP payload format.

176	1.2.  MPEG-4 Audio RTP Payload Format

178	   MPEG-4 Audio is an audio standard that integrates many different
179	   types of audio coding tools.  Low-overhead MPEG-4 Audio Transport
180	   Multiplex (LATM) manages the sequences of audio data with relatively
181	   small overhead.  In audio-only applications, then, it is desirable
182	   for LATM-based MPEG-4 Audio bitstreams to be directly mapped onto RTP
183	   packets without using MPEG-4 Systems.

185	   For MPEG-4 Audio coding tools, as is true for other audio coders, if
186	   the payload is a single audio frame, packet loss will not impair the
187	   decodability of adjacent packets.  Therefore, the additional media
188	   specific header for recovering errors will not be required for MPEG-4
189	   Audio.  Existing RTP protection mechanisms, such as Generic Forward
190	   Error Correction [RFC5109] and Redundant Audio Data [RFC2198], MAY be
191	   applied to improve error resiliency.

193	1.3.  Interoperability with RFC 3016

195	   Although strictly speaking systems that support MPEG-4 Audio as
196	   specified in [RFC3016] will be incompatible with systems supporting
197	   this document, existing systems already comply with the specification
198	   in 3GPP PSS service [3GPP] and therefore no incompatibility issues
199	   are foreseen.

201	2.  Definitions and Abbreviations

203	   This document makes use of terms, specified in [14496-2], [14496-3],
204	   and [23003-1].  In addition, the following terms are used in this
205	   document and have specific meaning within the context of this
206	   document.

208	   Core codec sampling rate:

210	      Audio codec sampling rate.  When SBR (Spectral Band Replication)
211	      is used, typically the double value of this will be regarded as
212	      the definitive sampling rate (i.e., the decoder's output sampling
213	      rate)

215	      Note: The exception is downsampled SBR mode in which the SBR
216	      sampling rate equals the core codec sampling rate.

218	   Core codec channel configuration:

220	      Audio codec channel configuration.  When PS (Parametric Stereo) is
221	      used, the core codec channel configuration indicates one channel
222	      (i.e., mono) whereas the definitive channel configuration is two
223	      channels (i.e. stereo).  When MPEG Surround is used, the
224	      definitive channel configuration depends on the output of the MPEG
225	      Surround decoder.

227	   SBR sampling rate:

229	      When SBR is used, typically the sampling rate is the double value
230	      of the core codec sampling rate, with the exception of downsampled
231	      SBR mode, where the SBR sampling rate and core codec sampling rate
232	      are identical.

234	   Abbreviations:

236	      AAC: Advanced Audio Coding

238	      ASC: AudioSpecificConfig

240	      HE AAC: High Efficiency AAC

242	      LATM: Low-overhead MPEG-4 Audio Transport Multiplex

244	      PS: Parametric Stereo

246	      SBR: Spectral Band Replication
247	      VOP: Video Object Plane

249	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
250	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
251	   document are to be interpreted as described in [RFC2119].

253	3.  LATM Restrictions for RTP Packetization of MPEG-4 Audio Bitstreams

255	   While LATM has several multiplexing features as follows;

257	   o  Carrying configuration information with audio data,

259	   o  Concatenation of multiple audio frames in one audio stream,

261	   o  Multiplexing multiple objects (programs),

263	   o  Multiplexing scalable layers,

265	   in RTP transmission there is no need for the last two features.
266	   Therefore, these two features MUST NOT be used in applications based
267	   on RTP packetization specified by this document.  Since LATM has been
268	   developed for only natural audio coding tools, i.e., not for
269	   synthesis tools, it seems difficult to transmit Structured Audio (SA)
270	   data and Text to Speech Interface (TTSI) data by LATM.  Therefore, SA
271	   data and TTSI data MUST NOT be transported by the RTP packetization
272	   in this document.

274	   For transmission of scalable streams, audio data of each layer SHOULD
275	   be packetized onto different RTP streams allowing for the different
276	   layers to be treated differently at the IP level, for example via
277	   some means of differentiated service.  On the other hand, all
278	   configuration data of the scalable streams are contained in one LATM
279	   configuration data "StreamMuxConfig" and every scalable layer shares
280	   the StreamMuxConfig.  The mapping between each layer and its
281	   configuration data is achieved by LATM header information attached to
282	   the audio data.  In order to indicate the dependency information of
283	   the scalable streams, the signaling mechanism as specified in
284	   [RFC5583] SHOULD be used (see Section 5.2).

286	4.  RTP Packetization of MPEG-4 Visual Bitstreams

288	   This section specifies RTP packetization rules for MPEG-4 Visual
289	   content.  An MPEG-4 Visual bitstream is mapped directly onto RTP
290	   packets without the addition of extra header fields or any removal of
291	   Visual syntax elements.  The Combined Configuration/Elementary stream
292	   mode MUST be used so that configuration information will be carried
293	   to the same RTP port as the elementary stream. (see 6.2.1 "Start
294	   codes" of [14496-2]) The configuration information MAY additionally
295	   be specified by some out-of-band means.  If needed by systems using
296	   Media Type parameters and SDP parameters, "e.g., SIP and RTSP", the
297	   optional parameter "config" MUST be used to specify the configuration
298	   information (see Section 6.1 and Section 6.2).

300	   When the short video header mode is used, the RTP payload format for
301	   H.263 SHOULD be used (the format defined in [RFC4629] is RECOMMENDED,
302	   but the [RFC4628] format MAY be used for compatibility with older
303	   implementations).

305	0                   1                   2                   3
306	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
307	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
308	|V=2|P|X|  CC   |M|     PT      |       sequence number         | RTP
309	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
310	|                           timestamp                           | Header
311	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
312	|           synchronization source (SSRC) identifier            |
313	+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
314	|            contributing source (CSRC) identifiers             |
315	|                             ....                              |
316	+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
317	|                                                               | RTP
318	|       MPEG-4 Visual stream (byte aligned)                     | Pay-
319	|                                                               | load
320	|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
321	|                               :...OPTIONAL RTP padding        |
322	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

324	     Figure 1 - An RTP packet for MPEG-4 Visual stream

326	4.1.  Use of RTP Header Fields for MPEG-4 Visual

328	   Payload Type (PT): The assignment of an RTP payload type for this
329	   packet format is outside the scope of this document, and will not be
330	   specified here.  It is expected that the RTP profile for a particular
331	   class of applications will assign a payload type for this encoding,
332	   or if that is not done then a payload type in the dynamic range SHALL
333	   be chosen by means of an out-of-band signaling protocol (e.g., H.245,
334	   SIP, etc).

336	   Extension (X) bit: Defined by the RTP profile used.

338	   Sequence Number: Incremented by one for each RTP data packet sent,
339	   starting, for security reasons, with a random initial value.

341	   Marker (M) bit: The marker bit is set to one to indicate the last RTP
342	   packet (or only RTP packet) of a VOP.  When multiple VOPs are carried
343	   in the same RTP packet, the marker bit is set to one.

345	   Timestamp: The timestamp indicates the sampling instance of the VOP
346	   contained in the RTP packet.  A constant offset, which is random, is
347	   added for security reasons.

349	   o  When multiple VOPs are carried in the same RTP packet, the
350	      timestamp indicates the earliest of the VOP times within the VOPs
351	      carried in the RTP packet.  Timestamp information of the rest of
352	      the VOPs are derived from the timestamp fields in the VOP header
353	      (modulo_time_base and vop_time_increment).

355	   o  If the RTP packet contains only configuration information and/or
356	      Group_of_VideoObjectPlane() fields, the timestamp of the next VOP
357	      in the coding order is used.

359	   o  If the RTP packet contains only visual_object_sequence_end_code
360	      information, the timestamp of the immediately preceding VOP in the
361	      coding order is used.

363	   The resolution of the timestamp is set to its default value of 90kHz,
364	   unless specified by an out-of-band means (e.g., SDP parameter or
365	   Media Type parameter as defined in Section 6).

367	   Other header fields are used as described in [RFC3550].

369	4.2.  Fragmentation of MPEG-4 Visual Bitstream

371	   A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP
372	   payload without any addition of extra header fields or any removal of
373	   Visual syntax elements.  The Combined Configuration/Elementary
374	   streams mode is used.  The following rules apply for the
375	   fragmentation.

377	   In the following, header means one of the following:

379	   o  Configuration information (Visual Object Sequence Header, Visual
380	      Object Header and Video Object Layer Header)

382	   o  visual_object_sequence_end_code

384	   o  The header of the entry point function for an elementary stream
385	      (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(),
386	      video_plane_with_short_header(), MeshObject() or FaceObject())

388	   o  The video packet header (video_packet_header() excluding
389	      next_resync_marker())

391	   o  The header of gob_layer()

393	   o  See 6.2.1 "Start codes" of [14496-2] for the definition of the
394	      configuration information and the entry point functions.

396	   (1) Configuration information and Group_of_VideoObjectPlane() fields
397	   SHALL be placed at the beginning of the RTP payload (just after the
398	   RTP header) or just after the header of the syntactically upper layer
399	   function.

401	   (2) If one or more headers exist in the RTP payload, the RTP payload
402	   SHALL begin with the header of the syntactically highest function.
403	   Note: The visual_object_sequence_end_code is regarded as the lowest
404	   function.

406	   (3) A header SHALL NOT be split into a plurality of RTP packets.

408	   (4) Different VOPs SHOULD be fragmented into different RTP packets so
409	   that one RTP packet consists of the data bytes associated with a
410	   unique VOP time instance (that is indicated in the timestamp field in
411	   the RTP packet header), with the exception that multiple consecutive
412	   VOPs MAY be carried within one RTP packet in the decoding order if
413	   the size of the VOPs is small.

415	   Note: When multiple VOPs are carried in one RTP payload, the
416	   timestamp of the VOPs after the first one may be calculated by the
417	   decoder.  This operation is necessary only for RTP packets in which
418	   the marker bit equals to one and the beginning of RTP payload
419	   corresponds to a start code.  (See timestamp and marker bit in
420	   Section 4.1.)

422	   (5) It is RECOMMENDED that a single video packet is sent as a single
423	   RTP packet.  The size of a video packet SHOULD be adjusted in such a
424	   way that the resulting RTP packet is not larger than the path-MTU.
425	   If the video packet is disabled by the coder configuration (by
426	   setting resync_marker_disable in the VOL header to 1), or in coding
427	   tools where the video packet is not supported, a VOP MAY be split at
428	   arbitrary byte-positions.

430	   The video packet starts with the VOP header or the video packet
431	   header, followed by motion_shape_texture(), and ends with
432	   next_resync_marker() or next_start_code().

434	4.3.  Examples of Packetized MPEG-4 Visual Bitstream

436	   Figure 2 shows examples of RTP packets generated based on the
437	   criteria described in Section 4.2

439	   (a) is an example of the first RTP packet or the random access point
440	   of an MPEG-4 Visual bitstream containing the configuration
441	   information.  According to criterion (1), the Visual Object Sequence
442	   Header(VS header) is placed at the beginning of the RTP payload,
443	   preceding the Visual Object Header and the Video Object Layer
444	   Header(VO header, VOL header).  Since the fragmentation rule defined
445	   in Section 4.2 guarantees that the configuration information,
446	   starting with visual_object_sequence_start_code, is always placed at
447	   the beginning of the RTP payload, RTP receivers can detect the random
448	   access point by checking if the first 32-bit field of the RTP payload
449	   is visual_object_sequence_start_code.

451	   (b) is another example of the RTP packet containing the configuration
452	   information.  It differs from example (a) in that the RTP packet also
453	   contains a VOP header and a Video Packet in the VOP following the
454	   configuration information.  Since the length of the configuration
455	   information is relatively short (typically scores of bytes) and an
456	   RTP packet containing only the configuration information may thus
457	   increase the overhead, the configuration information and the
458	   immediately following VOP can be packetized into a single RTP packet.

460	   (c) is an example of an RTP packet that contains
461	   Group_of_VideoObjectPlane(GOV).  Following criterion (1), the GOV is
462	   placed at the beginning of the RTP payload.  It would be a waste of
463	   RTP/IP header overhead to generate an RTP packet containing only a
464	   GOV whose length is 7 bytes.  Therefore, (a part of) the following
465	   VOP can be placed in the same RTP packet as shown in (c).

467	   (d) is an example of the case where one video packet is packetized
468	   into one RTP packet.  When the packet-loss rate of the underlying
469	   network is high, this kind of packetization is recommended.  Even
470	   when the RTP packet containing the VOP header is discarded by a
471	   packet loss, the other RTP packets can be decoded by using the
472	   HEC(Header Extension Code) information in the video packet header.
473	   No extra RTP header field is necessary.

475	   (e) is an example of the case where more than one video packet is
476	   packetized into one RTP packet.  This kind of packetization is
477	   effective to save the overhead of RTP/IP headers when the bit-rate of
478	   the underlying network is low.  However, it will decrease the packet-
479	   loss resiliency because multiple video packets are discarded by a
480	   single RTP packet loss.  The optimal number of video packets in an
481	   RTP packet and the length of the RTP packet can be determined
482	   considering the packet-loss rate and the bit-rate of the underlying
483	   network.

485	   (f) is an example of the case when the video packet is disabled by
486	   setting resync_marker_disable in the VOL header to 1.  In this case,
487	   a VOP may be split into a plurality of RTP packets at arbitrary byte-
488	   positions.  For example, it is possible to split a VOP into fixed-
489	   length packets.  This kind of coder configuration and RTP packet
490	   fragmentation may be used when the underlying network is guaranteed
491	   to be error-free.

493	   Figure 3 shows examples of RTP packets prohibited by the criteria of
494	   Section 4.2.

496	   Fragmentation of a header into multiple RTP packets, as in (a), will
497	   not only increase the overhead of RTP/IP headers but also decrease
498	   the error resiliency.  Therefore, it is prohibited by the criterion
499	   (3).

501	   When concatenating more than one video packets into an RTP packet,
502	   VOP header or video_packet_header() are not allowed to be placed in
503	   the middle of the RTP payload.  The packetization as in (b) is not
504	   allowed by criterion (2) due to the aspect of the error resiliency.
505	   Comparing this example with Figure 2(d), although two video packets
506	   are mapped onto two RTP packets in both cases, the packet-loss
507	   resiliency is not identical.  Namely, if the second RTP packet is
508	   lost, both video packets 1 and 2 are lost in the case of Figure 3(b)
509	   whereas only video packet 2 is lost in the case of Figure 2(d).

511	    +------+------+------+------+
512	(a) | RTP  |  VS  |  VO  | VOL  |
513	    |header|header|header|header|
514	    +------+------+------+------+

516	    +------+------+------+------+------+------------+
517	(b) | RTP  |  VS  |  VO  | VOL  | VOP  |Video Packet|
518	    |header|header|header|header|header|            |
519	    +------+------+------+------+------+------------+

521	    +------+-----+------------------+
522	(c) | RTP  | GOV |Video Object Plane|
523	    |header|     |                  |
524	    +------+-----+------------------+

526	    +------+------+------------+  +------+------+------------+
527	(d) | RTP  | VOP  |Video Packet|  | RTP  |  VP  |Video Packet|
528	    |header|header|    (1)     |  |header|header|    (2)     |
529	    +------+------+------------+  +------+------+------------+

531	    +------+------+------------+------+------------+------+------------+
532	(e) | RTP  |  VP  |Video Packet|  VP  |Video Packet|  VP  |Video Packet|
533	    |header|header|     (1)    |header|    (2)     |header|    (3)     |
534	    +------+------+------------+------+------------+------+------------+

536	    +------+------+------------+  +------+------------+
537	(f) | RTP  | VOP  |VOP fragment|  | RTP  |VOP fragment|
538	    |header|header|    (1)     |  |header|    (2)     | ___
539	    +------+------+------------+  +------+------------+

541	     Figure 2 - Examples of RTP packetized MPEG-4 Visual bitstream

543	    +------+-------------+  +------+------------+------------+
544	(a) | RTP  |First half of|  | RTP  |Last half of|Video Packet|
545	    |header|  VP header  |  |header|  VP header |            |
546	    +------+-------------+  +------+------------+------------+

548	    +------+------+----------+  +------+---------+------+------------+
549	(b) | RTP  | VOP  |First half|  | RTP  |Last half|  VP  |Video Packet|
550	    |header|header| of VP(1) |  |header| of VP(1)|header|    (2)     |
551	    +------+------+----------+  +------+---------+------+------------+

553	   Figure 3 - Examples of prohibited RTP packetization for MPEG-4 Visual
554	   bitstream

556	5.  RTP Packetization of MPEG-4 Audio Bitstreams

558	   This section specifies RTP packetization rules for MPEG-4 Audio
559	   bitstreams.  MPEG-4 Audio streams MUST be formatted LATM (Low-
560	   overhead MPEG-4 Audio Transport Multiplex) [14496-3] streams, and the
561	   LATM-based streams are then mapped onto RTP packets as described in
562	   the sections below.

564	5.1.  RTP Packet Format

566	   LATM-based streams consist of a sequence of audioMuxElements that
567	   include one or more PayloadMux elements which carry the audio frames.
568	   A complete audioMuxElement or a part of one SHALL be mapped directly
569	   onto an RTP payload without any removal of audioMuxElement syntax
570	   elements (see Figure 4).  The first byte of each audioMuxElement
571	   SHALL be located at the first payload location in an RTP packet.

573	0                   1                   2                   3
574	0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
575	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
576	|V=2|P|X|  CC   |M|     PT      |       sequence number         |RTP
577	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
578	|                           timestamp                           |Header
579	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
580	|           synchronization source (SSRC) identifier            |
581	+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
582	|            contributing source (CSRC) identifiers             |
583	|                             ....                              |
584	+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
585	|                                                               |RTP
586	:                 audioMuxElement (byte aligned)                :Payload
587	|                                                               |
588	|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
589	|                               :...OPTIONAL RTP padding        |
590	+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

592	             Figure 4 - An RTP packet for MPEG-4 Audio

594	   In order to decode the audioMuxElement, the following
595	   muxConfigPresent information is required to be indicated by out-of-
596	   band means.  When SDP is utilized for this indication, the Media Type
597	   parameter "cpresent" corresponds to the muxConfigPresent information
598	   (see Section 6.3).  The following restrictions apply:

600	   o  In the out-of-band configuration case the number of PayloadMux
601	      elements contained in each audioMuxElement can only be set once.
602	      If more than one PayloadMux elements are contained in each
603	      AudioMuxElement, special care is required to ensure that the last
604	      RTP packet remains decodable.

606	   o  To construct the audioMuxElement in the in-band configuration
607	      case, non octet aligned configuration data is preceding the one or
608	      more PayloadMux elements.  Since the generation of RTP payloads
609	      with non octet aligned data is not possible with RTP hint tracks,
610	      as defined by the MP4 file format [14496-12] [14496-14], this
611	      document does not support RTP hint tracks for the in-band
612	      configuration case.

614	   muxConfigPresent: If this value is set to 1 (in-band mode), the
615	   audioMuxElement SHALL include an indication bit "useSameStreamMux"
616	   and MAY include the configuration information for audio compression
617	   "StreamMuxConfig".  The useSameStreamMux bit indicates whether the
618	   StreamMuxConfig element in the previous frame is applied in the
619	   current frame.  If the useSameStreamMux bit indicates to use the
620	   StreamMuxConfig from the previous frame, but if the previous frame
621	   has been lost, the current frame may not be decodable.  Therefore, in
622	   case of in-band mode, the StreamMuxConfig element SHOULD be
623	   transmitted repeatedly depending on the network condition.  On the
624	   other hand, if muxConfigPresent is set to 0 (out-band mode), the
625	   StreamMuxConfig element is required to be transmitted by an out-of-
626	   band means.  In case of SDP, Media Type parameter "config" is
627	   utilized (see Section 6.3).

629	5.2.  Use of RTP Header Fields for MPEG-4 Audio

631	   Payload Type (PT): The assignment of an RTP payload type for this new
632	   packet format is outside the scope of this document, and will only be
633	   restricted here.  It is expected that the RTP profile for a
634	   particular class of applications will assign a payload type for this
635	   encoding, or if that is not done then a payload type in the dynamic
636	   range shall be chosen by means of an out-of-band signaling protocol
637	   (e.g., H.245, SIP, etc).  In the dynamic assignment of RTP payload
638	   types for scalable streams, the server SHALL assign a different value
639	   to each layer.  The dependency relationships between the enhance
640	   layer and the base layer MUST be signaled as specified in [RFC5583].
641	   An example of the use of such signaling for scalable audio streams
642	   can be found in [RFC5691].

644	   Marker (M) bit: The marker bit indicates audioMuxElement boundaries.
645	   It is set to one to indicate that the RTP packet contains a complete
646	   audioMuxElement or the last fragment of an audioMuxElement.

648	   Timestamp: The timestamp indicates the sampling instance of the first
649	   audio frame contained in the RTP packet.  Timestamps are RECOMMENDED
650	   to start at a random value for security reasons.

652	   Unless specified by an out-of-band means, the resolution of the
653	   timestamp is set to its default value of 90 kHz.

655	   Sequence Number: Incremented by one for each RTP packet sent,
656	   starting, for security reasons, with a random value.

658	   Other header fields are used as described in [RFC3550].

660	5.3.  Fragmentation of MPEG-4 Audio Bitstream

662	   It is RECOMMENDED to put one audioMuxElement in each RTP packet.  If
663	   the size of an audioMuxElement can be kept small enough that the size
664	   of the RTP packet containing it does not exceed the size of the path-
665	   MTU, this will be no problem.  If it cannot, the audioMuxElement
666	   SHALL be fragmented and spread across multiple packets.

668	6.  Media Type Registration for MPEG-4 Audio/Visual Streams

670	   The following sections describe the Media Type registrations for
671	   MPEG-4 Audio/Visual streams, which are registered in accordance with
672	   [RFC4855] and uses the template of [RFC4288].  Media Type
673	   registration and SDP usage for the MPEG-4 Visual stream are described
674	   in Section 6.1 and Section 6.2, respectively, while Media Type
675	   registration and SDP usage for MPEG-4 Audio stream are described in
676	   Section 6.3 and Section 6.4, respectively.

678	6.1.  Media Type Registration for MPEG-4 Visual

680	   The receiver MUST ignore any unspecified parameter, to ensure that
681	   additional parameters can be added in any future revision of this
682	   specification.

684	   Type name: video

686	   Subtype name: MP4V-ES

688	   Required parameters: none

690	   Optional parameters:

692	      rate: This parameter is used only for RTP transport.  It indicates
693	      the resolution of the timestamp field in the RTP header.  If this
694	      parameter is not specified, its default value of 90000 (90kHz) is
695	      used.

697	      profile-level-id: A decimal representation of MPEG-4 Visual
698	      Profile and Level indication value (profile_and_level_indication)
699	      defined in Table G-1 of [14496-2].  This parameter MAY be used in
700	      the capability exchange or session setup procedure to indicate
701	      MPEG-4 Visual Profile and Level combination of which the MPEG-4
702	      Visual codec is capable.  If this parameter is not specified by
703	      the procedure, its default value of 1 (Simple Profile/Level 1) is
704	      used.

706	      config: This parameter SHALL be used to indicate the configuration
707	      of the corresponding MPEG-4 Visual bitstream.  It SHALL NOT be
708	      used to indicate the codec capability in the capability exchange
709	      procedure.  It is a hexadecimal representation of an octet string
710	      that expresses the MPEG-4 Visual configuration information, as
711	      defined in subclause 6.2.1 Start codes of [14496-2].  The
712	      configuration information is mapped onto the octet string in an
713	      MSB-first basis.  The first bit of the configuration information
714	      SHALL be located at the MSB of the first octet.  The configuration
715	      information indicated by this parameter SHALL be the same as the
716	      configuration information in the corresponding MPEG-4 Visual
717	      stream, except for first_half_vbv_occupancy and
718	      latter_half_vbv_occupancy, if exist, which may vary in the
719	      repeated configuration information inside an MPEG-4 Visual stream
720	      (See 6.2.1 Start codes of [14496-2]).

722	   Published specification:

724	      The specifications for MPEG-4 Visual streams are presented in
725	      [14496-2].  The RTP payload format is described in this document.

727	   Encoding considerations:

729	      Video bitstreams MUST be generated according to MPEG-4 Visual
730	      specifications [14496-2].  A video bitstream is binary data and
731	      MUST be encoded for non-binary transport (for Email, the Base64
732	      encoding is sufficient).  This type is also defined for transfer
733	      via RTP.  The RTP packets MUST be packetized according to the
734	      MPEG-4 Visual RTP payload format defined in this document.

736	   Security considerations:

738	      See Section 9 of this document.

740	   Interoperability considerations:

742	      MPEG-4 Visual provides a large and rich set of tools for the
743	      coding of visual objects.  For effective implementation of the
744	      standard, subsets of the MPEG-4 Visual tool sets have been
745	      provided for use in specific applications.  These subsets, called
746	      'Profiles', limit the size of the tool set a decoder is required
747	      to implement.  In order to restrict computational complexity, one
748	      or more Levels are set for each Profile.  A Profile@Level
749	      combination allows:

751	      *  a codec builder to implement only the subset of the standard he
752	         needs, while maintaining interworking with other MPEG-4 devices
753	         included in the same combination, and

755	      *  checking whether MPEG-4 devices comply with the standard
756	         ('conformance testing').

758	      The visual stream SHALL be compliant with the MPEG-4 Visual
759	      Profile@Level specified by the parameter "profile-level-id".
760	      Interoperability between a sender and a receiver may be achieved
761	      by specifying the parameter "profile-level-id", or by arranging a
762	      capability exchange/announcement procedure for this parameter.

764	   Applications which use this Media Type:

766	      Audio and visual streaming and conferencing tools

768	   Additional information: none

770	   Person and email address to contact for further information:

772	      See Authors' Address section at the end of this document.

774	   Intended usage: COMMON

776	   Author:

778	      See Authors' Address section at the end of this document.

780	   Change controller:

782	      IETF Audio/Video Transport working group delegated from the IESG.

784	6.2.  Mapping to SDP for MPEG-4 Visual

786	   The Media Type video/MP4V-ES string is mapped to fields in the
787	   Session Description Protocol (SDP) [RFC4566], as follows:

789	   o  The Media Type (video) goes in SDP "m=" as the media name.

791	   o  The Media subtype (MP4V-ES) goes in SDP "a=rtpmap" as the encoding
792	      name.

794	   o  The optional parameter "rate" goes in "a=rtpmap" as the clock
795	      rate.

797	   o  The optional parameter "profile-level-id" and "config" go in the
798	      "a=fmtp" line to indicate the coder capability and configuration,
799	      respectively.  These parameters are expressed as a string, in the
800	      form of as a semicolon separated list of parameter=value pairs.

802	      Example usages for the profile-level-id parameter are:
803	      1  : MPEG-4 Visual Simple Profile/Level 1
804	      34 : MPEG-4 Visual Core Profile/Level 2
805	      145: MPEG-4 Visual Advanced Real Time Simple Profile/Level 1

807	6.2.1.  Declarative SDP Usage for MPEG-4 Visual

809	   The following are some examples of media representation in SDP:

811	Simple Profile/Level 1, rate=90000(90kHz), "profile-level-id" and
812	"config" are present in "a=fmtp" line:
813	  m=video 49170/2 RTP/AVP 98
814	  a=rtpmap:98 MP4V-ES/90000
815	  a=fmtp:98 profile-level-id=1;config=000001B001000001B50900000100000001
816	     20008440FA282C2090A21F

818	Core Profile/Level 2, rate=90000(90kHz), "profile-level-id" is present
819	in "a=fmtp" line:
820	  m=video 49170/2 RTP/AVP 98
821	  a=rtpmap:98 MP4V-ES/90000
822	  a=fmtp:98 profile-level-id=34

824	Advance Real Time Simple Profile/Level 1, rate=90000(90kHz),
825	"profile-level-id" is present in "a=fmtp" line:
826	  m=video 49170/2 RTP/AVP 98
827	  a=rtpmap:98 MP4V-ES/90000
828	  a=fmtp:98 profile-level-id=145

830	6.3.  Media Type Registration for MPEG-4 Audio

832	   The receiver MUST ignore any unspecified parameter, to ensure that
833	   additional parameters can be added in any future revision of this
834	   specification.

836	   Type name: audio

838	   Subtype name: MP4A-LATM

840	   Required parameters:

842	      rate: the rate parameter indicates the RTP time stamp clock rate.
843	      The default value is 90000.  Other rates MAY be indicated only if
844	      they are set to the same value as the audio sampling rate (number
845	      of samples per second).

847	      In the presence of SBR, the sampling rates for the core en-/
848	      decoder and the SBR tool are different in most cases.  This
849	      parameter SHALL therefore NOT be considered as the definitive
850	      sampling rate.  If this parameter is used, the server must
851	      following the rules below:

853	      *  When the presence of SBR is not explicitly signaled by the
854	         optional SDP parameters such as object parameter, profile-
855	         level-id or config string, this parameter SHALL be set to the
856	         core codec sampling rate.

858	      *  When the presence of SBR is explicitly signaled by the optional
859	         SDP parameters such as object parameter, profile-level-id or
860	         config string this parameter SHALL be set to the SBR sampling
861	         rate.

863	      NOTE: The optional parameter SBR-enabled in SDP a=fmtp is useful
864	      for implicit HE AAC / HE AAC v2 signaling.  But the SBR-enabled
865	      parameter can also be used in the case of explicit HE AAC / HE AAC
866	      v2 signaling.  Therefore, its existence itself is not the criteria
867	      to determine whether HE AAC / HE AAC v2 signaling is explicit or
868	      not.

870	   Optional parameters:

872	      profile-level-id: a decimal representation of MPEG-4 Audio Profile
873	      Level indication value defined in [14496-3].  This parameter
874	      indicates which MPEG-4 Audio tool subsets the decoder is capable
875	      of using.  If this parameter is not specified in the capability
876	      exchange or session setup procedure, its default value of 30
877	      (Natural Audio Profile/Level 1) is used.

879	      MPS-profile-level-id: a decimal representation of the MPEG
880	      Surround Profile Level indication as defined in [14496-3].  This
881	      parameter indicates the support of the MPEG Surround profile and
882	      level by the decoder to be capable to decode the stream.

884	      object: a decimal representation of the MPEG-4 Audio Object Type
885	      value defined in [14496-3].  This parameter specifies the tool to
886	      be used by the decoder.  It CAN be used to limit the capability
887	      within the specified "profile-level-id".

889	      bitrate: the data rate for the audio bit stream.

891	      cpresent: a boolean parameter indicates whether audio payload
892	      configuration data has been multiplexed into an RTP payload (see
893	      Section 5.1).  A 0 indicates the configuration data has not been
894	      multiplexed into an RTP payload and in this case the "config"
895	      parameter MUST be present, a 1 indicates that it has.  The default
896	      if the parameter is omitted is 1.  If this parameter is set to 1
897	      and the "config" parameter is present, the multiplexed
898	      configuration data and the value of the "config" parameter SHALL
899	      be consistent.

901	      config: a hexadecimal representation of an octet string that
902	      expresses the audio payload configuration data "StreamMuxConfig",
903	      as defined in [14496-3].  Configuration data is mapped onto the
904	      octet string in an MSB-first basis.  The first bit of the
905	      configuration data SHALL be located at the MSB of the first octet.
906	      In the last octet, zero-padding bits, if necessary, SHALL follow
907	      the configuration data.  Senders MUST set the StreamMuxConfig
908	      elements taraBufferFullness and latmBufferFullness to their
909	      largest respective value, indicating that buffer fullness measures
910	      are not used in SDP.  Receivers MUST ignore the value of these two
911	      elements contained in the config parameter.

913	      MPS-asc: a hexadecimal representation of an octet string that
914	      expresses audio payload configuration data "AudioSpecificConfig",
915	      as defined in [14496-3].  If this parameter is not present the
916	      relevant signaling is performed by other means (e.g. in-band or
917	      contained in the config string).

919	      The same mapping rules as for the config parameter apply.

921	      ptime: duration of each packet in milliseconds.

923	      SBR-enabled: a boolean parameter which indicates whether SBR-data
924	      can be expected in the RTP-payload of a stream.  This parameter is
925	      relevant for an SBR-capable decoder if the presence of SBR can not
926	      be detected from an out-of-band decoder configuration (e.g.
927	      contained in the config string).

929	      If this parameter is set to 0, a decoder MAY expect that SBR is
930	      not used.  If this parameter is set to 1, a decoder CAN upsample
931	      the audio data with the SBR tool, regardless whether SBR data is
932	      present in the stream or not.

934	      If the presence of SBR can not be detected from out-of-band
935	      configuration and the SBR-enabled parameter is not present, the
936	      parameter defaults to 1 for an SBR-capable decoder.  If the
937	      resulting output sampling rate or the computational complexity is
938	      not supported, the SBR tool can be disabled or run in downsampled
939	      mode.

941	      The timestamp resolution at RTP layer is determined by the rate
942	      parameter.

944	   Published specification:

946	      Encoding specifications are provided in [14496-3].  The RTP
947	      payload format specification is described in this document.

949	   Encoding considerations:

951	      This type is only defined for transfer via RTP.

953	   Security considerations:

955	      See Section 9 of this document.

957	   Interoperability considerations:

959	      MPEG-4 Audio provides a large and rich set of tools for the coding
960	      of audio objects.  For effective implementation of the standard,
961	      subsets of the MPEG-4 Audio tool sets similar to those used in
962	      MPEG-4 Visual have been provided (see Section 6.1).

964	      The audio stream SHALL be compliant with the MPEG-4 Audio Profile@
965	      Level specified by the parameters "profile-level-id" and "MPS-
966	      profile-level-id".  Interoperability between a sender and a
967	      receiver may be achieved by specifying the parameters "profile-
968	      level-id" and "MPS-profile-level-id", or by arranging in the
969	      capability exchange procedure to set this parameter mutually to
970	      the same value.  Furthermore, the "object" parameter can be used
971	      to limit the capability within the specified Profile@Level in
972	      capability exchange.

974	   Applications which use this media type:

976	      Audio and video streaming and conferencing tools.

978	   Additional information: none

980	   Personal and email address to contact for further information:

982	      See Authors' Address section at the end of this document.

984	   Intended usage: COMMON
985	   Author:

987	      See Authors' Address section at the end of this document.

989	   Change controller:

991	      IETF Audio/Video Transport working group delegated from the IESG.

993	6.4.  Mapping to SDP for MPEG-4 Audio

995	   The Media Type audio/MP4A-LATM string is mapped to fields in the
996	   Session Description Protocol (SDP) [RFC4566], as follows:

998	   o  The Media Type (audio) goes in SDP "m=" as the media name.

1000	   o  The Media subtype (MP4A-LATM) goes in SDP "a=rtpmap" as the
1001	      encoding name.

1003	   o  The required parameter "rate" goes in "a=rtpmap" as the clock
1004	      rate.

1006	   o  The optional parameter "ptime" goes in SDP "a=ptime" attribute.

1008	   o  The optional parameters "profile-level-id", "MPS-profile-level-id"
1009	      and "object" goes in the "a=fmtp" line to indicate the coder
1010	      capability.

1012	      Followings are some examples of the profile-level-id value:
1013	      1 : Main Audio Profile Level 1
1014	      9 : Speech Audio Profile Level 1
1015	      15: High Quality Audio Profile Level 2
1016	      30: Natural Audio Profile Level 1
1017	      44: High Efficiency AAC Profile Level 2
1018	      48: High Efficiency AAC v2 Profile Level 2
1019	      55: Baseline MPEG Surround Profile (see ISO/IEC 23003-1) Level 3

1021	      The optional payload-format-specific parameters "bitrate",
1022	      "cpresent", "config", "MPS-asc" and "SBR-enabled" go also in the
1023	      "a=fmtp" line.  These parameters are expressed as a string, in the
1024	      form of as a semicolon separated list of parameter=value pairs.

1026	6.4.1.  Declarative SDP Usage for MPEG-4 Audio

1028	   The following sections contain some examples of the media
1029	   representation in SDP.

1031	   Note that the a=fmtp line in some of the examples has been wrapped to
1032	   fit the page; they would comprise a single line in the SDP file.

1034	6.4.1.1.  Example: In-band Configuration

1036	   In this example the audio configuration data appears in the RTP
1037	   payload exclusively (i.e., the MPEG-4 audio configuration is known
1038	   when a StreamMuxConfig element appears within the RTP payload).

1040	      m=audio 49230 RTP/AVP 96
1041	      a=rtpmap:96 MP4A-LATM/90000
1042	      a=fmtp:96 object=2; cpresent=1

1044	   The "clock rate" is set to 90kHz.  This is the default value and the
1045	   real audio sampling rate is known when the audio configuration data
1046	   is received.

1048	6.4.1.2.  Example: 6kb/s CELP

1050	   6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz)

1052	     m=audio 49230 RTP/AVP 96
1053	     a=rtpmap:96 MP4A-LATM/8000
1054	     a=fmtp:96 profile-level-id=9; object=8; cpresent=0;
1055	       config=40008B18388380
1056	     a=ptime:20

1058	   In this example audio configuration data is not multiplexed into the
1059	   RTP payload and is described only in SDP.  Furthermore, the "clock
1060	   rate" is set to the audio sampling rate.

1062	6.4.1.3.  Example: 64 kb/s AAC LC Stereo

1064	   64 kb/s AAC LC stereo bitstream (with an audio sampling rate of 24
1065	   kHz)

1067	     m=audio 49230 RTP/AVP 96
1068	     a=rtpmap:96 MP4A-LATM/24000/2
1069	     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
1070	       object=2; config=400026203fc0

1072	   In this example audio configuration data is not multiplexed into the
1073	   RTP payload and is described only in SDP.  Furthermore, the "clock
1074	   rate" is set to the audio sampling rate.

1076	   In this example, the presence of SBR can not be determined by the SDP
1077	   parameter set.  The clock rate represents the core codec sampling
1078	   rate.  An SBR enabled decoder can use the SBR tool to upsample the
1079	   audio data if complexity and resulting output sampling rate permits.

1081	6.4.1.4.  Example: Use of the SBR-enabled Parameter

1083	   These two examples are identical to the example above with the
1084	   exception of the SBR-enabled parameter.  The presence of SBR is not
1085	   signaled by the SDP parameters object, profile-level-id and config,
1086	   but instead the SBR-enabled parameter is present.  The rate parameter
1087	   and the StreamMuxConfig contain the core codec sampling rate.

1089	   Example with "SBR-enabled=0", definitive and core codec sampling rate
1090	   24kHz:

1092	     m=audio 49230 RTP/AVP 96
1093	     a=rtpmap:96 MP4A-LATM/24000/2
1094	     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
1095	       SBR-enabled=0; config=400026203fc0

1097	   Example with "SBR-enabled=1", core codec sampling rate 24kHz,
1098	   definitive and SBR sampling rate 48kHz:

1100	     m=audio 49230 RTP/AVP 96
1101	     a=rtpmap:96 MP4A-LATM/24000/2
1102	     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
1103	       SBR-enabled=1; config=400026203fc0

1105	   In this example, the clock rate is still 24000 and this information
1106	   is used for RTP timestamp calculation.  The value of 24000 is used to
1107	   support old AAC decoders.  This makes the decoder supporting only AAC
1108	   understand the HE AAC coded data, although only plain AAC is
1109	   supported.  A HE AAC decoder is able to generate output data with the
1110	   SBR sampling rate.

1112	6.4.1.5.  Example: Hierarchical Signaling of SBR

1114	   When the presence of SBR is explicitly signaled by the SDP parameters
1115	   object, profile-level-id or the config string as in the example
1116	   below, the StreamMuxConfig contains both the core codec sampling rate
1117	   and the SBR sampling rate.

1119	     m=audio 49230 RTP/AVP 96
1120	     a=rtpmap:96 MP4A-LATM/48000/2
1121	     a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0;
1122	       config=40005623101fe0; SBR-enabled=1

1124	   This config string uses the explicit signaling mode 2.A (hierarchical
1125	   signaling; See [14496-3].  This means that the AOT(Audio Object Type)
1126	   is SBR(5) and SFI(Sampling Frequency Index) is 6(24000 Hz) which
1127	   refers to the underlying core codec sampling frequency.  CC(Channel
1128	   Configuration) is stereo(2), and the ESFI(Extension Sampling
1129	   Frequency Index)=3 (48000) is referring to the sampling frequency of
1130	   the extension tool(SBR).

1132	6.4.1.6.  Example: HE AAC v2 Signaling

1134	   HE AAC v2 decoders are required to always produce a stereo signal
1135	   from a mono signal.  Hence, there is no parameter necessary to signal
1136	   the presence of PS.

1138	   Example with "SBR-enabled=1" and 1 channel signaled in the a=rtpmap
1139	   line and within the config parameter.  Core codec sampling rate is
1140	   24kHz, definitive and SBR sampling rate is 48kHz.  Core codec channel
1141	   configuration is mono, PS channel configuration is stereo.

1143	     m=audio 49230 RTP/AVP 110
1144	     a=rtpmap:110 MP4A-LATM/24000/1
1145	     a=fmtp:110 profile-level-id=15; object=2; cpresent=0;
1146	       config=400026103fc0; SBR-enabled=1

1148	6.4.1.7.  Example: Hierarchical Signaling of PS

1150	   Example: 48khz stereo audio input:

1152	     m=audio 49230 RTP/AVP 110
1153	     a=rtpmap:110 MP4A-LATM/48000/2
1154	     a=fmtp:110 profile-level-id=48; cpresent=0; config=4001d613101fe0

1156	   The config parameter indicates explicit hierarchical signaling of PS
1157	   and SBR.  This configuration method is not supported by legacy AAC an
1158	   HE AAC decoders and these are therefore unable to decode the the
1159	   coded data.

1161	6.4.1.8.  Example: MPEG Surround

1163	   The following examples show how MPEG Surround configuration data can
1164	   be signaled using SDP.  The configuration is carried within the
1165	   config string in the first example by using two different layers.
1166	   The general parameters in this example are: AudioMuxVersion=1;
1167	   allStreamsSameTimeFraming=1; numSubFrames=0; numProgram=0;
1168	   numLayer=1.  The first layer describes the HE AAC payload and signals
1169	   the following parameters: ascLen=25; audioObjectType=2 (AAC LC);
1170	   extensionAudioObjectType=5 (SBR); samplingFrequencyIndex=6 (24kHz);
1171	   extensionSamplingFrequencyIndex=3 (48kHz); channelConfiguration=2
1172	   (2.0 channels).  The second layer describes the MPEG surround payload
1173	   and specifies the following parameters: ascLen=110;
1174	   AudioObjectType=30 (MPEG Surround); samplingFrequencyIndex=3 (48kHz);
1175	   channelConfiguration=6 (5.1 channels); sacPayloadEmbedding=1;
1176	   SpatialSpecificConfig=(48 kHz; 32 slots; 525 tree; ResCoding=1;
1177	   ResBands=[7,7,7,7]).

1179	   In this example the signaling is carried by using two different LATM
1180	   layers.  The MPEG surround payload is carried together with the AAC
1181	   payload in a single layer as indicated by the sacPayloadEmbedding
1182	   Flag.

1184	     m=audio 49230 RTP/AVP 96
1185	     a=rtpmap:96 MP4A-LATM/48000
1186	     a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
1187	       SBR-enabled=1;
1188	       config=8FF8004192B11880FF0DDE3699F2408C00536C02313CF3CE0FF0

1190	6.4.1.9.  Example: MPEG Surround with Extended SDP Parameters

1192	   The following example is an extension of the configuration given
1193	   above by the MPEG Surround specific parameters.  The MPS-asc
1194	   parameter specifies the MPEG Surround Baseline Profile at Level 3
1195	   (PLI55) and the MPS-asc string contains the hexadecimal
1196	   representation of the MPEG Surround ASC [audioObjectType=30 (MPEG
1197	   Surround); samplingFrequencyIndex=0x3 (48kHz); channelConfiguration=6
1198	   (5.1 channels); sacPayloadEmbedding=1; SpatialSpecificConfig=(48 kHz;
1199	   32 slots; 525 tree; ResCoding=1; ResBands=[0,13,13,13])].

1201	     m=audio 49230 RTP/AVP 96
1202	     a=rtpmap:96 MP4A-LATM/48000
1203	     a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0;
1204	       config=40005623101fe0; MPS-profile-level-id=55;
1205	       MPS-asc=F1B4CF920442029B501185B6DA00;

1207	6.4.1.10.  Example: MPEG Surround with Single Layer Configuration

1209	   The following example shows how MPEG Surround configuration data can
1210	   be signaled using the SDP config parameter.  The configuration is
1211	   carried within the config string using a single layer.  The general
1212	   parameters in this example are: AudioMuxVersion=1;
1213	   allStreamsSameTimeFraming=1; numSubFrames=0; numProgram=0;
1214	   numLayer=0.  The single layer describes the combination of HE AAC and
1215	   MPEG Surround payload and signals the following parameters:
1216	   ascLen=101; audioObjectType=2 (AAC LC); extensionAudioObjectType=5
1217	   (SBR); samplingFrequencyIndex=7 (22.05kHz);
1218	   extensionSamplingFrequencyIndex=7 (44.1kHz); channelConfiguration=2
1219	   (2.0 channels).  A backward compatible extension according to
1220	   [14496-3/Amd.1] signals the presence of MPEG surround payload data
1221	   and specifies the following parameters: SpatialSpecificConfig=(44.1
1222	   kHz; 32 slots; 525 tree; ResCoding=0).

1224	   In this example the signaling is carried by using a single LATM
1225	   layer.  The MPEG surround payload is carried together with the HE AAC
1226	   payload in a single layer.

1228	     m=audio 49230 RTP/AVP 96
1229	     a=rtpmap:96 MP4A-LATM/44100
1230	     a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0;
1231	       SBR-enabled=1; config=8FF8000652B920876A83A1F440884053620FF0;
1232	       MPS-profile-level-id=55

1234	7.  IANA Considerations

1236	   This document updates the media subtypes "MP4A-LATM" and "MP4V-ES"
1237	   from RFC 3016.  The new registrations are in Section 6.1 and
1238	   Section 6.3 of this document.

1240	8.  Acknowledgements

1242	   The authors would like to thank Yoshihiro Kikuchi, Yoshinori Matsui,
1243	   Toshiyuki Nomura, Shigeru Fukunaga and Hideaki Kimata for their work
1244	   on RFC 3016, and Ali Begen, Keith Drage, Roni Even and Qin Wu for
1245	   their valuable input and comments on this document.

1247	9.  Security Considerations

1249	   RTP packets using the payload format defined in this specification
1250	   are subject to the security considerations discussed in the RTP
1251	   specification [RFC3550], and in any applicable RTP profile.  The main
1252	   security considerations for the RTP packet carrying the RTP payload
1253	   format defined within this document are confidentiality, integrity,
1254	   and source authenticity.  Confidentiality is achieved by encryption
1255	   of the RTP payload, and integrity of the RTP packets through a
1256	   suitable cryptographic integrity protection mechanism.  A
1257	   cryptographic system may also allow the authentication of the source
1258	   of the payload.  A suitable security mechanism for this RTP payload
1259	   format should provide confidentiality, integrity protection, and at
1260	   least source authentication capable of determining whether or not an
1261	   RTP packet is from a member of the RTP session.

1263	   Note that most MPEG-4 codecs define an extension mechanism to
1264	   transmit extra data within a stream that is gracefully skipped by
1265	   decoders that do not support this extra data.  This covert channel
1266	   may be used to transmit unwanted data in an otherwise valid stream.
1267	   The appropriate mechanism to provide security to RTP and payloads
1268	   following this may vary.  It is dependent on the application, the
1269	   transport, and the signaling protocol employed.  Therefore, a single
1270	   mechanism is not sufficient, although if suitable, the usage of the
1271	   Secure Real-time Transport Protocol (SRTP) [RFC3711] is recommended.
1272	   Other mechanisms that may be used are IPsec [RFC4301] and Transport
1273	   Layer Security (TLS) [RFC5246] (e.g., for RTP over TCP), but other
1274	   alternatives may also exist.

1276	   This RTP payload format and its media decoder do not exhibit any
1277	   significant non-uniformity in the receiver-side computational
1278	   complexity for packet processing, and thus are unlikely to pose a
1279	   denial-of-service threat due to the receipt of pathological data.
1280	   The complete MPEG-4 system allows for transport of a wide range of
1281	   content, including Java applets (MPEG-J) and scripts.  Since this
1282	   payload format is restricted to audio and video streams, it is not
1283	   possible to transport such active content in this format.

1285	10.  Differences to RFC 3016

1287	   The RTP payload format for MPEG-4 Audio as specified in RFC 3016 is
1288	   used by the 3GPP PSS service [3GPP].  However, there are some
1289	   misalignments between RFC 3016 and the 3GPP PSS specification that
1290	   are addressed by this update:

1292	   o  The audio payload format (LATM) referenced in this document is
1293	      binary compatible to the format used in [3GPP].

1295	   o  The audio signaling format (StreamMuxConfig) referenced in this
1296	      document is binary compatible to the format used in [3GPP].

1298	   o  The use of an audio parameter "SBR-enabled" is now defined in this
1299	      document, which is used by 3GPP implementations [3GPP].

1301	   o  The rate parameter is defined unambiguously in this document for
1302	      the case of presence of SBR (Spectral Band Replication)

1304	   o  The number of audio channels parameter is defined unambiguously in
1305	      this document for the case of presence of PS (Parametric Stereo)

1307	   Furthermore some comments have been addressed and signaling support
1308	   for MPEG surround [23003-1] was added.

1310	11.  References

1312	11.1.  Normative References

1314	   [14496-2]  MPEG, "ISO/IEC International Standard 14496-2 - Coding of
1315	              audio-visual objects, Part 2: Visual", 2003.

1317	   [14496-3]  MPEG, "ISO/IEC International Standard 14496-3 - Coding of
1318	              audio-visual objects, Part 3 Audio", 2009.

1320	   [14496-3/Amd.1]
1321	              MPEG, "ISO/IEC International Standard 14496-3 - Coding of
1322	              audio-visual objects, Part 3: Audio, Amendment 1: HD-AAC
1323	              profile and MPEG Surround signaling", 2009.

1325	   [23003-1]  MPEG, "ISO/IEC International Standard 23003-1 - MPEG
1326	              Surround (MPEG D)", 2007.

1328	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1329	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1331	   [RFC3016]  Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and H.
1332	              Kimata, "RTP Payload Format for MPEG-4 Audio/Visual
1333	              Streams", RFC 3016, November 2000.

1335	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1336	              Jacobson, "RTP: A Transport Protocol for Real-Time
1337	              Applications", STD 64, RFC 3550, July 2003.

1339	   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
1340	              Registration Procedures", BCP 13, RFC 4288, December 2005.

1342	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1343	              Description Protocol", RFC 4566, July 2006.

1345	   [RFC4629]  Ott, H., Bormann, C., Sullivan, G., Wenger, S., and R.
1346	              Even, "RTP Payload Format for ITU-T Rec", RFC 4629,
1347	              January 2007.

1349	   [RFC4855]  Casner, S., "Media Type Registration of RTP Payload
1350	              Formats", RFC 4855, February 2007.

1352	   [RFC5583]  Schierl, T. and S. Wenger, "Signaling Media Decoding
1353	              Dependency in the Session Description Protocol (SDP)",
1354	              RFC 5583, July 2009.

1356	11.2.  Informative References

1358	   [14496-1]  MPEG, "ISO/IEC International Standard 14496-1 - Coding of
1359	              audio-visual objects, Part 1 Systems", 2004.

1361	   [14496-12]
1362	              MPEG, "ISO/IEC International Standard 14496-12 - Coding of
1363	              audio-visual objects, Part 12 ISO base media file format".

1365	   [14496-14]
1366	              MPEG, "ISO/IEC International Standard 14496-14 - Coding of
1367	              audio-visual objects, Part 12 MP4 file format".

1369	   [3GPP]     3GPP, "3rd Generation Partnership Project; Technical
1370	              Specification Group Services and System Aspects;
1371	              Transparent end-to-end Packet-switched Streaming Service
1372	              (PSS); Protocols and codecs (Release 9)", 3GPP TS 26.234
1373	              V9.5.0, December 2010.

1375	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
1376	              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
1377	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
1378	              September 1997.

1380	   [RFC3640]  van der Meer, J., Mackie, D., Swaminathan, V., Singer, D.,
1381	              and P. Gentric, "RTP Payload Format for Transport of
1382	              MPEG-4 Elementary Streams", RFC 3640, November 2003.

1384	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1385	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
1386	              RFC 3711, March 2004.

1388	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
1389	              Internet Protocol", RFC 4301, December 2005.

1391	   [RFC4628]  Even, R., "RTP Payload Format for H.263 Moving RFC 2190 to
1392	              Historic Status", RFC 4628, January 2007.

1394	   [RFC5109]  Li, A., "RTP Payload Format for Generic Forward Error
1395	              Correction", RFC 5109, December 2007.

1397	   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
1398	              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

1400	   [RFC5691]  de Bont, F., Doehla, S., Schmidt, M., and R.
1401	              Sperschneider, "RTP Payload Format for Elementary Streams
1402	              with MPEG Surround Multi-Channel Audio", RFC 5691,
1403	              October 2009.

1405	Authors' Addresses

1407	   Malte Schmidt
1408	   Dolby Laboratories
1409	   Deutschherrnstr. 15-19
1410	   90537 Nuernberg,
1411	   DE

1413	   Phone: +49 911 928 91 42
1414	   Email: malte.schmidt@dolby.com

1416	   Frans de Bont
1417	   Philips Electronics
1418	   High Tech Campus 5
1419	   5656 AE Eindhoven,
1420	   NL

1422	   Phone: +31 40 2740234
1423	   Email: frans.de.bont@philips.com

1425	   Stefan Doehla
1426	   Fraunhofer IIS
1427	   Am Wolfmantel 33
1428	   91058 Erlangen,
1429	   DE

1431	   Phone: +49 9131 776 6042
1432	   Email: stefan.doehla@iis.fraunhofer.de

1434	   Jaehwan Kim
1435	   LG Electronics Inc.
1436	   221, Yangjae-dong, Seocho-gu
1437	   Seoul 137-130,
1438	   Korea

1440	   Phone: +82 10 6225 0619
1441	   Email: kjh1905m@naver.com