idnits 2.17.1 draft-ietf-avt-rfc3016bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC3640]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document obsoletes RFC3016, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 11, 2011) is 4853 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '14496-2' -- Possible downref: Non-RFC (?) normative reference: ref. '14496-3' -- Possible downref: Non-RFC (?) normative reference: ref. '23003-1' ** Obsolete normative reference: RFC 3016 (Obsoleted by RFC 6416) ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 AVT M. Schmidt 3 Internet-Draft Dolby Laboratories 4 Obsoletes: 3016 (if approved) F. de Bont 5 Intended status: Standards Track Philips Electronics 6 Expires: July 15, 2011 S. Doehla 7 Fraunhofer IIS 8 Jaehwan. Kim 9 LG Electronics Inc. 10 January 11, 2011 12 RTP Payload Format for MPEG-4 Audio/Visual Streams 13 draft-ietf-avt-rfc3016bis-02.txt 15 Abstract 17 This document describes Real-Time Transport Protocol (RTP) payload 18 formats for carrying each of MPEG-4 Audio and MPEG-4 Visual 19 bitstreams without using MPEG-4 Systems. For the purpose of directly 20 mapping MPEG-4 Audio/Visual bitstreams onto RTP packets, it provides 21 specifications for the use of RTP header fields and also specifies 22 fragmentation rules. It also provides specifications for Media Type 23 registration and the use of Session Description Protocol (SDP). The 24 audio payload format described in this document has some limitations. 25 for new system designs [RFC3640] is preferred. 27 Status of this Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on July 15, 2011. 44 Copyright Notice 46 Copyright (c) 2011 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 1.1. MPEG-4 Visual RTP Payload Format . . . . . . . . . . . . . 4 63 1.2. MPEG-4 Audio RTP Payload Format . . . . . . . . . . . . . 5 64 1.3. Interoperability with RFC 3016 . . . . . . . . . . . . . . 5 65 2. Definitions and Abbreviations . . . . . . . . . . . . . . . . 6 66 3. LATM Restrictions for RTP Packetization of MPEG-4 Audio 67 Bitstreams . . . . . . . . . . . . . . . . . . . . . . . . . . 7 68 4. RTP Packetization of MPEG-4 Visual Bitstreams . . . . . . . . 7 69 4.1. Use of RTP Header Fields for MPEG-4 Visual . . . . . . . . 8 70 4.2. Fragmentation of MPEG-4 Visual Bitstream . . . . . . . . . 9 71 4.3. Examples of Packetized MPEG-4 Visual Bitstream . . . . . . 11 72 5. RTP Packetization of MPEG-4 Audio Bitstreams . . . . . . . . . 14 73 5.1. RTP Packet Format . . . . . . . . . . . . . . . . . . . . 14 74 5.2. Use of RTP Header Fields for MPEG-4 Audio . . . . . . . . 15 75 5.3. Fragmentation of MPEG-4 Audio Bitstream . . . . . . . . . 16 76 6. Media Type Registration for MPEG-4 Audio/Visual Streams . . . 16 77 6.1. Media Type Registration for MPEG-4 Visual . . . . . . . . 16 78 6.2. Mapping to SDP for MPEG-4 Visual . . . . . . . . . . . . . 18 79 6.2.1. Declarative SDP Usage for MPEG-4 Visual . . . . . . . 19 80 6.3. Media Type Registration for MPEG-4 Audio . . . . . . . . . 19 81 6.4. Mapping to SDP for MPEG-4 Audio . . . . . . . . . . . . . 23 82 6.4.1. Declarative SDP Usage for MPEG-4 Audio . . . . . . . . 23 83 6.4.1.1. Example: In-band Configuration . . . . . . . . . . 24 84 6.4.1.2. Example: 6kb/s CELP . . . . . . . . . . . . . . . 24 85 6.4.1.3. Example: 64 kb/s AAC LC Stereo . . . . . . . . . . 24 86 6.4.1.4. Example: Use of the SBR-enabled Parameter . . . . 25 87 6.4.1.5. Example: Hierarchical Signaling of SBR . . . . . . 25 88 6.4.1.6. Example: HE AAC v2 Signaling . . . . . . . . . . . 26 89 6.4.1.7. Example: Hierarchical Signaling of PS . . . . . . 26 90 6.4.1.8. Example: MPEG Surround . . . . . . . . . . . . . . 26 91 6.4.1.9. Example: MPEG Surround with Extended SDP 92 Parameters . . . . . . . . . . . . . . . . . . . . 27 93 6.4.1.10. Example: MPEG Surround with Single Layer 94 Configuration . . . . . . . . . . . . . . . . . . 27 95 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 96 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 28 97 9. Security Considerations . . . . . . . . . . . . . . . . . . . 28 98 10. Differences to RFC 3016 . . . . . . . . . . . . . . . . . . . 29 99 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 100 11.1. Normative References . . . . . . . . . . . . . . . . . . . 29 101 11.2. Informative References . . . . . . . . . . . . . . . . . . 30 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 31 104 1. Introduction 106 The RTP payload formats described in this document specify how MPEG-4 107 Audio [14496-3] and MPEG-4 Visual streams [14496-2] are to be 108 fragmented and mapped directly onto RTP packets. 110 These RTP payload formats enable transport of MPEG-4 Audio/Visual 111 streams without using the synchronization and stream management 112 functionality of MPEG-4 Systems [14496-1]. Such RTP payload formats 113 will be used in systems that have intrinsic stream management 114 functionality and thus require no such functionality from MPEG-4 115 Systems. H.323 terminals are an example of such systems, where 116 MPEG-4 Audio/Visual streams are not managed by MPEG-4 Systems Object 117 Descriptors but by H.245. The streams are directly mapped onto RTP 118 packets without using the MPEG-4 Systems Sync Layer. Other examples 119 are SIP and RTSP where Media Type and SDP are used. Media Type and 120 SDP usages of the RTP payload formats described in this document are 121 defined to directly specify the attribute of Audio/Visual streams 122 (e.g., media type, packetization format and codec configuration) 123 without using MPEG-4 Systems. The obvious benefit is that these 124 MPEG-4 Audio/Visual RTP payload formats can be handled in an unified 125 way together with those formats defined for non-MPEG-4 codecs. The 126 disadvantage is that interoperability with environments using MPEG-4 127 Systems may be difficult, hence, other payload formats may be better 128 suited to those applications. 130 The semantics of RTP headers in such cases need to be clearly 131 defined, including the association with MPEG-4 Audio/Visual data 132 elements. In addition, it is beneficial to define the fragmentation 133 rules of RTP packets for MPEG-4 Video streams so as to enhance error 134 resiliency by utilizing the error resiliency tools provided inside 135 the MPEG-4 Video stream. 137 1.1. MPEG-4 Visual RTP Payload Format 139 MPEG-4 Visual is a visual coding standard with many new features: 140 high coding efficiency; high error resiliency; multiple, arbitrary 141 shape object-based coding; etc. [14496-2]. It covers a wide range of 142 bitrate from scores of Kbps to several Mbps. It also covers a wide 143 variety of networks, ranging from those guaranteed to be almost 144 error-free to mobile networks with high error rates. 146 With respect to the fragmentation rules for an MPEG-4 Visual 147 bitstream defined in this document, since MPEG-4 Visual is used for a 148 wide variety of networks, it is desirable not to apply too much 149 restriction on fragmentation, and a fragmentation rule such as "a 150 single video packet shall always be mapped on a single RTP packet" 151 may be inappropriate. On the other hand, careless, media unaware 152 fragmentation may cause degradation in error resiliency and bandwidth 153 efficiency. The fragmentation rules described in this document are 154 flexible but manage to define the minimum rules for preventing 155 meaningless fragmentation while utilizing the error resiliency 156 functionalities of MPEG-4 Visual. 158 The fragmentation rule "Different VOPs SHOULD be fragmented into 159 different RTP packets" is made so that the RTP timestamp uniquely 160 indicates the VOP time framing. On the other hand, MPEG-4 video may 161 generate VOPs of very small size, in cases with an empty VOP 162 (vop_coded=0) containing only VOP header or an arbitrary shaped VOP 163 with a small number of coding blocks. To reduce the overhead for 164 such cases, the fragmentation rule permits concatenating multiple 165 VOPs in an RTP packet. (See fragmentation rule (4) in Section 4.2 166 and marker bit and timestamp in Section 4.1.) 168 While the additional media specific RTP header defined for such video 169 coding tools as H.261 or MPEG-1/2 is effective in helping to recover 170 picture headers corrupted by packet losses, MPEG-4 Visual has already 171 error resiliency functionalities for recovering corrupt headers, and 172 these can be used on RTP/IP networks as well as on other networks 173 (H.223/mobile, MPEG-2/TS, etc.). Therefore, no extra RTP header 174 fields are defined in this MPEG-4 Visual RTP payload format. 176 1.2. MPEG-4 Audio RTP Payload Format 178 MPEG-4 Audio is an audio standard that integrates many different 179 types of audio coding tools. Low-overhead MPEG-4 Audio Transport 180 Multiplex (LATM) manages the sequences of audio data with relatively 181 small overhead. In audio-only applications, then, it is desirable 182 for LATM-based MPEG-4 Audio bitstreams to be directly mapped onto RTP 183 packets without using MPEG-4 Systems. 185 For MPEG-4 Audio coding tools, as is true for other audio coders, if 186 the payload is a single audio frame, packet loss will not impair the 187 decodability of adjacent packets. Therefore, the additional media 188 specific header for recovering errors will not be required for MPEG-4 189 Audio. Existing RTP protection mechanisms, such as Generic Forward 190 Error Correction [RFC5109] and Redundant Audio Data [RFC2198], MAY be 191 applied to improve error resiliency. 193 1.3. Interoperability with RFC 3016 195 Although strictly speaking systems that support MPEG-4 Audio as 196 specified in [RFC3016] will be incompatible with systems supporting 197 this document, existing systems already comply with the specification 198 in 3GPP PSS service [3GPP] and therefore no incompatibility issues 199 are foreseen. 201 2. Definitions and Abbreviations 203 This document makes use of terms, specified in [14496-2], [14496-3], 204 and [23003-1]. In addition, the following terms are used in this 205 document and have specific meaning within the context of this 206 document. 208 Core codec sampling rate: 210 Audio codec sampling rate. When SBR (Spectral Band Replication) 211 is used, typically the double value of this will be regarded as 212 the definitive sampling rate (i.e., the decoder's output sampling 213 rate) 215 Note: The exception is downsampled SBR mode in which the SBR 216 sampling rate equals the core codec sampling rate. 218 Core codec channel configuration: 220 Audio codec channel configuration. When PS (Parametric Stereo) is 221 used, the core codec channel configuration indicates one channel 222 (i.e., mono) whereas the definitive channel configuration is two 223 channels (i.e. stereo). When MPEG Surround is used, the 224 definitive channel configuration depends on the output of the MPEG 225 Surround decoder. 227 SBR sampling rate: 229 When SBR is used, typically the sampling rate is the double value 230 of the core codec sampling rate, with the exception of downsampled 231 SBR mode, where the SBR sampling rate and core codec sampling rate 232 are identical. 234 Abbreviations: 236 AAC: Advanced Audio Coding 238 ASC: AudioSpecificConfig 240 HE AAC: High Efficiency AAC 242 LATM: Low-overhead MPEG-4 Audio Transport Multiplex 244 PS: Parametric Stereo 246 SBR: Spectral Band Replication 247 VOP: Video Object Plane 249 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 250 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 251 document are to be interpreted as described in [RFC2119]. 253 3. LATM Restrictions for RTP Packetization of MPEG-4 Audio Bitstreams 255 While LATM has several multiplexing features as follows; 257 o Carrying configuration information with audio data, 259 o Concatenation of multiple audio frames in one audio stream, 261 o Multiplexing multiple objects (programs), 263 o Multiplexing scalable layers, 265 in RTP transmission there is no need for the last two features. 266 Therefore, these two features MUST NOT be used in applications based 267 on RTP packetization specified by this document. Since LATM has been 268 developed for only natural audio coding tools, i.e., not for 269 synthesis tools, it seems difficult to transmit Structured Audio (SA) 270 data and Text to Speech Interface (TTSI) data by LATM. Therefore, SA 271 data and TTSI data MUST NOT be transported by the RTP packetization 272 in this document. 274 For transmission of scalable streams, audio data of each layer SHOULD 275 be packetized onto different RTP streams allowing for the different 276 layers to be treated differently at the IP level, for example via 277 some means of differentiated service. On the other hand, all 278 configuration data of the scalable streams are contained in one LATM 279 configuration data "StreamMuxConfig" and every scalable layer shares 280 the StreamMuxConfig. The mapping between each layer and its 281 configuration data is achieved by LATM header information attached to 282 the audio data. In order to indicate the dependency information of 283 the scalable streams, the signaling mechanism as specified in 284 [RFC5583] SHOULD be used (see Section 5.2). 286 4. RTP Packetization of MPEG-4 Visual Bitstreams 288 This section specifies RTP packetization rules for MPEG-4 Visual 289 content. An MPEG-4 Visual bitstream is mapped directly onto RTP 290 packets without the addition of extra header fields or any removal of 291 Visual syntax elements. The Combined Configuration/Elementary stream 292 mode MUST be used so that configuration information will be carried 293 to the same RTP port as the elementary stream. (see 6.2.1 "Start 294 codes" of [14496-2]) The configuration information MAY additionally 295 be specified by some out-of-band means. If needed by systems using 296 Media Type parameters and SDP parameters, "e.g., SIP and RTSP", the 297 optional parameter "config" MUST be used to specify the configuration 298 information (see Section 6.1 and Section 6.2). 300 When the short video header mode is used, the RTP payload format for 301 H.263 SHOULD be used (the format defined in [RFC4629] is RECOMMENDED, 302 but the [RFC4628] format MAY be used for compatibility with older 303 implementations). 305 0 1 2 3 306 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 307 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 308 |V=2|P|X| CC |M| PT | sequence number | RTP 309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 310 | timestamp | Header 311 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 312 | synchronization source (SSRC) identifier | 313 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 314 | contributing source (CSRC) identifiers | 315 | .... | 316 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 317 | | RTP 318 | MPEG-4 Visual stream (byte aligned) | Pay- 319 | | load 320 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 321 | :...OPTIONAL RTP padding | 322 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 324 Figure 1 - An RTP packet for MPEG-4 Visual stream 326 4.1. Use of RTP Header Fields for MPEG-4 Visual 328 Payload Type (PT): The assignment of an RTP payload type for this 329 packet format is outside the scope of this document, and will not be 330 specified here. It is expected that the RTP profile for a particular 331 class of applications will assign a payload type for this encoding, 332 or if that is not done then a payload type in the dynamic range SHALL 333 be chosen by means of an out-of-band signaling protocol (e.g., H.245, 334 SIP, etc). 336 Extension (X) bit: Defined by the RTP profile used. 338 Sequence Number: Incremented by one for each RTP data packet sent, 339 starting, for security reasons, with a random initial value. 341 Marker (M) bit: The marker bit is set to one to indicate the last RTP 342 packet (or only RTP packet) of a VOP. When multiple VOPs are carried 343 in the same RTP packet, the marker bit is set to one. 345 Timestamp: The timestamp indicates the sampling instance of the VOP 346 contained in the RTP packet. A constant offset, which is random, is 347 added for security reasons. 349 o When multiple VOPs are carried in the same RTP packet, the 350 timestamp indicates the earliest of the VOP times within the VOPs 351 carried in the RTP packet. Timestamp information of the rest of 352 the VOPs are derived from the timestamp fields in the VOP header 353 (modulo_time_base and vop_time_increment). 355 o If the RTP packet contains only configuration information and/or 356 Group_of_VideoObjectPlane() fields, the timestamp of the next VOP 357 in the coding order is used. 359 o If the RTP packet contains only visual_object_sequence_end_code 360 information, the timestamp of the immediately preceding VOP in the 361 coding order is used. 363 The resolution of the timestamp is set to its default value of 90kHz, 364 unless specified by an out-of-band means (e.g., SDP parameter or 365 Media Type parameter as defined in Section 6). 367 Other header fields are used as described in [RFC3550]. 369 4.2. Fragmentation of MPEG-4 Visual Bitstream 371 A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP 372 payload without any addition of extra header fields or any removal of 373 Visual syntax elements. The Combined Configuration/Elementary 374 streams mode is used. The following rules apply for the 375 fragmentation. 377 In the following, header means one of the following: 379 o Configuration information (Visual Object Sequence Header, Visual 380 Object Header and Video Object Layer Header) 382 o visual_object_sequence_end_code 384 o The header of the entry point function for an elementary stream 385 (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(), 386 video_plane_with_short_header(), MeshObject() or FaceObject()) 388 o The video packet header (video_packet_header() excluding 389 next_resync_marker()) 391 o The header of gob_layer() 393 o See 6.2.1 "Start codes" of [14496-2] for the definition of the 394 configuration information and the entry point functions. 396 (1) Configuration information and Group_of_VideoObjectPlane() fields 397 SHALL be placed at the beginning of the RTP payload (just after the 398 RTP header) or just after the header of the syntactically upper layer 399 function. 401 (2) If one or more headers exist in the RTP payload, the RTP payload 402 SHALL begin with the header of the syntactically highest function. 403 Note: The visual_object_sequence_end_code is regarded as the lowest 404 function. 406 (3) A header SHALL NOT be split into a plurality of RTP packets. 408 (4) Different VOPs SHOULD be fragmented into different RTP packets so 409 that one RTP packet consists of the data bytes associated with a 410 unique VOP time instance (that is indicated in the timestamp field in 411 the RTP packet header), with the exception that multiple consecutive 412 VOPs MAY be carried within one RTP packet in the decoding order if 413 the size of the VOPs is small. 415 Note: When multiple VOPs are carried in one RTP payload, the 416 timestamp of the VOPs after the first one may be calculated by the 417 decoder. This operation is necessary only for RTP packets in which 418 the marker bit equals to one and the beginning of RTP payload 419 corresponds to a start code. (See timestamp and marker bit in 420 Section 4.1.) 422 (5) It is RECOMMENDED that a single video packet is sent as a single 423 RTP packet. The size of a video packet SHOULD be adjusted in such a 424 way that the resulting RTP packet is not larger than the path-MTU. 425 If the video packet is disabled by the coder configuration (by 426 setting resync_marker_disable in the VOL header to 1), or in coding 427 tools where the video packet is not supported, a VOP MAY be split at 428 arbitrary byte-positions. 430 The video packet starts with the VOP header or the video packet 431 header, followed by motion_shape_texture(), and ends with 432 next_resync_marker() or next_start_code(). 434 4.3. Examples of Packetized MPEG-4 Visual Bitstream 436 Figure 2 shows examples of RTP packets generated based on the 437 criteria described in Section 4.2 439 (a) is an example of the first RTP packet or the random access point 440 of an MPEG-4 Visual bitstream containing the configuration 441 information. According to criterion (1), the Visual Object Sequence 442 Header(VS header) is placed at the beginning of the RTP payload, 443 preceding the Visual Object Header and the Video Object Layer 444 Header(VO header, VOL header). Since the fragmentation rule defined 445 in Section 4.2 guarantees that the configuration information, 446 starting with visual_object_sequence_start_code, is always placed at 447 the beginning of the RTP payload, RTP receivers can detect the random 448 access point by checking if the first 32-bit field of the RTP payload 449 is visual_object_sequence_start_code. 451 (b) is another example of the RTP packet containing the configuration 452 information. It differs from example (a) in that the RTP packet also 453 contains a VOP header and a Video Packet in the VOP following the 454 configuration information. Since the length of the configuration 455 information is relatively short (typically scores of bytes) and an 456 RTP packet containing only the configuration information may thus 457 increase the overhead, the configuration information and the 458 immediately following VOP can be packetized into a single RTP packet. 460 (c) is an example of an RTP packet that contains 461 Group_of_VideoObjectPlane(GOV). Following criterion (1), the GOV is 462 placed at the beginning of the RTP payload. It would be a waste of 463 RTP/IP header overhead to generate an RTP packet containing only a 464 GOV whose length is 7 bytes. Therefore, (a part of) the following 465 VOP can be placed in the same RTP packet as shown in (c). 467 (d) is an example of the case where one video packet is packetized 468 into one RTP packet. When the packet-loss rate of the underlying 469 network is high, this kind of packetization is recommended. Even 470 when the RTP packet containing the VOP header is discarded by a 471 packet loss, the other RTP packets can be decoded by using the 472 HEC(Header Extension Code) information in the video packet header. 473 No extra RTP header field is necessary. 475 (e) is an example of the case where more than one video packet is 476 packetized into one RTP packet. This kind of packetization is 477 effective to save the overhead of RTP/IP headers when the bit-rate of 478 the underlying network is low. However, it will decrease the packet- 479 loss resiliency because multiple video packets are discarded by a 480 single RTP packet loss. The optimal number of video packets in an 481 RTP packet and the length of the RTP packet can be determined 482 considering the packet-loss rate and the bit-rate of the underlying 483 network. 485 (f) is an example of the case when the video packet is disabled by 486 setting resync_marker_disable in the VOL header to 1. In this case, 487 a VOP may be split into a plurality of RTP packets at arbitrary byte- 488 positions. For example, it is possible to split a VOP into fixed- 489 length packets. This kind of coder configuration and RTP packet 490 fragmentation may be used when the underlying network is guaranteed 491 to be error-free. 493 Figure 3 shows examples of RTP packets prohibited by the criteria of 494 Section 4.2. 496 Fragmentation of a header into multiple RTP packets, as in (a), will 497 not only increase the overhead of RTP/IP headers but also decrease 498 the error resiliency. Therefore, it is prohibited by the criterion 499 (3). 501 When concatenating more than one video packets into an RTP packet, 502 VOP header or video_packet_header() are not allowed to be placed in 503 the middle of the RTP payload. The packetization as in (b) is not 504 allowed by criterion (2) due to the aspect of the error resiliency. 505 Comparing this example with Figure 2(d), although two video packets 506 are mapped onto two RTP packets in both cases, the packet-loss 507 resiliency is not identical. Namely, if the second RTP packet is 508 lost, both video packets 1 and 2 are lost in the case of Figure 3(b) 509 whereas only video packet 2 is lost in the case of Figure 2(d). 511 +------+------+------+------+ 512 (a) | RTP | VS | VO | VOL | 513 |header|header|header|header| 514 +------+------+------+------+ 516 +------+------+------+------+------+------------+ 517 (b) | RTP | VS | VO | VOL | VOP |Video Packet| 518 |header|header|header|header|header| | 519 +------+------+------+------+------+------------+ 521 +------+-----+------------------+ 522 (c) | RTP | GOV |Video Object Plane| 523 |header| | | 524 +------+-----+------------------+ 526 +------+------+------------+ +------+------+------------+ 527 (d) | RTP | VOP |Video Packet| | RTP | VP |Video Packet| 528 |header|header| (1) | |header|header| (2) | 529 +------+------+------------+ +------+------+------------+ 531 +------+------+------------+------+------------+------+------------+ 532 (e) | RTP | VP |Video Packet| VP |Video Packet| VP |Video Packet| 533 |header|header| (1) |header| (2) |header| (3) | 534 +------+------+------------+------+------------+------+------------+ 536 +------+------+------------+ +------+------------+ 537 (f) | RTP | VOP |VOP fragment| | RTP |VOP fragment| 538 |header|header| (1) | |header| (2) | ___ 539 +------+------+------------+ +------+------------+ 541 Figure 2 - Examples of RTP packetized MPEG-4 Visual bitstream 543 +------+-------------+ +------+------------+------------+ 544 (a) | RTP |First half of| | RTP |Last half of|Video Packet| 545 |header| VP header | |header| VP header | | 546 +------+-------------+ +------+------------+------------+ 548 +------+------+----------+ +------+---------+------+------------+ 549 (b) | RTP | VOP |First half| | RTP |Last half| VP |Video Packet| 550 |header|header| of VP(1) | |header| of VP(1)|header| (2) | 551 +------+------+----------+ +------+---------+------+------------+ 553 Figure 3 - Examples of prohibited RTP packetization for MPEG-4 Visual 554 bitstream 556 5. RTP Packetization of MPEG-4 Audio Bitstreams 558 This section specifies RTP packetization rules for MPEG-4 Audio 559 bitstreams. MPEG-4 Audio streams MUST be formatted LATM (Low- 560 overhead MPEG-4 Audio Transport Multiplex) [14496-3] streams, and the 561 LATM-based streams are then mapped onto RTP packets as described in 562 the sections below. 564 5.1. RTP Packet Format 566 LATM-based streams consist of a sequence of audioMuxElements that 567 include one or more PayloadMux elements which carry the audio frames. 568 A complete audioMuxElement or a part of one SHALL be mapped directly 569 onto an RTP payload without any removal of audioMuxElement syntax 570 elements (see Figure 4). The first byte of each audioMuxElement 571 SHALL be located at the first payload location in an RTP packet. 573 0 1 2 3 574 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 |V=2|P|X| CC |M| PT | sequence number |RTP 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 | timestamp |Header 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 580 | synchronization source (SSRC) identifier | 581 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 582 | contributing source (CSRC) identifiers | 583 | .... | 584 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 585 | |RTP 586 : audioMuxElement (byte aligned) :Payload 587 | | 588 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 589 | :...OPTIONAL RTP padding | 590 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 592 Figure 4 - An RTP packet for MPEG-4 Audio 594 In order to decode the audioMuxElement, the following 595 muxConfigPresent information is required to be indicated by out-of- 596 band means. When SDP is utilized for this indication, the Media Type 597 parameter "cpresent" corresponds to the muxConfigPresent information 598 (see Section 6.3). The following restrictions apply: 600 o In the out-of-band configuration case the number of PayloadMux 601 elements contained in each audioMuxElement can only be set once. 602 If more than one PayloadMux elements are contained in each 603 AudioMuxElement, special care is required to ensure that the last 604 RTP packet remains decodable. 606 o To construct the audioMuxElement in the in-band configuration 607 case, non octet aligned configuration data is preceding the one or 608 more PayloadMux elements. Since the generation of RTP payloads 609 with non octet aligned data is not possible with RTP hint tracks, 610 as defined by the MP4 file format [14496-12] [14496-14], this 611 document does not support RTP hint tracks for the in-band 612 configuration case. 614 muxConfigPresent: If this value is set to 1 (in-band mode), the 615 audioMuxElement SHALL include an indication bit "useSameStreamMux" 616 and MAY include the configuration information for audio compression 617 "StreamMuxConfig". The useSameStreamMux bit indicates whether the 618 StreamMuxConfig element in the previous frame is applied in the 619 current frame. If the useSameStreamMux bit indicates to use the 620 StreamMuxConfig from the previous frame, but if the previous frame 621 has been lost, the current frame may not be decodable. Therefore, in 622 case of in-band mode, the StreamMuxConfig element SHOULD be 623 transmitted repeatedly depending on the network condition. On the 624 other hand, if muxConfigPresent is set to 0 (out-band mode), the 625 StreamMuxConfig element is required to be transmitted by an out-of- 626 band means. In case of SDP, Media Type parameter "config" is 627 utilized (see Section 6.3). 629 5.2. Use of RTP Header Fields for MPEG-4 Audio 631 Payload Type (PT): The assignment of an RTP payload type for this new 632 packet format is outside the scope of this document, and will only be 633 restricted here. It is expected that the RTP profile for a 634 particular class of applications will assign a payload type for this 635 encoding, or if that is not done then a payload type in the dynamic 636 range shall be chosen by means of an out-of-band signaling protocol 637 (e.g., H.245, SIP, etc). In the dynamic assignment of RTP payload 638 types for scalable streams, the server SHALL assign a different value 639 to each layer. The dependency relationships between the enhance 640 layer and the base layer MUST be signaled as specified in [RFC5583]. 641 An example of the use of such signaling for scalable audio streams 642 can be found in [RFC5691]. 644 Marker (M) bit: The marker bit indicates audioMuxElement boundaries. 645 It is set to one to indicate that the RTP packet contains a complete 646 audioMuxElement or the last fragment of an audioMuxElement. 648 Timestamp: The timestamp indicates the sampling instance of the first 649 audio frame contained in the RTP packet. Timestamps are RECOMMENDED 650 to start at a random value for security reasons. 652 Unless specified by an out-of-band means, the resolution of the 653 timestamp is set to its default value of 90 kHz. 655 Sequence Number: Incremented by one for each RTP packet sent, 656 starting, for security reasons, with a random value. 658 Other header fields are used as described in [RFC3550]. 660 5.3. Fragmentation of MPEG-4 Audio Bitstream 662 It is RECOMMENDED to put one audioMuxElement in each RTP packet. If 663 the size of an audioMuxElement can be kept small enough that the size 664 of the RTP packet containing it does not exceed the size of the path- 665 MTU, this will be no problem. If it cannot, the audioMuxElement 666 SHALL be fragmented and spread across multiple packets. 668 6. Media Type Registration for MPEG-4 Audio/Visual Streams 670 The following sections describe the Media Type registrations for 671 MPEG-4 Audio/Visual streams, which are registered in accordance with 672 [RFC4855] and uses the template of [RFC4288]. Media Type 673 registration and SDP usage for the MPEG-4 Visual stream are described 674 in Section 6.1 and Section 6.2, respectively, while Media Type 675 registration and SDP usage for MPEG-4 Audio stream are described in 676 Section 6.3 and Section 6.4, respectively. 678 6.1. Media Type Registration for MPEG-4 Visual 680 The receiver MUST ignore any unspecified parameter, to ensure that 681 additional parameters can be added in any future revision of this 682 specification. 684 Type name: video 686 Subtype name: MP4V-ES 688 Required parameters: none 690 Optional parameters: 692 rate: This parameter is used only for RTP transport. It indicates 693 the resolution of the timestamp field in the RTP header. If this 694 parameter is not specified, its default value of 90000 (90kHz) is 695 used. 697 profile-level-id: A decimal representation of MPEG-4 Visual 698 Profile and Level indication value (profile_and_level_indication) 699 defined in Table G-1 of [14496-2]. This parameter MAY be used in 700 the capability exchange or session setup procedure to indicate 701 MPEG-4 Visual Profile and Level combination of which the MPEG-4 702 Visual codec is capable. If this parameter is not specified by 703 the procedure, its default value of 1 (Simple Profile/Level 1) is 704 used. 706 config: This parameter SHALL be used to indicate the configuration 707 of the corresponding MPEG-4 Visual bitstream. It SHALL NOT be 708 used to indicate the codec capability in the capability exchange 709 procedure. It is a hexadecimal representation of an octet string 710 that expresses the MPEG-4 Visual configuration information, as 711 defined in subclause 6.2.1 Start codes of [14496-2]. The 712 configuration information is mapped onto the octet string in an 713 MSB-first basis. The first bit of the configuration information 714 SHALL be located at the MSB of the first octet. The configuration 715 information indicated by this parameter SHALL be the same as the 716 configuration information in the corresponding MPEG-4 Visual 717 stream, except for first_half_vbv_occupancy and 718 latter_half_vbv_occupancy, if exist, which may vary in the 719 repeated configuration information inside an MPEG-4 Visual stream 720 (See 6.2.1 Start codes of [14496-2]). 722 Published specification: 724 The specifications for MPEG-4 Visual streams are presented in 725 [14496-2]. The RTP payload format is described in this document. 727 Encoding considerations: 729 Video bitstreams MUST be generated according to MPEG-4 Visual 730 specifications [14496-2]. A video bitstream is binary data and 731 MUST be encoded for non-binary transport (for Email, the Base64 732 encoding is sufficient). This type is also defined for transfer 733 via RTP. The RTP packets MUST be packetized according to the 734 MPEG-4 Visual RTP payload format defined in this document. 736 Security considerations: 738 See Section 9 of this document. 740 Interoperability considerations: 742 MPEG-4 Visual provides a large and rich set of tools for the 743 coding of visual objects. For effective implementation of the 744 standard, subsets of the MPEG-4 Visual tool sets have been 745 provided for use in specific applications. These subsets, called 746 'Profiles', limit the size of the tool set a decoder is required 747 to implement. In order to restrict computational complexity, one 748 or more Levels are set for each Profile. A Profile@Level 749 combination allows: 751 * a codec builder to implement only the subset of the standard he 752 needs, while maintaining interworking with other MPEG-4 devices 753 included in the same combination, and 755 * checking whether MPEG-4 devices comply with the standard 756 ('conformance testing'). 758 The visual stream SHALL be compliant with the MPEG-4 Visual 759 Profile@Level specified by the parameter "profile-level-id". 760 Interoperability between a sender and a receiver may be achieved 761 by specifying the parameter "profile-level-id", or by arranging a 762 capability exchange/announcement procedure for this parameter. 764 Applications which use this Media Type: 766 Audio and visual streaming and conferencing tools 768 Additional information: none 770 Person and email address to contact for further information: 772 See Authors' Address section at the end of this document. 774 Intended usage: COMMON 776 Author: 778 See Authors' Address section at the end of this document. 780 Change controller: 782 IETF Audio/Video Transport working group delegated from the IESG. 784 6.2. Mapping to SDP for MPEG-4 Visual 786 The Media Type video/MP4V-ES string is mapped to fields in the 787 Session Description Protocol (SDP) [RFC4566], as follows: 789 o The Media Type (video) goes in SDP "m=" as the media name. 791 o The Media subtype (MP4V-ES) goes in SDP "a=rtpmap" as the encoding 792 name. 794 o The optional parameter "rate" goes in "a=rtpmap" as the clock 795 rate. 797 o The optional parameter "profile-level-id" and "config" go in the 798 "a=fmtp" line to indicate the coder capability and configuration, 799 respectively. These parameters are expressed as a string, in the 800 form of as a semicolon separated list of parameter=value pairs. 802 Example usages for the profile-level-id parameter are: 803 1 : MPEG-4 Visual Simple Profile/Level 1 804 34 : MPEG-4 Visual Core Profile/Level 2 805 145: MPEG-4 Visual Advanced Real Time Simple Profile/Level 1 807 6.2.1. Declarative SDP Usage for MPEG-4 Visual 809 The following are some examples of media representation in SDP: 811 Simple Profile/Level 1, rate=90000(90kHz), "profile-level-id" and 812 "config" are present in "a=fmtp" line: 813 m=video 49170/2 RTP/AVP 98 814 a=rtpmap:98 MP4V-ES/90000 815 a=fmtp:98 profile-level-id=1;config=000001B001000001B50900000100000001 816 20008440FA282C2090A21F 818 Core Profile/Level 2, rate=90000(90kHz), "profile-level-id" is present 819 in "a=fmtp" line: 820 m=video 49170/2 RTP/AVP 98 821 a=rtpmap:98 MP4V-ES/90000 822 a=fmtp:98 profile-level-id=34 824 Advance Real Time Simple Profile/Level 1, rate=90000(90kHz), 825 "profile-level-id" is present in "a=fmtp" line: 826 m=video 49170/2 RTP/AVP 98 827 a=rtpmap:98 MP4V-ES/90000 828 a=fmtp:98 profile-level-id=145 830 6.3. Media Type Registration for MPEG-4 Audio 832 The receiver MUST ignore any unspecified parameter, to ensure that 833 additional parameters can be added in any future revision of this 834 specification. 836 Type name: audio 838 Subtype name: MP4A-LATM 840 Required parameters: 842 rate: the rate parameter indicates the RTP time stamp clock rate. 843 The default value is 90000. Other rates MAY be indicated only if 844 they are set to the same value as the audio sampling rate (number 845 of samples per second). 847 In the presence of SBR, the sampling rates for the core en-/ 848 decoder and the SBR tool are different in most cases. This 849 parameter SHALL therefore NOT be considered as the definitive 850 sampling rate. If this parameter is used, the server must 851 following the rules below: 853 * When the presence of SBR is not explicitly signaled by the 854 optional SDP parameters such as object parameter, profile- 855 level-id or config string, this parameter SHALL be set to the 856 core codec sampling rate. 858 * When the presence of SBR is explicitly signaled by the optional 859 SDP parameters such as object parameter, profile-level-id or 860 config string this parameter SHALL be set to the SBR sampling 861 rate. 863 NOTE: The optional parameter SBR-enabled in SDP a=fmtp is useful 864 for implicit HE AAC / HE AAC v2 signaling. But the SBR-enabled 865 parameter can also be used in the case of explicit HE AAC / HE AAC 866 v2 signaling. Therefore, its existence itself is not the criteria 867 to determine whether HE AAC / HE AAC v2 signaling is explicit or 868 not. 870 Optional parameters: 872 profile-level-id: a decimal representation of MPEG-4 Audio Profile 873 Level indication value defined in [14496-3]. This parameter 874 indicates which MPEG-4 Audio tool subsets the decoder is capable 875 of using. If this parameter is not specified in the capability 876 exchange or session setup procedure, its default value of 30 877 (Natural Audio Profile/Level 1) is used. 879 MPS-profile-level-id: a decimal representation of the MPEG 880 Surround Profile Level indication as defined in [14496-3]. This 881 parameter indicates the support of the MPEG Surround profile and 882 level by the decoder to be capable to decode the stream. 884 object: a decimal representation of the MPEG-4 Audio Object Type 885 value defined in [14496-3]. This parameter specifies the tool to 886 be used by the decoder. It CAN be used to limit the capability 887 within the specified "profile-level-id". 889 bitrate: the data rate for the audio bit stream. 891 cpresent: a boolean parameter indicates whether audio payload 892 configuration data has been multiplexed into an RTP payload (see 893 Section 5.1). A 0 indicates the configuration data has not been 894 multiplexed into an RTP payload and in this case the "config" 895 parameter MUST be present, a 1 indicates that it has. The default 896 if the parameter is omitted is 1. If this parameter is set to 1 897 and the "config" parameter is present, the multiplexed 898 configuration data and the value of the "config" parameter SHALL 899 be consistent. 901 config: a hexadecimal representation of an octet string that 902 expresses the audio payload configuration data "StreamMuxConfig", 903 as defined in [14496-3]. Configuration data is mapped onto the 904 octet string in an MSB-first basis. The first bit of the 905 configuration data SHALL be located at the MSB of the first octet. 906 In the last octet, zero-padding bits, if necessary, SHALL follow 907 the configuration data. Senders MUST set the StreamMuxConfig 908 elements taraBufferFullness and latmBufferFullness to their 909 largest respective value, indicating that buffer fullness measures 910 are not used in SDP. Receivers MUST ignore the value of these two 911 elements contained in the config parameter. 913 MPS-asc: a hexadecimal representation of an octet string that 914 expresses audio payload configuration data "AudioSpecificConfig", 915 as defined in [14496-3]. If this parameter is not present the 916 relevant signaling is performed by other means (e.g. in-band or 917 contained in the config string). 919 The same mapping rules as for the config parameter apply. 921 ptime: duration of each packet in milliseconds. 923 SBR-enabled: a boolean parameter which indicates whether SBR-data 924 can be expected in the RTP-payload of a stream. This parameter is 925 relevant for an SBR-capable decoder if the presence of SBR can not 926 be detected from an out-of-band decoder configuration (e.g. 927 contained in the config string). 929 If this parameter is set to 0, a decoder MAY expect that SBR is 930 not used. If this parameter is set to 1, a decoder CAN upsample 931 the audio data with the SBR tool, regardless whether SBR data is 932 present in the stream or not. 934 If the presence of SBR can not be detected from out-of-band 935 configuration and the SBR-enabled parameter is not present, the 936 parameter defaults to 1 for an SBR-capable decoder. If the 937 resulting output sampling rate or the computational complexity is 938 not supported, the SBR tool can be disabled or run in downsampled 939 mode. 941 The timestamp resolution at RTP layer is determined by the rate 942 parameter. 944 Published specification: 946 Encoding specifications are provided in [14496-3]. The RTP 947 payload format specification is described in this document. 949 Encoding considerations: 951 This type is only defined for transfer via RTP. 953 Security considerations: 955 See Section 9 of this document. 957 Interoperability considerations: 959 MPEG-4 Audio provides a large and rich set of tools for the coding 960 of audio objects. For effective implementation of the standard, 961 subsets of the MPEG-4 Audio tool sets similar to those used in 962 MPEG-4 Visual have been provided (see Section 6.1). 964 The audio stream SHALL be compliant with the MPEG-4 Audio Profile@ 965 Level specified by the parameters "profile-level-id" and "MPS- 966 profile-level-id". Interoperability between a sender and a 967 receiver may be achieved by specifying the parameters "profile- 968 level-id" and "MPS-profile-level-id", or by arranging in the 969 capability exchange procedure to set this parameter mutually to 970 the same value. Furthermore, the "object" parameter can be used 971 to limit the capability within the specified Profile@Level in 972 capability exchange. 974 Applications which use this media type: 976 Audio and video streaming and conferencing tools. 978 Additional information: none 980 Personal and email address to contact for further information: 982 See Authors' Address section at the end of this document. 984 Intended usage: COMMON 985 Author: 987 See Authors' Address section at the end of this document. 989 Change controller: 991 IETF Audio/Video Transport working group delegated from the IESG. 993 6.4. Mapping to SDP for MPEG-4 Audio 995 The Media Type audio/MP4A-LATM string is mapped to fields in the 996 Session Description Protocol (SDP) [RFC4566], as follows: 998 o The Media Type (audio) goes in SDP "m=" as the media name. 1000 o The Media subtype (MP4A-LATM) goes in SDP "a=rtpmap" as the 1001 encoding name. 1003 o The required parameter "rate" goes in "a=rtpmap" as the clock 1004 rate. 1006 o The optional parameter "ptime" goes in SDP "a=ptime" attribute. 1008 o The optional parameters "profile-level-id", "MPS-profile-level-id" 1009 and "object" goes in the "a=fmtp" line to indicate the coder 1010 capability. 1012 Followings are some examples of the profile-level-id value: 1013 1 : Main Audio Profile Level 1 1014 9 : Speech Audio Profile Level 1 1015 15: High Quality Audio Profile Level 2 1016 30: Natural Audio Profile Level 1 1017 44: High Efficiency AAC Profile Level 2 1018 48: High Efficiency AAC v2 Profile Level 2 1019 55: Baseline MPEG Surround Profile (see ISO/IEC 23003-1) Level 3 1021 The optional payload-format-specific parameters "bitrate", 1022 "cpresent", "config", "MPS-asc" and "SBR-enabled" go also in the 1023 "a=fmtp" line. These parameters are expressed as a string, in the 1024 form of as a semicolon separated list of parameter=value pairs. 1026 6.4.1. Declarative SDP Usage for MPEG-4 Audio 1028 The following sections contain some examples of the media 1029 representation in SDP. 1031 Note that the a=fmtp line in some of the examples has been wrapped to 1032 fit the page; they would comprise a single line in the SDP file. 1034 6.4.1.1. Example: In-band Configuration 1036 In this example the audio configuration data appears in the RTP 1037 payload exclusively (i.e., the MPEG-4 audio configuration is known 1038 when a StreamMuxConfig element appears within the RTP payload). 1040 m=audio 49230 RTP/AVP 96 1041 a=rtpmap:96 MP4A-LATM/90000 1042 a=fmtp:96 object=2; cpresent=1 1044 The "clock rate" is set to 90kHz. This is the default value and the 1045 real audio sampling rate is known when the audio configuration data 1046 is received. 1048 6.4.1.2. Example: 6kb/s CELP 1050 6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz) 1052 m=audio 49230 RTP/AVP 96 1053 a=rtpmap:96 MP4A-LATM/8000 1054 a=fmtp:96 profile-level-id=9; object=8; cpresent=0; 1055 config=40008B18388380 1056 a=ptime:20 1058 In this example audio configuration data is not multiplexed into the 1059 RTP payload and is described only in SDP. Furthermore, the "clock 1060 rate" is set to the audio sampling rate. 1062 6.4.1.3. Example: 64 kb/s AAC LC Stereo 1064 64 kb/s AAC LC stereo bitstream (with an audio sampling rate of 24 1065 kHz) 1067 m=audio 49230 RTP/AVP 96 1068 a=rtpmap:96 MP4A-LATM/24000/2 1069 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 1070 object=2; config=400026203fc0 1072 In this example audio configuration data is not multiplexed into the 1073 RTP payload and is described only in SDP. Furthermore, the "clock 1074 rate" is set to the audio sampling rate. 1076 In this example, the presence of SBR can not be determined by the SDP 1077 parameter set. The clock rate represents the core codec sampling 1078 rate. An SBR enabled decoder can use the SBR tool to upsample the 1079 audio data if complexity and resulting output sampling rate permits. 1081 6.4.1.4. Example: Use of the SBR-enabled Parameter 1083 These two examples are identical to the example above with the 1084 exception of the SBR-enabled parameter. The presence of SBR is not 1085 signaled by the SDP parameters object, profile-level-id and config, 1086 but instead the SBR-enabled parameter is present. The rate parameter 1087 and the StreamMuxConfig contain the core codec sampling rate. 1089 Example with "SBR-enabled=0", definitive and core codec sampling rate 1090 24kHz: 1092 m=audio 49230 RTP/AVP 96 1093 a=rtpmap:96 MP4A-LATM/24000/2 1094 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 1095 SBR-enabled=0; config=400026203fc0 1097 Example with "SBR-enabled=1", core codec sampling rate 24kHz, 1098 definitive and SBR sampling rate 48kHz: 1100 m=audio 49230 RTP/AVP 96 1101 a=rtpmap:96 MP4A-LATM/24000/2 1102 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 1103 SBR-enabled=1; config=400026203fc0 1105 In this example, the clock rate is still 24000 and this information 1106 is used for RTP timestamp calculation. The value of 24000 is used to 1107 support old AAC decoders. This makes the decoder supporting only AAC 1108 understand the HE AAC coded data, although only plain AAC is 1109 supported. A HE AAC decoder is able to generate output data with the 1110 SBR sampling rate. 1112 6.4.1.5. Example: Hierarchical Signaling of SBR 1114 When the presence of SBR is explicitly signaled by the SDP parameters 1115 object, profile-level-id or the config string as in the example 1116 below, the StreamMuxConfig contains both the core codec sampling rate 1117 and the SBR sampling rate. 1119 m=audio 49230 RTP/AVP 96 1120 a=rtpmap:96 MP4A-LATM/48000/2 1121 a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0; 1122 config=40005623101fe0; SBR-enabled=1 1124 This config string uses the explicit signaling mode 2.A (hierarchical 1125 signaling; See [14496-3]. This means that the AOT(Audio Object Type) 1126 is SBR(5) and SFI(Sampling Frequency Index) is 6(24000 Hz) which 1127 refers to the underlying core codec sampling frequency. CC(Channel 1128 Configuration) is stereo(2), and the ESFI(Extension Sampling 1129 Frequency Index)=3 (48000) is referring to the sampling frequency of 1130 the extension tool(SBR). 1132 6.4.1.6. Example: HE AAC v2 Signaling 1134 HE AAC v2 decoders are required to always produce a stereo signal 1135 from a mono signal. Hence, there is no parameter necessary to signal 1136 the presence of PS. 1138 Example with "SBR-enabled=1" and 1 channel signaled in the a=rtpmap 1139 line and within the config parameter. Core codec sampling rate is 1140 24kHz, definitive and SBR sampling rate is 48kHz. Core codec channel 1141 configuration is mono, PS channel configuration is stereo. 1143 m=audio 49230 RTP/AVP 110 1144 a=rtpmap:110 MP4A-LATM/24000/1 1145 a=fmtp:110 profile-level-id=15; object=2; cpresent=0; 1146 config=400026103fc0; SBR-enabled=1 1148 6.4.1.7. Example: Hierarchical Signaling of PS 1150 Example: 48khz stereo audio input: 1152 m=audio 49230 RTP/AVP 110 1153 a=rtpmap:110 MP4A-LATM/48000/2 1154 a=fmtp:110 profile-level-id=48; cpresent=0; config=4001d613101fe0 1156 The config parameter indicates explicit hierarchical signaling of PS 1157 and SBR. This configuration method is not supported by legacy AAC an 1158 HE AAC decoders and these are therefore unable to decode the the 1159 coded data. 1161 6.4.1.8. Example: MPEG Surround 1163 The following examples show how MPEG Surround configuration data can 1164 be signaled using SDP. The configuration is carried within the 1165 config string in the first example by using two different layers. 1166 The general parameters in this example are: AudioMuxVersion=1; 1167 allStreamsSameTimeFraming=1; numSubFrames=0; numProgram=0; 1168 numLayer=1. The first layer describes the HE AAC payload and signals 1169 the following parameters: ascLen=25; audioObjectType=2 (AAC LC); 1170 extensionAudioObjectType=5 (SBR); samplingFrequencyIndex=6 (24kHz); 1171 extensionSamplingFrequencyIndex=3 (48kHz); channelConfiguration=2 1172 (2.0 channels). The second layer describes the MPEG surround payload 1173 and specifies the following parameters: ascLen=110; 1174 AudioObjectType=30 (MPEG Surround); samplingFrequencyIndex=3 (48kHz); 1175 channelConfiguration=6 (5.1 channels); sacPayloadEmbedding=1; 1176 SpatialSpecificConfig=(48 kHz; 32 slots; 525 tree; ResCoding=1; 1177 ResBands=[7,7,7,7]). 1179 In this example the signaling is carried by using two different LATM 1180 layers. The MPEG surround payload is carried together with the AAC 1181 payload in a single layer as indicated by the sacPayloadEmbedding 1182 Flag. 1184 m=audio 49230 RTP/AVP 96 1185 a=rtpmap:96 MP4A-LATM/48000 1186 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 1187 SBR-enabled=1; 1188 config=8FF8004192B11880FF0DDE3699F2408C00536C02313CF3CE0FF0 1190 6.4.1.9. Example: MPEG Surround with Extended SDP Parameters 1192 The following example is an extension of the configuration given 1193 above by the MPEG Surround specific parameters. The MPS-asc 1194 parameter specifies the MPEG Surround Baseline Profile at Level 3 1195 (PLI55) and the MPS-asc string contains the hexadecimal 1196 representation of the MPEG Surround ASC [audioObjectType=30 (MPEG 1197 Surround); samplingFrequencyIndex=0x3 (48kHz); channelConfiguration=6 1198 (5.1 channels); sacPayloadEmbedding=1; SpatialSpecificConfig=(48 kHz; 1199 32 slots; 525 tree; ResCoding=1; ResBands=[0,13,13,13])]. 1201 m=audio 49230 RTP/AVP 96 1202 a=rtpmap:96 MP4A-LATM/48000 1203 a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0; 1204 config=40005623101fe0; MPS-profile-level-id=55; 1205 MPS-asc=F1B4CF920442029B501185B6DA00; 1207 6.4.1.10. Example: MPEG Surround with Single Layer Configuration 1209 The following example shows how MPEG Surround configuration data can 1210 be signaled using the SDP config parameter. The configuration is 1211 carried within the config string using a single layer. The general 1212 parameters in this example are: AudioMuxVersion=1; 1213 allStreamsSameTimeFraming=1; numSubFrames=0; numProgram=0; 1214 numLayer=0. The single layer describes the combination of HE AAC and 1215 MPEG Surround payload and signals the following parameters: 1216 ascLen=101; audioObjectType=2 (AAC LC); extensionAudioObjectType=5 1217 (SBR); samplingFrequencyIndex=7 (22.05kHz); 1218 extensionSamplingFrequencyIndex=7 (44.1kHz); channelConfiguration=2 1219 (2.0 channels). A backward compatible extension according to 1220 [14496-3/Amd.1] signals the presence of MPEG surround payload data 1221 and specifies the following parameters: SpatialSpecificConfig=(44.1 1222 kHz; 32 slots; 525 tree; ResCoding=0). 1224 In this example the signaling is carried by using a single LATM 1225 layer. The MPEG surround payload is carried together with the HE AAC 1226 payload in a single layer. 1228 m=audio 49230 RTP/AVP 96 1229 a=rtpmap:96 MP4A-LATM/44100 1230 a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0; 1231 SBR-enabled=1; config=8FF8000652B920876A83A1F440884053620FF0; 1232 MPS-profile-level-id=55 1234 7. IANA Considerations 1236 This document updates the media subtypes "MP4A-LATM" and "MP4V-ES" 1237 from RFC 3016. The new registrations are in Section 6.1 and 1238 Section 6.3 of this document. 1240 8. Acknowledgements 1242 The authors would like to thank Yoshihiro Kikuchi, Yoshinori Matsui, 1243 Toshiyuki Nomura, Shigeru Fukunaga and Hideaki Kimata for their work 1244 on RFC 3016, and Ali Begen, Keith Drage, Roni Even and Qin Wu for 1245 their valuable input and comments on this document. 1247 9. Security Considerations 1249 RTP packets using the payload format defined in this specification 1250 are subject to the security considerations discussed in the RTP 1251 specification [RFC3550], and in any applicable RTP profile. The main 1252 security considerations for the RTP packet carrying the RTP payload 1253 format defined within this document are confidentiality, integrity, 1254 and source authenticity. Confidentiality is achieved by encryption 1255 of the RTP payload, and integrity of the RTP packets through a 1256 suitable cryptographic integrity protection mechanism. A 1257 cryptographic system may also allow the authentication of the source 1258 of the payload. A suitable security mechanism for this RTP payload 1259 format should provide confidentiality, integrity protection, and at 1260 least source authentication capable of determining whether or not an 1261 RTP packet is from a member of the RTP session. 1263 Note that most MPEG-4 codecs define an extension mechanism to 1264 transmit extra data within a stream that is gracefully skipped by 1265 decoders that do not support this extra data. This covert channel 1266 may be used to transmit unwanted data in an otherwise valid stream. 1267 The appropriate mechanism to provide security to RTP and payloads 1268 following this may vary. It is dependent on the application, the 1269 transport, and the signaling protocol employed. Therefore, a single 1270 mechanism is not sufficient, although if suitable, the usage of the 1271 Secure Real-time Transport Protocol (SRTP) [RFC3711] is recommended. 1272 Other mechanisms that may be used are IPsec [RFC4301] and Transport 1273 Layer Security (TLS) [RFC5246] (e.g., for RTP over TCP), but other 1274 alternatives may also exist. 1276 This RTP payload format and its media decoder do not exhibit any 1277 significant non-uniformity in the receiver-side computational 1278 complexity for packet processing, and thus are unlikely to pose a 1279 denial-of-service threat due to the receipt of pathological data. 1280 The complete MPEG-4 system allows for transport of a wide range of 1281 content, including Java applets (MPEG-J) and scripts. Since this 1282 payload format is restricted to audio and video streams, it is not 1283 possible to transport such active content in this format. 1285 10. Differences to RFC 3016 1287 The RTP payload format for MPEG-4 Audio as specified in RFC 3016 is 1288 used by the 3GPP PSS service [3GPP]. However, there are some 1289 misalignments between RFC 3016 and the 3GPP PSS specification that 1290 are addressed by this update: 1292 o The audio payload format (LATM) referenced in this document is 1293 binary compatible to the format used in [3GPP]. 1295 o The audio signaling format (StreamMuxConfig) referenced in this 1296 document is binary compatible to the format used in [3GPP]. 1298 o The use of an audio parameter "SBR-enabled" is now defined in this 1299 document, which is used by 3GPP implementations [3GPP]. 1301 o The rate parameter is defined unambiguously in this document for 1302 the case of presence of SBR (Spectral Band Replication) 1304 o The number of audio channels parameter is defined unambiguously in 1305 this document for the case of presence of PS (Parametric Stereo) 1307 Furthermore some comments have been addressed and signaling support 1308 for MPEG surround [23003-1] was added. 1310 11. References 1312 11.1. Normative References 1314 [14496-2] MPEG, "ISO/IEC International Standard 14496-2 - Coding of 1315 audio-visual objects, Part 2: Visual", 2003. 1317 [14496-3] MPEG, "ISO/IEC International Standard 14496-3 - Coding of 1318 audio-visual objects, Part 3 Audio", 2009. 1320 [14496-3/Amd.1] 1321 MPEG, "ISO/IEC International Standard 14496-3 - Coding of 1322 audio-visual objects, Part 3: Audio, Amendment 1: HD-AAC 1323 profile and MPEG Surround signaling", 2009. 1325 [23003-1] MPEG, "ISO/IEC International Standard 23003-1 - MPEG 1326 Surround (MPEG D)", 2007. 1328 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1329 Requirement Levels", BCP 14, RFC 2119, March 1997. 1331 [RFC3016] Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and H. 1332 Kimata, "RTP Payload Format for MPEG-4 Audio/Visual 1333 Streams", RFC 3016, November 2000. 1335 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1336 Jacobson, "RTP: A Transport Protocol for Real-Time 1337 Applications", STD 64, RFC 3550, July 2003. 1339 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 1340 Registration Procedures", BCP 13, RFC 4288, December 2005. 1342 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1343 Description Protocol", RFC 4566, July 2006. 1345 [RFC4629] Ott, H., Bormann, C., Sullivan, G., Wenger, S., and R. 1346 Even, "RTP Payload Format for ITU-T Rec", RFC 4629, 1347 January 2007. 1349 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 1350 Formats", RFC 4855, February 2007. 1352 [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding 1353 Dependency in the Session Description Protocol (SDP)", 1354 RFC 5583, July 2009. 1356 11.2. Informative References 1358 [14496-1] MPEG, "ISO/IEC International Standard 14496-1 - Coding of 1359 audio-visual objects, Part 1 Systems", 2004. 1361 [14496-12] 1362 MPEG, "ISO/IEC International Standard 14496-12 - Coding of 1363 audio-visual objects, Part 12 ISO base media file format". 1365 [14496-14] 1366 MPEG, "ISO/IEC International Standard 14496-14 - Coding of 1367 audio-visual objects, Part 12 MP4 file format". 1369 [3GPP] 3GPP, "3rd Generation Partnership Project; Technical 1370 Specification Group Services and System Aspects; 1371 Transparent end-to-end Packet-switched Streaming Service 1372 (PSS); Protocols and codecs (Release 9)", 3GPP TS 26.234 1373 V9.5.0, December 2010. 1375 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1376 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1377 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1378 September 1997. 1380 [RFC3640] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., 1381 and P. Gentric, "RTP Payload Format for Transport of 1382 MPEG-4 Elementary Streams", RFC 3640, November 2003. 1384 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1385 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1386 RFC 3711, March 2004. 1388 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1389 Internet Protocol", RFC 4301, December 2005. 1391 [RFC4628] Even, R., "RTP Payload Format for H.263 Moving RFC 2190 to 1392 Historic Status", RFC 4628, January 2007. 1394 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 1395 Correction", RFC 5109, December 2007. 1397 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1398 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 1400 [RFC5691] de Bont, F., Doehla, S., Schmidt, M., and R. 1401 Sperschneider, "RTP Payload Format for Elementary Streams 1402 with MPEG Surround Multi-Channel Audio", RFC 5691, 1403 October 2009. 1405 Authors' Addresses 1407 Malte Schmidt 1408 Dolby Laboratories 1409 Deutschherrnstr. 15-19 1410 90537 Nuernberg, 1411 DE 1413 Phone: +49 911 928 91 42 1414 Email: malte.schmidt@dolby.com 1416 Frans de Bont 1417 Philips Electronics 1418 High Tech Campus 5 1419 5656 AE Eindhoven, 1420 NL 1422 Phone: +31 40 2740234 1423 Email: frans.de.bont@philips.com 1425 Stefan Doehla 1426 Fraunhofer IIS 1427 Am Wolfmantel 33 1428 91058 Erlangen, 1429 DE 1431 Phone: +49 9131 776 6042 1432 Email: stefan.doehla@iis.fraunhofer.de 1434 Jaehwan Kim 1435 LG Electronics Inc. 1436 221, Yangjae-dong, Seocho-gu 1437 Seoul 137-130, 1438 Korea 1440 Phone: +82 10 6225 0619 1441 Email: kjh1905m@naver.com