idnits 2.17.1 draft-schmidt-avt-rfc3016bis-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC3016, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 21, 2009) is 5212 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '14496-2' -- Possible downref: Non-RFC (?) normative reference: ref. '14496-3' -- Possible downref: Non-RFC (?) normative reference: ref. '23003-1' ** Obsolete normative reference: RFC 3016 (Obsoleted by RFC 6416) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 AVT M. Schmidt 3 Internet-Draft Dolby Laboratories 4 Obsoletes: 3016 (if approved) F. de Bont 5 Intended status: Standards Track Philips Electronics 6 Expires: June 24, 2010 S. Doehla 7 Fraunhofer IIS 8 Jaehwan. Kim 9 Vidiator (Korea) Inc. 10 December 21, 2009 12 RTP Payload Format for MPEG-4 Audio/Visual Streams 13 draft-schmidt-avt-rfc3016bis-03.txt 15 Abstract 17 This document describes Real-Time Transport Protocol (RTP) payload 18 formats for carrying each of MPEG-4 Audio and MPEG-4 Visual 19 bitstreams without using MPEG-4 Systems. For the purpose of directly 20 mapping MPEG-4 Audio/Visual bitstreams onto RTP packets, it provides 21 specifications for the use of RTP header fields and also specifies 22 fragmentation rules. It also provides specifications for Media Type 23 registration and the use of Session Description Protocol (SDP). 25 Comments are solicited and should be addressed to the working group's 26 mailing list at avt@ietf.org and/or the author(s). 28 Status of this Memo 30 This Internet-Draft is submitted to IETF in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF), its areas, and its working groups. Note that 35 other groups may also distribute working documents as Internet- 36 Drafts. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 The list of current Internet-Drafts can be accessed at 44 http://www.ietf.org/ietf/1id-abstracts.txt. 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html. 49 This Internet-Draft will expire on June 24, 2010. 51 Copyright Notice 53 Copyright (c) 2009 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 1.1. MPEG-4 Visual RTP payload format . . . . . . . . . . . . . 4 70 1.2. MPEG-4 Audio RTP payload format . . . . . . . . . . . . . 5 71 1.3. Differences to RFC 3016 . . . . . . . . . . . . . . . . . 6 72 2. Definitions and Abbreviations . . . . . . . . . . . . . . . . 7 73 3. RTP Packetization of MPEG-4 Visual bitstream . . . . . . . . . 8 74 3.1. Use of RTP header fields for MPEG-4 Visual . . . . . . . . 9 75 3.2. Fragmentation of MPEG-4 Visual bitstream . . . . . . . . . 10 76 3.3. Examples of packetized MPEG-4 Visual bitstream . . . . . . 11 77 4. RTP Packetization of MPEG-4 Audio bitstream . . . . . . . . . 15 78 4.1. RTP Packet Format . . . . . . . . . . . . . . . . . . . . 15 79 4.2. Use of RTP Header Fields for MPEG-4 Audio . . . . . . . . 16 80 4.3. Fragmentation of MPEG-4 Audio bitstream . . . . . . . . . 17 81 5. Media Type registration for MPEG-4 Audio/Visual streams . . . 17 82 5.1. Media Type registration for MPEG-4 Visual . . . . . . . . 17 83 5.2. SDP usage of MPEG-4 Visual . . . . . . . . . . . . . . . . 19 84 5.3. Media Type registration of MPEG-4 Audio . . . . . . . . . 20 85 5.4. SDP usage of MPEG-4 Audio . . . . . . . . . . . . . . . . 24 86 5.4.1. Example: In-band configuration . . . . . . . . . . . . 24 87 5.4.2. Example: 6kb/s CELP . . . . . . . . . . . . . . . . . 24 88 5.4.3. Example: 64 kb/s AAC LC stereo . . . . . . . . . . . . 25 89 5.4.4. Example: Use of the SBR-enabled parameter . . . . . . 25 90 5.4.5. Example: Hierarchical Signaling of SBR . . . . . . . . 26 91 5.4.6. Example: HE AAC v2 Signaling . . . . . . . . . . . . . 26 92 5.4.7. Example: Hierarchical Signaling of PS . . . . . . . . 27 93 5.4.8. Example: MPEG Surround . . . . . . . . . . . . . . . . 27 94 5.4.9. Example: MPEG Surround with extended SDP parameters . 27 95 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 96 6.1. Media Type Registration . . . . . . . . . . . . . . . . . 28 97 6.2. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . . 28 98 7. Security Considerations . . . . . . . . . . . . . . . . . . . 29 99 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 100 8.1. Normative References . . . . . . . . . . . . . . . . . . . 29 101 8.2. Informative References . . . . . . . . . . . . . . . . . . 30 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 31 104 1. Introduction 106 The RTP payload formats described in this document specify how MPEG-4 107 Audio [14496-3] and MPEG-4 Visual streams [14496-2] [14496-2/Amd.1] 108 are to be fragmented and mapped directly onto RTP packets. 110 These RTP payload formats enable transport of MPEG-4 Audio/Visual 111 streams without using the synchronization and stream management 112 functionality of MPEG-4 Systems [14496-1]. Such RTP payload formats 113 will be used in systems that have intrinsic stream management 114 functionality and thus require no such functionality from MPEG-4 115 Systems. H.323 terminals are an example of such systems, where 116 MPEG-4 Audio/Visual streams are not managed by MPEG-4 Systems Object 117 Descriptors but by H.245. The streams are directly mapped onto RTP 118 packets without using the MPEG-4 Systems Sync Layer. Other examples 119 are SIP and RTSP where Media Type and SDP are used. Media Type and 120 SDP usages of the RTP payload formats described in this document are 121 defined to directly specify the attribute of Audio/Visual streams 122 (e.g., media type, packetization format and codec configuration) 123 without using MPEG-4 Systems. The obvious benefit is that these 124 MPEG-4 Audio/Visual RTP payload formats can be handled in an unified 125 way together with those formats defined for non-MPEG-4 codecs. The 126 disadvantage is that interoperability with environments using MPEG-4 127 Systems may be difficult, hence, other payload formats may be better 128 suited to those applications. 130 The semantics of RTP headers in such cases need to be clearly 131 defined, including the association with MPEG-4 Audio/Visual data 132 elements. In addition, it is beneficial to define the fragmentation 133 rules of RTP packets for MPEG-4 Video streams so as to enhance error 134 resiliency by utilizing the error resilience tools provided inside 135 the MPEG-4 Video stream. 137 1.1. MPEG-4 Visual RTP payload format 139 MPEG-4 Visual is a visual coding standard with many new features: 140 high coding efficiency; high error resiliency; multiple, arbitrary 141 shape object-based coding; etc. [14496-2]. It covers a wide range of 142 bitrate from scores of Kbps to several Mbps. It also covers a wide 143 variety of networks, ranging from those guaranteed to be almost 144 error-free to mobile networks with high error rates. 146 With respect to the fragmentation rules for an MPEG-4 Visual 147 bitstream defined in this document, since MPEG-4 Visual is used for a 148 wide variety of networks, it is desirable not to apply too much 149 restriction on fragmentation, and a fragmentation rule such as "a 150 single video packet shall always be mapped on a single RTP packet" 151 may be inappropriate. On the other hand, careless, media unaware 152 fragmentation may cause degradation in error resiliency and bandwidth 153 efficiency. The fragmentation rules described in this document are 154 flexible but manage to define the minimum rules for preventing 155 meaningless fragmentation while utilizing the error resilience 156 functionalities of MPEG-4 Visual. 158 The fragmentation rule recommends not to map more than one VOP in an 159 RTP packet so that the RTP timestamp uniquely indicates the VOP time 160 framing. On the other hand, MPEG-4 video may generate VOPs of very 161 small size, in cases with an empty VOP (vop_coded=0) containing only 162 VOP header or an arbitrary shaped VOP with a small number of coding 163 blocks. To reduce the overhead for such cases, the fragmentation 164 rule permits concatenating multiple VOPs in an RTP packet. (See 165 fragmentation rule (4) in section 3.2 and marker bit and timestamp in 166 section 3.1.) 168 While the additional media specific RTP header defined for such video 169 coding tools as H.261 or MPEG-1/2 is effective in helping to recover 170 picture headers corrupted by packet losses, MPEG-4 Visual has already 171 error resilience functionalities for recovering corrupt headers, and 172 these can be used on RTP/IP networks as well as on other networks 173 (H.223/mobile, MPEG-2/TS, etc.). Therefore, no extra RTP header 174 fields are defined in this MPEG-4 Visual RTP payload format. 176 1.2. MPEG-4 Audio RTP payload format 178 MPEG-4 Audio is an audio standard that integrates many different 179 types of audio coding tools. Low-overhead MPEG-4 Audio Transport 180 Multiplex (LATM) manages the sequences of audio data with relatively 181 small overhead. In audio-only applications, then, it is desirable 182 for LATM-based MPEG-4 Audio bitstreams to be directly mapped onto RTP 183 packets without using MPEG-4 Systems. 185 While LATM has several multiplexing features as follows; 187 o Carrying configuration information with audio data, 189 o Concatenation of multiple audio frames in one audio stream, 191 o Multiplexing multiple objects (programs), 193 o Multiplexing scalable layers, 195 in RTP transmission there is no need for the last two features. 196 Therefore, these two features MUST NOT be used in applications based 197 on RTP packetization specified by this document. Since LATM has been 198 developed for only natural audio coding tools, i.e., not for 199 synthesis tools, it seems difficult to transmit Structured Audio (SA) 200 data and Text to Speech Interface (TTSI) data by LATM. Therefore, SA 201 data and TTSI data MUST NOT be transported by the RTP packetization 202 in this document. 204 For transmission of scalable streams, audio data of each layer SHOULD 205 be packetized onto different RTP streams allowing for the different 206 layers to be treated differently at the IP level, for example via 207 some means of differentiated service. On the other hand, all 208 configuration data of the scalable streams are contained in one LATM 209 configuration data "StreamMuxConfig" and every scalable layer shares 210 the StreamMuxConfig. The mapping between each layer and its 211 configuration data is achieved by LATM header information attached to 212 the audio data. In order to indicate the dependency information of 213 the scalable streams, the signaling mechanism as specified in 214 [RFC5583] SHOULD be used (see section 4.2). 216 For MPEG-4 Audio coding tools, as is true for other audio coders, if 217 the payload is a single audio frame, packet loss will not impair the 218 decodability of adjacent packets. Therefore, the additional media 219 specific header for recovering errors will not be required for MPEG-4 220 Audio. Existing RTP protection mechanisms, such as Generic Forward 221 Error Correction (RFC 5109 [RFC5109]) and Redundant Audio Data (RFC 222 2198 [RFC2198]), MAY be applied to improve error resiliency. 224 1.3. Differences to RFC 3016 226 The RTP payload format for MPEG-4 Audio as specified in RFC 3016 is 227 used by the 3GPP PSS service [3GPP]. However, there are some 228 misalignments between RFC 3016 and the 3GPP PSS specification that 229 are addressed by this update: 231 o The audio payload format (LATM) referenced in RFC 3016 is binary 232 incompatible to the format used in 3GPP. 234 o The audio signaling format (StreamMuxConfig) referenced in RFC 235 3016 is binary incompatible to the format used in 3GPP. 237 o The audio parameter "SBR-enabled" is not defined within RFC 3016 238 but used by 3GPP 240 o The rate parameter specification is ambiguous in the presence of 241 SBR (Spectral Band Replication) 243 o The number of audio channel parameter specification is ambiguous 244 in the presence of PS (Parametric Stereo) 246 Furthermore some comments have been addressed and signaling support 247 for MPEG surround [23003-1] was added. It should be noted that the 248 audio payload format described here has some known limitations. For 249 new system designs RFC 3640 [RFC3640] is recommended. 251 2. Definitions and Abbreviations 253 This memo makes use of terms, specified in [14496-2], [14496-3], and 254 [23003-1]. In addition, the following terms are used in this 255 document and have specific meaning within the context of this 256 document. 258 Core codec sampling rate: 260 Audio codec sampling rate. When SBR (Spectral Band Replication) 261 is used, typically the double value of this will be regarded as 262 the definitive sampling rate (i.e., the decoder's output sampling 263 rate) 265 Note: The exception is downsampled SBR mode in which the SBR 266 sampling rate equals the core codec sampling rate. 268 Core codec channel configuration: 270 Audio codec channel configuration. When PS (Parametric Stereo) is 271 used, the core codec channel configuration indicates one channel 272 (i.e., mono) whereas the definitive channel configuration is two 273 channels (i.e. stereo). When MPEG Surround is used, the 274 definitive channel configuration depends on the output of the MPEG 275 Surround decoder. 277 SBR sampling rate: 279 When SBR is used, typically the sampling rate is the double value 280 of the core codec sampling rate, with the exception of downsampled 281 SBR mode, where the SBR sampling rate and core codec sampling rate 282 are identical. 284 Abbreviations: 286 AAC: Advanced Audio Coding 288 ASC: AudioSpecificConfig 290 HE AAC: High Efficiency AAC 292 LATM: Low-overhead MPEG-4 Audio Transport Multiplex 293 PS: Parametric Stereo 295 SBR: Spectral Band Replication 297 VOP: Video Object Plane 299 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 300 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 301 document are to be interpreted as described in [RFC2119]. 303 3. RTP Packetization of MPEG-4 Visual bitstream 305 This section specifies RTP packetization rules for MPEG-4 Visual 306 content. An MPEG-4 Visual bitstream is mapped directly onto RTP 307 packets without the addition of extra header fields or any removal of 308 Visual syntax elements. The Combined Configuration/Elementary stream 309 mode MUST be used so that configuration information will be carried 310 to the same RTP port as the elementary stream. (see 6.2.1 "Start 311 codes" of ISO/IEC 14496-2 [14496-2] [14496-2/Cor.1] [14496-2/Amd.1]) 312 The configuration information MAY additionally be specified by some 313 out-of-band means. If needed for an H.323 terminal, H.245 codepoint 314 "decoderConfigurationInformation" MUST be used for this purpose. If 315 needed by systems using Media Type parameters and SDP parameters, 316 e.g., SIP and RTSP, the optional parameter "config" MUST be used to 317 specify the configuration information (see 5.1 and 5.2). 319 When the short video header mode is used, the RTP payload format for 320 H.263 SHOULD be used (the format defined in RFC 4629 [RFC4629] is 321 RECOMMENDED, but the RFC 4628 [RFC4628] format MAY be used for 322 compatibility with older implementations). 324 0 1 2 3 325 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 326 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 327 |V=2|P|X| CC |M| PT | sequence number | RTP 328 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 329 | timestamp | Header 330 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 331 | synchronization source (SSRC) identifier | 332 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 333 | contributing source (CSRC) identifiers | 334 | .... | 335 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 336 | | RTP 337 | MPEG-4 Visual stream (byte aligned) | Pay- 338 | | load 339 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 340 | :...OPTIONAL RTP padding | 341 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 343 Figure 1 - An RTP packet for MPEG-4 Visual stream 345 3.1. Use of RTP header fields for MPEG-4 Visual 347 Payload Type (PT): The assignment of an RTP payload type for this 348 packet format is outside the scope of this document, and will not be 349 specified here. It is expected that the RTP profile for a particular 350 class of applications will assign a payload type for this encoding, 351 or if that is not done then a payload type in the dynamic range SHALL 352 be chosen by means of an out-of-band signaling protocol (e.g., H.245, 353 SIP, etc). 355 Extension (X) bit: Defined by the RTP profile used. 357 Sequence Number: Incremented by one for each RTP data packet sent, 358 starting, for security reasons, with a random initial value. 360 Marker (M) bit: The marker bit is set to one to indicate the last RTP 361 packet (or only RTP packet) of a VOP. When multiple VOPs are carried 362 in the same RTP packet, the marker bit is set to one. 364 Timestamp: The timestamp indicates the sampling instance of the VOP 365 contained in the RTP packet. A constant offset, which is random, is 366 added for security reasons. 368 o When multiple VOPs are carried in the same RTP packet, the 369 timestamp indicates the earliest of the VOP times within the VOPs 370 carried in the RTP packet. Timestamp information of the rest of 371 the VOPs are derived from the timestamp fields in the VOP header 372 (modulo_time_base and vop_time_increment). 374 o If the RTP packet contains only configuration information and/or 375 Group_of_VideoObjectPlane() fields, the timestamp of the next VOP 376 in the coding order is used. 378 o If the RTP packet contains only visual_object_sequence_end_code 379 information, the timestamp of the immediately preceding VOP in the 380 coding order is used. 382 The resolution of the timestamp is set to its default value of 90kHz, 383 unless specified by an out-of-band means (e.g., SDP parameter or 384 Media Type parameter as defined in section 5). 386 Other header fields are used as described in RFC 3550 [RFC3550]. 388 3.2. Fragmentation of MPEG-4 Visual bitstream 390 A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP 391 payload without any addition of extra header fields or any removal of 392 Visual syntax elements. The Combined Configuration/Elementary 393 streams mode is used. The following rules apply for the 394 fragmentation. 396 In the following, header means one of the following: 398 o Configuration information (Visual Object Sequence Header, Visual 399 Object Header and Video Object Layer Header) 401 o visual_object_sequence_end_code 403 o The header of the entry point function for an elementary stream 404 (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(), 405 video_plane_with_short_header(), MeshObject() or FaceObject()) 407 o The video packet header (video_packet_header() excluding 408 next_resync_marker()) 410 o The header of gob_layer() 412 o See 6.2.1 "Start codes" of ISO/IEC 14496-2 [14496-2] [14496-2/ 413 Cor.1] [14496-2/Amd.1] for the definition of the configuration 414 information and the entry point functions. 416 (1) Configuration information and Group_of_VideoObjectPlane() fields 417 SHALL be placed at the beginning of the RTP payload (just after the 418 RTP header) or just after the header of the syntactically upper layer 419 function. 421 (2) If one or more headers exist in the RTP payload, the RTP payload 422 SHALL begin with the header of the syntactically highest function. 423 Note: The visual_object_sequence_end_code is regarded as the lowest 424 function. 426 (3) A header SHALL NOT be split into a plurality of RTP packets. 428 (4) Different VOPs SHOULD be fragmented into different RTP packets so 429 that one RTP packet consists of the data bytes associated with a 430 unique VOP time instance (that is indicated in the timestamp field in 431 the RTP packet header), with the exception that multiple consecutive 432 VOPs MAY be carried within one RTP packet in the decoding order if 433 the size of the VOPs is small. 435 Note: When multiple VOPs are carried in one RTP payload, the 436 timestamp of the VOPs after the first one may be calculated by the 437 decoder. This operation is necessary only for RTP packets in which 438 the marker bit equals to one and the beginning of RTP payload 439 corresponds to a start code. (See timestamp and marker bit in 440 section 3.1.) 442 (5) It is RECOMMENDED that a single video packet is sent as a single 443 RTP packet. The size of a video packet SHOULD be adjusted in such a 444 way that the resulting RTP packet is not larger than the path-MTU. 445 Note: Rule (5) does not apply when the video packet is disabled by 446 the coder configuration (by setting resync_marker_disable in the VOL 447 header to 1), or in coding tools where the video packet is not 448 supported. In this case, a VOP MAY be split at arbitrary byte- 449 positions. 451 The video packet starts with the VOP header or the video packet 452 header, followed by motion_shape_texture(), and ends with 453 next_resync_marker() or next_start_code(). 455 3.3. Examples of packetized MPEG-4 Visual bitstream 457 Figure 2 shows examples of RTP packets generated based on the 458 criteria described in 3.2 460 (a) is an example of the first RTP packet or the random access point 461 of an MPEG-4 Visual bitstream containing the configuration 462 information. According to criterion (1), the Visual Object Sequence 463 Header(VS header) is placed at the beginning of the RTP payload, 464 preceding the Visual Object Header and the Video Object Layer 465 Header(VO header, VOL header). Since the fragmentation rule defined 466 in 3.2 guarantees that the configuration information, starting with 467 visual_object_sequence_start_code, is always placed at the beginning 468 of the RTP payload, RTP receivers can detect the random access point 469 by checking if the first 32-bit field of the RTP payload is 470 visual_object_sequence_start_code. 472 (b) is another example of the RTP packet containing the configuration 473 information. It differs from example (a) in that the RTP packet also 474 contains a video packet in the VOP following the configuration 475 information. Since the length of the configuration information is 476 relatively short (typically scores of bytes) and an RTP packet 477 containing only the configuration information may thus increase the 478 overhead, the configuration information and the immediately following 479 GOV and/or (a part of) VOP can be packetized into a single RTP packet 480 as in this example. 482 (c) is an example of an RTP packet that contains 483 Group_of_VideoObjectPlane(GOV). Following criterion (1), the GOV is 484 placed at the beginning of the RTP payload. It would be a waste of 485 RTP/IP header overhead to generate an RTP packet containing only a 486 GOV whose length is 7 bytes. Therefore, (a part of) the following 487 VOP can be placed in the same RTP packet as shown in (c). 489 (d) is an example of the case where one video packet is packetized 490 into one RTP packet. When the packet-loss rate of the underlying 491 network is high, this kind of packetization is recommended. Even 492 when the RTP packet containing the VOP header is discarded by a 493 packet loss, the other RTP packets can be decoded by using the 494 HEC(Header Extension Code) information in the video packet header. 495 No extra RTP header field is necessary. 497 (e) is an example of the case where more than one video packet is 498 packetized into one RTP packet. This kind of packetization is 499 effective to save the overhead of RTP/IP headers when the bit-rate of 500 the underlying network is low. However, it will decrease the packet- 501 loss resiliency because multiple video packets are discarded by a 502 single RTP packet loss. The optimal number of video packets in an 503 RTP packet and the length of the RTP packet can be determined 504 considering the packet-loss rate and the bit-rate of the underlying 505 network. 507 (f) is an example of the case when the video packet is disabled by 508 setting resync_marker_disable in the VOL header to 1. In this case, 509 a VOP may be split into a plurality of RTP packets at arbitrary byte- 510 positions. For example, it is possible to split a VOP into fixed- 511 length packets. This kind of coder configuration and RTP packet 512 fragmentation may be used when the underlying network is guaranteed 513 to be error-free. On the other hand, it is not recommended to use it 514 in error-prone environment since it provides only poor packet loss 515 resiliency. 517 Figure 3 shows examples of RTP packets prohibited by the criteria of 518 3.2. 520 Fragmentation of a header into multiple RTP packets, as in (a), will 521 not only increase the overhead of RTP/IP headers but also decrease 522 the error resiliency. Therefore, it is prohibited by the criterion 523 (3). 525 When concatenating more than one video packets into an RTP packet, 526 VOP header or video_packet_header() shall not be placed in the middle 527 of the RTP payload. The packetization as in (b) is not allowed by 528 criterion (2) due to the aspect of the error resiliency. Comparing 529 this example with Figure 2(d), although two video packets are mapped 530 onto two RTP packets in both cases, the packet-loss resiliency is not 531 identical. Namely, if the second RTP packet is lost, both video 532 packets 1 and 2 are lost in the case of Figure 3(b) whereas only 533 video packet 2 is lost in the case of Figure 2(d). 535 +------+------+------+------+ 536 (a) | RTP | VS | VO | VOL | 537 |header|header|header|header| 538 +------+------+------+------+ 540 +------+------+------+------+------+------------+ 541 (b) | RTP | VS | VO | VOL | VOP |Video Packet| 542 |header|header|header|header|header| | 543 +------+------+------+------+------+------------+ 545 +------+-----+------------------+ 546 (c) | RTP | GOV |Video Object Plane| 547 |header| | | 548 +------+-----+------------------+ 550 +------+------+------------+ +------+------+------------+ 551 (d) | RTP | VOP |Video Packet| | RTP | VP |Video Packet| 552 |header|header| (1) | |header|header| (2) | 553 +------+------+------------+ +------+------+------------+ 555 +------+------+------------+------+------------+------+------------+ 556 (e) | RTP | VP |Video Packet| VP |Video Packet| VP |Video Packet| 557 |header|header| (1) |header| (2) |header| (3) | 558 +------+------+------------+------+------------+------+------------+ 560 +------+------+------------+ +------+------------+ 561 (f) | RTP | VOP |VOP fragment| | RTP |VOP fragment| 562 |header|header| (1) | |header| (2) | ___ 563 +------+------+------------+ +------+------------+ 565 Figure 2 - Examples of RTP packetized MPEG-4 Visual bitstream 567 +------+-------------+ +------+------------+------------+ 568 (a) | RTP |First half of| | RTP |Last half of|Video Packet| 569 |header| VP header | |header| VP header | | 570 +------+-------------+ +------+------------+------------+ 572 +------+------+----------+ +------+---------+------+------------+ 573 (b) | RTP | VOP |First half| | RTP |Last half| VP |Video Packet| 574 |header|header| of VP(1) | |header| of VP(1)|header| (2) | 575 +------+------+----------+ +------+---------+------+------------+ 577 Figure 3 - Examples of prohibited RTP packetization for MPEG-4 Visual 578 bitstream 580 4. RTP Packetization of MPEG-4 Audio bitstream 582 This section specifies RTP packetization rules for MPEG-4 Audio 583 bitstreams. MPEG-4 Audio streams MUST be formatted LATM (Low- 584 overhead MPEG-4 Audio Transport Multiplex) [14496-3] streams, and the 585 LATM-based streams are then mapped onto RTP packets as described in 586 the sections below. 588 4.1. RTP Packet Format 590 LATM-based streams consist of a sequence of audioMuxElements that 591 include one or more PayloadMux elements which carry the audio frames. 592 A complete audioMuxElement or a part of one SHALL be mapped directly 593 onto an RTP payload without any removal of audioMuxElement syntax 594 elements (see Figure 4). The first byte of each audioMuxElement 595 SHALL be located at the first payload location in an RTP packet. 597 0 1 2 3 598 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 600 |V=2|P|X| CC |M| PT | sequence number |RTP 601 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 602 | timestamp |Header 603 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 604 | synchronization source (SSRC) identifier | 605 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 606 | contributing source (CSRC) identifiers | 607 | .... | 608 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 609 | |RTP 610 : audioMuxElement (byte aligned) :Payload 611 | | 612 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 613 | :...OPTIONAL RTP padding | 614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 616 Figure 4 - An RTP packet for MPEG-4 Audio 618 In order to decode the audioMuxElement, the following 619 muxConfigPresent information is required to be indicated by out-of- 620 band means. When SDP is utilized for this indication, the Media Type 621 parameter "cpresent" corresponds to the muxConfigPresent information 622 (see section 5.3). The following restrictions apply: 624 o In the out-of-band configuration case the number of PayloadMux 625 elements contained in each audioMuxElement can only be set once. 626 If values greater than one PayloadMux Element are used, special 627 care is required to ensure that the last RTP packet remains 628 decodable. 630 o In the in-band configuration case the audio frames are in general 631 not byte aligned. Hinting RTP payload from MP4 file format 632 [14496-12] [14496-14] is therefore not possible. 634 muxConfigPresent: If this value is set to 1 (in-band mode), the 635 audioMuxElement SHALL include an indication bit "useSameStreamMux" 636 and MAY include the configuration information for audio compression 637 "StreamMuxConfig". The useSameStreamMux bit indicates whether the 638 StreamMuxConfig element in the previous frame is applied in the 639 current frame. If the useSameStreamMux bit indicates to use the 640 StreamMuxConfig from the previous frame, but if the previous frame 641 has been lost, the current frame may not be decodable. Therefore, in 642 case of in-band mode, the StreamMuxConfig element SHOULD be 643 transmitted repeatedly depending on the network condition. On the 644 other hand, if muxConfigPresent is set to 0 (out-band mode), the 645 StreamMuxConfig element is required to be transmitted by an out-of- 646 band means. In case of SDP, Media Type parameter "config" is 647 utilized (see section 5.3). 649 4.2. Use of RTP Header Fields for MPEG-4 Audio 651 Payload Type (PT): The assignment of an RTP payload type for this new 652 packet format is outside the scope of this document, and will not be 653 specified here. It is expected that the RTP profile for a particular 654 class of applications will assign a payload type for this encoding, 655 or if that is not done then a payload type in the dynamic range shall 656 be chosen by means of an out-of-band signaling protocol (e.g., H.245, 657 SIP, etc). In the dynamic assignment of RTP payload types for 658 scalable streams, a different value SHOULD be assigned to each layer. 659 The dependency relationships between the enhance layer and the base 660 layer SHOULD be signaled as specified in [RFC5583]. An example of 661 the use of such signaling for scalable audio streams can be found in 662 [RFC5691]. 664 Marker (M) bit: The marker bit indicates audioMuxElement boundaries. 665 It is set to one to indicate that the RTP packet contains a complete 666 audioMuxElement or the last fragment of an audioMuxElement. 668 Timestamp: The timestamp indicates the sampling instance of the first 669 audio frame contained in the RTP packet. Timestamps are recommended 670 to start at a random value for security reasons. 672 Unless specified by an out-of-band means, the resolution of the 673 timestamp is set to its default value of 90 kHz. 675 Sequence Number: Incremented by one for each RTP packet sent, 676 starting, for security reasons, with a random value. 678 Other header fields are used as described in RFC 3550 [RFC3550]. 680 4.3. Fragmentation of MPEG-4 Audio bitstream 682 It is RECOMMENDED to put one audioMuxElement in each RTP packet. If 683 the size of an audioMuxElement can be kept small enough that the size 684 of the RTP packet containing it does not exceed the size of the path- 685 MTU, this will be no problem. If it cannot, the audioMuxElement MAY 686 be fragmented and spread across multiple packets. 688 5. Media Type registration for MPEG-4 Audio/Visual streams 690 The following sections describe the Media Type registrations for 691 MPEG-4 Audio/Visual streams. Media Type registration and SDP usage 692 for the MPEG-4 Visual stream are described in Sections 5.1 and 5.2, 693 respectively, while Media Type registration and SDP usage for MPEG-4 694 Audio stream are described in Sections 5.3 and 5.4, respectively. 696 5.1. Media Type registration for MPEG-4 Visual 698 Media type name: video 700 Media subtype name: MP4V-ES 702 Required parameters: none 704 Optional parameters: 706 rate: This parameter is used only for RTP transport. It indicates 707 the resolution of the timestamp field in the RTP header. If this 708 parameter is not specified, its default value of 90000 (90kHz) is 709 used. 711 profile-level-id: A decimal representation of MPEG-4 Visual 712 Profile and Level indication value (profile_and_level_indication) 713 defined in Table G-1 of ISO/IEC 14496-2 [14496-2] [14496-2/Amd.1]. 714 This parameter MAY be used in the capability exchange or session 715 setup procedure to indicate MPEG-4 Visual Profile and Level 716 combination of which the MPEG-4 Visual codec is capable. If this 717 parameter is not specified by the procedure, its default value of 718 1 (Simple Profile/Level 1) is used. 720 config: This parameter SHALL be used to indicate the configuration 721 of the corresponding MPEG-4 Visual bitstream. It SHALL NOT be 722 used to indicate the codec capability in the capability exchange 723 procedure. It is a hexadecimal representation of an octet string 724 that expresses the MPEG-4 Visual configuration information, as 725 defined in subclause 6.2.1 Start codes of ISO/IEC14496-2 [14496-2] 726 [14496-2/Amd.1] [14496-2/Cor.1]. The configuration information is 727 mapped onto the octet string in an MSB-first basis. The first bit 728 of the configuration information SHALL be located at the MSB of 729 the first octet. The configuration information indicated by this 730 parameter SHALL be the same as the configuration information in 731 the corresponding MPEG-4 Visual stream, except for 732 first_half_vbv_occupancy and latter_half_vbv_occupancy, if exist, 733 which may vary in the repeated configuration information inside an 734 MPEG-4 Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2). 736 Example usages for these parameters are: 738 * MPEG-4 Visual Simple Profile/Level 1: Content-type: video/ 739 mp4v-es; profile-level-id=1 741 * MPEG-4 Visual Core Profile/Level 2: Content-type: video/ 742 mp4v-es; profile-level-id=34 744 * MPEG-4 Visual Advanced Real Time Simple Profile/Level 1: 745 Content-type: video/mp4v-es; profile-level-id=145 747 Published specification: 749 The specifications for MPEG-4 Visual streams are presented in ISO/ 750 IEC 14469-2 [14496-2] [14496-2/Amd.1] [14496-2/Cor.1]. The RTP 751 payload format is described in RFC 3016. 753 Encoding considerations: 755 Video bitstreams MUST be generated according to MPEG-4 Visual 756 specifications (ISO/IEC 14496-2). A video bitstream is binary 757 data and MUST be encoded for non-binary transport (for Email, the 758 Base64 encoding is sufficient). This type is also defined for 759 transfer via RTP. The RTP packets MUST be packetized according to 760 the MPEG-4 Visual RTP payload format defined in RFC 3016. 762 Security considerations: 764 See section 7 of RFC 3016. 766 Interoperability considerations: 768 MPEG-4 Visual provides a large and rich set of tools for the 769 coding of visual objects. For effective implementation of the 770 standard, subsets of the MPEG-4 Visual tool sets have been 771 provided for use in specific applications. These subsets, called 772 'Profiles', limit the size of the tool set a decoder is required 773 to implement. In order to restrict computational complexity, one 774 or more Levels are set for each Profile. A Profile@Level 775 combination allows: 777 * a codec builder to implement only the subset of the standard he 778 needs, while maintaining interworking with other MPEG-4 devices 779 included in the same combination, and 781 * checking whether MPEG-4 devices comply with the standard 782 ('conformance testing'). 784 The visual stream SHALL be compliant with the MPEG-4 Visual 785 Profile@Level specified by the parameter "profile-level-id". 786 Interoperability between a sender and a receiver may be achieved 787 by specifying the parameter "profile-level-id", or by arranging in 788 the capability exchange/announcement procedure to set this 789 parameter mutually to the same value. 791 Applications which use this Media Type: 793 Audio and visual streaming and conferencing tools 795 Additional information: none 797 Person and email address to contact for further information: 799 See Authors' Address section at the end of this document. 801 Intended usage: COMMON 803 Author/Change controller: 805 See Authors' Address section at the end of this document. 807 5.2. SDP usage of MPEG-4 Visual 809 The Media Type video/MP4V-ES string is mapped to fields in the 810 Session Description Protocol (SDP) [RFC4566], as follows: 812 o The Media Type (video) goes in SDP "m=" as the media name. 814 o The Media subtype (MP4V-ES) goes in SDP "a=rtpmap" as the encoding 815 name. 817 o The optional parameter "rate" goes in "a=rtpmap" as the clock 818 rate. 820 o The optional parameter "profile-level-id" and "config" go in the 821 "a=fmtp" line to indicate the coder capability and configuration, 822 respectively. These parameters are expressed as a string, in the 823 form of as a semicolon separated list of parameter=value pairs. 825 The following are some examples of media representation in SDP: 827 Simple Profile/Level 1, rate=90000(90kHz), "profile-level-id" and 828 "config" are present in "a=fmtp" line: 829 m=video 49170/2 RTP/AVP 98 830 a=rtpmap:98 MP4V-ES/90000 831 a=fmtp:98 profile-level-id=1;config=000001B001000001B50900000100000001 832 20008440FA282C2090A21F 834 Core Profile/Level 2, rate=90000(90kHz), "profile-level-id" is present 835 in "a=fmtp" line: 836 m=video 49170/2 RTP/AVP 98 837 a=rtpmap:98 MP4V-ES/90000 838 a=fmtp:98 profile-level-id=34 840 Advance Real Time Simple Profile/Level 1, rate=90000(90kHz), 841 "profile-level-id" is present in "a=fmtp" line: 842 m=video 49170/2 RTP/AVP 98 843 a=rtpmap:98 MP4V-ES/90000 844 a=fmtp:98 profile-level-id=145 846 5.3. Media Type registration of MPEG-4 Audio 848 Media type name: audio 850 Media subtype name: MP4A-LATM 852 Required parameters: 854 rate: the rate parameter indicates the RTP time stamp clock rate. 855 The default value is 90000. Other rates MAY be specified only if 856 they are set to the same value as the audio sampling rate (number 857 of samples per second). 859 In the presence of SBR, the sampling rates for the core en-/ 860 decoder and the SBR tool are different in most cases. This 861 parameter shall therefore not be considered as the definitive 862 sampling rate. If this parameter is used, the server must 863 following the rules below: 865 * When the presence of SBR is not explicitly signaled by the 866 optional SDP parameters such as object parameter, profile- 867 level-id or config string, this parameter shall be set to the 868 core codec sampling rate. 870 * When the presence of SBR is explicitly signaled by the optional 871 SDP parameters such as object parameter, profile-level-id or 872 config string this parameter shall be set to the SBR sampling 873 rate. 875 NOTE: The optional parameter SBR-enabled in SDP a=fmtp is useful 876 for implicit HE AAC / HE AAC v2 signaling. But the SBR-enabled 877 parameter can also be used in the case of explicit HE AAC / HE AAC 878 v2 signaling. Therefore, its existence itself is not the criteria 879 to determine whether HE AAC / HE AAC v2 signaling is explicit or 880 not. 882 Optional parameters: 884 profile-level-id: a decimal representation of MPEG-4 Audio Profile 885 Level indication value defined in ISO/IEC 14496-3 [14496-3]. This 886 parameter indicates which MPEG-4 Audio tool subsets the decoder is 887 capable of using. If this parameter is not specified in the 888 capability exchange or session setup procedure, its default value 889 of 30 (Natural Audio Profile/Level 1) is used. 891 Followings are some examples of this value: 892 1 : Main Audio Profile Level 1 893 9 : Speech Audio Profile Level 1 894 15: High Quality Audio Profile Level 2 895 30: Natural Audio Profile Level 1 896 44: High Efficiency AAC Profile Level 2 897 48: High Efficiency AAC v2 Profile Level 2 898 55: Baseline MPEG Surround Profile (see ISO/IEC 23003-1) Level 3 900 MPS-profile-level-id: a decimal representation of the MPEG 901 Surround Profile Level indication as defined in ISO/IEC 14496-3 902 [14496-3]. This parameter indicates the MPEG Surround profile and 903 level that the decoder must be capable in order to decode the 904 stream. 906 object: a decimal representation of the MPEG-4 Audio Object Type 907 value defined in ISO/IEC 14496-3 [14496-3]. This parameter 908 specifies the tool to be used by the coder. It CAN be used to 909 limit the capability within the specified "profile-level-id". 911 bitrate: the data rate for the audio bit stream. 913 cpresent: a boolean parameter indicates whether audio payload 914 configuration data has been multiplexed into an RTP payload (see 915 section 4.1). A 0 indicates the configuration data has not been 916 multiplexed into an RTP payload, a 1 indicates that it has. The 917 default if the parameter is omitted is 1. 919 config: a hexadecimal representation of an octet string that 920 expresses the audio payload configuration data "StreamMuxConfig", 921 as defined in ISO/IEC 14496-3 [14496-3]. Configuration data is 922 mapped onto the octet string in an MSB-first basis. The first bit 923 of the configuration data SHALL be located at the MSB of the first 924 octet. In the last octet, zero-padding bits, if necessary, SHALL 925 follow the configuration data. Senders MUST set the 926 StreamMuxConfig elements taraBufferFullness and latmBufferFullness 927 to their largest respective value, indicating that buffer fullness 928 measures are not used in SDP. Receivers MUST ignore the value of 929 these two elements contained in the config parameter. 931 MPS-asc: a hexadecimal representation of an octet string that 932 expresses audio payload configuration data "AudioSpecificConfig", 933 as defined in ISO/IEC 14496-3 [14496-3]. If this parameter is not 934 present the relevant signaling is performed by other means (e.g. 935 in-band or contained in the config string). 937 The same mapping rules as for the config parameter apply. 939 ptime: RECOMMENDED duration of each packet in milliseconds. 941 SBR-enabled: a boolean parameter which indicates whether SBR-data 942 can be expected in the RTP-payload of a stream. This parameter is 943 relevant for an SBR-capable decoder if the presence of SBR can not 944 be detected from an out-of-band decoder configuration (e.g. 945 contained in the config string). 947 If this parameter is set to 0, a decoder SHALL expect that SBR is 948 not used. If this parameter is set to 1, a decoder SHOULD 949 upsample the audio data with the SBR tool, regardless whether SBR 950 data is present in the stream or not. 952 If the presence of SBR can not be detected from out-of-band 953 configuration and the SBR-enabled parameter is not present, the 954 parameter defaults to 1 for an SBR-capable decoder. If the 955 resulting output sampling rate or the computational complexity is 956 not supported, the SBR tool may be disabled or run in downsampled 957 mode. 959 The timestamp resolution at RTP layer is determined by the rate 960 parameter. 962 Published specification: 964 Payload format specifications are described in this document. 965 Encoding specifications are provided in ISO/IEC 14496-3 [14496-3]. 967 Encoding considerations: 969 This type is only defined for transfer via RTP. 971 Security considerations: 973 See Section 7 of RFC 3016. 975 Interoperability considerations: 977 MPEG-4 Audio provides a large and rich set of tools for the coding 978 of audio objects. For effective implementation of the standard, 979 subsets of the MPEG-4 Audio tool sets similar to those used in 980 MPEG-4 Visual have been provided (see section 5.1). 982 The audio stream SHALL be compliant with the MPEG-4 Audio Profile@ 983 Level specified by the parameters "profile-level-id" and "MPS- 984 profile-level-id". Interoperability between a sender and a 985 receiver may be achieved by specifying the parameters "profile- 986 level-id" and "MPS-profile-level-id", or by arranging in the 987 capability exchange procedure to set this parameter mutually to 988 the same value. Furthermore, the "object" parameter can be used 989 to limit the capability within the specified Profile@Level in 990 capability exchange. 992 Applications which use this media type: 994 Audio and video streaming and conferencing tools. 996 Additional information: none 998 Personal and email address to contact for further information: 1000 See Authors' Address section at the end of this document. 1002 Intended usage: COMMON 1004 Author/Change controller: 1006 See Authors' Address section at the end of this document. 1008 5.4. SDP usage of MPEG-4 Audio 1010 The Media Type audio/MP4A-LATM string is mapped to fields in the 1011 Session Description Protocol (SDP) [RFC4566], as follows: 1013 o The Media Type (audio) goes in SDP "m=" as the media name. 1015 o The Media subtype (MP4A-LATM) goes in SDP "a=rtpmap" as the 1016 encoding name. 1018 o The required parameter "rate" goes in "a=rtpmap" as the clock 1019 rate. 1021 o The optional parameter "ptime" goes in SDP "a=ptime" attribute. 1023 o The optional parameters "profile-level-id" and 1024 "MPS-profile-level-id" goes in the "a=fmtp" line to indicate the 1025 coder capability. The "object" parameter goes in the "a=fmtp" 1026 attribute. The payload-format-specific parameters "bitrate", 1027 "cpresent", "config", "MPS-asc" and "SBR-enabled" go in the 1028 "a=fmtp" line. These parameters are expressed as a string, in the 1029 form of as a semicolon separated list of parameter=value pairs. 1031 The following sections contain some examples of the media 1032 representation in SDP. 1034 Note that the a=fmtp line in some of the examples has been wrapped to 1035 fit the page; they would comprise a single line in the SDP file. 1037 5.4.1. Example: In-band configuration 1039 In this example the audio configuration data appears in the RTP 1040 payload exclusively (i.e., the MPEG-4 audio configuration is known 1041 when a StreamMuxConfig element appears within the RTP payload). 1043 m=audio 49230 RTP/AVP 96 1044 a=rtpmap:96 MP4A-LATM/90000 1045 a=fmtp:96 object=2; cpresent=1 1047 The "clock rate" is set to 90kHz. This is the default value and the 1048 real audio sampling rate is known when the audio configuration data 1049 is received. 1051 5.4.2. Example: 6kb/s CELP 1053 6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz) 1054 m=audio 49230 RTP/AVP 96 1055 a=rtpmap:96 MP4A-LATM/8000 1056 a=fmtp:96 profile-level-id=9; object=8; cpresent=0; 1057 config=40008B18388380 1058 a=ptime:20 1060 In this example audio configuration data is not multiplexed into the 1061 RTP payload and is described only in SDP. Furthermore, the "clock 1062 rate" is set to the audio sampling rate. 1064 5.4.3. Example: 64 kb/s AAC LC stereo 1066 64 kb/s AAC LC stereo bitstream (with an audio sampling rate of 24 1067 kHz) 1069 m=audio 49230 RTP/AVP 96 1070 a=rtpmap:96 MP4A-LATM/24000/2 1071 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 1072 object=2; config=400026203fc0 1074 In this example audio configuration data is not multiplexed into the 1075 RTP payload and is described only in SDP. Furthermore, the "clock 1076 rate" is set to the audio sampling rate. 1078 In this example, the presence of SBR can not be determined by the SDP 1079 parameter set. The clock rate represents the core codec sampling 1080 rate. An SBR enabled decoder SHOULD use the SBR tool to upsample the 1081 audio data if complexity and resulting output sampling rate permits. 1083 5.4.4. Example: Use of the SBR-enabled parameter 1085 These two examples are identical to the example above with the 1086 exception of the SBR-enabled parameter. The presence of SBR is not 1087 signaled by the SDP parameters object, profile-level-id and config, 1088 but instead the SBR-enabled parameter is present. The rate parameter 1089 and the StreamMuxConfig contain the core codec sampling rate. 1091 Example with "SBR-enabled=0", definitive and core codec sampling rate 1092 24kHz: 1094 m=audio 49230 RTP/AVP 96 1095 a=rtpmap:96 MP4A-LATM/24000/2 1096 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 1097 SBR-enabled=0; config=400026203fc0 1099 Example with "SBR-enabled=1", core codec sampling rate 24kHz, 1100 definitive and SBR sampling rate 48kHz: 1102 m=audio 49230 RTP/AVP 96 1103 a=rtpmap:96 MP4A-LATM/24000/2 1104 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 1105 SBR-enabled=1; config=400026203fc0 1107 In this example, the clock rate is still 24000 and this information 1108 should be used for RTP timestamp calculation. The value of 24000 is 1109 used to support old AAC decoders. This makes the decoder supporting 1110 only AAC understand the HE AAC coded data, although only plain AAC is 1111 supported. A HE AAC decoder is able to generate ourput data with the 1112 SBR sampling rate. 1114 5.4.5. Example: Hierarchical Signaling of SBR 1116 When the presence of SBR is explicitly signaled by the SDP parameters 1117 object, profile-level-id or the config string as in the example 1118 below, the StreamMuxConfig contains both the core codec sampling rate 1119 and the SBR sampling rate. 1121 m=audio 49230 RTP/AVP 96 1122 a=rtpmap:96 MP4A-LATM/48000/2 1123 a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0; 1124 config=40005623101fe0; SBR-enabled=1 1126 This config string uses the explicit signaling mode 2.A (hierarchical 1127 signaling; See ISO/IEC 14496-3 [14496-3]). This means that the 1128 AOT(Audio Object Type) is SBR(5) and SFI(Sampling Frequency Index) is 1129 6(24000 Hz) which refers to the underlying core codec sampling 1130 frequency. CC(Channel Configuration) is stereo(2), and the 1131 ESFI(Extension Sampling Frequency Index)=3 (48000) is referring to 1132 the sampling frequency of the extension tool(SBR). 1134 5.4.6. Example: HE AAC v2 Signaling 1136 HE AAC v2 decoders are required to always produce a stereo signal 1137 from a mono signal. Hence, there is no parameter necessary to signal 1138 the presence of PS. 1140 Example with "SBR-enabled=1" and 1 channel signaled in the a=rtpmap 1141 line and within the config parameter. Core codec sampling rate is 1142 24kHz, definitive and SBR sampling rate is 48kHz. Core codec channel 1143 configuration is mono, PS channel configuration is stereo. 1145 m=audio 49230 RTP/AVP 110 1146 a=rtpmap:110 MP4A-LATM/24000/1 1147 a=fmtp:110 profile-level-id=15; object=2; cpresent=0; 1148 config=400026103fc0; SBR-enabled=1 1150 5.4.7. Example: Hierarchical Signaling of PS 1152 Example: 48khz stereo audio input: 1154 m=audio 49230 RTP/AVP 110 1155 a=rtpmap:110 MP4A-LATM/48000/2 1156 a=fmtp:110 profile-level-id=48; cpresent=0; config=4001d613101fe0 1158 The config parameter indicates explicit hierarchical signaling of PS 1159 and SBR. This configuration method is not supported by legacy AAC an 1160 HE AAC decoders and these are therefore unable to decode the the 1161 coded data. 1163 5.4.8. Example: MPEG Surround 1165 The following examples show how MPEG Surround configuration data can 1166 be signaled using SDP. The configuration is carried within the 1167 config string in the first example by using two different layers. 1168 The general parameters in this example are: AudioMuxVersion=1; 1169 allStreamsSameTimeFraming=1; numSubFrames=0; numProgram=0; 1170 numLayer=1. The first layer describes the HE AAC payload and signals 1171 the following parameters: ascLen=25; audioObjectType=2 (AAC LC); 1172 extensionAudioObjectType=5 (SBR); samplingFrequencyIndex=6 (24kHz); 1173 extensionSamplingFrequencyIndex=3 (48kHz); channelConfiguration=2 1174 (2.0 channels). The second layer describes the MPEG surround payload 1175 and specifies the following parameters: ascLen=110; 1176 AudioObjectType=30 (MPEG Surround); samplingFrequencyIndex=3 (48kHz); 1177 channelConfiguration=6 (5.1 channels); sacPayloadEmbedding=1; 1178 SpatialSpecificConfig=(48 kHz; 32 slots; 525 tree; ResCoding=1; 1179 ResBands=[7,7,7,7]). 1181 In this example the signaling is carried by using two different LATM 1182 layers. The MPEG surround payload is carried together with the AAC 1183 payload in a single layer as indicated by the sacPayloadEmbedding 1184 Flag. 1186 m=audio 49230 RTP/AVP 96 1187 a=rtpmap:96 MP4A-LATM/48000 1188 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 1189 SBR-enabled=1; 1190 config=9FF8005192B11880FF2DDE3699F2408C00536C02313CF3CE0FF0 1192 5.4.9. Example: MPEG Surround with extended SDP parameters 1194 The following example is an extension of the configuration given 1195 above by the MPEG Surround specific parameters. The MPS-asc 1196 parameter specifies the MPEG Surround Baseline Profile at Level 3 1197 (PLI55) and the MPS-asc string contains the hexadecimal 1198 representation of the MPEG Surround ASC [audioObjectType=30 (MPEG 1199 Surround); samplingFrequencyIndex=0x3 (48kHz); channelConfiguration=6 1200 (5.1 channels); sacPayloadEmbedding=1; SpatialSpecificConfig=(48 kHz; 1201 32 slots; 525 tree; ResCoding=1; ResBands=[0,13,13,13])]. 1203 m=audio 49230 RTP/AVP 96 1204 a=rtpmap:96 MP4A-LATM/48000 1205 a=fmtp:96 profile-level-id=44; bitrate=64000; cpresent=0; 1206 config=40005623101fe0; MPS-profile-level-id=55; 1207 MPS-asc=F1B4CF920442029B501185B6DA00; 1209 6. IANA Considerations 1211 This memo defines additional optional format parameters to the Media 1212 Type "audio" and its subtype "MP4A-LATM", as defined in RFC 3016 1213 [RFC3016]. The Media Type parameters are defined in sections 5.1 and 1214 5.3. 1216 6.1. Media Type Registration 1218 This memo defines the following additional optional parameters which 1219 SHOULD be used if SBR or MPEG Surround data is present inside the 1220 payload of an AAC elementary stream. 1222 MPS-profile-level-id: a decimal representation of the MPEG 1223 Surround Profile Level indication as defined in ISO/IEC 14496-3 1224 [14496-3]. This parameter indicates the MPEG Surround profile and 1225 level that the decoder must be capable in order to decode the 1226 stream. 1228 MPS-asc: a hexadecimal representation of an octet string that 1229 expresses audio payload configuration data "AudioSpecificConfig", 1230 as defined in ISO/IEC 14496-3 [14496-3]. If this parameter is not 1231 present the relevant signaling is performed by other means (e.g. 1232 in-band or contained in the config string). 1234 SBR-enabled: a boolean parameter which indicates whether SBR-data 1235 can be expected in the RTP-payload of a stream. This parameter is 1236 relevant for an SBR-capable decoder if the presence of SBR can not 1237 be detected from an out-of-band decoder configuration (e.g. 1238 contained in the config string). 1240 6.2. Usage of SDP 1242 It is assumed that the Media Type parameters are conveyed via an SDP 1243 message as specified in RFC 3016 [RFC3016], sections 5.2 and 5.4. 1245 7. Security Considerations 1247 RTP packets using the payload format defined in this specification 1248 are subject to the security considerations discussed in the RTP 1249 specification [RFC3550]. This implies that confidentiality of the 1250 media streams is achieved by encryption. Because the data 1251 compression used with this payload format is applied end-to-end, 1252 encryption may be performed on the compressed data so there is no 1253 conflict between the two operations. 1255 The complete MPEG-4 system allows for transport of a wide range of 1256 content, including Java applets (MPEG-J) and scripts. Since this 1257 payload format is restricted to audio and video streams, it is not 1258 possible to transport such active content in this format. 1260 Most MPEG-4 codecs define an extension mechanism to transmit extra 1261 data within a stream that is gracefully skipped by decoders that do 1262 not support this extra data. This covert channel may be used to 1263 transmit unwanted data in an otherwise valid stream and it is hence 1264 recommended to use SRTP [RFC3711] for stream encryption, 1265 authentication, and integrity check. 1267 8. References 1269 8.1. Normative References 1271 [14496-2] MPEG, "ISO/IEC International Standard 14496-2 - Coding of 1272 audio-visual objects, Part 2: Visual", 1999. 1274 [14496-2/Amd.1] 1275 MPEG, "ISO/IEC International Standard 14496-2 - Coding of 1276 audio-visual objects, Part 2: Visual, Amendment 1: Visual 1277 extensions", 2000. 1279 [14496-2/Cor.1] 1280 MPEG, "ISO/IEC International Standard 14496-2 - Coding of 1281 audio-visual objects, Part 2: Visual, Technical 1282 corrigendum 1", 2000. 1284 [14496-3] MPEG, "ISO/IEC International Standard 14496-3 - Coding of 1285 audio-visual objects, Part 3 Audio", 2009. 1287 [23003-1] MPEG, "ISO/IEC International Standard 23003-1 - MPEG 1288 Surround (MPEG D)", 2007. 1290 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1291 Requirement Levels", BCP 14, RFC 2119, March 1997. 1293 [RFC3016] Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and H. 1294 Kimata, "RTP Payload Format for MPEG-4 Audio/Visual 1295 Streams", RFC 3016, November 2000. 1297 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1298 Jacobson, "RTP: A Transport Protocol for Real-Time 1299 Applications", STD 64, RFC 3550, July 2003. 1301 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1302 Description Protocol", RFC 4566, July 2006. 1304 [RFC4629] Ott, H., Bormann, C., Sullivan, G., Wenger, S., and R. 1305 Even, "RTP Payload Format for ITU-T Rec", RFC 4629, 1306 January 2007. 1308 [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding 1309 Dependency in the Session Description Protocol (SDP)", 1310 RFC 5583, July 2009. 1312 8.2. Informative References 1314 [14496-1] MPEG, "ISO/IEC International Standard 14496-1 - Coding of 1315 audio-visual objects, Part 1 Systems", 2004. 1317 [14496-12] 1318 MPEG, "ISO/IEC International Standard 14496-12 - Coding of 1319 audio-visual objects, Part 12 ISO base media file format". 1321 [14496-14] 1322 MPEG, "ISO/IEC International Standard 14496-14 - Coding of 1323 audio-visual objects, Part 12 MP4 file format". 1325 [3GPP] 3GPP, "3rd Generation Partnership Project; Technical 1326 Specification Group Services and System Aspects; 1327 Transparent end-to-end Packet-switched Streaming Service 1328 (PSS); Protocols and codecs (Release 8)", 3GPP TS 24.234 1329 V8.0.0, September 2008. 1331 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1332 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1333 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1334 September 1997. 1336 [RFC3640] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., 1337 and P. Gentric, "RTP Payload Format for Transport of 1338 MPEG-4 Elementary Streams", RFC 3640, November 2003. 1340 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1342 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1343 RFC 3711, March 2004. 1345 [RFC4628] Even, R., "RTP Payload Format for H.263 Moving RFC 2190 to 1346 Historic Status", RFC 4628, January 2007. 1348 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 1349 Correction", RFC 5109, December 2007. 1351 [RFC5691] de Bont, F., Doehla, S., Schmidt, M., and R. 1352 Sperschneider, "RTP Payload Format for Elementary Streams 1353 with MPEG Surround Multi-Channel Audio", RFC 5691, 1354 October 2009. 1356 Authors' Addresses 1358 Malte Schmidt 1359 Dolby Laboratories 1360 Deutschherrnstr. 15-19 1361 90537 Nuernberg, 1362 DE 1364 Phone: +49 911 928 91 42 1365 Email: malte.schmidt@dolby.com 1367 Frans de Bont 1368 Philips Electronics 1369 High Tech Campus 5 1370 5656 AE Eindhoven, 1371 NL 1373 Phone: +31 40 2740234 1374 Email: frans.de.bont@philips.com 1376 Stefan Doehla 1377 Fraunhofer IIS 1378 Am Wolfmantel 33 1379 91058 Erlangen, 1380 DE 1382 Phone: +49 9131 776 6042 1383 Email: stefan.doehla@iis.fraunhofer.de 1384 Jaehwan Kim 1385 Vidiator (Korea) Inc. 1386 7th Fl. AnnJay BLDG 718-2 YeokSam-Dong, KangNam-Gu 1387 135-920, Seoul, 1388 Korea 1390 Phone: +82 70 7012 2540 1391 Email: jaehwan@vidiator.com, kjh1905m@naver.com