idnits 2.17.1 draft-ietf-avt-rtp-mpeg4-es-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 230 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 11, 2000) is 8597 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 14 looks like a reference -- Missing reference section? '3' on line 691 looks like a reference -- Missing reference section? '5' on line 691 looks like a reference -- Missing reference section? '2' on line 560 looks like a reference -- Missing reference section? '4' on line 560 looks like a reference -- Missing reference section? '6' on line 44 looks like a reference -- Missing reference section? '7' on line 151 looks like a reference -- Missing reference section? '9' on line 560 looks like a reference -- Missing reference section? '8' on line 786 looks like a reference -- Missing reference section? '10' on line 662 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 1 warning (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Yoshihiro Kikuchi - Toshiba 2 Internet Draft Toshiyuki Nomura - NEC 3 Document: draft-ietf-avt-rtp-mpeg4-es-05.txt Shigeru Fukunaga - Oki 4 Yoshinori Matsui - Matsushita 5 Hideaki Kimata - NTT 6 October 11, 2000 8 RTP payload format for MPEG-4 Audio/Visual streams 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with all 13 provisions of Section 10 of RFC2026 [1]. 15 Internet-Drafts are working documents of the Internet Engineering Task 16 Force (IETF), its areas, and its working groups. Note that other groups 17 may also distribute working documents as Internet-Drafts. Internet-Drafts 18 are draft documents valid for a maximum of six months and may be updated, 19 replaced, or obsoleted by other documents at any time. It is 20 inappropriate to use Internet- Drafts as reference material or to cite 21 them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 24 The list of Internet-Draft Shadow Directories can be accessed at 25 http://www.ietf.org/shadow.html. 27 Abstract 29 This document describes RTP payload formats for carrying each of MPEG-4 30 Audio and MPEG-4 Visual bitstreams without using MPEG-4 Systems. For the 31 purpose of directly mapping MPEG-4 Audio/Visual bitstreams onto RTP 32 packets, it provides specifications for the use of RTP header fields and 33 also specifies fragmentation rules. It also provides specifications for 34 MIME type registrations and the use of SDP. 36 1. Introduction 38 The RTP payload formats described in this document specify how MPEG-4 39 Audio [3][5] and MPEG-4 Visual streams [2][4] are to be fragmented and 40 mapped directly onto RTP packets. 42 These RTP payload formats enable transport of MPEG-4 Audio/Visual streams 43 without using the synchronization and stream management functionality of 44 MPEG-4 Systems [6]. Such RTP payload formats will be used in systems that 45 have intrinsic stream management functionality and thus require no such 46 functionality from MPEG-4 Systems. H.323 terminals are an example of such 47 a systems, where MPEG-4 Audio/Visual streams are not managed by MPEG-4 48 Systems Object Descriptors but by H.245. The streams are directly mapped 49 onto RTP packets without using MPEG-4 Systems Sync Layer. Other examples 50 are SIP and RTSP where MIME and SDP are used. MIME types and SDP usages 51 of the RTP payload formats described in this document are defined to 52 directly specify the attribute of Audio/Visual streams (e.g. media type, 53 packetization format and codec configuration) without using MPEG-4 54 Systems. The obvious benefit is that these MPEG-4 Audio/Visual RTP 55 payload formats can be handled in an unified way together with those 56 formats defined for non-MPEG-4 codecs. The disadvantage is that 57 interoperability with environments using MPEG-4 Systems may be difficult, 58 other payload formats may be better suited to those applications. 60 The semantics of RTP headers in such cases need to be clearly defined, 61 including the association with MPEG-4 Audio/Visual data elements. In 62 addition, it is beneficial to define the fragmentation rules of RTP 63 packets for MPEG-4 Video streams so as to enhance error resiliency by 64 utilizing the error resilience tools provided inside the MPEG-4 Video 65 stream. 67 1.1 MPEG-4 Visual RTP payload format 69 MPEG-4 Visual is a visual coding standard with many new features: high 70 coding efficiency; high error resiliency; multiple, arbitrary shape 71 object-based coding; etc. [2]. It covers a wide range of bitrates from 72 scores of Kbps to several Mbps. It also covers a wide variety of 73 networks, ranging from those guaranteed to be almost error-free to mobile 74 networks with high error rates. 76 With respect to the fragmentation rules for an MPEG-4 visual bitstream 77 defined in this document, since MPEG-4 Visual is used for a wide variety 78 of networks, it is desirable not to apply too much restriction on 79 fragmentation, and a fragmentation rule such as "a single video packet 80 shall always be mapped on a single RTP packet" may be inappropriate. On 81 the other hand, careless, media unaware fragmentation may cause 82 degradation in error resiliency and bandwidth efficiency. The 83 fragmentation rules described in this document are flexible but manage to 84 define the minimum rules for preventing meaningless fragmentation while 85 utilizing the error resilience functionalities of MPEG-4 Visual. 87 The fragmentation rule recommends not to map more than one VOP in an RTP 88 packet so that the RTP timestamp uniquely indicates the VOP time framing. 89 On the other hand, MPEG-4 video may generate VOPs of very small size, in 90 cases with an empty VOP (vop_coded=0) containing only VOP header or an 91 arbitrary shaped VOP with a small number of coding blocks. To reduce the 92 overhead for such cases, the fragmentation rule permits concatenating 93 multiple VOPs in an RTP packet. (See fragmentation rule (4) in section 94 3.2 and marker bit and timestamp in section 3.1.) 96 While the additional media specific RTP header defined for such video 97 coding tools as H.261 or MPEG-1/2 is effective in helping to recover 98 picture headers corrupted by packet losses, MPEG-4 Visual has already 99 error resilience functionalities for recovering corrupt headers, and 100 these can be used on RTP/IP networks as well as on other networks 101 (H.223/mobile, MPEG-2/TS, etc.). Therefore, no extra RTP header fields 102 are defined in this MPEG-4 Visual RTP payload format. 104 1.2 MPEG-4 Audio RTP payload format 106 MPEG-4 Audio is a new kind of audio standard that integrates many 107 different types of audio coding tools. Low-overhead MPEG-4 Audio 108 Transport Multiplex (LATM) manages the sequences of audio data with 109 relatively small overhead. In audio-only applications, then, it is 110 desirable for LATM-based MPEG-4 Audio bitstreams to be directly mapped 111 onto the RTP packets without using MPEG-4 Systems. 113 While LATM has several multiplexing features as follows; 114 - Carrying configuration information with audio data, 115 - Concatenation of multiple audio frames in one audio stream, 116 - Multiplexing multiple objects (programs), 117 - Multiplexing scalable layers, 118 in RTP transmission there is no need for the last two features. 119 Therefore, these two features MUST NOT be used in applications based on 120 RTP packetization specified by this document. Since LATM has been 121 developed for only natural audio coding tools, i.e. not for synthesis 122 tools, it seems difficult to transmit Structured Audio (SA) data and Text 123 to Speech Interface (TTSI) data by LATM. Therefore, SA data and TTSI data 124 MUST NOT be transported by the RTP packetization in this document 126 For transmission of scalable streams, audio data of each layer should be 127 packetized onto different RTP packets allowing for the different layers 128 to be treated differently at the IP level, for example via some means of 129 differentiated service. On the other hand, all configuration data of the 130 scalable streams are contained in one LATM configuration data 131 "StreamMuxConfig" and every scalable layer shares the StreamMuxConfig. 132 The mapping between each layer and its configuration data is achieved by 133 LATM header information attached to the audio data. In order to indicate 134 the dependency information of the scalable streams, a restriction is 135 applied to the dynamic assignment rule of payload type (PT) values (see 136 section 4.2). 138 For MPEG-4 Audio coding tools, as is true for other audio coders, if the 139 payload is a single audio frame, packet loss will not impair the 140 decodability of adjacent packets. Therefore, the additional media 141 specific header for recovering errors will not be required for MPEG-4 142 Audio. Existing RTP protection mechanisms, such as Generic Forward Error 143 Correction (RFC 2733) and Redundant Audio Data (RFC 2198), MAY be applied 144 to improve error resiliency. 146 2. Conventions used in this document 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 150 document are to be interpreted as described in RFC-2119 [7]. 152 3. RTP Packetization of MPEG-4 Visual bitstream 154 This section specifies RTP packetization rules for MPEG-4 Visual content. 155 An MPEG-4 Visual bitstream is mapped directly onto RTP packets without 156 the addition of extra header fields or any removal of Visual syntax 157 elements. The Combined Configuration/Elementary stream mode MUST be used 158 so that configuration information will be carried to the same RTP port as 159 the elementary stream. (see 6.2.1 "Start codes" of ISO/IEC 14496-2 160 [2][9][4]) The configuration information MAY additionally be specified by 161 some out-of-band means. If needed for an H.323 terminal, H.245 codepoint 162 "decoderConfigurationInformation" MUST be used for this purpose. If 163 needed by systems using MIME content type and SDP parameters, e.g. SIP 164 and RTSP, the optional parameter "config" MUST be used to specify the 165 configuration information. (see 5.1 and 5.2) 167 When the short video header mode is used, the RTP payload format for 168 H.263 SHOULD be used (the format defined in RFC 2429 is RECOMMENDED, but 169 the RFC 2190 format MAY be used for compatibility with older 170 implementations). 172 0 1 2 3 173 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 174 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 175 |V=2|P|X| CC |M| PT | sequence number | RTP 176 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 177 | timestamp | Header 178 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 179 | synchronization source (SSRC) identifier | 180 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 181 | contributing source (CSRC) identifiers | 182 | .... | 183 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 184 | | RTP 185 | MPEG-4 Visual stream (byte aligned) | Payload 186 | | 187 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 188 | :...OPTIONAL RTP padding | 189 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 191 Figure 1 - An RTP packet for MPEG-4 Visual stream 193 3.1 Use of RTP header fields for MPEG-4 Visual 195 Payload Type (PT): The assignment of an RTP payload type for this new 196 packet format is outside the scope of this document, and will not be 197 specified here. It is expected that the RTP profile for a particular 198 class of applications will assign a payload type for this encoding, or if 199 that is not done then a payload type in the dynamic range shall be chosen 200 by means of an out of band signaling protocol (e.g. H.245, SIP, etc). 202 Extension (X) bit: Defined by the RTP profile used. 204 Sequence Number: Incremented by one for each RTP data packet sent, 205 starting, for security reasons, with a random initial value. 207 Marker (M) bit: The marker bit is set to one to indicate the last RTP 208 packet (or only RTP packet) of a VOP. When multiple VOPs are carried in 209 the same RTP packet, the marker bit is set to 1. 211 Timestamp: The timestamp indicates the sampling instance of the VOP 212 contained in the RTP packet. A constant offset, which is random, is added 213 for security reasons. 214 - When multiple VOPs are carried in the same RTP packet, the timestamp 215 indicates the earliest of the VOP times within the VOPs carried in the 216 RTP packet. Timestamp information of the rest of the VOPs are derived 217 from the timestamp fields in the VOP header (modulo_time_base and 218 vop_time_increment). 219 - If the RTP packet contains only configuration information and/or 220 Group_of_VideoObjectPlane() fields, the timestamp of the next VOP in 221 the coding order is used. 222 - If the RTP packet contains only visual_object_sequence_end_code 223 information, the timestamp of the immediately preceding VOP in the 224 coding order is used. 226 The resolution of the timestamp is set to its default value of 90KHz, 227 unless specified by an out-of-band means (e.g. SDP parameter or MIME 228 parameter as defined in section 5). 230 Other header fields are used as described in RFC 1889 [8]. 232 3.2 Fragmentation of MPEG-4 Visual bitstream 234 A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP 235 payload without any addition of extra header fields or any removal of 236 Visual syntax elements. The Combined Configuration/Elementary streams 237 mode is used. The following rules apply for the fragmentation. 239 In the following header means one of the following: 240 - Configuration information (Visual Object Sequence Header, Visual Object 241 Header and Video Object Layer Header) 242 - visual_object_sequence_end_code 243 - The header of the entry point function for an elementary stream 244 (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(), 245 video_plane_with_short_header(), MeshObject() or FaceObject()) 246 - The video packet header (video_packet_header() excluding 247 next_resync_marker()) 248 - The header of gob_layer() 249 See 6.2.1 "Start codes" of ISO/IEC 14496-2[2][9][4] for the definition of 250 the configuration information and the entry point functions. 252 (1) Configuration information and Group_of_VideoObjectPlane() fields 253 SHALL be placed at the beginning of the RTP payload (just after the RTP 254 header) or just after the header of the syntactically upper layer 255 function. 257 (2) If one or more headers exist in the RTP payload, the RTP payload 258 SHALL begin with the header of the syntactically highest function. 259 Note: The visual_object_sequence_end_code is regarded as the lowest 260 function. 262 (3) A header SHALL NOT be split into a plurality of RTP packets. 264 (4) Different VOPs SHOULD be fragmented into different RTP packets so 265 that one RTP packet consists of the data bytes associated with a unique 266 VOP time instance (that is indicated in the timestamp field in the RTP 267 packet header), with the exception that multiple consecutive VOPs MAY be 268 carried within one RTP packet in the decoding order if the size of the 269 VOPs is small. 270 Note: When multiple VOPs are carried in one RTP payload, the timestamp of 271 the VOPs after the first one may be calculated by the decoder. This 272 operation is necessary only for RTP packets in which the marker bit 273 equals to one and the beginning of RTP payload corresponds to a start 274 code. (See timestamp and marker bit in section 3.1) 276 (5) It is RECOMMENDED that a single video packet is sent as a single RTP 277 packet. The size of a video packet SHOULD be adjusted in such a way that 278 the resulting RTP packet is not larger than the path-MTU. 279 Note: Rule (5) does not apply when the video packet is disabled by the 280 coder configuration (by setting resync_marker_disable in the VOL header 281 to 1), or in coding tools where the video packet is not supported. In 282 this case, a VOP MAY be split at arbitrary byte-positions. 284 The video packet starts with the VOP header or the video packet header, 285 followed by motion_shape_texture(), and ends with next_resync_marker() or 286 next_start_code(). 288 3.3 Examples of packetized MPEG-4 Visual bitstream 290 Figure 2 shows examples of RTP packets generated based on the criteria 291 described in 3.2 293 (a) is an example of the first RTP packet or the random access point of 294 an MPEG-4 visual bitstream containing the configuration information. 295 According to criterion (1), the Visual Object Sequence Header(VS header) 296 is placed at the beginning of the RTP payload, preceding the Visual 297 Object Header and the Video Object Layer Header(VO header, VOL header). 298 Since the fragmentation rule defined in 3.2 guarantees that the 299 configuration information, starting with 300 visual_object_sequence_start_code, is always placed at the beginning of 301 the RTP payload, RTP receivers can detect the random access point by 302 checking if the first 32-bit field of the RTP payload is 303 visual_object_sequence_start_code. 305 (b) is another example of the RTP packet containing the configuration 306 information. It differs from example (a) in that the RTP packet also 307 contains a video packet in the VOP following the configuration 308 information. Since the length of the configuration information is 309 relatively short (typically scores of bytes) and an RTP packet containing 310 only the configuration information may thus increase the overhead, the 311 configuration information and the immediately following GOV and/or (a 312 part of) VOP can be packetized into a single RTP packet as in this 313 example. 315 (c) is an example of an RTP packet that contains 316 Group_of_VideoObjectPlane(GOV). Following criterion (1), the GOV is 317 placed at the beginning of the RTP payload. It would be a waste of RTP/IP 318 header overhead to generate an RTP packet containing only a GOV whose 319 length is 7 bytes. Therefore, (a part of) the following VOP can be placed 320 in the same RTP packet as shown in (c). 322 (d) is an example of the case where one video packet is packetized into 323 one RTP packet. When the packet-loss rate of the underlying network is 324 high, this kind of packetization is recommended. Even when the RTP packet 325 containing the VOP header is discarded by a packet loss, the other RTP 326 packets can be decoded by using the HEC(Header Extension Code) 327 information in the video packet header. No extra RTP header field is 328 necessary. 330 (e) is an example of the case where more than one video packet is 331 packetized into one RTP packet. This kind of packetization is effective 332 to save the overhead of RTP/IP headers when the bit-rate of the 333 underlying network is low. However, it will decrease the packet-loss 334 resiliency because multiple video packets are discarded by a single RTP 335 packet loss. The optimal number of video packets in an RTP packet and the 336 length of the RTP packet can be determined considering the packet-loss 337 rate and the bit-rate of the underlying network. 339 (f) is an example of the case when the video packet is disabled by 340 setting resync_marker_disable in the VOL header to 1. In this case, a VOP 341 may be split into a plurality of RTP packets at arbitrary byte-positions. 342 For example, it is possible to split a VOP into fixed-length packets. 343 This kind of coder configuration and RTP packet fragmentation may be used 344 when the underlying network is guaranteed to be error-free. On the other 345 hand, it is not recommended to use it in error-prone environment since it 346 provides only poor packet loss resiliency. 348 Figure 3 shows examples of RTP packets prohibited by the criteria of 3.2. 350 Fragmentation of a header into multiple RTP packets, as in (a), will not 351 only increase the overhead of RTP/IP headers but also decrease the error 352 resiliency. Therefore, it is prohibited by the criterion (3). 354 When concatenating more than one video packets into an RTP packet, VOP 355 header or video_packet_header() shall not be placed in the middle of the 356 RTP payload. The packetization as in (b) is not allowed by criterion (2) 357 due to the aspect of the error resiliency. Comparing this example with 358 Figure 2(d), although two video packets are mapped onto two RTP packets 359 in both cases, the packet-loss resiliency is not identical. Namely, if 360 the second RTP packet is lost, both video packets 1 and 2 are lost in the 361 case of Figure 3(b) whereas only video packet 2 is lost in the case of 362 Figure 2(d). 364 +------+------+------+------+ 365 (a) | RTP | VS | VO | VOL | 366 |header|header|header|header| 367 +------+------+------+------+ 369 +------+------+------+------+------------+ 370 (b) | RTP | VS | VO | VOL |Video Packet| 371 |header|header|header|header| | 372 +------+------+------+------+------------+ 374 +------+-----+------------------+ 375 (c) | RTP | GOV |Video Object Plane| 376 |header| | | 377 +------+-----+------------------+ 379 +------+------+------------+ +------+------+------------+ 380 (d) | RTP | VOP |Video Packet| | RTP | VP |Video Packet| 381 |header|header| (1) | |header|header| (2) | 382 +------+------+------------+ +------+------+------------+ 384 +------+------+------------+------+------------+------+------------+ 385 (e) | RTP | VP |Video Packet| VP |Video Packet| VP |Video Packet| 386 |header|header| (1) |header| (2) |header| (3) | 387 +------+------+------------+------+------------+------+------------+ 389 +------+------+------------+ +------+------------+ 390 (f) | RTP | VOP |VOP fragment| | RTP |VOP fragment| 391 |header|header| (1) | |header| (2) | ___ 392 +------+------+------------+ +------+------------+ 394 Figure 2 - Examples of RTP packetized MPEG-4 Visual bitstream 396 +------+-------------+ +------+------------+------------+ 397 (a) | RTP |First half of| | RTP |Last half of|Video Packet| 398 |header| VP header | |header| VP header | | 399 +------+-------------+ +------+------------+------------+ 401 +------+------+----------+ +------+---------+------+------------+ 402 (b) | RTP | VOP |First half| | RTP |Last half| VP |Video Packet| 403 |header|header| of VP(1) | |header| of VP(1)|header| (2) | 404 +------+------+----------+ +------+---------+------+------------+ 406 Figure 3 - Examples of prohibited RTP packetization for MPEG-4 Visual 407 bitstream 409 4. RTP Packetization of MPEG-4 Audio bitstream 411 This section specifies RTP packetization rules for MPEG-4 Audio 412 bitstreams. MPEG-4 Audio streams MUST be formatted by LATM (Low-overhead 413 MPEG-4 Audio Transport Multiplex) tool[5], and the LATM-based streams are 414 then mapped onto RTP packets as described the three sections below. 416 4.1 RTP Packet Format 418 LATM-based streams consist of a sequence of audioMuxElements that include 419 one or more audio frames. A complete audioMuxElement or a part of one 420 SHALL be mapped directly onto an RTP payload without any removal of 421 audioMuxElement syntax elements (see Figure 4). The first byte of each 422 audioMuxElement SHALL be located at the first payload location in an RTP 423 packet. 425 0 1 2 3 426 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 427 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 428 |V=2|P|X| CC |M| PT | sequence number |RTP 429 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 430 | timestamp |Header 431 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 432 | synchronization source (SSRC) identifier | 433 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 434 | contributing source (CSRC) identifiers | 435 | .... | 436 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 437 | |RTP 438 : audioMuxElement (byte aligned) :Payload 439 | | 440 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 441 | :...OPTIONAL RTP padding | 442 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 443 Figure 4 - An RTP packet for MPEG-4 Audio 445 In order to decode the audioMuxElement, the following muxConfigPresent 446 information is required to be indicated by an out-of-band means. When SDP 447 is utilized for this indication, MIME parameter "cpresent" corresponds to 448 the muxConfigPresent information (see section 5.3). 450 muxConfigPresent: If this value is set to 1 (in-band mode), the 451 audioMuxElement SHALL include an indication bit "useSameStreamMux" and 452 MAY include the configuration information for audio compression 453 "StreamMuxConfig". The useSameStreamMux bit indicates whether the 454 StreamMuxConfig element in the previous frame is applied in the current 455 frame. If the useSameStreamMux bit indicates to use the StreamMuxConfig 456 from the previous frame, but if the previous frame has been lost, the 457 current frame may not be decodable. Therefore, in case of in-band mode, 458 the StreamMuxConfig element SHOULD be transmitted repeatedly depending on 459 the network condition. On the other hand, if muxConfigPresent is set to 0 460 (out-band mode), the StreamMuxConfig element is required to be 461 transmitted by an out-of-band means. In case of SDP, MIME parameter 462 "config" is utilized (see section 5.3). 464 4.2 Use of RTP Header Fields for MPEG-4 Audio 466 Payload Type (PT): The assignment of an RTP payload type for this new 467 packet format is outside the scope of this document, and will not be 468 specified here. It is expected that the RTP profile for a particular 469 class of applications will assign a payload type for this encoding, or if 470 that is not done then a payload type in the dynamic range shall be chosen 471 by means of an out of band signaling protocol (e.g. H.245, SIP, etc). In 472 the dynamic assignment of RTP payload types for scalable streams, a 473 different value should be assigned to each layer. The assigned values 474 should be in order of enhance layer dependency, where the base layer has 475 the smallest value. 477 Marker (M) bit: The marker bit indicates audioMuxElement boundaries. It 478 is set to one to indicate that the RTP packet contains a complete 479 audioMuxElement or the last fragment of an audioMuxElement. 481 Timestamp: The timestamp indicates the sampling instance of the first 482 audio frame contained in the RTP packet. Timestamps are recommended to 483 start at a random value for security reasons. 485 Unless specified by an out-of-band means, the resolution of the timestamp 486 is set to its default value of 90 kHz. 488 Sequence Number: Incremented by one for each RTP packet sent, starting, 489 for security reasons, with a random value. 491 Other header fields are used as described in RFC 1889 [8]. 493 4.3 Fragmentation of MPEG-4 Audio bitstream 495 It is RECOMMENDED to put one audioMuxElement in each RTP packet. If the 496 size of an audioMuxElement can be kept small enough that the size of the 497 RTP packet containing it does not exceed the size of the path-MTU, this 498 will be no problem. If it cannot, the audioMuxElement MAY be fragmented 499 and spread across multiple packets. 501 5. MIME type registration for MPEG-4 Audio/Visual streams 503 The following sections describe the MIME type registrations for MPEG-4 504 Audio/Visual streams. MIME type registration and SDP usage for the MPEG-4 505 Visual stream are described in Sections 5.1 and 5.2, respectively, while 506 MIME type registration and SDP usage for MPEG-4 Audio stream are 507 described in Sections 5.3 and 5.4, respectively. 509 (In the following sections, the RFC number "XXXX" represents the RFC 510 number, which should be assigned for this document.) 512 5.1 MIME type registration for MPEG-4 Visual 514 MIME media type name: video 516 MIME subtype name: MP4V-ES 518 Required parameters: none 520 Optional parameters: 521 rate: This parameter is used only for RTP transport. It indicates the 522 resolution of the timestamp field in the RTP header. If this parameter 523 is not specified, its default value of 90000 (90KHz) is used. 525 profile-level-id: A decimal representation of MPEG-4 Visual Profile 526 Level indication value (profile_and_level_indication) defined in Table 527 G-1 of ISO/IEC 14496-2 [2][4]. This parameter MAY be used in the 528 capability exchange or session setup procedure to indicate MPEG-4 529 Visual Profile and Level combination of which the MPEG-4 Visual codec 530 is capable. If this parameter is not specified by the procedure, its 531 default value of 1 (Simple Profile/Level 1) is used. 533 config: This parameter SHALL be used to indicate the configuration of 534 the corresponding MPEG-4 visual bitstream. It SHALL NOT be used to 535 indicate the codec capability in the capability exchange procedure. It 536 is a hexadecimal representation of an octet string that expresses the 537 MPEG-4 Visual configuration information, as defined in subclause 6.2.1 538 Start codes of ISO/IEC14496-2[2][4][9]. The configuration information 539 is mapped onto the octet string in an MSB-first basis. The first bit 540 of the configuration information SHALL be located at the MSB of the 541 first octet. The configuration information indicated by this parameter 542 SHALL be the same as the configuration information in the 543 corresponding MPEG-4 Visual stream, except for 544 first_half_vbv_occupancy and latter_half_vbv_occupancy, if exist, 545 which may vary in the repeated configuration information inside an 546 MPEG-4 Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2). 548 Example usages for these parameters are: 549 - MPEG-4 Visual Simple Profile/Level 1: 550 Content-type: video/mp4v-es; profile-level-id=1 552 - MPEG-4 Visual Core Profile/Level 2: 553 Content-type: video/mp4v-es; profile-level-id=34 555 - MPEG-4 Visual Advanced Real Time Simple Profile/Level 1: 556 Content-type: video/mp4v-es; profile-level-id=145 558 Published specification: 559 The specifications for MPEG-4 Visual streams are presented in ISO/IEC 560 14469-2[2][4][9]. The RTP payload format is described in RFCXXXX. 562 Encoding considerations: 563 Video bitstreams must be generated according to MPEG-4 Visual 564 specifications (ISO/IEC 14496-2). A video bitstream is binary data and 565 must be encoded for non-binary transport (for Email, the Base64 566 encoding is sufficient). This type is also defined for transfer via 567 RTP. The RTP packets MUST be packetized according to the MPEG-4 Visual 568 RTP payload format defined in RFCXXXX. 570 Security considerations: 571 See section 6 of RFCXXXX. 573 Interoperability considerations: 574 MPEG-4 Visual provides a large and rich set of tools for the coding of 575 visual objects. For effective implementation of the standard, subsets 576 of the MPEG-4 Visual tool sets have been provided for use in specific 577 applications. These subsets, called 'Profiles', limit the size of the 578 tool set a decoder is required to implement. In order to restrict 579 computational complexity, one or more Levels are set for each Profile. 580 A Profile@Level combination allows: 582 o a codec builder to implement only the subset of the standard he 583 needs, while maintaining interworking with other MPEG-4 devices 584 included in the same combination, and 586 o checking whether MPEG-4 devices comply with the standard 587 ('conformance testing'). 589 The visual stream SHALL be compliant with the MPEG-4 Visual 590 Profile@Level specified by the parameter "profile-level-id". 591 Interoperability between a sender and a receiver may be achieved by 592 specifying the parameter "profile-level-id" in MIME content, or by 593 arranging in the capability exchange/announcement procedure to set this 594 parameter mutually to the same value. 596 Applications which use this media type: 597 Audio and visual streaming and conferencing tools, Internet messaging 598 and Email applications. 600 Additional information: none 602 Person & email address to contact for further information: 603 The authors of RFCXXXX. (See section 8) 605 Intended usage: COMMON 607 Author/Change controller: 609 The authors of RFCXXXX. (See section 8) 611 5.2 SDP usage of MPEG-4 Visual 613 The MIME media type video/MP4V-ES string is mapped to fields in the 614 Session Description Protocol (SDP), RFC 2327, as follows: 616 o The MIME type (video) goes in SDP "m=" as the media name. 618 o The MIME subtype (MP4V-ES) goes in SDP "a=rtpmap" as the encoding name. 620 o The optional parameter "rate" goes in "a=rtpmap" as the clock rate. 622 o The optional parameter "profile-level-id" and "config" go in the 623 "a=fmtp" line to indicate the coder capability and configuration, 624 respectively. These parameters are expressed as a MIME media type string, 625 in the form of as a semicolon separated list of parameter=value pairs. 627 The following are some examples of media representation in SDP: 629 Simple Profile/Level 1, rate=90000(90KHz), "profile-level-id" and 630 "config" are present in "a=fmtp" line: 631 m=video 49170/2 RTP/AVP 98 632 a=rtpmap:98 MP4V-ES/90000 633 a=fmtp:98 profile-level-id=1;config=000001B001000001B509000001000000012 634 0008440FA282C2090A21F 636 Core Profile/Level 2, rate=90000(90KHz), "profile-level-id" is present in 637 "a=fmtp" line: 638 m=video 49170/2 RTP/AVP 98 639 a=rtpmap:98 MP4V-ES/90000 640 a=fmtp:98 profile-level-id=34 642 Advance Real Time Simple Profile/Level 1, rate=25(25Hz), "profile-level- 643 id" is present in "a=fmtp" line: 644 m=video 49170/2 RTP/AVP 98 645 a=rtpmap:98 MP4V-ES/25 646 a=fmtp:98 profile-level-id=145 648 5.3 MIME type registration of MPEG-4 Audio 650 MIME media type name: audio 652 MIME subtype name: MP4A-LATM 654 Required parameters: 655 rate: the rate parameter indicates the RTP time stamp clock rate. The 656 default value is 90000. Other rates MAY be specified only if they are 657 set to the same value as the audio sampling rate (number of samples 658 per second). 660 Optional parameters: 661 profile-level-id: a decimal representation of MPEG-4 Audio Profile 662 Level indication value defined in ISO/IEC 14496-1 [10]. This parameter 663 indicates which MPEG-4 Audio tool subsets the decoder is capable of 664 using. If this parameter is not specified in the capability exchange 665 or session setup procedure, its default value of 30 (Natural Audio 666 Profile/Level 1) is used. 668 object: a decimal representation of the MPEG-4 Audio Object Type value 669 defined in ISO/IEC 14496-3 [5]. This parameter specifies the tool to 670 be used by the coder. It CAN be used to limit the capability within 671 the specified "profile-level-id". 673 bitrate: the data rate for the audio bit stream. 675 cpresent: this parameter indicates whether audio payload configuration 676 data has been multiplexed into an RTP payload (see section 4.1). If 677 not specified, the default value is 1. 679 config: a hexadecimal representation of an octet string that expresses 680 the audio payload configuration data "StreamMuxConfig", as defined in 681 ISO/IEC 14496-3 [5] (see section 4.1). Configuration data is mapped 682 onto the octet string in an MSB-first basis. The first bit of the 683 configuration data SHALL be located at the MSB of the first octet. In 684 the last octet, zero-padding bits, if necessary, shall follow the 685 configuration data. 687 ptime: RECOMMENDED duration of each packet in milliseconds. 689 Published specification: 690 Payload format specifications are described in this document. Encoding 691 specifications are provided in ISO/IEC 14496-3 [3][5]. 693 Encoding considerations: 694 This type is only defined for transfer via RTP. 696 Security considerations: 697 See Section 6 of RFCXXXX. 699 Interoperability considerations: 700 MPEG-4 Audio provides a large and rich set of tools for the coding of 701 audio objects. For effective implementation of the standard, subsets of 702 the MPEG-4 Audio tool sets similar to those used in MPEG-4 Visual have 703 been provided (see section 5.1). 705 The audio stream SHALL be compliant with the MPEG-4 Audio 706 Profile@Level specified by the parameter "profile-level-id". 707 Interoperability between a sender and a receiver may be achieved by 708 specifying the parameter "profile-level-id" in MIME content, or by 709 arranging in the capability exchange procedure to set this parameter 710 mutually to the same value. Furthermore, the "object" parameter can be 711 used to limit the capability within the specified Profile@Level in 712 capability exchange. 714 Applications which use this media type: 715 Audio and video streaming and conferencing tools. 717 Additional information: none 719 Personal & email address to contact for further information: 720 See Section 8 of RFCXXXX. 722 Intended usage: COMMON 724 Author/Change controller: 725 See Section 8 of RFCXXXX. 727 5.4 SDP usage of MPEG-4 Audio 729 The MIME media type audio/MP4A-LATM string is mapped to fields in the 730 Session Description Protocol (SDP), RFC 2327, as follows: 732 o The MIME type (audio) goes in SDP "m=" as the media name. 734 o The MIME subtype (MP4A-LATM) goes in SDP "a=rtpmap" as the encoding 735 name. 737 o The required parameter "rate" goes in "a=rtpmap" as the clock rate. 739 o The optional parameter "ptime" goes in SDP "a=ptime" attribute. 741 o The optional parameter "profile-level-id" goes in the "a=fmtp" line to 742 indicate the coder capability. The "object" parameter goes in the 743 "a=fmtp" attribute. The payload-format-specific parameters "bitrate", 744 "cpresent" and "config" go in the "a=fmtp" line. These parameters are 745 expressed as a MIME media type string, in the form of as a semicolon 746 separated list of parameter=value pairs. 748 The following are some examples of the media representation in SDP: 750 For 6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz), 751 m=audio 49230 RTP/AVP 96 752 a=rtpmap:96 MP4A-LATM/8000 753 a=fmtp:96 profile-level-id=9;object=8;cpresent=0;config=9128B1071070 754 a=ptime:20 756 For 64 kb/s AAC LC stereo bitstreams (with an audio sampling rate of 24 757 kHz), 758 m=audio 49230 RTP/AVP 96 759 a=rtpmap:96 MP4A-LATM/24000 760 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 761 config=9122620000 763 In the above two examples, audio configuration data is not multiplexed 764 into the RTP payload and is described only in SDP. Furthermore, the 765 "clock rate" is set to the audio sampling rate. 767 If the clock rate has been set to its default value and it is necessary 768 to obtain the audio sampling rate, this can be done by parsing the 769 "config" parameter (see the following example). 771 m=audio 49230 RTP/AVP 96 772 a=rtpmap:96 MP4A-LATM/90000 773 a=fmtp:96 object=8; cpresent=0; config=9128B1071070 775 The following example shows that the audio configuration data appears in 776 the RTP payload. 778 m=audio 49230 RTP/AVP 96 779 a=rtpmap:96 MP4A-LATM/90000 780 a=fmtp:96 object=2; cpresent=1 782 6. Security Considerations 784 RTP packets using the payload format defined in this specification are 785 subject to the security considerations discussed in the RTP specification 786 [8]. This implies that confidentiality of the media streams is achieved 787 by encryption. Because the data compression used with this payload format 788 is applied end-to-end, encryption may be performed on the compressed data 789 so there is no conflict between the two operations. 791 The complete MPEG-4 system allows for transport of a wide range of 792 content, including Java applets (MPEG-J) and scripts. Since this payload 793 format is restricted to audio and video streams, it is not possible to 794 transport such active content in this format. 796 7. References 798 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, 799 RFC 2026, October 1996. 801 2 ISO/IEC 14496-2:1999, "Information technology - Coding of audio-visual 802 objects - Part2: Visual", December 1999. 804 3 ISO/IEC 14496-3:1999, "Information technology - Coding of audio-visual 805 objects - Part3: Audio", December 1999. 807 4 ISO/IEC 14496-2:1999/FDAM1:2000, December 1999. 809 5 ISO/IEC 14496-3:1999/FDAM1:2000, December 1999. 811 6 ISO/IEC 14496-1:1999, "Information technology - Coding of audio-visual 812 objects - Part1: Systems", December 1999. 814 7 Bradner, S., "Key words for use in RFCs to Indicate Requirement 815 Levels", BCP 14, RFC 2119, March 1997 817 8 H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson "RTP: A Transport 818 Protocol for Real Time Applications", RFC 1889, Internet Engineering 819 Task Force, January 1996. 821 9 ISO/IEC 14496-2:1999/COR1:2000, "Information technology - Coding of 822 audio-visual objects - Part2: Visual, Technical corrigendum 1", August 823 2000. 825 10 ISO/IEC 14496-1:1999/FDAM1:2000, December 1999. 827 8. Author's Addresses 829 Yoshihiro Kikuchi 830 Toshiba corporation 831 1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki, 212-8582, Japan 832 Email: yoshihiro.kikuchi@toshiba.co.jp 834 Yoshinori Matsui 835 Matsushita Electric Industrial Co., LTD. 836 1006, Kadoma, Kadoma-shi, Osaka, Japan 837 Email: matsui@drl.mei.co.jp 839 Toshiyuki Nomura 840 NEC Corporation 841 4-1-1,Miyazaki,Miyamae-ku,Kawasaki,JAPAN 842 Email: t-nomura@ccm.cl.nec.co.jp 844 Shigeru Fukunaga 845 Oki Electric Industry Co., Ltd. 846 1-2-27 Shiromi, Chuo-ku, Osaka 540-6025 Japan. 847 Email: fukunaga444@oki.co.jp 849 Hideaki Kimata 850 Nippon Telegraph and Telephone Corporation 851 1-1, Hikari-no-oka, Yokosuka-shi, Kanagawa, Japan 852 Email: kimata@nttvdt.hil.ntt.co.jp 854 Full Copyright Statement 856 "Copyright (C) The Internet Society (date). All Rights Reserved. 858 This document and translations of it may be copied and furnished to 859 others, and derivative works that comment on or otherwise explain it 860 or assist in its implementation may be prepared, copied, published 861 and distributed, in whole or in part, without restriction of any 862 kind, provided that the above copyright notice and this paragraph 863 are included on all such copies and derivative works. However, this 864 document itself may not be modified in any way, such as by removing 865 the copyright notice or references to the Internet Society or other 866 Internet organizations, except as needed for the purpose of 867 developing Internet standards in which case the procedures for 868 copyrights defined in the Internet Standards process must be 869 followed, or as required to translate it into languages other than 870 English. 872 The limited permissions granted above are perpetual and will not be 873 revoked by the Internet Society or its successors or assigns.