idnits 2.17.1 draft-ietf-avt-rtp-mpeg4-es-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 215 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** The abstract seems to contain references ([2], [3]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 48 has weird spacing: '...riptors but b...' == Line 745 has weird spacing: '...cessing to ca...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 6, 2000) is 8694 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 14 looks like a reference -- Missing reference section? '2' on line 513 looks like a reference -- Missing reference section? '3' on line 643 looks like a reference -- Missing reference section? '6' on line 44 looks like a reference -- Missing reference section? '7' on line 120 looks like a reference -- Missing reference section? '9' on line 513 looks like a reference -- Missing reference section? '4' on line 513 looks like a reference -- Missing reference section? '8' on line 739 looks like a reference -- Missing reference section? '5' on line 643 looks like a reference -- Missing reference section? '11' on line 614 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Yoshihiro Kikuchi - Toshiba 2 Internet Draft Toshiyuki Nomura - NEC 3 Document: draft-ietf-avt-rtp-mpeg4-es-02.txt Shigeru Fukunaga - Oki 4 Yoshinori Matsui - Matsushita 5 Hideaki Kimata - NTT 6 July 6, 2000 8 RTP payload format for MPEG-4 Audio/Visual streams 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with all 13 provisions of Section 10 of RFC2026 [1]. 15 Internet-Drafts are working documents of the Internet Engineering Task 16 Force (IETF), its areas, and its working groups. Note that other groups 17 may also distribute working documents as Internet-Drafts. Internet-Drafts 18 are draft documents valid for a maximum of six months and may be updated, 19 replaced, or obsoleted by other documents at any time. It is 20 inappropriate to use Internet- Drafts as reference material or to cite 21 them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 24 The list of Internet-Draft Shadow Directories can be accessed at 25 http://www.ietf.org/shadow.html. 27 Abstract 29 This document describes RTP payload formats for carrying of MPEG-4 Audio 30 and Visual bitstreams[2][3]. For the purpose of directly mapping MPEG-4 31 Audio/Visual bitstreams onto RTP packets, it provides specifications for 32 the use of RTP header fields and also specifies fragmentation rules. It 33 also provides specifications for MIME type registrations and the use of 34 SDP. 36 1. Introduction 38 The RTP payload formats described in this Internet-Draft specify a way of 39 how MPEG-4 Audio and Visual streams are to be fragmented and mapped 40 directly onto RTP packets. 42 These RTP payload formats enable to carry MPEG-4 Audio/Visual streams 43 without using the synchronization and stream management functionality of 44 MPEG-4 Systems [6]. Such RTP payload format would be used within systems 45 where their own stream management functionality is provided and thus such 46 functionality in MPEG-4 Systems is not necessary. H.323 terminals are an 47 example of such systems. MPEG-4 Audio/Visual streams are not managed by 48 MPEG-4 Systems Object Descriptors but by H.245. The streams are directly 49 mapped onto RTP packets without using the synchronization functionality 50 of MPEG-4 Systems. Other examples are SIP and RTSP where attribute of the 51 video stream (e.g. media type, packetization format and configuration) is 52 specified in MIME and SDP parameters. 54 The semantics of RTP headers in such cases need to be clearly defined, 55 including the association with MPEG-4 Audio/Visual data elements. In 56 addition, it would be beneficial to define the fragmentation rules of RTP 57 packets for MPEG-4 Video streams so as to enhance error resiliency by 58 utilizing the error resilience tools provided inside the MPEG-4 Video 59 stream. These issues, however, have yet to be addressed by other RTP 60 payload format specifications. 62 1.1 MPEG-4 Visual RTP payload format 64 MPEG-4 Visual is a visual coding standard with many new features: high 65 coding efficiency; high error resiliency; multiple, arbitrary shape 66 object-based coding; etc. [2]. It covers a wide range of bitrate from 67 scores of Kbps to several Mbps. It also covers a wide variety of 68 networks, ranging from those guaranteed to be almost error-free to mobile 69 networks with high error rates. 71 With respect to the fragmentation rules for an MPEG-4 visual bitstream 72 defined in this document, since MPEG-4 Visual is used for a wide variety 73 of networks, it is desirable not to apply too much restriction on 74 fragmentation, and a fragmentation rule such as "a single video packet 75 shall always be mapped on a single RTP packet" may be inappropriate. On 76 the other hand, careless, media unaware fragmentation may cause 77 degradation in error resiliency and bandwidth efficiency. The 78 fragmentation rules described in this document are flexible but manage to 79 define the minimum rules for preventing meaningless fragmentation and for 80 utilizing the error resilience of MPEG-4 visual. 82 While the additional media specific RTP header defined for such video 83 coding tools as H.261 or MPEG-1/2 is effective in helping to recover 84 picture headers corrupted by packet losses, in MPEG-4 Visual there are 85 already error resilience functionalities for recovering corrupt headers, 86 and these can be used on RTP/IP networks, as well as on other networks. 88 (H.223/mobile, MPEG-2/TS, etc.) That is why no extra RTP header fields 89 are defined in the MPEG-4 Visual RTP payload format proposed here. 91 1.2 MPEG-4 Audio RTP payload format 93 MPEG-4 Audio is a new kind of audio standard that integrates many 94 different types of audio coding tools. It also supports a mechanism for 95 representing synthesized sounds. Low-overhead MPEG-4 Audio Transport 96 Multiplex (LATM) manages the sequences of audio data with relatively 97 small overhead. In audio-only applications, then, it is desirable for 98 LATM-based MPEG-4 Audio bitstreams to be directly mapped onto the RTP 99 packets without using MPEG-4 Systems. 101 For MPEG-4 Audio coding tools except synthesis tools, as is true for 102 other audio coders, if the payload of a packet is a single audio frame, 103 packet loss will not impair the decodability of adjacent packets. On the 104 other hands, MPEG-4 Audio synthesis tools may be sensitive to error. For 105 example, an SA_access_unit in the payload may set a global value to a new 106 value, which is then references throughout the audio content to make a 107 macro change in the performance. In this case, an error in the payload 108 influences all audio data produced after the error. In order to enhance 109 error resiliency, the element of SA_access_unit that makes the above 110 macro change should be transmitted across several SA_access_unit 111 repeatedly. The number of repetition will be dependent on the network 112 condition. Therefore, the additional media specific header for recovering 113 errors will not be required for MPEG-4 Audio. 115 2. Conventions used in this document 117 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 118 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 119 document are to be interpreted as described in RFC-2119 [7]. 121 3. RTP Packetization of MPEG-4 Visual bitstream 123 This section specifies RTP packetization rules for MPEG-4 Visual content. 124 An MPEG-4 Visual bitstream is mapped directly onto the RTP payload 125 without any addition of extra header fields or any removal of Visual 126 syntax elements. The Combined Configuration/Elementary stream mode is 127 used so that configuration information will be carried to the same RTP 128 port as the elementary stream. (see 6.2.1 "Start codes" of ISO/IEC 14496- 129 2 [2][9][4]) The configuration information MAY additionally be specified 130 by some out-of-band means; in H.323 terminals, H.245 codepoint 131 "decoderConfigurationInformation" MAY be used for this purpose; in 132 systems using MIME content type and SDP parameters, e.g. SIP and RTSP, 133 the optional parameter "config" MAY be used to specify the configuration 134 information. (see 5.1 and 5.2) 136 When the short video header mode is used, the RTP payload format used MAY 137 be that specified for H.263 in the relevant RFCs or in other relevant 138 standards. 140 0 1 2 3 141 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 142 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 143 |V=2|P|X| CC |M| PT | sequence number | RTP 144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 145 | timestamp | Header 146 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 147 | synchronization source (SSRC) identifier | 148 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 149 | contributing source (CSRC) identifiers | 150 | .... | 151 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 152 | | RTP 153 | MPEG-4 Visual stream (byte aligned) | Payload 154 | | 155 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 156 | :...OPTIONAL RTP padding | 157 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 159 Figure 1 - An RTP packet for MPEG-4 Visual stream 161 3.1 Use of RTP header fields for MPEG-4 Visual 163 Payload Type (PT): Payload type is to be specifically assigned as the 164 MPEG-4 Visual RTP payload format. If this assignment is to be carried out 165 dynamically, it can be performed by such out-of-band means as H.245, SDP, 166 etc. 168 Extension (X) bit: Defined by the RTP profile used. 170 Sequence Number: Incremented by one for each RTP data packet sent, 171 starting, for security reasons, with a random initial value. 173 Marker (M) bit: The marker bit is set to one to indicate the last RTP 174 packet (or only RTP packet) of a VOP. 176 Timestamp: The timestamp indicates the composition time, or the 177 presentation time in a no-compositor decoder. A constant offset, which is 178 random, is added for security reasons. For a video object plane, it is 179 defined as vop_time_increment (in units of 180 1/vop_time_increment_resolution seconds) plus the cumulative number of 181 whole seconds specified by module_time_base and, if present, time_code of 182 Group_of_VideoObjectPlane() fields. In the case of interlaced video, a 183 VOP will consist of lines from two fields, and the timestamp will 184 indicate the composition time of the first field. If the RTP packet 185 contains only configuration information and/or 186 Group_of_VideoObjectPlane() fields, the composition time of the next VOP 187 in the coding order is used. If the RTP packet contains only 188 visual_object_sequence_end_code information, the composition time of the 189 immediately preceding VOP in the coding order is used. 191 The resolution of the timestamp is set to its default value of 90KHz, 192 unless specified by an out-of-band means (e.g. SDP parameter or MIME 193 parameter as defined in section 5). 195 SSRC, CC and CSRC fields are used as described in RFC 1889 [8]. 197 3.2 Fragmentation of MPEG-4 Visual bitstream 199 A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP 200 payload without any addition of extra header fields or any removal of 201 Visual syntax elements. The Combined Configuration/Elementary streams 202 mode is used. The following rules apply for the fragmentation. 204 (1) Configuration information and Group_of_VideoObjectPlane() fields 205 SHALL be placed at the beginning of the RTP payload (just after the RTP 206 header) or just after the header of the syntactically upper layer 207 function. 209 (2) If one or more headers exist in the RTP payload, the RTP payload 210 SHALL begin with the header of the syntactically highest function. 211 Note: The visual_object_sequence_end_code is regarded as the lowest 212 function. 214 (3) A header SHALL NOT be split into a plurality of RTP packets. 216 (4) Two or more VOPs SHALL be fragmented into different RTP packets so 217 that one RTP packet consists of the data bytes associated with a unique 218 presentation time (that is indicated in the timestamp field in the RTP 219 packet header). 221 (5) A single video packet SHOULD NOT be split into a plurality of RTP 222 packets. The size of a video packet SHOULD be adjusted in such a way that 223 the resulting RTP packet is not larger than the path-MTU. A video packet 224 MAY be split into a plurality of RTP packets when the size of the video 225 packet is large. 226 (Rule (5) does not apply to the enhancement layer of the scalable streams 227 where the video packet is not supported.) 229 Here, header means: 230 - Configuration information (Visual Object Sequence Header, Visual Object 231 Header and Video Object Layer Header) 232 - visual_object_sequence_end_code 233 - The header of the entry point function for an elementary stream 234 (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(), 235 video_plane_with_short_header(), MeshObject() or FaceObject()) 236 - The video packet header (video_packet_header() excluding 237 next_resync_marker()) 238 - The header of gob_layer() 239 See 6.2.1 "Start codes" of ISO/IEC 14496-2[2][9][4] for the definition of 240 the configuration information and the entry point functions. 242 The video packet starts with the VOP header or the video packet header, 243 followed by motion_shape_texture(), and ends with next_resync_marker() or 244 next_start_code(). 246 3.3 Examples of packetized MPEG-4 Visual bitstream 248 Considering the fact that MPEG-4 Visual covers a wide variety of networks 249 ranging from scores of Kbps to several Mbps, and from those guaranteed to 250 be almost error-free to mobile networks with high error rates, it is 251 desirable not to apply too much restriction on fragmentation. On the 252 other hand, careless, media unaware fragmentation will cause degradation 253 in error resiliency and bandwidth efficiency. The fragmentation criteria 254 described in 3.2 are flexible but to define the minimum rules to prevent 255 meaningless fragmentation. 257 Figure 2 shows examples of RTP packets generated based on the criteria 258 described in 3.2 260 (a) is an example of the first RTP packet or the random access point of 261 an MPEG-4 visual bitstream containing the configuration information. 262 According to criterion (1), the Visual Object Sequence Header(VS header) 263 is placed at the beginning of the RTP payload, preceding the Visual 264 Object Header and the Video Object Layer Header(VO header, VOL header). 265 Since the fragmentation rule defined in 3.2 guarantees that the 266 configuration information, starting with 267 visual_object_sequence_start_code, is always placed at the beginning of 268 the RTP payload, RTP receivers can detect the random access point by 269 checking if the first 32-bit field of the RTP payload is 270 visual_object_sequence_start_code. 272 (b) is another example of the RTP packet containing the configuration 273 information. It differs from example (a) in that the RTP packet also 274 contains a video packet in the VOP following the configuration 275 information. Since the length of the configuration information is 276 relatively short (typically scores of bytes) and an RTP packet containing 277 only the configuration information may thus increase the overhead, the 278 configuration information and the immediately following GOV and/or (a 279 part of) VOP can be effectively packetized into a single RTP packet as in 280 this example. 282 (c) is an example of the RTP packet that contains 283 Group_of_VideoObjectPlane(GOV). Following criterion (1), the GOV is 284 placed at the beginning of the RTP payload. It would be a waste of RTP/IP 285 header overhead to generate an RTP packet containing only a GOV whose 286 length is 7 bytes. Therefore, (a part of) the following VOP can be placed 287 in the same RTP packet as shown in (c). 289 (d) is an example of the case where one video packet is packetized into 290 one RTP packet. When the packet-loss rate of the underlying network is 291 high, this kind of packetization is recommended. It is recommended to set 292 resync_marker_disable to 0 in the VOL header to enable the adjustment of 293 the video packet size. Even when the RTP packet containing the VOP header 294 is discarded by a packet loss, the other RTP packets can be decoded by 295 using the HEC(Header Extension Code) information in the video packet 296 header. No extra RTP header field is necessary. 298 (e) is an example of the case where more than one video packets are 299 packetized into one RTP packet. This kind of packetization is effective 300 to save the overhead of RTP/IP headers when the bit-rate of the 301 underlying network is low. However, it will decrease the packet-loss 302 resiliency because multiple video packets are discarded by a single RTP 303 packet loss. The optimal number of video packets in an RTP packet and the 304 length of the RTP packet can be determined considering the packet-loss 305 rate and the bit-rate of the underlying network. 307 Figure 3 shows examples of RTP packets prohibited by the criteria of 3.2. 309 Fragmentation of a header into multiple RTP packets, as in (a), will not 310 only increase the overhead of RTP/IP headers but also decrease the error 311 resiliency. Therefore, it is prohibited by the criterion (3). 313 When concatenating more than one video packets into an RTP packet, VOP 314 header or video_packet_header() shall not be placed in the middle of the 315 RTP payload. The packetization as in (b) is not allowed by criterion (2) 316 due to the aspect of the error resiliency. Comparing this example with 317 Figure 2(d), although two video packets are mapped onto two RTP packets 318 in both cases, the packet-loss resiliency is not identical. Namely, if 319 the second RTP packet is lost, both video packets 1 and 2 are lost in the 320 case of Figure 3(b) whereas only video packet 2 is lost in the case of 321 Figure 2(d). 323 An RTP packet containing more than one VOPs, as in (c), is not allowed. 325 +------+------+------+------+ 326 (a) | RTP | VS | VO | VOL | 327 |header|header|header|header| 328 +------+------+------+------+ 330 +------+------+------+------+------------+ 331 (b) | RTP | VS | VO | VOL |Video Packet| 332 |header|header|header|header| | 333 +------+------+------+------+------------+ 335 +------+-----+------------------+ 336 (c) | RTP | GOV |Video Object Plane| 337 |header| | | 338 +------+-----+------------------+ 340 +------+------+------------+ +------+------+------------+ 341 (d) | RTP | VOP |Video Packet| | RTP | VP |Video Packet| 342 |header|header| (1) | |header|header| (2) | 343 +------+------+------------+ +------+------+------------+ 345 +------+------+------------+------+------------+------+------------+ 346 (e) | RTP | VP |Video Packet| VP |Video Packet| VP |Video Packet| 347 |header|header| (1) |header| (2) |header| (3) | 348 +------+------+------------+------+------------+------+------------+ 350 Figure 2 - Examples of RTP packetized MPEG-4 Visual bitstream 352 +------+-------------+ +------+------------+------------+ 353 (a) | RTP |First half of| | RTP |Last half of|Video Packet| 354 |header| VP header | |header| VP header | | 355 +------+-------------+ +------+------------+------------+ 357 +------+------+----------+ +------+---------+------+------------+ 358 (b) | RTP | VOP |First half| | RTP |Last half| VP |Video Packet| 359 |header|header| of VP(1) | |header| of VP(1)|header| (2) | 360 +------+------+----------+ +------+---------+------+------------+ 362 +------+------+------------------+------+------------------+ 363 (c) | RTP | VOP |Video Object Plane| VOP |Video Object Plane| 364 |header|header| (1) |header| (2) | 365 +------+------+------------------+------+------------------+ 367 Figure 3 - Examples of prohibited RTP packetization for MPEG-4 Visual 368 bitstream 370 4. RTP Packetization of MPEG-4 Audio bitstream 372 This section specifies RTP packetization rules for MPEG-4 Audio 373 bitstreams. MPEG-4 Audio streams are formatted by LATM (Low-overhead 374 MPEG-4 Audio Transport Multiplex) tool[5], and the LATM-based streams are 375 then mapped onto RTP packets as described the three sections below. 377 4.1 RTP Packet Format 379 LATM-based streams consist of a sequence of audioMuxElements that include 380 one or more audio frames. A complete audioMuxElement or a part of one 381 SHALL be mapped directly onto an RTP payload without any removal of 382 audioMuxElement syntax elements (see Figure 4). The first byte of each 383 audioMuxElement SHALL be located at the first payload location in an RTP 384 packet. 386 0 1 2 3 387 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 388 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 389 |V=2|P|X| CC |M| PT | sequence number |RTP 390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 391 | timestamp |Header 392 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 393 | synchronization source (SSRC) identifier | 394 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 395 | contributing source (CSRC) identifiers | 396 | .... | 397 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 398 | |RTP 399 : audioMuxElement (byte aligned) :Payload 400 | | 401 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 402 | :...OPTIONAL RTP padding | 403 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 404 Figure 4 - An RTP packet for MPEG-4 Audio 406 In order to decode the audioMuxElement, the following muxConfigPresent 407 information is required to be indicated by an out-of-band means. 409 muxConfigPresent: If this value is set to 1, the audioMuxElement SHALL 410 include an indication bit "useSameStreamMux" and MAY include the 411 configuration information for audio compression "StreamMuxConfig". The 412 useSameStreamMux bit indicates whether the StreamMuxConfig element in the 413 previous frame is applied in the current frame. 415 4.2 Use of RTP Header Fields for MPEG-4 Audio 417 Payload Type (PT): Payload type is to be specifically assigned as the 418 MPEG-4 Audio RTP payload format. If this assignment is to be carried out 419 dynamically, it can be performed by such out-of-band means as H.245, SDP, 420 etc. 422 Marker (M) bit: The marker bit indicates audioMuxElement boundaries. It 423 is set to one to indicate that the RTP packet contains a complete 424 audioMuxElement or the last fragment of an audioMuxElement. 426 Timestamp: The timestamp indicates composition time, or presentation time 427 in a no-compositor decoder. Timestamps are recommended to start at a 428 random value for security reasons. 430 Unless specified by an out-of-band means, the resolution of the timestamp 431 is set to its default value of 90 kHz. 433 Sequence Number: Incremented by one for each RTP packet sent, starting, 434 for security reasons, with a random value. 436 SSRC, CC and CSRC fields are used as described in RFC 1889 [8]. 438 4.3 Fragmentation of MPEG-4 Audio bitstream 440 It is desirable to put one audioMuxElement in each RTP packet. If the 441 size of an audioMuxElement can be kept small enough that the size of the 442 RTP packet containing it does not exceed the size of the path-MTU, this 443 will be no problem. If it cannot, the audioMuxElement MAY be fragmented 444 and spread across multiple packets, following the rules below: 446 (1) "payloadMux", which consists of payload elements, MAY be fragmented 447 across several RTP packets, so that each of those RTP packets will 448 contain one or more payload elements. Individual payload elements 449 themselves SHOULD NOT be fragmented. 451 (2) If the audioMuxElement includes StreamMuxConfig, StreamMuxConfig 452 SHALL be included in the RTP packet that contains the first payload 453 element. 455 5. MIME type registration for MPEG-4 Audio/Visual streams 457 The following sections describe the MIME type registrations for MPEG-4 458 Audio/Visual streams. MIME type registration and SDP usage for the MPEG-4 459 Visual stream are described in Sections 5.1 and 5.2, respectively, while 460 MIME type registration and SDP usage for MPEG-4 Audio stream are 461 described in Sections 5.3 and 5.4, respectively. 463 (In the following sections, the RFC number "XXXX" represents the RFC 464 number, which should be assigned for this Internet Draft.) 466 5.1 MIME type registration for MPEG-4 Visual 468 MIME media type name: video 469 MIME subtype name: MP4V 471 Required parameters: none 473 Optional parameters: 474 rate: This parameter is used only for RTP transport. It indicates the 475 resolution of the timestamp field in the RTP header. If this parameter 476 is not specified, its default value of 90000 (90KHz) is used. 478 profile-level-id: A decimal representation of MPEG-4 Visual Profile 479 Level indication value (profile_and_level_indication) defined in Table 480 G-1 of ISO/IEC 14496-2 [2][4]. 482 config: A hexadecimal representation of an octet string that expresses 483 the MPEG-4 Visual configuration information, as defined in subclause 484 6.2.1 Start codes of ISO/IEC14496-2[2][4][9]. The configuration 485 information is mapped onto the octet string in an MSB-first basis. The 486 first bit of the configuration information SHALL be located at the MSB 487 of the first octet. The configuration information indicated by this 488 parameter SHALL be the same as the configuration information in the 489 corresponding MPEG-4 Visual stream, except for 490 first_half_vbv_occupancy and latter_half_vbv_occupancy, if exist, 491 which may vary in the repeated configuration information inside an 492 MPEG-4 Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2). 494 The parameter "profile-level-id" MAY be used in the capability 495 exchange procedure to indicate MPEG-4 Visual Profile and Level 496 combination of which the MPEG-4 Visual codec is capable. The parameter 497 "config" MAY be used to indicate the configuration of the 498 corresponding MPEG-4 visual bitstream, but SHALL NOT be used to 499 indicate the codec capability in the capability exchange procedure. 501 Example usages for these parameters are: 502 - MPEG-4 Visual Simple Profile/Level 1: 503 Content-type: video/mp4v; profile-level-id=1 505 - MPEG-4 Visual Core Profile/Level 2: 506 Content-type: video/mp4v; profile-level-id=34 508 - MPEG-4 Visual Advanced Real Time Simple Profile/Level 1: 509 Content-type: video/mp4v; profile-level-id=145 511 Published specification: 512 The specifications for MPEG-4 Visual streams are presented in ISO/IEC 513 14469-2[2][4][9]. The RTP payload format is described in RFCXXXX. 515 Encoding considerations: 516 Video bitstreams must be generated according to MPEG-4 Visual 517 specifications (ISO/IEC 14496-2). A video bitstream is binary data and 518 must be encoded for non-binary transport (for Email, the Base64 519 encoding is sufficient). This type is also defined for transfer via 520 RTP. The RTP packets MUST be packetized according to the MPEG-4 Visual 521 RTP payload format defined in RFCXXXX. 523 Security considerations: 524 See section 6 of RFCXXXX. 526 Interoperability considerations: 527 MPEG-4 Visual provides a large and rich set of tools for the coding of 528 visual objects. For effective implementation of the standard, subsets 529 of the MPEG-4 Visual tool sets have been provided for use in specific 530 applications. These subsets, called 'Profiles', limit the size of the 531 tool set a decoder is required to implement. In order to restrict 532 computational complexity, one or more Levels are set for each Profiles. 533 A Profile@Level combination allows: 535 o a codec builder to implement only the subset of the standard he 536 needs, while maintaining interworking with other MPEG-4 devices 537 included in the same combination, and 539 o checking whether MPEG-4 devices comply with the standard 540 ('conformance testing'). 542 The visual stream SHALL be compliant with the MPEG-4 Visual 543 Profile@Level specified by the parameter "profile-level-id". 544 Interoperability between a sender and a receiver may be achieved by 545 specifying the parameter "profile-level-id" in MIME content, or by 546 arranging in the capability exchange procedure to set this parameter 547 mutually to the same value. 549 Applications which use this media type: 550 Audio and visual streaming and conferencing tools, Internet messaging 551 and Email applications. 553 Additional information: none 555 Person & email address to contact for further information: 556 The authors of RFCXXXX. (See section 8) 558 Intended usage: COMMON 560 Author/Change controller: 561 The authors of RFCXXXX. (See section 8) 563 5.2 SDP usage of MPEG-4 Visual 565 The MIME media type video/MP4V string is mapped to fields in the Session 566 Description Protocol (SDP), RFC 2327, as follows: 568 o The MIME type (video) goes in SDP "m=" as the media name. 570 o The MIME subtype (MP4V) goes in SDP "a=rtpmap" as the encoding name. 572 o The optional parameter "rate" goes in "a=rtpmap" as the clock rate. 574 o The optional parameter "profile-level-id" and "config" MAY go in the 575 "a=fmtp" line to indicate the coder capability and configuration, 576 respectively. These parameters are expressed as a MIME media type string, 577 in the form of as a semicolon separated list of parameter=value pairs. 579 The following are some examples of media representation in SDP: 581 Simple Profile/Level 1, rate=90000(90KHz), "profile-level-id" and 582 "config" are present in "a=fmtp" line: 583 m=video 49170/2 RTP/AVP 98 584 a=rtpmap:98 MP4V/90000 585 a=fmtp:98 profile-level-id=1; 586 config=000001B001000001B5090000010000000120008440FA282C2090A21F 588 Core Profile/Level 2, rate=90000(90KHz), "profile-level-id" is present in 589 "a=fmtp" line: 590 m=video 49170/2 RTP/AVP 98 591 a=rtpmap:98 MP4V/90000 592 a=fmtp:98 profile-level-id=34 594 Advance Real Time Simple Profile/Level 1, rate=25(25Hz), "profile-level- 595 id" is present in "a=fmtp" line: 596 m=video 49170/2 RTP/AVP 98 597 a=rtpmap:98 MP4V/25 598 a=fmtp:98 profile-level-id=145 600 5.3 MIME type registration of MPEG-4 Audio 602 MIME media type name: audio 604 MIME subtype name: MP4A 606 Required parameters: 607 rate: the rate parameter indicates the RTP time stamp clock rate. The 608 default value is 90000. Other rates CAN be specified only if they are 609 set to the same value as the audio sampling rate (number of samples 610 per second). 612 Optional parameters: 613 profile-level-id: a decimal representation of MPEG-4 Audio Profile 614 Level indication value defined in ISO/IEC 14496-1 [11]. This parameter 615 indicates which MPEG-4 Audio tool subsets the decoder is capable of 616 using. 618 object: a decimal representation of the MPEG-4 Audio Object Type value 619 defined in ISO/IEC 14496-3 [5]. This parameter specifies the tool to 620 be used by the coder. It CAN be used to limit the capability within 621 the specified "profile-level-id". 623 bitrate: the data rate for the audio bit stream. 625 cpresent: this parameter indicates whether audio payload configuration 626 data has been multiplexed into an RTP payload (See section 4.1 in this 627 document). 629 config: a hexadecimal representation of an octet string that expresses 630 the audio payload configuration data "StreamMuxConfig", as defined in 631 ISO/IEC 14496-3 [5]. Configuration data is mapped onto the octet 632 string in an MSB-first basis. The first bit of the configuration data 633 SHALL be located at the MSB of the first octet. In the last octet, 634 zero-padding bits, if necessary, shall follow the configuration data. 635 If the size of the configuration data is quite large, such large 636 config data is RECOMMENDED to be indicated by in-band mode (cpresent 637 is set to 1). 639 ptime: RECOMMENDED duration of each packet in milliseconds. 641 Published specification: 642 Payload format specifications are described in this document. Encoding 643 specifications are provided in ISO/IEC 14496-3 [3][5]. 645 Encoding considerations: 646 This type is only defined for transfer via RTP. 648 Security considerations: 649 See Section 6 of RFCXXXX. 651 Interoperability considerations: 652 MPEG-4 Audio provides a large and rich set of tools for the coding of 653 audio objects. For effective implementation of the standard, subsets of 654 the MPEG-4 Audio tool sets similar to those used in MPEG-4 Visual have 655 been provided (see section 5.1). 657 The audio stream SHALL be compliant with the MPEG-4 Audio 658 Profile@Level specified by the parameter "profile-level-id". 659 Interoperability between a sender and a receiver may be achieved by 660 specifying the parameter "profile-level-id" in MIME content, or by 661 arranging in the capability exchange procedure to set this parameter 662 mutually to the same value. Furthermore, the "object" parameter can be 663 used to limit the capability within the specified Profile@Level in 664 capability exchange. 666 Applications which use this media type: 667 Audio and video streaming and conferencing tools. 669 Additional information: none 671 Personal & email address to contact for further information: 672 See Section 8 of RFCXXXX. 674 Intended usage: COMMON 676 Author/Change controller: 677 See Section 8 of RFCXXXX. 679 5.4 SDP usage of MPEG-4 Audio 681 The MIME media type audio/MP4A string is mapped to fields in the Session 682 Description Protocol (SDP), RFC 2327, as follows: 684 o The MIME type (audio) goes in SDP "m=" as the media name. 686 o The MIME subtype (MP4A) goes in SDP "a=rtpmap" as the encoding name. 688 o The required parameter "rate" goes in "a=rtpmap" as the clock rate. 690 o The optional parameter "ptime" goes in SDP "a=ptime" attribute. 692 o The optional parameter "profile-level-id" goes in the "a=fmtp" line to 693 indicate the coder capability. The "object" parameter goes in the 694 "a=fmtp" attribute. The payload-format-specific parameters "bitrate", 695 "cpresent" and "config" go in the "a=fmtp" line. If the string after 696 "config=" is quite large, such large config data should not be 697 transmitted by SDP but should be transmitted by in-band mode. These 698 parameters are expressed as a MIME media type string, in the form of as a 699 semicolon separated list of parameter=value pairs. 701 The following are some examples of the media representation in SDP: 703 For 6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz), 704 m=audio 49230 RTP/AVP 96 705 a=rtpmap:96 MP4A/8000 706 a=fmtp:96 profile-level-id=9;object=8;cpresent=0;config=9128B1071070 707 a=ptime:20 709 For 64 kb/s AAC LC stereo bitstreams (with an audio sampling rate of 24 710 kHz), 711 m=audio 49230 RTP/AVP 96 712 a=rtpmap:96 MP4A/24000 713 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 714 config=9122620000 716 In the above two examples, audio configuration data is not multiplexed 717 into the RTP payload and is described only in SDP. Furthermore, the 718 "clock rate" is set to the audio sampling rate. 720 If the clock rate has been set to its default value and it is necessary 721 to obtain the audio sampling rate, this can be done by parsing the 722 "config" parameter (see the following example). 724 m=audio 49230 RTP/AVP 96 725 a=rtpmap:96 MP4A/90000 726 a=fmtp:96 object=8; cpresent=0; config=9128B1071070 728 The following example shows that the audio configuration data appears in 729 the RTP payload. 731 m=audio 49230 RTP/AVP 96 732 a=rtpmap:96 MP4A/90000 733 a=fmtp:96 object=13; cpresent=1 735 6. Security Considerations 737 RTP packets using the payload format defined in this specification are 738 subject to the security considerations discussed in the RTP specification 739 [8]. This implies that confidentiality of the media streams is achieved 740 by encryption. Because the data compression used with this payload format 741 is applied end-to-end, encryption may be performed on the compressed data 742 so there is no conflict between the two operations. 744 This payload type does not exhibit any significant non-uniformity in the 745 receiver side computational complexity for packet processing to cause a 746 potential denial-of-service threat. 748 7. References 750 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, 751 RFC 2026, October 1996. 753 2 ISO/IEC 14496-2:1999, "Information technology - Coding of audio-visual 754 objects - Part2: Visual", December 1999. 756 3 ISO/IEC 14496-3:1999, "Information technology - Coding of audio-visual 757 objects - Part3: Audio", December 1999. 759 4 ISO/IEC 14496-2:1999/FDAM1:2000, December 1999. 761 5 ISO/IEC 14496-3:1999/FDAM1:2000, December 1999. 763 6 ISO/IEC 14496-1:1999, "Information technology - Coding of audio-visual 764 objects - Part1: Systems", December 1999. 766 7 Bradner, S., "Key words for use in RFCs to Indicate Requirement 767 Levels", BCP 14, RFC 2119, March 1997 769 8 H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson "RTP: A Transport 770 Protocol for Real Time Applications", RFC 1889, Internet Engineering 771 Task Force, January 1996. 773 9 ISO/IEC 14496-2/COR1, "Information technology - Coding of audio-visual 774 objects - Part2: Visual, Technical corrigendum 1", March 2000. 776 8. Author's Addresses 778 Yoshihiro Kikuchi 779 Toshiba corporation 780 1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki, 212-8582, Japan 781 Email: yoshihiro.kikuchi@toshiba.co.jp 783 Yoshinori Matsui 784 Matsushita Electric Industrial Co., LTD. 785 1006, Kadoma, Kadoma-shi, Osaka, Japan 786 Email: matsui@drl.mei.co.jp 788 Toshiyuki Nomura 789 NEC Corporation 790 4-1-1,Miyazaki,Miyamae-ku,Kawasaki,JAPAN 791 Email: t-nomura@ccm.cl.nec.co.jp 793 Shigeru Fukunaga 794 Oki Electric Industry Co., Ltd. 795 1-2-27 Shiromi, Chuo-ku, Osaka 540-6025 Japan. 796 Email: fukunaga444@oki.co.jp 798 Hideaki Kimata 799 Nippon Telegraph and Telephone Corporation 800 1-1, Hikari-no-oka, Yokosuka-shi, Kanagawa, Japan 801 Email: kimata@nttvdt.hil.ntt.co.jp 803 Full Copyright Statement 805 "Copyright (C) The Internet Society (date). All Rights Reserved. 807 This document and translations of it may be copied and furnished to 808 others, and derivative works that comment on or otherwise explain it 809 or assist in its implementation may be prepared, copied, published 810 and distributed, in whole or in part, without restriction of any 811 kind, provided that the above copyright notice and this paragraph 812 are included on all such copies and derivative works. However, this 813 document itself may not be modified in any way, such as by removing 814 the copyright notice or references to the Internet Society or other 815 Internet organizations, except as needed for the purpose of 816 developing Internet standards in which case the procedures for 817 copyrights defined in the Internet Standards process must be 818 followed, or as required to translate it into languages other than 819 English. 821 The limited permissions granted above are perpetual and will not be 822 revoked by the Internet Society or its successors or assigns.