idnits 2.17.1 draft-ietf-avt-rtp-mpeg4-es-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 17) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 221 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 727 has weird spacing: '...streams with...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (Aug 21, 2000) is 8646 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 14 looks like a reference -- Missing reference section? '2' on line 531 looks like a reference -- Missing reference section? '3' on line 661 looks like a reference -- Missing reference section? '4' on line 531 looks like a reference -- Missing reference section? '5' on line 661 looks like a reference -- Missing reference section? '6' on line 44 looks like a reference -- Missing reference section? '7' on line 120 looks like a reference -- Missing reference section? '9' on line 531 looks like a reference -- Missing reference section? '8' on line 758 looks like a reference -- Missing reference section? '11' on line 632 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 3 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Yoshihiro Kikuchi - Toshiba 2 Internet Draft Toshiyuki Nomura - NEC 3 Document: draft-ietf-avt-rtp-mpeg4-es-03.txt Shigeru Fukunaga - Oki 4 Yoshinori Matsui - Matsushita 5 Hideaki Kimata - NTT 6 Aug 21, 2000 8 RTP payload format for MPEG-4 Audio/Visual streams 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with all 13 provisions of Section 10 of RFC2026 [1]. 15 Internet-Drafts are working documents of the Internet Engineering Task 16 Force (IETF), its areas, and its working groups. Note that other groups 17 may also distribute working documents as Internet-Drafts. Internet-Drafts 18 are draft documents valid for a maximum of six months and may be updated, 19 replaced, or obsoleted by other documents at any time. It is 20 inappropriate to use Internet- Drafts as reference material or to cite 21 them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 24 The list of Internet-Draft Shadow Directories can be accessed at 25 http://www.ietf.org/shadow.html. 27 Abstract 29 This document describes RTP payload formats for carrying of MPEG-4 Audio 30 and Visual bitstreams without using MPEG-4 Systems. For the purpose of 31 directly mapping MPEG-4 Audio/Visual bitstreams onto RTP packets, it 32 provides specifications for the use of RTP header fields and also 33 specifies fragmentation rules. It also provides specifications for MIME 34 type registrations and the use of SDP. 36 1. Introduction 38 The RTP payload formats described in this document specify a way of how 39 MPEG-4 Audio and Visual streams [2][3][4][5] are to be fragmented and 40 mapped directly onto RTP packets. 42 These RTP payload formats enable to carry MPEG-4 Audio/Visual streams 43 without using the synchronization and stream management functionality of 44 MPEG-4 Systems [6]. Such RTP payload format would be used within systems 45 where their own stream management functionality is provided and thus such 46 functionality in MPEG-4 Systems is not necessary. H.323 terminals are an 47 example of such systems. MPEG-4 Audio/Visual streams are not managed by 48 MPEG-4 Systems Object Descriptors but by H.245. The streams are directly 49 mapped onto RTP packets without using the synchronization functionality 50 of MPEG-4 Systems. Other examples are SIP and RTSP where MIME and SDP are 51 used. MIME types and SDP usages of the RTP payload formats described in 52 this document are defined to specify the attribute of Audio/Visual 53 streams (e.g. media type, packetization format and codec configuration) 54 directly without using MPEG-4 Systems. 55 The semantics of RTP headers in such cases need to be clearly defined, 56 including the association with MPEG-4 Audio/Visual data elements. In 57 addition, it would be beneficial to define the fragmentation rules of RTP 58 packets for MPEG-4 Video streams so as to enhance error resiliency by 59 utilizing the error resilience tools provided inside the MPEG-4 Video 60 stream. These issues, however, have yet to be addressed by other RTP 61 payload format specifications. 63 1.1 MPEG-4 Visual RTP payload format 65 MPEG-4 Visual is a visual coding standard with many new features: high 66 coding efficiency; high error resiliency; multiple, arbitrary shape 67 object-based coding; etc. [2]. It covers a wide range of bitrate from 68 scores of Kbps to several Mbps. It also covers a wide variety of 69 networks, ranging from those guaranteed to be almost error-free to mobile 70 networks with high error rates. 72 With respect to the fragmentation rules for an MPEG-4 visual bitstream 73 defined in this document, since MPEG-4 Visual is used for a wide variety 74 of networks, it is desirable not to apply too much restriction on 75 fragmentation, and a fragmentation rule such as "a single video packet 76 shall always be mapped on a single RTP packet" may be inappropriate. On 77 the other hand, careless, media unaware fragmentation may cause 78 degradation in error resiliency and bandwidth efficiency. The 79 fragmentation rules described in this document are flexible but manage to 80 define the minimum rules for preventing meaningless fragmentation and for 81 utilizing the error resilience of MPEG-4 Visual. 83 While the additional media specific RTP header defined for such video 84 coding tools as H.261 or MPEG-1/2 is effective in helping to recover 85 picture headers corrupted by packet losses, in MPEG-4 Visual there are 86 already error resilience functionalities for recovering corrupt headers, 87 and these can be used on RTP/IP networks, as well as on other networks. 88 (H.223/mobile, MPEG-2/TS, etc.) That is why no extra RTP header fields 89 are defined in the MPEG-4 Visual RTP payload format proposed here. 91 1.2 MPEG-4 Audio RTP payload format 93 MPEG-4 Audio is a new kind of audio standard that integrates many 94 different types of audio coding tools. It also supports a mechanism for 95 representing synthesized sounds. Low-overhead MPEG-4 Audio Transport 96 Multiplex (LATM) manages the sequences of audio data with relatively 97 small overhead. In audio-only applications, then, it is desirable for 98 LATM-based MPEG-4 Audio bitstreams to be directly mapped onto the RTP 99 packets without using MPEG-4 Systems. 101 For MPEG-4 Audio coding tools except synthesis tools, as is true for 102 other audio coders, if the payload of a packet is a single audio frame, 103 packet loss will not impair the decodability of adjacent packets. On the 104 other hands, MPEG-4 Audio synthesis tools may be sensitive to error. For 105 example, an SA_access_unit in the payload may set a global value to a new 106 value, which is then references throughout the audio content to make a 107 macro change in the performance. In this case, an error in the payload 108 influences all audio data produced after the error. In order to enhance 109 error resiliency, the element of SA_access_unit that makes the above 110 macro change should be transmitted across several SA_access_unit 111 repeatedly. The number of repetition will be dependent on the network 112 condition. Therefore, the additional media specific header for recovering 113 errors will not be required for MPEG-4 Audio. 115 2. Conventions used in this document 117 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 118 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 119 document are to be interpreted as described in RFC-2119 [7]. 121 3. RTP Packetization of MPEG-4 Visual bitstream 123 This section specifies RTP packetization rules for MPEG-4 Visual content. 124 An MPEG-4 Visual bitstream is mapped directly onto the RTP payload 125 without any addition of extra header fields or any removal of Visual 126 syntax elements. The Combined Configuration/Elementary stream mode is 127 used so that configuration information will be carried to the same RTP 128 port as the elementary stream. (see 6.2.1 "Start codes" of ISO/IEC 14496- 129 2 [2][9][4]) The configuration information MAY additionally be specified 130 by some out-of-band means; in H.323 terminals, H.245 codepoint 131 "decoderConfigurationInformation" MAY be used for this purpose; in 132 systems using MIME content type and SDP parameters, e.g. SIP and RTSP, 133 the optional parameter "config" MAY be used to specify the configuration 134 information. (see 5.1 and 5.2) 136 When the short video header mode is used, the RTP payload format used MAY 137 be that specified for H.263 in the relevant RFCs or in other relevant 138 standards. 140 0 1 2 3 141 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 142 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 143 |V=2|P|X| CC |M| PT | sequence number | RTP 144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 145 | timestamp | Header 146 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 147 | synchronization source (SSRC) identifier | 148 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 149 | contributing source (CSRC) identifiers | 150 | .... | 151 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 152 | | RTP 153 | MPEG-4 Visual stream (byte aligned) | Payload 154 | | 155 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 156 | :...OPTIONAL RTP padding | 157 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 159 Figure 1 - An RTP packet for MPEG-4 Visual stream 161 3.1 Use of RTP header fields for MPEG-4 Visual 163 Payload Type (PT): Payload type is to be specifically assigned as the 164 MPEG-4 Visual RTP payload format. If this assignment is to be carried out 165 dynamically, it can be performed by such out-of-band means as H.245, SDP, 166 etc. 168 Extension (X) bit: Defined by the RTP profile used. 170 Sequence Number: Incremented by one for each RTP data packet sent, 171 starting, for security reasons, with a random initial value. 173 Marker (M) bit: The marker bit is set to one to indicate the last RTP 174 packet (or only RTP packet) of a VOP. When multiple VOPs are carried in 175 the same RTP packet, the marker bit is set to 1. 177 Timestamp: The timestamp indicates the composition time, or the 178 presentation time in a no-compositor decoder. A constant offset, which is 179 random, is added for security reasons. The detailed definition of the 180 timestamp is as follows: 181 - For a video object plane, it is defined as vop_time_increment (in units 182 of 1/vop_time_increment_resolution seconds) plus the cumulative number 183 of whole seconds specified by modulo_time_base and, if present, 184 time_code of Group_of_VideoObjectPlane() fields. 186 - In the case of interlaced video, a VOP will consist of lines from two 187 fields, and the timestamp will indicate the composition time of the 188 first field. 189 - When multiple VOPs are carried in the same RTP packet, the timestamp 190 indicates the earliest of the composition time within the VOPs carried 191 in the RTP packet. 192 - If the RTP packet contains only configuration information and/or 193 Group_of_VideoObjectPlane() fields, the composition time of the next 194 VOP in the coding order is used. 195 - If the RTP packet contains only visual_object_sequence_end_code 196 information, the composition time of the immediately preceding VOP in 197 the coding order is used. 199 The resolution of the timestamp is set to its default value of 90KHz, 200 unless specified by an out-of-band means (e.g. SDP parameter or MIME 201 parameter as defined in section 5). 203 SSRC, CC and CSRC fields are used as described in RFC 1889 [8]. 205 3.2 Fragmentation of MPEG-4 Visual bitstream 207 A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP 208 payload without any addition of extra header fields or any removal of 209 Visual syntax elements. The Combined Configuration/Elementary streams 210 mode is used. The following rules apply for the fragmentation. 212 (1) Configuration information and Group_of_VideoObjectPlane() fields 213 SHALL be placed at the beginning of the RTP payload (just after the RTP 214 header) or just after the header of the syntactically upper layer 215 function. 217 (2) If one or more headers exist in the RTP payload, the RTP payload 218 SHALL begin with the header of the syntactically highest function. 219 Note: The visual_object_sequence_end_code is regarded as the lowest 220 function. 222 (3) A header SHALL NOT be split into a plurality of RTP packets. 224 (4) Two or more VOPs SHOULD be fragmented into different RTP packets so 225 that one RTP packet consists of the data bytes associated with a unique 226 presentation time (that is indicated in the timestamp field in the RTP 227 packet header), with the exception that multiple VOPs MAY be carried 228 within one RTP packet if the size of the VOPs is small. 230 (5) A single video packet SHOULD NOT be split into a plurality of RTP 231 packets. The size of a video packet SHOULD be adjusted in such a way that 232 the resulting RTP packet is not larger than the path-MTU. A video packet 233 MAY be split into a plurality of RTP packets when the size of the video 234 packet is large. 236 Note: Rule (5) does not apply when the video packet is disabled by the 237 coder configuration (by setting resync_marker_disable in the VOL header 238 to 1), or in coding tools where the video packet is not supported. In 239 this case, a VOP MAY be split at arbitrary byte-positions. 240 Here, header means: 241 - Configuration information (Visual Object Sequence Header, Visual Object 242 Header and Video Object Layer Header) 243 - visual_object_sequence_end_code 244 - The header of the entry point function for an elementary stream 245 (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(), 246 video_plane_with_short_header(), MeshObject() or FaceObject()) 247 - The video packet header (video_packet_header() excluding 248 next_resync_marker()) 249 - The header of gob_layer() 250 See 6.2.1 "Start codes" of ISO/IEC 14496-2[2][9][4] for the definition of 251 the configuration information and the entry point functions. 253 The video packet starts with the VOP header or the video packet header, 254 followed by motion_shape_texture(), and ends with next_resync_marker() or 255 next_start_code(). 257 3.3 Examples of packetized MPEG-4 Visual bitstream 259 Considering the fact that MPEG-4 Visual covers a wide variety of networks 260 ranging from scores of Kbps to several Mbps, and from those guaranteed to 261 be almost error-free to mobile networks with high error rates, it is 262 desirable not to apply too much restriction on fragmentation. On the 263 other hand, careless, media unaware fragmentation will cause degradation 264 in error resiliency and bandwidth efficiency. The fragmentation criteria 265 described in 3.2 are flexible but serve to define the minimum rules to 266 prevent meaningless fragmentation. 268 Figure 2 shows examples of RTP packets generated based on the criteria 269 described in 3.2 271 (a) is an example of the first RTP packet or the random access point of 272 an MPEG-4 visual bitstream containing the configuration information. 273 According to criterion (1), the Visual Object Sequence Header(VS header) 274 is placed at the beginning of the RTP payload, preceding the Visual 275 Object Header and the Video Object Layer Header(VO header, VOL header). 276 Since the fragmentation rule defined in 3.2 guarantees that the 277 configuration information, starting with 278 visual_object_sequence_start_code, is always placed at the beginning of 279 the RTP payload, RTP receivers can detect the random access point by 280 checking if the first 32-bit field of the RTP payload is 281 visual_object_sequence_start_code. 283 (b) is another example of the RTP packet containing the configuration 284 information. It differs from example (a) in that the RTP packet also 285 contains a video packet in the VOP following the configuration 286 information. Since the length of the configuration information is 287 relatively short (typically scores of bytes) and an RTP packet containing 288 only the configuration information may thus increase the overhead, the 289 configuration information and the immediately following GOV and/or (a 290 part of) VOP can be effectively packetized into a single RTP packet as in 291 this example. 293 (c) is an example of the RTP packet that contains 294 Group_of_VideoObjectPlane(GOV). Following criterion (1), the GOV is 295 placed at the beginning of the RTP payload. It would be a waste of RTP/IP 296 header overhead to generate an RTP packet containing only a GOV whose 297 length is 7 bytes. Therefore, (a part of) the following VOP can be placed 298 in the same RTP packet as shown in (c). 300 (d) is an example of the case where one video packet is packetized into 301 one RTP packet. When the packet-loss rate of the underlying network is 302 high, this kind of packetization is recommended. It is recommended to set 303 resync_marker_disable to 0 in the VOL header to enable the adjustment of 304 the video packet size. Even when the RTP packet containing the VOP header 305 is discarded by a packet loss, the other RTP packets can be decoded by 306 using the HEC(Header Extension Code) information in the video packet 307 header. No extra RTP header field is necessary. 309 (e) is an example of the case where more than one video packets are 310 packetized into one RTP packet. This kind of packetization is effective 311 to save the overhead of RTP/IP headers when the bit-rate of the 312 underlying network is low. However, it will decrease the packet-loss 313 resiliency because multiple video packets are discarded by a single RTP 314 packet loss. The optimal number of video packets in an RTP packet and the 315 length of the RTP packet can be determined considering the packet-loss 316 rate and the bit-rate of the underlying network. 318 (f) is an example of the case when the video packet is disabled by 319 setting resync_marker_disable in the VOL header to 1. In this case, a VOP 320 may be split into a plurality of RTP packets at arbitrary byte-positions. 321 For example, it is possible to split a VOP into fixed-length packets. 322 This kind of coder configuration and RTP packet fragmentation may be used 323 when the underlying network is guaranteed to be error-free. On the other 324 hand, it is not recommended to use it in error-prone environment since it 325 provides only poor packet loss resiliency. 327 Figure 3 shows examples of RTP packets prohibited by the criteria of 3.2. 329 Fragmentation of a header into multiple RTP packets, as in (a), will not 330 only increase the overhead of RTP/IP headers but also decrease the error 331 resiliency. Therefore, it is prohibited by the criterion (3). 333 When concatenating more than one video packets into an RTP packet, VOP 334 header or video_packet_header() shall not be placed in the middle of the 335 RTP payload. The packetization as in (b) is not allowed by criterion (2) 336 due to the aspect of the error resiliency. Comparing this example with 337 Figure 2(d), although two video packets are mapped onto two RTP packets 338 in both cases, the packet-loss resiliency is not identical. Namely, if 339 the second RTP packet is lost, both video packets 1 and 2 are lost in the 340 case of Figure 3(b) whereas only video packet 2 is lost in the case of 341 Figure 2(d). 343 +------+------+------+------+ 344 (a) | RTP | VS | VO | VOL | 345 |header|header|header|header| 346 +------+------+------+------+ 348 +------+------+------+------+------------+ 349 (b) | RTP | VS | VO | VOL |Video Packet| 350 |header|header|header|header| | 351 +------+------+------+------+------------+ 353 +------+-----+------------------+ 354 (c) | RTP | GOV |Video Object Plane| 355 |header| | | 356 +------+-----+------------------+ 358 +------+------+------------+ +------+------+------------+ 359 (d) | RTP | VOP |Video Packet| | RTP | VP |Video Packet| 360 |header|header| (1) | |header|header| (2) | 361 +------+------+------------+ +------+------+------------+ 363 +------+------+------------+------+------------+------+------------+ 364 (e) | RTP | VP |Video Packet| VP |Video Packet| VP |Video Packet| 365 |header|header| (1) |header| (2) |header| (3) | 366 +------+------+------------+------+------------+------+------------+ 368 +------+------+------------+ +------+------------+ 369 (f) | RTP | VOP |VOP fragment| | RTP |VOP fragment| 370 |header|header| (1) | |header| (2) | ___ 371 +------+------+------------+ +------+------------+ 373 Figure 2 - Examples of RTP packetized MPEG-4 Visual bitstream 375 +------+-------------+ +------+------------+------------+ 376 (a) | RTP |First half of| | RTP |Last half of|Video Packet| 377 |header| VP header | |header| VP header | | 378 +------+-------------+ +------+------------+------------+ 380 +------+------+----------+ +------+---------+------+------------+ 381 (b) | RTP | VOP |First half| | RTP |Last half| VP |Video Packet| 382 |header|header| of VP(1) | |header| of VP(1)|header| (2) | 383 +------+------+----------+ +------+---------+------+------------+ 385 Figure 3 - Examples of prohibited RTP packetization for MPEG-4 Visual 386 bitstream 388 4. RTP Packetization of MPEG-4 Audio bitstream 390 This section specifies RTP packetization rules for MPEG-4 Audio 391 bitstreams. MPEG-4 Audio streams are formatted by LATM (Low-overhead 392 MPEG-4 Audio Transport Multiplex) tool[5], and the LATM-based streams are 393 then mapped onto RTP packets as described the three sections below. 395 4.1 RTP Packet Format 397 LATM-based streams consist of a sequence of audioMuxElements that include 398 one or more audio frames. A complete audioMuxElement or a part of one 399 SHALL be mapped directly onto an RTP payload without any removal of 400 audioMuxElement syntax elements (see Figure 4). The first byte of each 401 audioMuxElement SHALL be located at the first payload location in an RTP 402 packet. 404 0 1 2 3 405 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 |V=2|P|X| CC |M| PT | sequence number |RTP 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 | timestamp |Header 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 411 | synchronization source (SSRC) identifier | 412 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 413 | contributing source (CSRC) identifiers | 414 | .... | 415 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 416 | |RTP 417 : audioMuxElement (byte aligned) :Payload 418 | | 419 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 420 | :...OPTIONAL RTP padding | 421 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 422 Figure 4 - An RTP packet for MPEG-4 Audio 424 In order to decode the audioMuxElement, the following muxConfigPresent 425 information is required to be indicated by an out-of-band means. 427 muxConfigPresent: If this value is set to 1, the audioMuxElement SHALL 428 include an indication bit "useSameStreamMux" and MAY include the 429 configuration information for audio compression "StreamMuxConfig". The 430 useSameStreamMux bit indicates whether the StreamMuxConfig element in the 431 previous frame is applied in the current frame. 433 4.2 Use of RTP Header Fields for MPEG-4 Audio 435 Payload Type (PT): Payload type is to be specifically assigned as the 436 MPEG-4 Audio RTP payload format. If this assignment is to be carried out 437 dynamically, it can be performed by such out-of-band means as H.245, SDP, 438 etc. 440 Marker (M) bit: The marker bit indicates audioMuxElement boundaries. It 441 is set to one to indicate that the RTP packet contains a complete 442 audioMuxElement or the last fragment of an audioMuxElement. 444 Timestamp: The timestamp indicates composition time, or presentation time 445 in a no-compositor decoder. Timestamps are recommended to start at a 446 random value for security reasons. 448 Unless specified by an out-of-band means, the resolution of the timestamp 449 is set to its default value of 90 kHz. 451 Sequence Number: Incremented by one for each RTP packet sent, starting, 452 for security reasons, with a random value. 454 SSRC, CC and CSRC fields are used as described in RFC 1889 [8]. 456 4.3 Fragmentation of MPEG-4 Audio bitstream 458 It is desirable to put one audioMuxElement in each RTP packet. If the 459 size of an audioMuxElement can be kept small enough that the size of the 460 RTP packet containing it does not exceed the size of the path-MTU, this 461 will be no problem. If it cannot, the audioMuxElement MAY be fragmented 462 and spread across multiple packets, following the rules below: 464 (1) "payloadMux", which consists of payload elements, MAY be fragmented 465 across several RTP packets, so that each of those RTP packets will 466 contain one or more payload elements. Individual payload elements 467 themselves SHOULD NOT be fragmented. 469 (2) If the audioMuxElement includes StreamMuxConfig, StreamMuxConfig 470 SHALL be included in the RTP packet that contains the first payload 471 element. 473 5. MIME type registration for MPEG-4 Audio/Visual streams 475 The following sections describe the MIME type registrations for MPEG-4 476 Audio/Visual streams. MIME type registration and SDP usage for the MPEG-4 477 Visual stream are described in Sections 5.1 and 5.2, respectively, while 478 MIME type registration and SDP usage for MPEG-4 Audio stream are 479 described in Sections 5.3 and 5.4, respectively. 481 (In the following sections, the RFC number "XXXX" represents the RFC 482 number, which should be assigned for this document.) 484 5.1 MIME type registration for MPEG-4 Visual 486 MIME media type name: video 487 MIME subtype name: MP4V 489 Required parameters: none 491 Optional parameters: 492 rate: This parameter is used only for RTP transport. It indicates the 493 resolution of the timestamp field in the RTP header. If this parameter 494 is not specified, its default value of 90000 (90KHz) is used. 496 profile-level-id: A decimal representation of MPEG-4 Visual Profile 497 Level indication value (profile_and_level_indication) defined in Table 498 G-1 of ISO/IEC 14496-2 [2][4]. 500 config: A hexadecimal representation of an octet string that expresses 501 the MPEG-4 Visual configuration information, as defined in subclause 502 6.2.1 Start codes of ISO/IEC14496-2[2][4][9]. The configuration 503 information is mapped onto the octet string in an MSB-first basis. The 504 first bit of the configuration information SHALL be located at the MSB 505 of the first octet. The configuration information indicated by this 506 parameter SHALL be the same as the configuration information in the 507 corresponding MPEG-4 Visual stream, except for 508 first_half_vbv_occupancy and latter_half_vbv_occupancy, if exist, 509 which may vary in the repeated configuration information inside an 510 MPEG-4 Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2). 512 The parameter "profile-level-id" MAY be used in the capability 513 exchange/announcement procedure to indicate MPEG-4 Visual Profile and 514 Level combination of which the MPEG-4 Visual codec is capable. The 515 parameter "config" MAY be used to indicate the configuration of the 516 corresponding MPEG-4 visual bitstream, but SHALL NOT be used to 517 indicate the codec capability in the capability exchange procedure. 519 Example usages for these parameters are: 520 - MPEG-4 Visual Simple Profile/Level 1: 521 Content-type: video/mp4v; profile-level-id=1 523 - MPEG-4 Visual Core Profile/Level 2: 524 Content-type: video/mp4v; profile-level-id=34 526 - MPEG-4 Visual Advanced Real Time Simple Profile/Level 1: 527 Content-type: video/mp4v; profile-level-id=145 529 Published specification: 530 The specifications for MPEG-4 Visual streams are presented in ISO/IEC 531 14469-2[2][4][9]. The RTP payload format is described in RFCXXXX. 533 Encoding considerations: 534 Video bitstreams must be generated according to MPEG-4 Visual 535 specifications (ISO/IEC 14496-2). A video bitstream is binary data and 536 must be encoded for non-binary transport (for Email, the Base64 537 encoding is sufficient). This type is also defined for transfer via 538 RTP. The RTP packets MUST be packetized according to the MPEG-4 Visual 539 RTP payload format defined in RFCXXXX. 541 Security considerations: 542 See section 6 of RFCXXXX. 544 Interoperability considerations: 545 MPEG-4 Visual provides a large and rich set of tools for the coding of 546 visual objects. For effective implementation of the standard, subsets 547 of the MPEG-4 Visual tool sets have been provided for use in specific 548 applications. These subsets, called 'Profiles', limit the size of the 549 tool set a decoder is required to implement. In order to restrict 550 computational complexity, one or more Levels are set for each Profile. 551 A Profile@Level combination allows: 553 o a codec builder to implement only the subset of the standard he 554 needs, while maintaining interworking with other MPEG-4 devices 555 included in the same combination, and 557 o checking whether MPEG-4 devices comply with the standard 558 ('conformance testing'). 560 The visual stream SHALL be compliant with the MPEG-4 Visual 561 Profile@Level specified by the parameter "profile-level-id". 562 Interoperability between a sender and a receiver may be achieved by 563 specifying the parameter "profile-level-id" in MIME content, or by 564 arranging in the capability exchange/announcement procedure to set this 565 parameter mutually to the same value. 567 Applications which use this media type: 568 Audio and visual streaming and conferencing tools, Internet messaging 569 and Email applications. 571 Additional information: none 573 Person & email address to contact for further information: 574 The authors of RFCXXXX. (See section 8) 576 Intended usage: COMMON 578 Author/Change controller: 579 The authors of RFCXXXX. (See section 8) 581 5.2 SDP usage of MPEG-4 Visual 583 The MIME media type video/MP4V string is mapped to fields in the Session 584 Description Protocol (SDP), RFC 2327, as follows: 586 o The MIME type (video) goes in SDP "m=" as the media name. 588 o The MIME subtype (MP4V) goes in SDP "a=rtpmap" as the encoding name. 590 o The optional parameter "rate" goes in "a=rtpmap" as the clock rate. 592 o The optional parameter "profile-level-id" and "config" MAY go in the 593 "a=fmtp" line to indicate the coder capability and configuration, 594 respectively. These parameters are expressed as a MIME media type string, 595 in the form of as a semicolon separated list of parameter=value pairs. 597 The following are some examples of media representation in SDP: 599 Simple Profile/Level 1, rate=90000(90KHz), "profile-level-id" and 600 "config" are present in "a=fmtp" line: 601 m=video 49170/2 RTP/AVP 98 602 a=rtpmap:98 MP4V/90000 603 a=fmtp:98 profile-level-id=1; 604 config=000001B001000001B5090000010000000120008440FA282C2090A21F 606 Core Profile/Level 2, rate=90000(90KHz), "profile-level-id" is present in 607 "a=fmtp" line: 608 m=video 49170/2 RTP/AVP 98 609 a=rtpmap:98 MP4V/90000 610 a=fmtp:98 profile-level-id=34 612 Advance Real Time Simple Profile/Level 1, rate=25(25Hz), "profile-level- 613 id" is present in "a=fmtp" line: 614 m=video 49170/2 RTP/AVP 98 615 a=rtpmap:98 MP4V/25 616 a=fmtp:98 profile-level-id=145 618 5.3 MIME type registration of MPEG-4 Audio 620 MIME media type name: audio 622 MIME subtype name: MP4A 624 Required parameters: 625 rate: the rate parameter indicates the RTP time stamp clock rate. The 626 default value is 90000. Other rates CAN be specified only if they are 627 set to the same value as the audio sampling rate (number of samples 628 per second). 630 Optional parameters: 631 profile-level-id: a decimal representation of MPEG-4 Audio Profile 632 Level indication value defined in ISO/IEC 14496-1 [11]. This parameter 633 indicates which MPEG-4 Audio tool subsets the decoder is capable of 634 using. 636 object: a decimal representation of the MPEG-4 Audio Object Type value 637 defined in ISO/IEC 14496-3 [5]. This parameter specifies the tool to 638 be used by the coder. It CAN be used to limit the capability within 639 the specified "profile-level-id". 641 bitrate: the data rate for the audio bit stream. 643 cpresent: this parameter indicates whether audio payload configuration 644 data has been multiplexed into an RTP payload (See section 4.1 in this 645 document). 647 config: a hexadecimal representation of an octet string that expresses 648 the audio payload configuration data "StreamMuxConfig", as defined in 649 ISO/IEC 14496-3 [5]. Configuration data is mapped onto the octet 650 string in an MSB-first basis. The first bit of the configuration data 651 SHALL be located at the MSB of the first octet. In the last octet, 652 zero-padding bits, if necessary, shall follow the configuration data. 653 If the size of the configuration data is quite large, such large 654 config data is RECOMMENDED to be indicated by in-band mode (cpresent 655 is set to 1). 657 ptime: RECOMMENDED duration of each packet in milliseconds. 659 Published specification: 660 Payload format specifications are described in this document. Encoding 661 specifications are provided in ISO/IEC 14496-3 [3][5]. 663 Encoding considerations: 664 This type is only defined for transfer via RTP. 666 Security considerations: 667 See Section 6 of RFCXXXX. 669 Interoperability considerations: 670 MPEG-4 Audio provides a large and rich set of tools for the coding of 671 audio objects. For effective implementation of the standard, subsets of 672 the MPEG-4 Audio tool sets similar to those used in MPEG-4 Visual have 673 been provided (see section 5.1). 675 The audio stream SHALL be compliant with the MPEG-4 Audio 676 Profile@Level specified by the parameter "profile-level-id". 677 Interoperability between a sender and a receiver may be achieved by 678 specifying the parameter "profile-level-id" in MIME content, or by 679 arranging in the capability exchange procedure to set this parameter 680 mutually to the same value. Furthermore, the "object" parameter can be 681 used to limit the capability within the specified Profile@Level in 682 capability exchange. 684 Applications which use this media type: 685 Audio and video streaming and conferencing tools. 687 Additional information: none 689 Personal & email address to contact for further information: 690 See Section 8 of RFCXXXX. 692 Intended usage: COMMON 694 Author/Change controller: 695 See Section 8 of RFCXXXX. 697 5.4 SDP usage of MPEG-4 Audio 699 The MIME media type audio/MP4A string is mapped to fields in the Session 700 Description Protocol (SDP), RFC 2327, as follows: 702 o The MIME type (audio) goes in SDP "m=" as the media name. 704 o The MIME subtype (MP4A) goes in SDP "a=rtpmap" as the encoding name. 706 o The required parameter "rate" goes in "a=rtpmap" as the clock rate. 708 o The optional parameter "ptime" goes in SDP "a=ptime" attribute. 710 o The optional parameter "profile-level-id" goes in the "a=fmtp" line to 711 indicate the coder capability. The "object" parameter goes in the 712 "a=fmtp" attribute. The payload-format-specific parameters "bitrate", 713 "cpresent" and "config" go in the "a=fmtp" line. If the string after 714 "config=" is quite large, such large config data should not be 715 transmitted by SDP but should be transmitted by in-band mode. These 716 parameters are expressed as a MIME media type string, in the form of as a 717 semicolon separated list of parameter=value pairs. 719 The following are some examples of the media representation in SDP: 721 For 6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz), 722 m=audio 49230 RTP/AVP 96 723 a=rtpmap:96 MP4A/8000 724 a=fmtp:96 profile-level-id=9;object=8;cpresent=0;config=9128B1071070 725 a=ptime:20 727 For 64 kb/s AAC LC stereo bitstreams with 728 ( an audio sampling rate of 24 729 kHz), 730 m=audio 49230 RTP/AVP 96 731 a=rtpmap:96 MP4A/24000 732 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 733 config=9122620000 735 In the above two examples, audio configuration data is not multiplexed 736 into the RTP payload and is described only in SDP. Furthermore, the 737 "clock rate" is set to the audio sampling rate. 739 If the clock rate has been set to its default value and it is necessary 740 to obtain the audio sampling rate, this can be done by parsing the 741 "config" parameter (see the following example). 743 m=audio 49230 RTP/AVP 96 744 a=rtpmap:96 MP4A/90000 745 a=fmtp:96 object=8; cpresent=0; config=9128B1071070 747 The following example shows that the audio configuration data appears in 748 the RTP payload. 750 m=audio 49230 RTP/AVP 96 751 a=rtpmap:96 MP4A/90000 752 a=fmtp:96 object=13; cpresent=1 754 6. Security Considerations 756 RTP packets using the payload format defined in this specification are 757 subject to the security considerations discussed in the RTP specification 758 [8]. This implies that confidentiality of the media streams is achieved 759 by encryption. Because the data compression used with this payload format 760 is applied end-to-end, encryption may be performed on the compressed data 761 so there is no conflict between the two operations. 763 The complete MPEG-4 system allows for transport of a wide range of 764 content, including Java applets (MPEG-J) and scripts. Since this payload 765 format is restricted to audio and video streams, it is not possible to 766 transport such active content in this format. 768 7. References 770 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, 771 RFC 2026, October 1996. 773 2 ISO/IEC 14496-2:1999, "Information technology - Coding of audio-visual 774 objects - Part2: Visual", December 1999. 776 3 ISO/IEC 14496-3:1999, "Information technology - Coding of audio-visual 777 objects - Part3: Audio", December 1999. 779 4 ISO/IEC 14496-2:1999/FDAM1:2000, December 1999. 781 5 ISO/IEC 14496-3:1999/FDAM1:2000, December 1999. 783 6 ISO/IEC 14496-1:1999, "Information technology - Coding of audio-visual 784 objects - Part1: Systems", December 1999. 786 7 Bradner, S., "Key words for use in RFCs to Indicate Requirement 787 Levels", BCP 14, RFC 2119, March 1997 789 8 H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson "RTP: A Transport 790 Protocol for Real Time Applications", RFC 1889, Internet Engineering 791 Task Force, January 1996. 793 9 ISO/IEC 14496-2/COR1, "Information technology - Coding of audio-visual 794 objects - Part2: Visual, Technical corrigendum 1", March 2000. 796 8. Author's Addresses 798 Yoshihiro Kikuchi 799 Toshiba corporation 800 1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki, 212-8582, Japan 801 Email: yoshihiro.kikuchi@toshiba.co.jp 803 Yoshinori Matsui 804 Matsushita Electric Industrial Co., LTD. 805 1006, Kadoma, Kadoma-shi, Osaka, Japan 806 Email: matsui@drl.mei.co.jp 808 Toshiyuki Nomura 809 NEC Corporation 810 4-1-1,Miyazaki,Miyamae-ku,Kawasaki,JAPAN 811 Email: t-nomura@ccm.cl.nec.co.jp 813 Shigeru Fukunaga 814 Oki Electric Industry Co., Ltd. 815 1-2-27 Shiromi, Chuo-ku, Osaka 540-6025 Japan. 816 Email: fukunaga444@oki.co.jp 818 Hideaki Kimata 819 Nippon Telegraph and Telephone Corporation 820 1-1, Hikari-no-oka, Yokosuka-shi, Kanagawa, Japan 821 Email: kimata@nttvdt.hil.ntt.co.jp 823 Full Copyright Statement 825 "Copyright (C) The Internet Society (date). All Rights Reserved. 827 This document and translations of it may be copied and furnished to 828 others, and derivative works that comment on or otherwise explain it 829 or assist in its implementation may be prepared, copied, published 830 and distributed, in whole or in part, without restriction of any 831 kind, provided that the above copyright notice and this paragraph 832 are included on all such copies and derivative works. However, this 833 document itself may not be modified in any way, such as by removing 834 the copyright notice or references to the Internet Society or other 835 Internet organizations, except as needed for the purpose of 836 developing Internet standards in which case the procedures for 837 copyrights defined in the Internet Standards process must be 838 followed, or as required to translate it into languages other than 839 English. 841 The limited permissions granted above are perpetual and will not be 842 revoked by the Internet Society or its successors or assigns.