idnits 2.17.1 draft-ietf-avt-rtp-mpeg4-es-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 267 instances of too long lines in the document, the longest one being 27 characters in excess of 72. ** The abstract seems to contain references ([2], [3], [4]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 879 has weird spacing: '...cessing to ca...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 31, 2000) is 8731 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 14 looks like a reference -- Missing reference section? '2' on line 666 looks like a reference -- Missing reference section? '3' on line 785 looks like a reference -- Missing reference section? '4' on line 666 looks like a reference -- Missing reference section? '6' on line 50 looks like a reference -- Missing reference section? '7' on line 121 looks like a reference -- Missing reference section? '9' on line 666 looks like a reference -- Missing reference section? '8' on line 873 looks like a reference -- Missing reference section? '5' on line 785 looks like a reference -- Missing reference section? '11' on line 760 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 2 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Yoshihiro Kikuchi - Toshiba 2 Internet Draft Toshiyuki Nomura - NEC 3 Document: draft-ietf-avt-rtp-mpeg4-es-01.txt Shigeru Fukunaga - Oki 4 Yoshinori Matsui - Matsushita 5 Hideaki Kimata - NTT 6 May 31, 2000 8 RTP payload format for MPEG-4 Audio/Visual streams 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with all 13 provisions of Section 10 of RFC2026 [1]. 15 Internet-Drafts are working documents of the Internet Engineering Task 16 Force (IETF), its areas, and its working groups. Note that other groups 17 may also distribute working documents as Internet-Drafts. Internet-Drafts 18 are draft documents valid for a maximum of six months and may be updated, 19 replaced, or obsoleted by other documents at any time. It is 20 inappropriate to use Internet- Drafts as reference material or to cite 21 them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 24 The list of Internet-Draft Shadow Directories can be accessed at 25 http://www.ietf.org/shadow.html. 27 Abstract 29 This document describes RTP payload formats for the carriage of MPEG-4 30 Audio and Visual streams[2][3], and an RTCP format for MPEG-4 upstream 31 messages functionalities[4]. In this specification, MPEG-4 Audio/Visual 32 bitstreams are directly mapped into RTP packets. The RTP header fields 33 usage and the fragmentation rule for MPEG-4 Visual and Audio bitstreams 34 are specified. It also specifies an RTCP packet usage to carry the MPEG-4 35 upstream messages. In addition, MIME type registrations and SDP usages 36 for the MPEG-4 Audio and Visual streams are defined in this document. 38 1. Introduction 40 1.1 Why MPEG-4 Audio/Visual RTP format needed? 42 The RTP payload formats described in this Internet-Draft specify a way of 43 how MPEG-4 Audio and Visual streams are fragmented and mapped directly 44 onto RTP packets. 46 H.323 terminals could be an example where such RTP payload formats are 47 used. MPEG-4 Audio/Visual streams are not managed by Object Descriptors 48 of MPEG-4 Systems[6] but by H.245. The streams are directly mapped onto 49 RTP packets without using the synchronization functionality of MPEG-4 50 Systems [6]. 52 The semantics of RTP headers in such cases need to be clearly defined, 53 including the association with the MPEG-4 Audio/Visual data elements. In 54 addition, it would be beneficial to define the fragmentation rule of RTP 55 packets for MPEG-4 Video streams so as to enhance error resiliency by 56 utilizing the error resilience tools provided inside the MPEG-4 Video 57 stream. However, these items are not covered by other RTP payload format 58 proposals. 60 1.2 MPEG-4 Visual RTP payload format 62 MPEG-4 Visual is a visual coding standard with many new functionalities: 63 high coding efficiency, high error resiliency, multiple arbitrary shaped 64 object based coding, etc. [2]. It covers a wide range of bitrate from 65 several Kbps to many Mbps. It also covers a wide variety of networks 66 ranging from guarantied to be almost error-free to mobile networks with 67 high error rate due to its error resilience functionalities. 69 A fragmentation rule for an MPEG-4 visual bitstream into RTP packets is 70 defined in this document. Since MPEG-4 Visual is used for a wide variety 71 of networks, it is desirable not to apply too much restriction to the 72 fragmentation. A fragmentation rule like "a single video packet shall 73 always be mapped on a single RTP packet" may be inappropriate. On the 74 other hand, a careless media unaware fragmentation may cause degradation 75 of the error resiliency and the bandwidth efficiency. The fragmentation 76 rule described in this document is flexible but to define the minimum 77 rules and guidelines for preventing the meaningless fragmentation and to 78 utilizing the error resilience functionality of MPEG-4 visual. 80 For video coding media such as H.261 or MPEG-1/2, the additional media 81 specific RTP header works effectively for recovering. e.g., of a picture 82 header corrupted by packet losses. However, there are error resilience 83 functionalities inside MPEG-4 Visual to recover corrupt headers. These 84 functionalities can commonly be used on RTP/IP network as well as other 85 networks. (H.223/mobile, MPEG-2/TS, etc.) Therefore, no extra RTP header 86 fields are defined in the MPEG-4 Visual RTP payload format. 88 1.3 Consideration on the MPEG-4 Audio RTP payload format 90 MPEG-4 Audio is a new kind of audio standard that integrates many 91 different types of audio coding tools. It also supports a mechanism 92 representing synthesized sounds. Low-overhead MPEG-4 Audio Transport 93 Multiplex (LATM) manages the sequence of the compressed or the 94 represented audio data by MPEG-4 Audio tools with relatively small 95 overhead. In audio-only applications, the LATM-based MPEG-4 Audio 96 bitstreams, therefore, are desirable to be directly mapped into the RTP 97 packets without using MPEG-4 Systems. 99 Furthermore, if the payload of a packet is a single audio frame, a packet 100 loss does not impair the decodability of adjacent packets. Therefore, a 101 payload specific header for MPEG-4 Audio is not required as same as one 102 for the other audio coders. 104 1.4 MPEG-4 Audio/Visual upstream messaging on RTCP packets 106 Some particular tools of MPEG-4 Audio/Visual support upstream messaging 107 functionalities. These messages are extremely Audio/Visual specific, 108 since coders directly use these messages for controlling coding 109 parameters. From the point of view of controlling parameters, these 110 messages should be transmitted without delay. Therefore, these messages 111 are directly mapped onto some kind of low delay RTCP packets. The use of 112 this type of RTCP packets is limited to the case when the MPEG-4 upstream 113 functionalities in some particular profiles are used (e.g. MPEG-4 Visual 114 Advanced Real Time Simple Profile, NEWPRED tool). 116 2. Conventions used in this document 118 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 119 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 120 document are to be interpreted as described in RFC-2119 [7]. 122 3. RTP Packetization of MPEG-4 Visual bitstream 124 This section specifies the RTP packetization rule for MPEG-4 Visual 125 content. An MPEG-4 Visual bitstream is mapped directly onto the RTP 126 payload without any addition of extra header fields or removal of any 127 Visual syntax elements. The Combined Configuration/Elementary streams 128 mode is used so that the configuration information is carried in the same 129 RTP port as the elementary stream. (see 6.2.1 "Start codes" of ISO/IEC 130 14496-2 [2][9][4]) 131 When the short video header mode is used, RTP payload format for H.263 132 specified in the relevant RFCs or other standards MAY be used. 134 0 1 2 3 135 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 136 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 137 |V=2|P|X| CC |M| PT | sequence number | RTP 138 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 139 | timestamp | Header 140 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 141 | synchronization source (SSRC) identifier | 142 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 143 | contributing source (CSRC) identifiers | 144 | .... | 145 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 146 | | RTP 147 | MPEG-4 Visual stream (byte aligned) | Payload 148 | | 149 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 150 | :...OPTIONAL RTP padding | 151 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 153 Figure 1 - An RTP packet for MPEG-4 Visual stream 155 3.1 RTP header fields usage for MPEG-4 Visual 157 Payload Type (PT): Distinct payload type should be assigned to specify 158 MPEG-4 Visual RTP payload format. If the dynamic payload type assignment 159 is used, it is specified by some out-of-band means (e.g. H.245, SDP, 160 etc.) that the MPEG-4 Visual payload format is used for the corresponding 161 RTP packet. 163 Extension (X) bit: Defined by the RTP profile used. 165 Sequence Number: Increment by one for each RTP data packet sent. It 166 starts with a random initial value for security reasons. 168 Marker (M) bit: The marker bit is set to one to indicate the last RTP 169 packet (or only RTP packet) of a VOP. 171 Timestamp: The timestamp indicates the composition time, or the 172 presentation time in a no-compositor decoder by adding a constant random 173 offset for security reasons. For a video object plane, it is defined by 174 vop_time_increment (in units of 1/vop_time_increment_resolution seconds) 175 plus the cumulative number of whole seconds specified by module_time_base 176 and time_code of Group_of_VideoObjectPlane() if present. In the case of 177 interlaced video, a VOP consists of lines from two fields and the 178 timestamp indicates the composition time of the first field. If the RTP 179 packet contains only configuration information and/or 180 Group_of_VideoObjectPlane(), the composition time of the subsequent VOP 181 in the coding order is used. If the RTP packet contains only 182 visual_object_sequence_end_code, the composition time of the immediately 183 preceding VOP in the coding order is used. 185 Unless specified by an out-of-band means (e.g. SDP parameter or MIME 186 parameter as defined in section 6), the resolution of the timestamp is 187 set to its default (90KHz). 189 SSRC, CC and CSRC fields are used as described in RFC 1889 [8]. 191 3.2 Fragmentation of MPEG-4 Visual bitstream 193 A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP 194 payload without any addition of extra header fields or removal of any 195 Visual syntax elements. The Combined Configuration/Elementary streams 196 mode is used. The following rules apply for the fragmentation. 198 (1) The configuration information and Group_of_VideoObjectPlane() SHALL 199 be placed at the beginning of the RTP payload (just after the RTP header) 200 or just after the header of the syntactically upper layer function. 202 (2) If one or more headers exist in the RTP payload, the RTP payload 203 SHALL begin with the header of the syntactically highest function. 204 Note: The visual_object_sequence_end_code is regarded as the lowest 205 function. 207 (3) A header SHALL NOT be split into a plurality of RTP packets. 209 (4) Two or more VOPs SHALL be fragmented into different RTP packets so 210 that one RTP packet consists of the data bytes associated with an unique 211 presentation time (that indicated to the timestamp field in the RTP 212 packet header). 214 (5) A single video packet SHOULD NOT be split into a plurality of RTP 215 packets. The size of a video packet SHOULD be adjusted such that the 216 resulting RTP packet is not larger than the path-MTU. A video packet MAY 217 be split into a plurality of RTP packets when the size of the video 218 packet is large. 220 Here, header means: 221 - Configuration information (Visual Object Sequence Header, Visual Object 222 Header and Video Object Layer Header) 223 - visual_object_sequence_end_code 224 - The header of the entry point function for an elementary stream 225 (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(), 226 video_plane_with_short_header(), MeshObject() or FaceObject()) 228 - The video packet header (video_packet_header() excluding 229 next_resync_marker()) 230 - The header of gob_layer() 231 See 6.2.1 "Start codes" of ISO/IEC 14496-2[2][9][4] for the definition of 232 the configuration information and the entry point functions. 234 The video packet starts with the VOP header or the video packet header, 235 followed by motion_shape_texture(), and ends with next_resync_marker() or 236 next_start_code). 238 3.3 Examples of packetized MPEG-4 Visual bitstream 240 Considering that MPEG-4 Visual is used on a wide variety of networks from 241 several Kbps to many Mbps, from guaranteed networks which are almost 242 error-free to mobile networks with high error rate, it is desirable not 243 to apply too much restriction to the fragmentation. On the other hand, a 244 careless media unaware fragmentation will cause degradation of the error 245 resiliency and the bandwidth efficiency. The fragmentation criteria 246 described in 3.2 are flexible but to define the minimum rules to prevent 247 meaningless fragmentation. 249 For video coding media such as H.261 or MPEG-1/2, the additional media 250 specific RTP header works effectively for recovering, e.g., of a picture 251 header corrupted by packet losses. However, there is an error resilience 252 functionality inside MPEG-4 Visual to recover corrupt headers. This 253 functionality can commonly be used on RTP/IP network as well as other 254 networks. (H.223/mobile, MPEG-2/TS, etc.) Therefore, there is no strong 255 reason to define MPEG-4 Visual specific extra RTP header fields. 257 Figure 2 shows examples of RTP packets generated based on the criteria 258 described in 3.2 260 (a) is an example of the first RTP packet or the random access point of 261 an MPEG-4 visual bitstream. This RTP packet contains the configuration 262 information. According to the criterion (1), the Visual Object Sequence 263 Header(VS header) is placed at the beginning of the RTP payload, and the 264 Visual Object Header and the Video Object Layer Header(VO header, VOL 265 header) follow it. Since the fragmentation rule defined in 3.2 guarantees 266 that the configuration information, starting with 267 visual_object_sequence_start_code, is always placed at the beginning of 268 the RTP payload, RTP receivers can detect the random access point by 269 checking if the first 32-bit field of the RTP payload is 270 visual_object_sequence_start_code. 272 (b) is another example of the RTP packet containing the configuration 273 information. The difference from the example (a) is that this RTP packet 274 also contains a video packet in the VOP following the configuration 275 information. Since the length of the configuration information is 276 relatively short (typically several ten bytes), an RTP packet containing 277 only the configuration information may increase the overhead. Therefore, 278 the configuration information and the immediately following GOV and/or (a 279 part of) VOP can be packetized into a single RTP packet like this 280 example. 282 (c) is an example the RTP packet that contains 283 Group_of_VideoObjectPlane(GOV). Following the criterion (1), the GOV is 284 placed at the beginning of the RTP payload. It is a waste of RTP/IP 285 header overhead to generate a RTP packet containing only a GOV whose 286 length is 7 bytes. Therefore, (a part of) the following VOP can be placed 287 in the same RTP packet as shown in (c). 289 (d) is an example of the case where one video packet is packetized into 290 one RTP packet. When the packet-loss rate of the underlying network is 291 high, this kind of packetization is recommended. It is strongly 292 recommended to set resync_marker_disable to 0 in the VOL header to enable 293 adjustment of the video packet size. Even when the RTP packet containing 294 the VOP header is discarded by a packet loss, the other RTP packets can 295 be decoded by using the HEC(Header Extension Code) information in the 296 video packet header. No extra RTP header field is necessary. 298 (e) is an example of the case where more than one video packets are 299 packetized into one RTP packet. This kind of packetization is effective 300 to save the overhead of RTP/IP headers if the bit-rate of the underlying 301 network is low. However, it will decrease the packet-loss resiliency 302 because multiple video packets are discarded by a single RTP packet loss. 303 The adequate number of video packets in a RTP packet and the RTP packet 304 length depend the packet-loss rate and the bit-rate of the underlying 305 network. 307 Figure 3 shows examples of RTP packets prohibited by the criteria of 3.2. 309 Fragmentation of a header into multiple RTP packets, like (a), will not 310 only increase the overhead of RTP/IP headers but also decrease the error 311 resiliency. Therefore, it is prohibited by the criterion (3). 313 When concatenating more than one video packets into an RTP packet, VOP 314 header or video_packet_header() shall not be placed in the middle of the 315 RTP payload. The packetization like (b) is not allowed by the criterion 316 (2). This is because of the error resiliency. Comparing this example with 317 Figure 2(c), two video packets are mapped onto two RTP packets in both 318 cases. However, there is a difference between the packet-loss resiliency. 319 When the second RTP packet is lost, both video packets 1 and 2 are lost 320 in the case of Figure 3(b) whereas only video packet 2 is lost in the 321 case of Figure 2(c). 323 An RTP packet containing more than one VOPs, like (c), is not allowed. 325 +------+------+------+------+ 326 (a) | RTP | VS | VO | VOL | 327 |header|header|header|header| 328 +------+------+------+------+ 330 +------+------+------+------+------------+ 331 (b) | RTP | VS | VO | VOL |Video Packet| 332 |header|header|header|header| | 333 +------+------+------+------+------------+ 335 +------+-----+------------------+ 336 (c) | RTP | GOV |Video Object Plane| 337 |header| | | 338 +------+-----+------------------+ 340 +------+------+------------+ +------+------+------------+ 341 (d) | RTP | VOP |Video Packet| | RTP | VP |Video Packet| 342 |header|header| (1) | |header|header| (2) | 343 +------+------+------------+ +------+------+------------+ 345 +------+------+------------+------+------------+------+------------+ 346 (e) | RTP | VP |Video Packet| VP |Video Packet| VP |Video Packet| 347 |header|header| (1) |header| (2) |header| (3) | 348 +------+------+------------+------+------------+------+------------+ 350 Figure 2 - Examples of RTP packetized MPEG-4 Visual bitstream 352 +------+-------------+ +------+------------+------------+ 353 (a) | RTP |First half of| | RTP |Last half of|Video Packet| 354 |header| VP header | |header| VP header | | 355 +------+-------------+ +------+------------+------------+ 357 +------+------+----------+ +------+---------+------+------------+ 358 (b) | RTP | VOP |First half| | RTP |Last half| VP |Video Packet| 359 |header|header| of VP(1) | |header| of VP(1)|header| (2) | 360 +------+------+----------+ +------+---------+------+------------+ 362 +------+------+------------------+------+------------------+ 363 (c) | RTP | VOP |Video Object Plane| VOP |Video Object Plane| 364 |header|header| (1) |header| (2) | 365 +------+------+------------------+------+------------------+ 367 Figure 3 - Examples of prohibited RTP packetization for MPEG-4 Visual 368 bitstream 370 4. RTP Packetization of MPEG-4 Audio bitstream 372 When tools defined in MPEG-4 Systems are not used MPEG-4 Audio stream is 373 formatted by LATM (Low-overhead MPEG-4 Audio Transport Multiplex) 374 format[5], and then mapped onto RTP packets as described the subsequent 375 section. 377 4.1 RTP Packet Format 379 The LATM consists of the sequence of audioMuxElements that include one or 380 more audio frames. A complete audioMuxElement or the part of 381 audioMuxElements SHALL be mapped directly onto the RTP payload without 382 removal of any audioMuxElement syntax elements as shown in Figure 4. The 383 first byte of each audioMuxElement SHALL be located at the first payload 384 location of an RTP packet. 386 0 1 2 3 387 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 388 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 389 |V=2|P|X| CC |M| PT | sequence number |RTP 390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 391 | timestamp |Header 392 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 393 | synchronization source (SSRC) identifier | 394 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 395 | contributing source (CSRC) identifiers | 396 | .... | 397 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 398 | |RTP 399 : audioMuxElement (byte aligned) :Payload 400 | | 401 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 402 | :...OPTIONAL RTP padding | 403 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 404 Figure 4 - An RTP packet for MPEG-4 Audio 406 It is required for the audioMuxElement to indicate the following 407 muxConfigPresent information by an out-of-band means. 409 muxConfigPresent: If this information is set to 1, the audioMuxElement 410 SHALL include an indication bit "useSameStreamMux" and MAY include the 411 configuration information for audio compression "StreamMuxConfig". The 412 useSameStreamMux bit indicates whether the StreamMuxConfig element in the 413 previous frame is applied in the current frame. 415 4.2 RTP Header Fields Usage 417 Payload Type (PT): Distinct payload type should be assigned to specify 418 MPEG-4 Audio RTP payload format. If the dynamic payload type assignment 419 is used, it is specified by some out-of-band means (e.g. H.245, SDP, 420 etc.) that the MPEG-4 Audio payload format is used for the corresponding 421 RTP packet. 423 Marker (M) bit: The marker bit indicates audioMuxElement boundaries. This 424 bit is set to one to mark the RTP packet contains a complete 425 audioMuxElement or the last fragment of an audioMuxElement. 427 Timestamp: The timestamp indicates the composition time, or the 428 presentation time in a no-compositor decoder. Timestamps are recommended 429 to start at a random value for security reasons. 431 Unless specified by an out-of-band means, the resolution of the timestamp 432 is set to its default (90 kHz). 434 Sequence Number: Increment by one for each RTP packet sent. It starts 435 with a random value for security reasons. 437 SSRC, CC and CSRC fields are used as described in RFC 1889 [8]. 439 4.3 Fragmentation of MPEG-4 Audio bitstream 441 It is desirable to put one audioMuxElement per RTP packet. The size of an 442 audioMuxElement is tried to be adjusted such that the resulting RTP 443 packet is not larger than the path-MTU. If this is not possible, the 444 audioMuxElement MAY be fragmented across several packets based on the 445 following rules. 447 (1) "payloadMux" which consists of payload elements MAY be fragmented 448 into several RTP packets so that one RTP packet consists of one or more 449 payload elements. A payload element SHOULD NOT be fragmented. 451 (2) If the audioMuxElement includes StreamMuxConfig, StreamMuxConfig 452 SHALL be included into the RTP packet containing the first payload 453 element. 455 5. RTCP Packetization of MPEG-4 upstream messages 457 This section specifies the usage of particular RTCP packets to carry the 458 upstream messages generated using the MPEG-4 Audio/Visual upstream 459 messaging functionalities. In the current specification, NEWPRED in the 460 MPEG-4 Visual Advance Real Time Simple (ARTS) Profile[4] is only the tool 461 which uses this RTCP payload specification. This particular RTCP packet 462 SHALL ONLY be used when it is indicated by some out of band means that 463 the corresponding MPEG-4 Visual codec is compliant with the ARTS profile 464 and it is indicated in the configuration information of the MPEG-4 visual 465 bitstream that the NEWPRED tool is enabled (newpred_enable is set to 1). 467 5.1. Abstract of NEWPRED in the ARTS profile 468 NEWPRED in the ARTS profile is an error resilience tool using the 469 upstream messages from the decoder to the encoder. As the inter-frame 470 coding is used in the MPEG-4 Visual standard, the image degradation by 471 packet loss will be propagated to the after several frames. In order to 472 prevent the temporal error propagation, the reference frames of the 473 inter-frame coding are switched according to the upstream messages in the 474 NEWPRED. As the correct frames are used as the reference frame, the 475 error propagation is refreshed. 477 As neither the re-transmission nor the intra refresh are used, the coding 478 efficiency can be kept high. And the NEWPRED can achieve the faster 479 error recovery than the intra refresh. 481 There are two types of upstream messages; acknowledged message (NP_ACK) 482 and non-acknowledged message (NP_NACK). NP_ACK and/or NP_NACK messages 483 are transmitted on the particular RTCP packets in the NEWPRED. The 484 selecting methods of reference frames are dependent on the kind of used 485 messages. 487 5.2. Particular RTCP packets keep low delay 489 The real-time Audio/Visual transmission is more sensitive to delay and 490 does not require full reliability. For Audio/Visual applications it is 491 more effective to send the MPEG-4 upstream message packets as soon as 492 possible, i.e. as soon as a loss is detected, without adding any random 493 delays. 495 5.3. Congestion control 497 In the cases of the demand type of intra refresh or the re-transmission, 498 the amount of bits during the congestion is larger than that in the error 499 free terms. Therefore they may cause some another congestion. While in 500 the NEWPRED, as the intra-frame coding is not used, the increased amount 501 of bits is much lower than that of the intra refresh or the re- 502 transmission even in the case of packet loss. Therefore NEWPRED causes 503 less additional burden for the congestion. 505 The amount of the upstream messages is dependent on the strategy of the 506 selecting methods of reference frames of the encoder and that of the 507 sending upstream messages of the decoder. In order to avoid congestion, 508 the amount of upstream message packets should be small. In the NEWPRED, 509 the decoder can control the amount of them by not sending some upstream 510 messages; For example, in the case that the NP_NACK messages are mainly 511 used to select the reference frames in the encoder, the decoder may not 512 send the NP_ACK messages even if it receives downstream data. On the 513 other hand, in the case that the NP_ACK messages are mainly used in the 514 encoder, the decoder may not send the NP_NACK messages. The amount of the 515 upstream messages is at most 5% (normally about 1%) of the visual 516 downstream data. 518 Especially the amount of NP_ACK messages is decreased in the case of 519 packet loss. Therefore the NP_ACK message has no additional burden for 520 the congestion. On the other hand, NP_NACK messages corresponding to the 521 lost packets are usually sent after the congestion, because the decoder 522 detects the packet loss after the next downstream packet reaches. 523 Therefore the NP_NACK message has less additional burden for the 524 congestion, too. 526 And to reduce the number of particular RTCP packets, multiple upstream 527 messages can be concatenated in the payload of one particular RTCP 528 packet. In this case, it is desirable to send these concatenated 529 messages as soon as possible. 531 The particular RTCP transmission interval is according to the interval of 532 the decoding the visual downstream data. Both the receiving interval of 533 the visual RTP packet and the decoding time for each packet data have 534 some jitter for themselves. Therefore the particular RTCP transmission 535 interval has some jitter for itself. It is effective for the congestion 536 control, and there is no need to add any random delays. This means that 537 the size of sending jitter is enough to avoid another congestion only in 538 case of the unicast. 540 5.4. Limiting to Unicast 542 The NEWPRED can work in multicast only in the case that the number of 543 decoders is small. However in order to avoid the additional congestion, 544 the NEWPRED over RTP/RTCP SHALL NOT be used in multicast. 546 5.5. Relations with SR and RR 547 The particular low delay RTCP packets for the MPEG-4 upstream messages 548 SHALL be treated as the completely different kind of packets from the 549 normal RTCP packets; such as SR, RR and so on. 551 For example, if the particular RTCP packets would be included in the 552 calculation of RTCP sending interval, the RR packets should be generated 553 in the timing of the particular low delay RTCP packets. In this case, 554 the interval of the RR packets would be smaller than 5 seconds, and the 555 number of the normal RTCP packets is much increased. It is bad for the 556 congestion. 558 Therefore all particular RTCP packets SHALL be ignored to analyze the 559 information in the sender and receiver reports (SR and RR), and only 560 normal RTCP packets are used. 562 Multiple particular RTCP packets can be concatenated without any 563 intervening separators to form a compound RTCP packet. The normal 564 compound RTCP packet SHOULD start with SR or RR packets. However in the 565 case of compound particular RTCP packet, other normal RTCP packets SHALL 566 NOT be included, and only particular RTCP packets SHALL be included in 567 one compound particular RTCP packet. 569 5.6. MPEG-4 Visual upstream message packets definition 570 0 1 2 3 571 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 572 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 573 |V=2|P| UMT | PT=RTCP_MP4U | length | 574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 575 | SSRC | 576 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 577 | MPEG-4 Upstream Messages Payload (byte aligned) | 578 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 579 | : padding | 580 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 582 version (V): 2 bits 583 Identifies the version of RTP, which is the same in RTCP packets as 584 in RTP data packets. 586 padding (P): 1 bit 587 If the padding bit is set, this RTCP packet contains some additional 588 padding octets at the end which are not part of the control 589 information. The last octet of the padding is a count of how many 590 padding octets should be ignored. In the case several upstream 591 messages are mapped onto one RTCP packet, padding should only be 592 required on the last individual message. 594 upstream message type (UMT): 5 bits 595 Identifies the type of the MPEG-4 upstream messages. 596 0: forbidden 597 1: MPEG-4 Visual NEWPRED in the ARTS Profile 598 2-63: reserved 599 In this internet-draft, only the NEWPRED in the ARTS profile is 600 assigned as the candidate of the UMT for the moment. Some other 601 MPEG-4 Audio/Visual applications using the upstream messages may be 602 assigned in the future. 604 packet type (PT): 8 bits 605 The value of the packet type (PT) identifier is the constant 606 RTCP_MP4U (TBD). 608 SSRC: 32 bits 609 SSRC is the synchronization source identifier for the sender of this 610 packet. 612 MPEG-4 Upstream Message Payload: variable 613 The syntax and semantics of the MPEG-4 upstream messages are defined 614 in the ISO/IEC 14496-2/3[4][5]. All messages are byte aligned. 615 Normally one message is mapped onto one RTCP packet, and several 616 messages with same UMT could be continuously mapped onto one RTCP 617 packet. One message SHALL NOT be fragmented into different RTCP 618 packets. 620 6. MIME type registration for MPEG-4 Audio/Visual streams 622 The following sections describe the MIME type registrations for the MPEG- 623 4 Audio/Visual streams. MIME type registration and SDP usage for the 624 MPEG-4 Visual stream are described in sections 6.1 and 6.2, respectively. 625 MIME type registration and SDP usage for the MPEG-4 Audio stream are 626 described in sections 6.3 and 6.4, respectively. 628 (In the following sections, the RFC number "XXXX" represents the RFC 629 number, which should be assigned for this Internet Draft.) 631 6.1 MIME type registration for MPEG-4 Visual 633 MIME media type name: video 635 MIME subtype name: MP4V 637 Required parameters: none 639 Optional parameters: 640 rate: This parameter is used only for RTP transport. It indicates the 641 resolution of the timestamp field in the RTP header. If this parameter 642 is not specified, the default value of 90000 (90KHz) is used. 644 profile-level-id: A decimal representation of MPEG-4 Visual Profile 645 Level indication value (profile_and_level_indication) defined in Table 646 G-1 of ISO/IEC 14496-2 [2][4]. 648 mpeg4-newpred-upstream-message: A boolean number to indicate the 649 receiver capability of sending the upstream message of NEWPRED in 650 MPEG-4 video. The upstream messages are delivered on the particular 651 RTCP packets which are described in section 5. This optional exist 652 when and only when the "profile-level-id" is 145, 146, 147 or 148 653 (Advance Real Time Simple Profile/Level 1, 2, 3 or 4). 655 Example usages for these parameters are show bellow: 656 - MPEG-4 Visual Core Profile/Level 2: 657 Content-type: video/mp4v; profile-level-id=34 659 - MPEG-4 Visual Advanced Real Time Simple Profile/Level 1, upstream 660 message is used: 661 Content-type: video/mp4v; profile-level-id=145; mpeg4-newpred- 662 upstream-message=1 664 Published specification: 665 The specification of MPEG-4 Visual stream is presented in ISO/IEC 666 14469-2[2][4][9]. The RTP payload format is described in RFCXXXX. 668 Encoding considerations: 669 A video bitstream must be generated according to the MPEG-4 Visual 670 specification (ISO/IEC 14496-2). The video bitstream is binary data, 671 and must be encoded for non-binary transport; the Base64 encoding is 672 suitable for Email. This type is also defined for transfer via RTP. 673 The RTP packets must be packetized according to the MPEG-4 Visual RTP 674 payload format defined in RFCXXXX. 676 Security considerations: 677 See section 9 of RFCXXXX. 679 Interoperability considerations: 680 MPEG-4 Visual provides a large and rich set of tools for the coding of 681 visual objects. In order to allow effective implementations of the 682 standard, subsets of the MPEG-4 Visual tool sets have been identified, 683 that can be used for specific applications. These subsets, called 684 'Profiles', limit the tool set a decoder has to implement. For each of 685 these Profiles, one or more Levels have been set, restricting the 686 computational complexity. A Profile@Level combination allows: 688 o a codec builder to implement only the subset of the standard he 689 needs, while maintaining interworking with other MPEG-4 devices built 690 to the same combination, and 692 o checking whether MPEG-4 devices comply with the standard 693 ('conformance testing'). 695 The visual stream SHALL be compliant with the MPEG-4 Visual 696 Profile@Level specified by the parameter "profile-level-id". The 697 interoperability between a sender and a receiver may be achieved by 698 specifying the parameter "profile-level-id" in MIME content, or by 699 exchanging this parameter in the capability exchange procedure. 701 Applications which use this media type: 702 Audio and visual streaming and conferencing tools, Internet messaging 703 and e-mail applications. 705 Additional information: none 707 Person & email address to contact for further information: 708 The authors of RFCXXXX. (See section 9) 710 Intended usage: COMMON 712 Author/Change controller: 713 The authors of RFCXXXX. (See section 9) 715 6.2 SDP usage of MPEG-4 Visual 716 The MIME media type video/MP4V string is mapped to fields in the Session 717 Description Protocol (SDP), RFC 2327, as follows: 719 o The MIME type (video) goes in SDP "m=" as the media name. 721 o The MIME subtype (MP4V) goes in SDP "a=rtpmap" as the encoding name. 723 o The optional parameter "rate" goes in "a=rtpmap" as clock rate. 725 o The optional parameter "profile-level-id" MAY go in "a=fmtp" line. The 726 optional parameter "mpeg4-newpred-upstream-message" MAY go in "a=fmtp" 727 line, when and only when the "profile-level-id" is 145, 146, 147 or 728 148(Advance Real Time Simple Profile/Level 1, 2, 3 or 4). The format and 729 syntax of these parameters is the MIME media type string as a semicolon 730 separated list of parameter=value pairs. 732 The followings are some examples of the media representation in SDP: 734 Simple Profile/Level 1, rate=90000(90KHz), "profile-level-id" is present 735 in "a=fmtp" line: 736 m=video 49170/2 RTP/AVP 98 737 a=rtpmap:98 MP4V/90000 738 a=fmtp:98 profile-level-id=1 740 Advance Real Time Simple Profile/Level 1, rate=25(25Hz), "profile-level- 741 id" and " newpred- mpeg4- upstream-message" are present in "a=fmtp" line: 742 m=video 49170/2 RTP/AVP 98 743 a=rtpmap:98 MP4V/25 744 a=fmtp:98 profile-level-id=145; mpeg4-newpred-upstream-message=1 746 6.3 MIME type registration of MPEG-4 Audio 748 MIME media type name: audio 750 MIME subtype name: MP4A 752 Required parameters: 753 rate: the rate parameter indicates the RTP time stamp clock rate. The 754 default value is 90000. Other rates CAN be specified only if it would 755 be set to the same value with the audio sampling rate (number of 756 samples per second). 758 Optional parameters: 759 profile-level-id: a decimal representation of MPEG-4 Audio Profile 760 Level indication value defined in ISO/IEC 14496-1 [11]. This parameter 761 indicates the capability of subsets in MPEG-4 Audio tools. 763 object: a decimal representation of MPEG-4 Audio Object Type value 764 defined in ISO/IEC 14496-3 [5]. This parameter specifies the tool to 765 be used by the coder. It CAN be used to limit the capability within 766 the specified "profile-level-id". 768 bitrate: the data rate for the audio bit stream. 770 cpresent: this parameter indicates whether audio payload configuration 771 data is multiplexed into the RTP payload (See section 4.1 in this 772 document). 774 config: a hexadecimal representation of octet string indicating the 775 audio payload configuration data "StreamMuxConfig" defined in ISO/IEC 776 14496-3 [5]. The configuration data is mapped into the octet string in 777 an MSB-first basis. The first bit of the configuration data shall be 778 located at the MSB of the first octet. In the last octet, zero-padding 779 bits shall follow the configuration data, if necessary. 781 ptime: RECOMMENDED duration of each packet in milliseconds. 783 Published specification: 784 The payload format specification is described in this document. The 785 specification of encoding is provided in ISO/IEC 14496-3 [3][5]. 787 Encoding considerations: 788 This type is only defined for transfer via RTP [RFC YYYY, draft-ietf- 789 avt-rtp-new]. 791 Security considerations: 792 See section 9 of RFCXXXX. 794 Interoperability considerations: 795 MPEG-4 Audio provides a large and rich set of tools for the coding of 796 visual objects. In order to allow effective implementations of the 797 standard, subsets of the MPEG-4 Audio tool sets have been identified 798 similar to MPEG-4 Audio (See section 6.1). 800 The audio stream SHALL be compliant with the MPEG-4 Audio 801 Profile@Level specified by the parameter "profile-level-id". The 802 interoperability between a sender and a receiver may be achieved by 803 specifying the parameter "profile-level-id" in MIME content, or by 804 exchanging this parameter in the capability exchange procedure. 805 Furthermore, the "object" parameter can be used to limit the 806 capability within the specified Profile@Level in capability exchange. 808 Applications which use this media type: 809 Audio and video streaming and conferencing tools. 811 Additional information: none 813 Personal & email address to contact for further information: 814 See section 9 of RFCXXXX. 816 Intended usage: COMMON 818 Author/Change controller: 819 See section 9 of RFCXXXX. 821 6.4 SDP usage of MPEG-4 Audio 823 The MIME media type audio/MP4A string is mapped to fields in the Session 824 Description Protocol (SDP), RFC 2327, as follows: 826 o The MIME type (audio) goes in SDP "m=" as the media name. 828 o The MIME subtype (MP4A) goes in SDP "a=rtpmap" as the encoding name. 830 o The required parameter "rate" goes in "a=rtpmap" as clock rate. 832 o The optional parameter "ptime" goes in SDP "a=ptime" attribute. 834 o The optional parameter "profile-level-id" goes in "a=fmtp" line to 835 indicate the coder capability. The "object" parameter goes in "a=fmtp" 836 attribute. Any payload-format-specific parameters "bitrate", "cpresent" 837 and "config" go in "a=fmtp" line. The format and syntax of these 838 parameters is the MIME media type string as a semicolon separated list of 839 parameter=value pairs. 841 The followings are some examples of the media representation in SDP: 843 For 6 kb/s CELP bitstream (the audio sampling rate of 8 kHz), 844 m=audio 49230 RTP/AVP 96 845 a=rtpmap:96 MP4A/8000 846 a=fmtp:96 profile-level-id=9;object=8;cpresent=0;config=9128B1071070 847 a=ptime:20 849 For 64 kb/s AAC LC stereo bitstream (the audio sampling rate is 24 kHz), 850 m=audio 49230 RTP/AVP 96 851 a=rtpmap:96 MP4A/24000 852 a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; 853 config=9122620000 855 In the above two examples, the audio configuration data is not 856 multiplexed into the RTP payload and is described only in SDP. 857 Furthermore, the "clock rate" is set to the audio sampling rate. If it is 858 set to its default, the audio sampling rate can be obtained by parsing 859 the "config" parameter. 861 The following example shows that the audio configuration data appears in 862 the RTP payload. The value specified in "config" parameter is used as an 863 initial value to setup coding parameters. 865 m=audio 49230 RTP/AVP 96 866 a=rtpmap:96 MP4A/90000 867 a=fmtp:96 cpresent=1; config=9128B1071070 869 7. Security Considerations 871 RTP packets using the payload format defined in this specification are 872 subject to the security considerations discussed in the RTP specification 873 [8]. This implies that confidentiality of the media streams is achieved 874 by encryption. Because the data compression used with this payload format 875 is applied end-to-end, encryption may be performed on the compressed data 876 so there is no conflict between the two operations. 878 This payload type does not exhibit any significant non-uniformity in the 879 receiver side computational complexity for packet processing to cause a 880 potential denial-of-service threat. 882 8. References 884 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, 885 RFC 2026, October 1996. 887 2 ISO/IEC 14496-2:1999, "Information technology - Coding of audio-visual 888 objects - Part2: Visual", December 1999. 890 3 ISO/IEC 14496-3:1999, "Information technology - Coding of audio-visual 891 objects - Part3: Audio", December 1999. 893 4 ISO/IEC 14496-2:1999/FDAM1:2000, December 1999. 895 5 ISO/IEC 14496-3:1999/FDAM1:2000, December 1999. 897 6 ISO/IEC 14496-1:1999, "Information technology - Coding of audio-visual 898 objects - Part1: Systems", December 1999. 900 7 Bradner, S., "Key words for use in RFCs to Indicate Requirement 901 Levels", BCP 14, RFC 2119, March 1997 903 8 H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson "RTP: A Transport 904 Protocol for Real Time Applications", RFC 1889, Internet Engineering 905 Task Force, January 1996. 907 9 ISO/IEC 14496-2/COR1, "Information technology - Coding of audio-visual 908 objects - Part2: Visual, Technical corrigendum 1", March 2000. 910 9. Author's Addresses 912 Yoshihiro Kikuchi 913 Toshiba corporation 914 1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki, 212-8582, Japan 915 Email: yoshihiro.kikuchi@toshiba.co.jp 917 Yoshinori Matsui 918 Matsushita Electric Industrial Co., LTD. 919 1006, Kadoma, Kadoma-shi, Osaka, Japan 920 Email: matsui@drl.mei.co.jp 922 Toshiyuki Nomura 923 NEC Corporation 924 4-1-1,Miyazaki,Miyamae-ku,Kawasaki,JAPAN 925 Email: t-nomura@ccm.cl.nec.co.jp 927 Shigeru Fukunaga 928 Oki Electric Industry Co., Ltd. 929 1-2-27 Shiromi, Chuo-ku, Osaka 540-6025 Japan. 930 Email: fukunaga444@oki.co.jp 932 Hideaki Kimata 933 Nippon Telegraph and Telephone Corporation 934 1-1, Hikari-no-oka, Yokosuka-shi, Kanagawa, Japan 935 Email: kimata@nttvdt.hil.ntt.co.jp 937 Full Copyright Statement 939 "Copyright (C) The Internet Society (date). All Rights Reserved. 941 This document and translations of it may be copied and furnished to 942 others, and derivative works that comment on or otherwise explain it 943 or assist in its implementation may be prepared, copied, published 944 and distributed, in whole or in part, without restriction of any 945 kind, provided that the above copyright notice and this paragraph 946 are included on all such copies and derivative works. However, this 947 document itself may not be modified in any way, such as by removing 948 the copyright notice or references to the Internet Society or other 949 Internet organizations, except as needed for the purpose of 950 developing Internet standards in which case the procedures for 951 copyrights defined in the Internet Standards process must be 952 followed, or as required to translate it into languages other than 953 English. 955 The limited permissions granted above are perpetual and will not be 956 revoked by the Internet Society or its successors or assigns.