idnits 2.17.1 draft-ietf-avt-mpeg4-simple-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 39 longer pages, the longest (page 34) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 41 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The "Author's Address" (or "Authors' Addresses") section title is misspelled. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 2004) is 7368 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 1988, but not defined == Missing Reference: '15' is mentioned on line 1994, but not defined == Missing Reference: '19' is mentioned on line 1995, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 2048 (ref. '3') (Obsoleted by RFC 4288, RFC 4289) ** Obsolete normative reference: RFC 2327 (ref. '5') (Obsoleted by RFC 4566) ** Obsolete normative reference: RFC 2434 (ref. '6') (Obsoleted by RFC 5226) -- Obsolete informational reference (is this intentional?): RFC 2326 (ref. '8') (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 2733 (ref. '10') (Obsoleted by RFC 5109) -- Obsolete informational reference (is this intentional?): RFC 3016 (ref. '12') (Obsoleted by RFC 6416) Summary: 6 errors (**), 0 flaws (~~), 7 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force J. van der Meer 2 Internet Draft Philips Electronics 3 D. Mackie 4 Apple Computer 5 V. Swaminathan 6 Sun Microsystems Inc. 7 D. Singer 8 Apple Computer 9 P. Gentric 10 Philips Electronics 12 August 2003 13 Expires February 2004 15 Document: draft-ietf-avt-mpeg4-simple-08.txt 17 RTP Payload Format for Transport of MPEG-4 Elementary Streams 19 Status of this Memo 21 This document is an Internet-Draft and is in full conformance with 22 all provisions of section 10 of RFC 2026. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. Internet-Drafts are draft documents valid for a maximum of 28 six months and may be updated, replaced, or obsoleted by other 29 documents at any time. It is inappropriate to use Internet- Drafts 30 as reference material or to cite them other than as "work in 31 progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This specification is a product of the Audio/Video Transport working 39 group within the Internet Engineering Task Force. Comments are 40 solicited and should be addressed to the working group's mailing 41 list at avt@ietf.org and/or the authors. 43 << Note for the RFC editor: xxxx should be replaced with the RFC 44 number that will be assigned. >> 46 Abstract 48 The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in 49 ISO that produced the MPEG-4 standard. MPEG defines tools to 50 compress content such as audio-visual information into elementary 51 streams. This specification defines a simple, but generic RTP 52 payload format for transport of any non-multiplexed MPEG-4 53 elementary stream. 55 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Carriage of MPEG-4 elementary streams over RTP . . . . . . . 6 61 2.1. Signaling by MIME format parameters . . . . . . . . . . . 6 62 2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . . 6 63 2.3. Concatenation of Access Units . . . . . . . . . . . . . . 6 64 2.4. Fragmentation of Access Units . . . . . . . . . . . . . . 7 65 2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . . 7 66 2.6. Time stamp information . . . . . . . . . . . . . . . . . . 8 67 2.7. State indication of MPEG-4 system streams . . . . . . . . 8 68 2.8. Random Access Indication . . . . . . . . . . . . . . . . . 8 69 2.9. Carriage of auxiliary information . . . . . . . . . . . . 9 70 2.10. MIME format parameters and configuring conditional field . 9 71 2.11. Global structure of payload format . . . . . . . . . . . . 9 72 2.12. Modes to transport MPEG-4 streams . . . . . . . . . . . . 10 73 2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . 10 74 3. Payload format . . . . . . . . . . . . . . . . . . . . . . . 11 75 3.1. Usage of RTP header fields and RTCP . . . . . . . . . . . 11 76 3.2. RTP payload structure . . . . . . . . . . . . . . . . . . 12 77 3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . 12 78 3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . 12 79 3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . 15 80 3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . 15 81 3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . 16 82 3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 16 83 3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . 18 84 3.2.3.4. Crucial and non-crucial AUs with MPEG-4 System data . 20 85 3.3. Usage of this specification . . . . . . . . . . . . . . . 22 86 3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . 22 87 3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . 22 88 3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . 23 89 3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . 23 90 3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . 24 91 3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . 25 92 3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . 26 93 4. IANA considerations . . . . . . . . . . . . . . . . . . . . 27 94 4.1. MIME type registration . . . . . . . . . . . . . . . . . . 27 95 4.2. Registration of mode definitions with IANA . . . . . . . . 32 96 4.3. Concatenation of parameters . . . . . . . . . . . . . . . 32 97 4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . . 33 98 4.4.1. The a=fmtp keyword . . . . . . . . . . . . . . . . . . . 33 99 5. Security considerations . . . . . . . . . . . . . . . . . . 33 100 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 101 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 102 7.1 Normative references . . . . . . . . . . . . . . . . . . . . 34 103 7.2 Informative references . . . . . . . . . . . . . . . . . . . 35 104 8. Author addresses . . . . . . . . . . . . . . . . . . . . . . 35 106 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 108 APPENDIX: Usage of this payload format . . . . . . . . . . . 37 109 A. Examples of delay analysis with interleave . . . . . . . 37 110 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 37 111 A.2 De-interleaving and error concealment . . . . . . . . . 37 112 A.3 Simple Group interleave . . . . . . . . . . . . . . . . 37 113 A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 37 114 A.3.2 Determining the de-interleave buffer size . . . . . . 38 115 A.3.3 Determining the maximum displacement . . . . . . . . . 38 116 A.4 More subtle group interleave . . . . . . . . . . . . . . 38 117 A.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 38 118 A.4.2 Determining the de-interleave buffer size . . . . . . 39 119 A.4.3 Determining the maximum displacement . . . . . . . . . 39 120 A.5 Continuous interleave . . . . . . . . . . . . . . . . . 40 121 A.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 40 122 A.5.2 Determining the de-interleave buffer size . . . . . . 40 123 A.5.3 Determining the maximum displacement . . . . . . . . . 41 125 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 127 1. Introduction 129 The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 130 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 131 standards [1]. The MPEG-4 standard specifies compression of 132 audio-visual data into for example an audio or video elementary 133 stream. In the MPEG-4 standard, these streams take the form of 134 audio-visual objects that may be arranged into an audio-visual scene 135 by means of a scene description. Each MPEG-4 elementary stream 136 consists of a sequence of Access Units; examples of an Access Unit 137 (AU) are an audio frame and a video picture. 139 This specification defines a general and configurable payload 140 structure to transport MPEG-4 elementary streams, in particular 141 MPEG-4 audio (including speech) streams, MPEG-4 video streams and 142 also MPEG-4 systems streams, such as BIFS (BInary Format for 143 Scenes), OCI (Object Content Information), OD (Object Descriptor) 144 and IPMP (Intellectual Property Management and Protection) streams. 145 The RTP payload defined in this document is simple to implement and 146 reasonably efficient. It allows for optional interleaving of Access 147 Units (such as audio frames) to increase error resiliency in packet 148 loss. 150 Some types of MPEG-4 elementary streams include "crucial" 151 information whose loss cannot be tolerated, but RTP does not provide 152 reliable transmission so receipt of that crucial information is not 153 assured. Section 3.2.3.4 specifies how stream state is conveyed so 154 that the receiver can detect the loss of crucial information and 155 cease decoding until the next random access point is received. 156 Applications transmitting streams that include crucial information, 157 such as OD commands, BIFS commands, or programmatic content such as 158 MPEG-J (Java) and ECMAScript, should include random access points 159 sufficiently often, depending upon the probability of loss, to 160 reduce stream corruption to an acceptable level. An example is the 161 carousel mechanism as defined by MPEG in ISO/IEC 14496-1. 163 Such applications may also employ additional protocols or services 164 to reduce the probability of loss. At the RTP layer, these measures 165 include payload formats and profiles for retransmission or forward 166 error correction (such as in RFC 2733 [10]), which must be employed 167 with due consideration to congestion control. Another solution that 168 may be appropriate for some applications is to carry RTP over TCP 169 (such as in RFC 2326 [8], section 10.12). At the network layer, 170 resource allocation or preferential service may be available to 171 reduce the probability of loss. For a general description of methods 172 to repair streaming media see RFC 2354 [9]. 174 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 176 Though the RTP payload format defined in this document is capable 177 of transporting any MPEG-4 stream, other, more specific, formats 178 may exist, such as RFC 3016 [12] for transport of MPEG-4 video 179 (ISO/IEC 14496 [1] part 2). 181 Configuration of the payload is provided to accommodate transport 182 of any MPEG-4 stream at any possible bit rate. However, for a 183 specific MPEG-4 elementary stream typically only very few 184 configurations are needed. So as to allow for the design of 185 simplified, but dedicated receivers, this specification requires 186 that specific modes are defined for transport of MPEG-4 streams. 187 This document defines modes for MPEG-4 CELP and AAC streams, as 188 well as a generic mode that can be used to transport any MPEG-4 189 stream. In the future new RFCs are expected to specify additional 190 modes for transport of MPEG-4 streams. 192 The RTP payload format defined in this document specifies carriage 193 of system-related information that is often equivalent to the 194 information that may be contained in the MPEG-4 Sync Layer (SL) as 195 defined in MPEG-4 Systems [1]. This document does not prescribe how 196 to transcode or map information from the SL to fields defined in 197 the RTP payload format. Such processing, if any, is left to the 198 discretion of the application. However, to anticipate the need for 199 transport of any additional system-related information in future, 200 an auxiliary field can be configured that may carry any such data. 202 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 203 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 204 this document are to be interpreted as described in RFC 2119 [4]. 206 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 208 2. Carriage of MPEG-4 elementary streams over RTP 210 2.1 Signaling by MIME format parameters 212 With this payload format a single MPEG-4 elementary stream can be 213 transported. Information on the type of MPEG-4 stream carried in 214 the payload is conveyed by MIME format parameters, for example in 215 an SDP [5] message or by other means (see section 4). These MIME 216 format parameters specify the configuration of the payload. To 217 allow for simplified and dedicated receivers, a MIME format 218 parameter is available to signal a specific mode of using this 219 payload. A mode definition MAY include the type of MPEG-4 220 elementary stream as well as the applied configuration, so as to 221 avoid the need for receivers to parse all MIME format parameters. 222 The applied mode MUST be signaled. 224 2.2 MPEG Access Units 226 For carriage of compressed audio-visual data MPEG defines Access 227 Units. An MPEG Access Unit (AU) is the smallest data entity to 228 which timing information is attributed. In case of audio an Access 229 Unit may represent an audio frame and in case of video a picture. 230 MPEG Access Units are by definition octet-aligned. If for example 231 an audio frame is not octet-aligned, up to 7 zero-padding bits MUST 232 be inserted at the end of the frame to achieve the octet-aligned 233 Access Units, as required by the MPEG-4 specification. MPEG-4 234 decoders MUST be able to decode AUs in which such padding is 235 applied. 237 Consistent with the MPEG-4 specification, this document requires 238 that each MPEG-4 part 2 video Access Unit includes all the coded 239 data of a picture, any video stream headers that may precede the 240 coded picture data, and any video stream stuffing that may follow 241 it, up to, but not including the startcode indicating the start of 242 a new video stream or the next Access Unit. 244 2.3 Concatenation of Access Units 246 Frequently it is possible to carry multiple Access Units in one RTP 247 packet. This is particularly useful for audio; for example, when 248 AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC 249 frames contain on average approximately 200 octets. On a LAN with a 250 1500 octet MTU this would allow on average 7 complete AAC frames to 251 be carried per RTP packet. 253 Access Units may have a fixed size in octets, but a variable size 254 is also possible. To facilitate parsing in case of multiple 255 concatenated AUs in one RTP packet, the size of each AU is made 256 known to the receiver. When concatenating in case of a constant AU 257 size, this size is communicated "out of band" through a MIME format 258 parameter. When concatenating in case of variable size AUs, the RTP 259 payload carries "in band" an AU size field for each contained AU. 261 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 263 In combination with the RTP payload length the size information 264 allows the RTP payload to be split by the receiver back into the 265 individual AUs. 267 To simplify the implementation of RTP receivers, it is required 268 that when multiple AUs are carried in an RTP packet, each AU MUST 269 be complete, i.e. the number of AUs in an RTP packet MUST be 270 integral. In addition, an AU MUST NOT be repeated in other RTP 271 packets; hence repetition of an AU is only possible by using a 272 duplicate RTP packet. 274 2.4 Fragmentation of Access Units 276 MPEG allows for very large Access Units. Since most IP networks 277 have significantly smaller MTU sizes, this payload format allows 278 for the fragmentation of an Access Unit over multiple RTP packets. 279 Hence when an IP packet is lost after IP-level fragmentation, only an 280 AU fragment may get lost instead of the entire AU. To simplify the 281 implementation of RTP receivers, an RTP packet SHALL either carry 282 one or more complete Access Units or a single fragment of one AU, 283 i.e. packets MUST NOT contain fragments of multiple Access Units. 285 2.5 Interleaving 287 When an RTP packet carries a contiguous sequence of Access Units, 288 the loss of such a packet can result in a "decoding gap" for the 289 user. One method to alleviate this problem is to allow for the 290 Access Units to be interleaved in the RTP packets. For a modest 291 cost in latency and implementation complexity, significant error 292 resiliency to packet loss can be achieved. 294 To support optional interleaving of Access Units, this payload 295 format allows for index information to be sent for each Access Unit. 296 After informing receivers about buffer resources to allocate for 297 de-interleaving, the RTP sender is free to choose the interleaving 298 pattern without propagating this information a priori to the 299 receiver(s). Indeed the sender could dynamically adjust the 300 interleaving pattern based on the Access Unit size, error rates, 301 etc. The RTP receiver does not need to know the interleaving 302 pattern used, it only needs to extract the index information of the 303 Access Unit and insert the Access Unit into the appropriate 304 sequence in the decoding or rendering queue. An example of 305 interleaving is given below. 307 For example, if we assume that an RTP packet contains 3 AUs, and 308 that the AUs are numbered 0, 1, 2, 3, 4, and so forth, and if an 309 interleaving group length of 9 is chosen, then RTP packet(i) 310 contains the following AU(n): 311 RTP packet(0): AU(0), AU(3), AU(6) 312 RTP packet(1): AU(1), AU(4), AU(7) 313 RTP packet(2): AU(2), AU(5), AU(8) 314 RTP packet(3): AU(9), AU(12), AU(15) 315 RTP packet(4): AU(10), AU(13), AU(16) Etc. 317 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 319 2.6 Time stamp information 321 The RTP time stamp MUST carry the sampling instant of the first AU 322 (fragment) in the RTP packet. When multiple AUs are carried within 323 an RTP packet, the time stamps of subsequent AUs can be calculated 324 if the frame period of each AU is known. For audio and video this 325 is possible if the frame rate is constant. However, in some cases 326 it is not possible to make such calculation. For example, for 327 variable frame rate video, or for MPEG-4 BIFS streams carrying 328 composition information. To support such cases, this payload format 329 can be configured to carry a time stamp in the RTP payload for each 330 contained Access Unit. A time stamp MAY be conveyed in the RTP 331 payload only for non-first AUs in the RTP packet, and SHALL NOT be 332 conveyed for the first AU (fragment), as the time stamp for the 333 first AU in the RTP packet is carried by the RTP time stamp. 335 MPEG-4 defines two types of time stamp: the composition time stamp 336 (CTS) and the decoding time stamp (DTS). The CTS represents the 337 sampling instant of an AU, and hence the CTS is equivalent to the 338 RTP time stamp. The DTS may be used in MPEG-4 video streams that 339 use bi-directional coding, i.e. when pictures are predicted in both 340 forward and backward direction by using either a reference picture 341 in the past, or a reference picture in the future. The DTS cannot 342 be carried in the RTP header. In some cases the DTS can be derived 343 from the RTP time stamp using frame rate information; this requires 344 deep parsing in the video stream, which may be considered 345 objectionable. But if the video frame rate is variable, the required 346 information may not even be present in the video stream. For both 347 reasons, the capability has been defined to optionally carry the 348 DTS in the RTP payload for each contained Access Unit. 350 To keep the coding of time stamps efficient, each time stamp 351 contained in the RTP payload is coded differentially, the CTS from 352 the RTP time stamp, and the DTS from the CTS. 354 2.7 State indication of MPEG-4 system streams 356 ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to 357 convey state information when transporting MPEG-4 system streams, 358 this payload format allows for the optional carriage in the RTP 359 payload of the stream state for each contained Access Unit. Stream 360 states are used to signal "crucial" AUs that carry information whose 361 loss cannot be tolerated and are also useful when repeating AUs 362 according to the carousel mechanism defined in ISO/IEC 14496-1. 364 2.8 Random access indication 366 Random access to the content of MPEG-4 elementary streams may be 367 possible at some but not all Access Units. To signal Access Units 368 where random access is possible, a random access point flag can 369 optionally be carried in the RTP payload for each contained Access 370 Unit. Carriage of random access points is particularly useful for 371 MPEG-4 system streams in combination with the stream state. 373 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 375 2.9 Carriage of auxiliary information. 377 This payload format defines a specific field to carry auxiliary 378 data. The auxiliary data field is preceded by a field that specifies 379 the length of the auxiliary data, so as to facilitate skipping of 380 the data without parsing it. The coding of the auxiliary data is not 381 defined in this document; instead the format, meaning and signaling 382 of auxiliary information is expected to be specified in one or more 383 future RFCs. Auxiliary information MUST NOT be transmitted until its 384 format, meaning and signaling have been specified and its use has 385 been signaled. Receivers that have knowledge of the auxiliary data 386 MAY decode the auxiliary data, but receivers without knowledge of 387 such data MUST skip the auxiliary data field. 389 2.10 MIME format parameters and configuring conditional fields 391 To support the features described in the previous sections several 392 fields are defined for carriage in the RTP payload. However, their 393 use strongly depends on the type of MPEG-4 elementary stream that 394 is carried. Sometimes a specific field is needed with a certain 395 length, while in other cases such field is not needed at all. To be 396 efficient in either case, the fields to support these features are 397 configurable by means of MIME format parameters. In general, a MIME 398 format parameter defines the presence and length of the associated 399 field. A length of zero indicates absence of the field. As a 400 consequence, parsing of the payload requires knowledge of MIME 401 format parameters. The MIME format parameters are conveyed to the 402 receiver via SDP [5] messages, as specified in section 4.4.1, or 403 through other means. 405 2.11 Global structure of payload format 407 The RTP payload following the RTP header, contains three 408 octet-aligned data sections, of which the first two MAY be empty. 409 See figure 1. 411 +---------+-----------+-----------+---------------+ 412 | RTP | AU Header | Auxiliary | Access Unit | 413 | Header | Section | Section | Data Section | 414 +---------+-----------+-----------+---------------+ 416 <----------RTP Packet Payload-----------> 418 Figure 1: Data sections within an RTP packet 420 The first data section is the AU (Access Unit) Header Section, that 421 contains one or more AU-headers; however, each AU-header MAY be 422 empty, in which case the entire AU Header Section is empty. The 423 second section is the Auxiliary Section, containing auxiliary data; 424 this section MAY also be configured empty. The third section is the 425 Access Unit Data Section, containing either a single fragment of 427 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 429 one Access Unit or one or more complete Access Units. The Access 430 Unit Data Section MUST NOT be empty. 432 2.12 Modes to transport MPEG-4 streams 434 While it is possible to build fully configurable receivers capable 435 of receiving any MPEG-4 stream, this specification also allows for 436 the design of simplified, but dedicated receivers, that are capable 437 for example of receiving only one type of MPEG-4 stream. This 438 is achieved by requiring that specific modes be defined for using 439 this specification. Each mode may define constraints for transport 440 of one or more type of MPEG-4 streams, for instance on the payload 441 configuration. 443 The applied mode MUST be signaled. Signaling the mode is 444 particularly important for receivers that are only capable of 445 decoding one or more specific modes. Such receivers need to 446 determine whether the applied mode is supported, so as to avoid 447 problems with processing of payloads that are beyond the 448 capabilities of the receiver. 450 In this document several modes are defined for transport of MPEG-4 451 CELP and AAC streams, as well as a generic mode that can be used 452 for any MPEG-4 stream. In the future, new RFCs may specify other 453 modes of using this specification. However, each mode MUST be in 454 full compliance with this specification (see section 3.3.7). 456 2.13 Alignment with RFC 3016 458 This payload can be configured to be nearly identical to the 459 payload format defined in RFC 3016 [12] for the MPEG-4 video 460 configurations recommended in RFC 3016. Hence, receivers that 461 comply with RFC 3016 can decode such RTP payload, providing that 462 additional packets containing video decoder configuration (VO, 463 VOL, VOSH) are inserted in the stream, as required by RFC 3016. 464 Conversely, receivers that comply with the specification in this 465 document SHOULD be able to decode payloads, names and parameters 466 defined for MPEG-4 video in RFC 3016. In this respect it is 467 strongly RECOMMENDED to implement the ability to ignore "in band" 468 video decoder configuration packets in the RFC 3016 payload. 470 Note the "out of band" availability of the video decoder 471 configuration is optional in RFC 3016. To achieve maximum 472 interoperability with the RTP payload format defined in this 473 document, applications that use RFC 3016 to transport MPEG-4 video 474 (part 2) are recommended to make the video decoder configuration 475 available as a MIME parameter. 477 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 479 3. Payload Format 481 3.1 Usage of RTP Header Fields and RTCP 483 Payload Type (PT): The assignment of an RTP payload type for this 484 packet format is outside the scope of this document; it is 485 specified by the RTP profile under which this payload format is 486 used, or signaled dynamically out-of-band (e.g. using SDP). 488 Marker (M) bit: The M bit is set to 1 to indicate that the RTP 489 packet payload contains either the final fragment of a fragmented 490 Access Unit or one or more complete Access Units. 492 Extension (X) bit: Defined by the RTP profile used. 494 Sequence Number: The RTP sequence number SHOULD be generated by the 495 sender in the usual manner with a constant random offset. 497 Timestamp: Indicates the sampling instant of the first AU 498 contained in the RTP payload. This sampling instant is equivalent 499 to the CTS in the MPEG-4 time domain. When using SDP the clock rate 500 of the RTP time stamp MUST be expressed using the "rtpmap" 501 attribute. If an MPEG-4 audio stream is transported, the rate SHOULD 502 be set to the same value as the sampling rate of the audio stream. 503 If an MPEG-4 video stream is transported, it is RECOMMENDED to set 504 the rate to 90 kHz. 506 In all cases, the sender SHALL make sure that RTP time stamps 507 are identical only if the RTP time stamp refers to fragments of the 508 same Access Unit. 510 According to RFC 1889 [2] (section 5.1), RTP time stamps are 511 RECOMMENDED to start at a random value for security reasons. This 512 is not an issue for synchronization of multiple RTP streams. When, 513 however, streams from multiple sources are to be synchronized (for 514 example one stream from local storage, another from an RTP streaming 515 server), synchronization may become impossible if the receiver only 516 knows the original time stamp relationships. Synchronization in such 517 cases, may require to provide the correct relationship between time 518 stamps for obtaining synchronization by out of band means. The 519 format of such information as well as methods to convey such 520 information are beyond the scope of this specification. 522 SSRC: set as described in RFC 1889 [2]. 524 CC and CSRC fields are used as described in RFC 1889 [2]. 526 RTCP SHOULD be used as defined in RFC 1889 [2]. Note that time 527 stamps in RTCP Sender Reports may be used to synchronize multiple 528 MPEG-4 elementary streams and also to synchronize MPEG-4 streams 529 with non-MPEG-4 streams, in case the delivery of these streams uses 530 RTP. 532 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 534 3.2 RTP Payload Structure 536 3.2.1 The AU Header Section 538 When present, the AU Header Section consists of the 539 AU-headers-length field, followed by a number of AU-headers. See 540 figure 2. 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ 543 |AU-headers-length|AU-header|AU-header| |AU-header|padding| 544 | | (1) | (2) | | (n) | bits | 545 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ 547 Figure 2: The AU Header Section 549 The AU-headers are configured using MIME format parameters and MAY 550 be empty. If the AU-header is configured empty, the 551 AU-headers-length field SHALL NOT be present and consequently the 552 AU Header Section is empty. If the AU-header is not configured 553 empty, then the AU-headers-length is a two octet field that 554 specifies the length in bits of the immediately following 555 AU-headers, excluding the padding bits. 557 Each AU-header is associated with a single Access Unit (fragment) 558 contained in the Access Unit Data Section in the same RTP packet. 559 For each contained Access Unit (fragment) there is exactly one 560 AU-header. Within the AU Header Section, the AU-headers are 561 bit-wise concatenated in the order in which the Access Units are 562 contained in the Access Unit Data Section. Hence, the n-th 563 AU-header refers to the n-th AU (fragment). If the concatenated 564 AU-headers consume a non-integer number of octets, up to 7 565 zero-padding bits MUST be inserted at the end in order to achieve 566 octet-alignment of the AU Header Section. 568 3.2.1.1 The AU-header 570 Each AU-header may contain the fields given in figure 3. The length 571 in bits of the fields, with the exception of the CTS-flag, the 572 DTS-flag and the RAP-flag fields is defined by MIME format 573 parameters; see section 4.1. If a MIME format parameter has the 574 default value of zero, then the associated field is not present. 575 The number of bits for fields that are present and that represent 576 the value of a parameter MUST be chosen large enough to correctly 577 encode the largest value of that parameter during the session. 579 If present, the fields MUST occur in the mutual order given in 580 figure 3. In the general case a receiver can only discover the size 581 of an AU-header by parsing it since the presence of the CTS-delta 582 and DTS-delta fields is signaled by the value of the CTS-flag and 583 DTS-flag, respectively. 585 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 587 +---------------------------------------+ 588 | AU-size | 589 +---------------------------------------+ 590 | AU-Index / AU-Index-delta | 591 +---------------------------------------+ 592 | CTS-flag | 593 +---------------------------------------+ 594 | CTS-delta | 595 +---------------------------------------+ 596 | DTS-flag | 597 +---------------------------------------+ 598 | DTS-delta | 599 +---------------------------------------+ 600 | RAP-flag | 601 +---------------------------------------+ 602 | Stream-state | 603 +---------------------------------------+ 605 Figure 3: The fields in the AU-header. If used, the AU-Index field 606 only occurs in the first AU-header within an AU Header 607 Section; in any other AU-header the AU-Index-delta field 608 occurs instead. 610 AU-size: Indicates the size in octets of the associated Access Unit 611 in the Access Unit Data Section in the same RTP packet. When 612 the AU-size is associated with an AU fragment, the AU size 613 indicates the size of the entire AU and not the size of the 614 fragment. In this case, the size of the fragment is known 615 from the size of the AU data section. This can be exploited 616 to determine whether a packet contains an entire AU or a 617 fragment, which is particularly useful after losing a packet 618 carrying the last fragment of an AU. 620 AU-Index: Indicates the serial number of the associated Access Unit 621 (fragment). For each (in decoding order) consecutive AU or AU 622 fragment, the serial number is incremented with 1. When 623 present, the AU-Index field occurs in the first AU-header in 624 the AU Header Section, but MUST NOT occur in any subsequent 625 (non-first) AU-header in that Section. To encode the serial 626 number in any such non-first AU-header, the AU-Index-delta 627 field is used. 629 AU-Index-delta: The AU-Index-delta field is an unsigned integer 630 that specifies the serial number of the associated AU as the 631 difference with respect to the serial number of the previous 632 Access Unit. Hence, for the n-th (n>1) AU the serial number 633 is found from: 634 AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1 635 If the AU-Index field is present in the first AU-header in 637 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 639 the AU Header Section, then the AU-Index-delta field MUST be 640 present in any subsequent (non-first) AU-header. When the 641 AU-Index-delta is coded with the value 0, it indicates that 642 the Access Units are consecutive in decoding order. An 643 AU-Index-delta value larger than 0 signals that interleaving 644 is applied. 646 CTS-flag: Indicates whether the CTS-delta field is present. 647 A value of 1 indicates that the field is present, a value 648 of 0 that it is not present. 649 The CTS-flag field MUST be present in each AU-header if the 650 length of the CTS-delta field is signaled to be larger than 651 zero. In that case, the CTS-flag field MUST have the value 0 652 in the first AU-header and MAY have the value 1 in all 653 non-first AU-headers. The CTS-flag field SHOULD be 0 for 654 any non-first fragment of an Access Unit. 656 CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's 657 complement offset (delta) from the time stamp in the RTP 658 header of this RTP packet. The CTS MUST use the same clock 659 rate as the time stamp in the RTP header. 661 DTS-flag: Indicates whether the DTS-delta field is present. A value 662 of 1 indicates that DTS-delta is present, a value of 0 that 663 it is not present. 664 The DTS-flag field MUST be present in each AU-header if the 665 length of the DTS-delta field is signaled to be larger than 666 zero. The DTS-flag field MUST have the same value for all 667 fragments of an Access Unit. 669 DTS-delta: Specifies the value of the DTS as a 2's complement 670 offset (delta) from the CTS. The DTS MUST use the 671 same clock rate as the time stamp in the RTP header. The 672 DTS-delta field MUST have the same value for all fragments of 673 an Access Unit. 675 RAP-flag: Indicates when set to 1 that the associated Access Unit 676 provides a random access point to the content of the stream. 677 If an Access Unit is fragmented, the RAP flag, if present, 678 MUST be set to 0 for each non-first fragment of the AU. 680 Stream-state: Specifies the state of the stream for an AU of an 681 MPEG-4 system stream; each state is identified by a value of 682 a modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams 683 use the AU_SequenceNumber to signal stream states. When the 684 stream state changes, the value of stream-state MUST be 685 incremented by one. 687 Note: no relation is required between stream-states of 688 different streams. 690 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 692 3.2.2 The Auxiliary Section 694 The Auxiliary Section consists of the auxiliary-data-size field 695 followed by the auxiliary-data field. Receivers MAY (but are not 696 required to) parse the auxiliary-data field; to facilitate skipping 697 of the auxiliary-data field by receivers, the auxiliary-data-size 698 field indicates the length in bits of the auxiliary-data. If the 699 concatenation of the auxiliary-data-size and the auxiliary-data 700 fields consume a non-integer number of octets, up to 7 zero padding 701 bits MUST be inserted immediately after the auxiliary data in order 702 to achieve octet-alignment. See figure 4. 704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ 705 | auxiliary-data-size | auxiliary-data |padding bits | 706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ 708 Figure 4: The fields in the Auxiliary Section 710 The length in bits of the auxiliary-data-size field is configurable 711 by a MIME format parameter; see section 4.1. The default length of 712 zero indicates that the entire Auxiliary Section is absent. 714 auxiliary-data-size: specifies the length in bits of the immediately 715 following auxiliary-data field; 717 auxiliary-data: the auxiliary-data field contains data of a format 718 not defined by this specification. 720 3.2.3 The Access Unit Data Section 722 The Access Unit Data Section contains an integer number of complete 723 Access Units or a single fragment of one AU. The Access Unit Data 724 Section is never empty. If data of more than one Access Unit is 725 present, then the AUs are concatenated into a contiguous string 726 of octets. See figure 5. The AUs inside the Access Unit Data 727 Section MUST be in decoding order, though not necessarily contiguous 728 in the case of interleaving. 730 The size and number of Access Units SHOULD be adjusted such that 731 the resulting RTP packet is not larger than the path MTU. To handle 732 larger packets, this payload format relies on lower layers for 733 fragmentation, which may result in reduced performance. 735 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 737 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 738 |AU(1) | 739 + | 740 | | 741 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 742 | |AU(2) | 743 +-+-+-+-+-+-+-+-+ | 744 | | 745 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 746 | | AU(n) | 747 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 748 |AU(n) continued| 749 |-+-+-+-+-+-+-+-+ 751 Figure 5: Access Unit Data Section; each AU is octet-aligned. 753 When multiple Access Units are carried, the size of each AU MUST be 754 made available to the receiver. If the AU size is variable then the 755 size of each AU MUST be indicated in the AU-size field of the 756 corresponding AU-header. However, if the AU size is constant for a 757 stream, this mechanism SHOULD NOT be used, but instead the fixed 758 size SHOULD be signaled by the MIME format parameter 759 "constantSize", see section 4.1. 761 The absence of both AU-size in the AU-header and the constantSize 762 MIME format parameter indicates carriage of a single AU (fragment), 763 i.e. that a single Access Unit (fragment) is transported in each 764 RTP packet for that stream. 766 3.2.3.1 Fragmentation 768 A packet SHALL carry either one or more complete Access Units, or 769 a single fragment of an Access Unit. Fragments of the same Access 770 Unit have the same time stamp but different RTP sequence numbers. 771 The marker bit in the RTP header is 1 on the last fragment of an 772 Access Unit, and 0 on all other fragments. 774 3.2.3.2 Interleaving 776 Unless prohibited by the signaled mode, a sender MAY interleave 777 Access Units. Receivers that are capable of receiving modes that 778 support interleaving, MUST be able to decode interleaved Access 779 Units. 781 When a sender interleaves Access Units, it needs to provide 782 sufficient information to enable a receiver to unambiguously 783 reconstruct the original order, even in case of out-of-order 784 packets, packet loss or duplication. The information that senders 786 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 788 need to provide depends on whether or not the Access Units have a 789 constant time duration. Access Units have a constant time duration, 790 if: 792 TS(i+1) - TS(i) = constant, for any i, where 794 i indicates the index of the AU in original order 795 TS(i) denotes the time stamp of AU(i) 797 The MIME parameter "constantDuration" SHOULD be used to signal that 798 Access Units have a constant time duration, see section 4.1. 800 If the "constantDuration" parameter is present, the receiver can 801 reconstruct the original Access Unit timing based solely on the RTP 802 timestamp and AU-Index-delta. Accordingly, when transmitting Access 803 Units of constant duration, the AU-Index, if present, MUST be set 804 to the value 0. Receivers of constant duration Access Units MUST 805 use the RTP timestamp to determine the index of the first AU in the 806 RTP packet. The AU-Index-delta header and the signaled 807 "constantDuration" are used to reconstruct AU timing. 809 If the "constantDuration" parameter is not present, then Access 810 Units are assumed to have a variable duration, unless the AU-Index 811 is present and coded with the value 0 in each RTP packet. When 812 transmitting Access Units of variable duration, then the 813 "constantDuration" parameter MUST NOT be present, and the 814 transmitter MUST use the AU-Index to encode the index information 815 required for re-ordering, and the receiver MUST use that value to 816 determine the index of each AU in the RTP packet. The number of 817 bits of the AU-Index field MUST be chosen so that valid index 818 information is provided at the applied interleaving scheme, without 819 causing problems due to roll-over of the AU-Index field. In 820 addition, the CTS-delta MUST be coded in the AU header for each 821 non-first AU in the RTP packet, so that receivers can place the AUs 822 correctly in time. 824 When interleaving is applied, a de-interleave buffer is needed in 825 receivers to put the Access Units in their correct logical 826 consecutive decoding order. This requires the computation of the 827 time stamp for each Access Unit. In case of a constant time duration 828 per Access Unit, the time stamp of the i-th access unit in an RTP 829 packet with RTP time stamp T is calculated as follows: 831 Timestamp[0] = T 832 Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] 833 + 1))) * access-unit-duration 835 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 837 When AU-Index-delta is always 0, this reduces to T + i * (access- 838 unit-duration). This is the non-interleaved case, where the frames 839 are consecutive in decoding order. Note that the AU-Index field 840 (present for the first Access Unit) is indeed not needed in this 841 calculation. 843 3.2.3.3 Constraints for interleaving 845 The size of the packets should be suitably chosen to be appropriate 846 to both the path MTU and the capacity of the receiver's 847 de-interleave buffer. The maximum packet size for a session SHOULD 848 be chosen not to exceed the path MTU. 850 To allow receivers to allocate sufficient resources for 851 de-interleaving, senders MUST provide the information to receivers 852 as specified in this section. 854 AUs enter the decoder in decoding order. The de-interleave buffer 855 is used to re-order a stream of interleaved AUs back into decoding 856 order. When interleaving is applied, the decoding of "early" AUs 857 has to be postponed until all AUs that precede in decoding order 858 are present. Therefore these "early" AUs are stored in the 859 de-interleave buffer. As an example in figure 6 the interleaving 860 pattern from section 2.5 is considered. 862 +--+--+--+--+--+--+--+--+--+--+--+- 863 Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|.. 864 +--+--+--+--+--+--+--+--+--+--+--+- 865 Storage of "early" AUs 3 3 3 3 3 3 866 6 6 6 6 6 6 867 4 4 4 868 7 7 7 869 12 12 871 Figure 6: Storage of "early" AUs in the de-interleave buffer per 872 interleaved AU. 874 AU(3) is to be delivered to the decoder after AU(0), AU(1)and 875 AU(2); of these AUs, AU(2) is most late and hence AU(3) needs to be 876 stored until AU(2) is present in the pattern. Similarly, AU(6) is 877 to be stored until AU(5) is present, while AU(4) and AU(7) are to 878 be stored until AU(2) and AU(5) are present, respectively. Note 879 that the fullness of the de-interleave buffer varies in time. In 880 figure 6, the de-interleave buffer contains at most 4, but often 881 less AUs. 883 So as to give a rough indication of the resources needed in the 884 receiver for de-interleaving, the maximum displacement in time of 885 an AU is defined. For any AU in the pattern it can be verified 887 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 889 which AUs are not yet present. The maximum displacement in time of 890 an AU is the maximum difference between the time stamp of an AU in 891 the pattern and the time stamp of the earliest AU that is not yet 892 present. In other words, when considering a sequence of interleaved 893 AUs, then: 895 Maximum displacement = max{TS(i) - TS(j)}, for any i and any j>i, 897 where i and j indicate the index of the AU in the 898 interleaving pattern and TS denotes the time stamp 899 of the AU 901 As an example in figure 7 the interleaving pattern from section 2.5 902 is considered. For each AU in the pattern the earliest not yet 903 present AU is indicated. A "-" indicates that all previous AUs 904 are present. If the AU period is constant, the maximum displacement 905 equals 5 AU periods, as found for AU(6) and AU(7). 907 +--+--+--+--+--+--+--+--+--+--+--+- 908 Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|.. 909 +--+--+--+--+--+--+--+--+--+--+--+- 911 Earliest not yet present AU - 1 1 - 2 2 - - - - 10 913 Figure 7: The earliest not yet present AU for each AU in the 914 interleaving pattern. 916 When interleaving, senders MUST signal the maximum displacement 917 in time during the session via the MIME format parameter 918 "maxDisplacement"; see section 4.1. 920 An estimate of the size of the de-interleave buffer is found by 921 multiplying the maximum displacement by the maximum bit rate: 923 size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP 924 clock frequency), 926 where Rate(max) is the maximum bit-rate of the transported stream. 928 Note that receivers can derive Rate(max) from the MIME format 929 parameters streamType, profile-level-id, and config. 931 However, this calculation estimates the size of the de-interleave 932 buffer and the really required size may differ from the calculated 933 value. If this calculation under-estimates the size of the 934 de-interleave buffer, then senders, when interleaving, MUST signal 935 a size of the de-interleave buffer via the MIME format parameter 936 "de-interleaveBufferSize"; see section 4.1. If the calculation 938 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 940 over-estimates the size of the de-interleave buffer, then senders, 941 when interleaving, MAY signal a size of the de-interleave buffer 942 via the MIME format parameter "de-interleaveBufferSize". 944 The signaled size of the de-interleave buffer MUST be large enough 945 to contain all "early" AUs at any point in time during the session, 946 that is: 948 minimum de-interleave buffer size = max [sum {if TS(i) > TS(j) then 949 AU-size(i) else 0}] for any j 950 and any i /[/] 1036 For audio streams, specifies the number of 1037 audio channels: 2 for stereo material (see RFC 2327 [5]) and 1 for 1038 mono. Provided no additional parameters are needed, this parameter 1039 may be omitted for mono material, hence its default value is 1. 1041 3.3.2 The generic mode 1043 The generic mode can be used for any MPEG-4 stream. In this mode 1044 no mode-specific constraints are applied; hence, in the generic 1045 mode the full flexibility of this specification can be exploited. 1046 The generic mode is signaled by mode=generic. 1048 An example is given below for transport of a BIFS-Anim stream. In 1049 this example carriage of multiple BIFS-Anim Access Units is allowed 1050 in one RTP packet. The AU-header contains the AU-size field, the 1051 CTS-flag and, if the CTS flag is set to 1, the CTS-delta field. The 1052 number of bits of the AU-size and the CTS-delta fields is 10 and 1053 16, respectively. The AU-header also contains the RAP-flag and the 1054 Stream-state of 4 bits. This results in an AU-header with a 1055 total size of two or four octets per BIFS-Anim AU. The RTP time 1056 stamp uses a 1 kHz clock. Note that the media type name is video, 1057 because the BIFS-Anim stream is part of an audio-visual 1058 presentation. For conventions on media type names see section 4.1. 1060 In detail: 1061 m=video 49230 RTP/AVP 96 1062 a=rtpmap:96 mpeg4-generic/1000 1063 a=fmtp:96 streamtype=3; profile-level-id=1807; mode=generic; 1064 objectType=2; config=0842237F24001FB400094002C0; sizeLength=10; 1065 CTSDeltaLength=16; randomAccessIndication=1; 1066 streamStateIndication=4 1068 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1069 a single line in the SDP file. 1071 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1073 The hexadecimal value of the "config" parameter is the 1074 BIFSConfiguration() as defined in ISO/IEC 14496-1. The 1075 BIFSConfiguration() specifies that the BIFS stream is a BIFS-Anim 1076 stream. For the description of MIME parameters see section 4.1. 1078 3.3.3 Constant bit-rate CELP 1080 This mode is signaled by mode=CELP-cbr. In this mode one or more 1081 complete CELP frames of fixed size can be transported in one RTP 1082 packet; interleaving MUST NOT be used with this mode. The RTP 1083 payload consists of one or more concatenated CELP frames, each of 1084 the same size. CELP frames MUST NOT be fragmented when using this 1085 mode. Both the AU Header Section and the Auxiliary Section MUST be 1086 empty. 1088 The MIME format parameter constantSize MUST be provided to specify 1089 the length of each CELP frame. 1091 For example: 1093 m=audio 49230 RTP/AVP 96 1094 a=rtpmap:96 mpeg4-generic/16000/1 1095 a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-cbr; config= 1096 440E00; constantSize=27; constantDuration=240 1098 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1099 a single line in the SDP file. 1101 The hexadecimal value of the "config" parameter is the 1102 AudioSpecificConfig()as defined in ISO/IEC 14496-3. 1103 AudioSpecificConfig() specifies a mono CELP stream with a sampling 1104 rate of 16 kHz at a fixed bitrate of 14.4 kb/s and 6 sub-frames per 1105 CELP frame. For the description of MIME parameters see section 4.1. 1107 3.3.4 Variable bit-rate CELP 1109 This mode is signaled by mode=CELP-vbr. With this mode one or more 1110 complete CELP frames of variable size can be transported in one RTP 1111 packet with OPTIONAL interleaving. As CELP frames are very small, 1112 while the largest possible AU-size in this mode is greater than the 1113 maximum CELP frame size, there is no support for fragmentation of 1114 CELP frames. Hence CELP frames MUST NOT be fragmented when using 1115 this mode. 1117 In this mode the RTP payload consists of the AU Header Section, 1118 followed by one or more concatenated CELP frames. The Auxiliary 1119 Section MUST be empty. For each CELP frame contained in the payload 1120 there MUST be a one octet AU-header in the AU Header Section to 1121 provide: 1122 (a) the size of each CELP frame in the payload and 1123 (b) index information for computing the sequence (and hence timing) 1124 of each CELP frame. 1126 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1128 Transport of CELP frames requires that the AU-size field is coded 1129 with 6 bits. In this mode therefore 6 bits are allocated to the 1130 AU-size field, and 2 bits to the AU-Index(-delta) field. Each 1131 AU-Index field MUST be coded with the value 0. In the AU Header 1132 Section, the concatenated AU-headers are preceded by the 16-bit 1133 AU-headers-length field, as specified in section 3.2.1. 1135 In addition to the required MIME format parameters, the following 1136 parameters MUST be present: sizeLength, indexLength, and 1137 indexDeltaLength. CELP frames have fixed time duration per Access 1138 Unit; when interleaving in this mode, the applicable duration MUST 1139 be signaled by the MIME format parameter constantDuration. In 1140 addition, the parameter maxDisplacement MUST be present when 1141 interleaving. 1143 For example: 1145 m=audio 49230 RTP/AVP 96 1146 a=rtpmap:96 mpeg4-generic/16000/1 1147 a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-vbr; config= 1148 440F20; sizeLength=6; indexLength=2; indexDeltaLength=2; 1149 constantDuration=160; maxDisplacement=5 1151 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1152 a single line in the SDP file. 1154 The hexadecimal value of the "config" parameter is the 1155 AudioSpecificConfig()as defined in ISO/IEC 14496-3. 1156 AudioSpecificConfig() specifies a mono CELP stream with a sampling 1157 rate of 16 kHz at a bitrate that varies between 13.9 and 16.2 kb/s 1158 and with 4 sub-frames per CELP frame. For the description of MIME 1159 parameters see section 4.1. 1161 3.3.5 Low bit-rate AAC 1163 This mode is signaled by mode=AAC-lbr. This mode supports transport 1164 of one or more complete AAC frames of variable size. In this mode 1165 the AAC frames are allowed to be interleaved and hence receivers 1166 MUST support de-interleaving. The maximum size of an AAC frame in 1167 this mode is 63 octets. AAC frames MUST NOT be fragmented when 1168 using this mode. Hence, when using this mode, encoders MUST ensure 1169 that the size of each AAC frame is at most 63 octets. 1171 The payload configuration in this mode is the same as in the 1172 variable bit-rate CELP mode as defined in 3.3.4. The RTP payload 1173 consists of the AU Header Section, followed by concatenated AAC 1174 frames. The Auxiliary Section MUST be empty. For each AAC frame 1175 contained in the payload the one octet AU-header MUST provide: 1176 (a) the size of each AAC frame in the payload and 1177 (b) index information for computing the sequence (and hence timing) 1178 of each AAC frame. 1180 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1182 In the AU-header Section, the concatenated AU-headers MUST be 1183 preceded by the 16-bit AU-headers-length field, as specified in 1184 section 3.2.1. 1186 In addition to the required MIME format parameters, the following 1187 parameters MUST be present: sizeLength, indexLength, and 1188 indexDeltaLength. AAC frames have fixed time duration per Access 1189 Unit; when interleaving in this mode, the applicable duration MUST 1190 be signaled by the MIME format parameter constantDuration. In 1191 addition, the parameter maxDisplacement MUST be present when 1192 interleaving. 1194 For example: 1196 m=audio 49230 RTP/AVP 96 1197 a=rtpmap:96 mpeg4-generic/22050/1 1198 a=fmtp:96 streamtype=5; profile-level-id=14; mode=AAC-lbr; config= 1199 1388; sizeLength=6; indexLength=2; indexDeltaLength=2; 1200 constantDuration=1024; maxDisplacement=5 1202 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1203 a single line in the SDP file. 1205 The hexadecimal value of the "config" parameter is the 1206 AudioSpecificConfig() as defined in ISO/IEC 14496-3. 1207 AudioSpecificConfig() specifies a mono AAC stream with a sampling 1208 rate of 22.05 kHz. For the description of MIME parameters see 1209 section 4.1. 1211 3.3.6 High bit-rate AAC 1213 This mode is signaled by mode=AAC-hbr. This mode supports transport 1214 of variable size AAC frames. In one RTP packet either one or more 1215 complete AAC frames are carried, or a single fragment of an AAC 1216 frame. In this mode the AAC frames are allowed to be interleaved 1217 and hence receivers MUST support de-interleaving. The maximum size 1218 of an AAC frame in this mode is 8191 octets. 1220 In this mode the RTP payload consists of the AU Header Section, 1221 followed by either one AAC frame, several concatenated AAC frames 1222 or one fragmented AAC frame. The Auxiliary Section MUST be empty. 1223 For each AAC frame contained in the payload there MUST be an 1224 AU-header in the AU Header Section to provide: 1225 (a) the size of each AAC frame in the payload and 1226 (b) index information for computing the sequence (and hence timing) 1227 of each AAC frame. 1229 To code the maximum size of an AAC frame requires 13 bits. Therefore 1230 in this configuration 13 bits are allocated to the AU-size, and 1231 3 bits to the AU-Index(-delta) field. Thus each AU-header has a size 1232 of 2 octets. Each AU-Index field MUST be coded with the value 0. In 1233 the AU Header Section, the concatenated AU-headers MUST be preceded 1235 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1237 by the 16-bit AU-headers-length field, as specified in 1238 section 3.2.1. 1240 In addition to the required MIME format parameters, the following 1241 parameters MUST be present: sizeLength, indexLength, and 1242 indexDeltaLength. AAC frames have fixed time duration per Access 1243 Unit; when interleaving in this mode, the applicable duration MUST 1244 be signaled by the MIME format parameter constantDuration. In 1245 addition, the parameter maxDisplacement MUST be present when 1246 interleaving. 1248 For example: 1250 m=audio 49230 RTP/AVP 96 1251 a=rtpmap:96 mpeg4-generic/48000/6 1252 a=fmtp:96 streamtype=5; profile-level-id=16; mode=AAC-hbr; 1253 config=11B0; sizeLength=13; indexLength=3; 1254 indexDeltaLength=3; constantDuration=1024 1256 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1257 a single line in the SDP file. 1259 The hexadecimal value of the "config" parameter is the 1260 AudioSpecificConfig() as defined in ISO/IEC 14496-3. 1261 AudioSpecificConfig() specifies a 5.1 channel AAC stream with a 1262 sampling rate of 48 kHz. For the description of MIME parameters see 1263 section 4.1. 1265 3.3.7 Additional modes 1267 This specification only defines the modes specified in sections 1268 3.3.2 up to 3.3.6. Additional modes are expected to be defined in 1269 future RFCs. Each additional mode MUST be in full compliance with 1270 this specification. 1272 Any new mode MUST be defined such that an implementation including 1273 all the features of this specification can decode the payload format 1274 corresponding to this new mode. For this reason a mode MUST NOT 1275 specify new default values for MIME parameters. In particular, MIME 1276 parameters that configure the RTP payload MUST be present (unless 1277 they have the default value), even if its presence is redundant in 1278 case the mode assigns a fixed value to a parameter. A mode may 1279 define additionally that some MIME parameters are required instead 1280 of optional, that some MIME parameters have fixed values (or 1281 ranges), and that there are rules restricting the usage. 1283 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1285 4. IANA considerations 1287 This section describes the MIME types and names associated with 1288 this payload format. Section 4.1 registers the MIME types, as per 1289 RFC 2048 [3]. 1291 This format may require additional information about the mapping to 1292 be made available to the receiver. This is done using parameters 1293 also described in the next section. 1295 4.1 MIME type registration 1297 MIME media type name: "video" or "audio" or "application" 1299 "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2) 1300 or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information 1301 needed for an audio/visual presentation. 1303 "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3) 1304 or MPEG-4 Systems streams that convey information needed for an 1305 audio only presentation. 1307 "application" MUST be used for MPEG-4 Systems streams (ISO/IEC 1308 14496-1) that serve purposes other than audio/visual presentation, 1309 e.g. in some cases when MPEG-J (Java) streams are transmitted. 1311 Depending on the required payload configuration, MIME format 1312 parameters need to be available to the receiver. This is done using 1313 the parameters described in the next section. There are required 1314 and optional parameters. 1316 Optional parameters are of two types: general parameters and 1317 configuration parameters. The configuration parameters are used to 1318 configure the fields in the AU Header section and in the auxiliary 1319 section. The absence of any configuration parameter is equivalent to 1320 the associated field set to its default value, which is always zero. 1321 The absence of all configuration parameters resolves into a default 1322 "basic" configuration with an empty AU-header section and an empty 1323 auxiliary section in each RTP packet. 1325 MIME subtype name: mpeg4-generic 1327 Required parameters: 1329 MIME format parameters are not case dependent; however for clarity 1330 both upper and lower case are used in the names of the parameters 1331 described in this specification. 1333 streamType: 1334 The integer value that indicates the type of MPEG-4 stream that 1335 is carried; its coding corresponds to the values of the 1336 streamType as defined in Table 9 (streamType Values) in ISO/IEC 1337 14496-1. 1339 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1341 profile-level-id: 1342 A decimal representation of the MPEG-4 Profile Level indication. 1343 This parameter MUST be used in the capability exchange or 1344 session set-up procedure to indicate the MPEG-4 Profile and Level 1345 combination of which the relevant MPEG-4 media codec is capable 1346 of. 1347 For MPEG-4 Audio streams, this parameter is the decimal value 1348 from Table 5 (audioProfileLevelIndication Values) in ISO/IEC 1349 14496-1, indicating which MPEG-4 Audio tool subsets are 1350 required to decode the audio stream. 1351 For MPEG-4 Visual streams, this parameter is the decimal value 1352 from Table G-1 (FLC table for profile and level indication) of 1353 ISO/IEC 14496-2, indicating which MPEG-4 Visual tool subsets 1354 are required to decode the visual stream. 1355 For BIFS streams, this parameter is the decimal value that is 1356 obtained from (SPLI + 256*GPLI), where: 1357 SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with 1358 the applied sceneProfileLevelIndication; 1359 GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with 1360 the applied graphicsProfileLevelIndication. 1361 For MPEG-J streams, this parameter is the decimal value from 1362 table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1, 1363 indicating the profile and level of the MPEG-J stream. 1364 For OD streams, this parameter is the decimal value from table 3 1365 (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the 1366 profile and level of the OD stream. 1367 For IPMP streams, this parameter has either the decimal value 0, 1368 indicating an unspecified profile and level, or a value larger 1369 than zero, indicating an MPEG-4 IPMP profile and level as 1370 defined in a future MPEG-4 specification. 1371 For Clock Reference streams and Object Content Info streams, this 1372 parameter has the decimal value zero, indicating that profile 1373 and level information is conveyed through the OD framework. 1375 config: 1376 A hexadecimal representation of an octet string that expresses 1377 the media payload configuration. Configuration data is mapped 1378 onto the hexadecimal octet string in an MSB-first basis. The 1379 first bit of the configuration data SHALL be located at the MSB 1380 of the first octet. In the last octet, if necessary to achieve 1381 octet-alignment, up to 7 zero-valued padding bits shall follow 1382 the configuration data. 1383 For MPEG-4 Audio streams, config is the audio object type 1384 specific decoder configuration data AudioSpecificConfig() as 1385 defined in ISO/IEC 14496-3. For Structured Audio, the 1386 AudioSpecificConfig() may be conveyed by other means, not 1387 defined by this specification. If the AudioSpecificConfig() 1388 is conveyed by other means for Structured Audio, then the 1389 config MUST be a quoted empty hexadecimal octet string, as 1390 follows: config="". 1391 Note that a future mode of using this RTP payload format for 1392 Structured Audio may define such other means. 1394 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1396 For MPEG-4 Visual streams, config is the MPEG-4 Visual 1397 configuration information as defined in subclause 6.2.1 Start 1398 codes of ISO/IEC 14496-2. The configuration information 1399 indicated by this parameter SHALL be the same as the 1400 configuration information in the corresponding MPEG-4 Visual 1401 stream, except for first-half-vbv-occupancy and 1402 latter-half-vbv-occupancy, if it exists, which may vary in 1403 the repeated configuration information inside an MPEG-4 1404 Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2). 1405 For BIFS streams, this is the BIFSConfig() information as defined 1406 in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in 1407 section 9.3.5.2, and for version 2 in section 9.3.5.3. The 1408 MIME format parameter objectType signals the version of 1409 BIFSConfig. 1410 For IPMP streams, this is either a quoted empty hexadecimal octet 1411 string, indicating the absence of any decoder configuration 1412 information (config=""), or the IPMPConfiguration() as 1413 defined in a future MPEG-4 IPMP specification. 1414 For Object Content Info (OCI) streams, this is the 1415 OCIDecoderConfiguration() information of the OCI stream, as 1416 defined in section 8.4.2.4 in ISO/IEC 14496-1. 1417 For OD streams, Clock Reference streams and MPEG-J streams, this 1418 is a quoted empty hexadecimal octet string (config=""), as 1419 no information on the decoder configuration is required. 1421 mode: 1422 The mode in which this specification is used. The following modes 1423 can be signaled: 1424 mode=generic, 1425 mode=CELP-cbr, 1426 mode=CELP-vbr, 1427 mode=AAC-lbr and 1428 mode=AAC-hbr. 1429 Other modes are expected to be defined in future RFCs. See also 1430 section 3.3.7 and 4.2 of RFC xxxx. 1432 Optional general parameters: 1434 objectType: 1435 The decimal value from Table 8 in ISO/IEC 14496-1, indicating 1436 the value of the objectTypeIndication of the transported stream. 1437 For BIFS streams this parameter MUST be present to signal the 1438 version of BIFSConfiguration(). Note that objectTypeIndication 1439 may signal a non-MPEG-4 stream and that the RTP payload format 1440 defined in this document may not be suitable to carry a stream 1441 that is not defined by MPEG-4. The objectType parameter SHOULD 1442 NOT be set to a value that signals a stream that cannot be 1443 carried by this payload format. 1445 constantSize: 1446 The constant size in octets of each Access Unit for this stream. 1447 The constantSize and the sizeLength parameters MUST NOT be 1448 simultaneously present. 1450 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1452 constantDuration: 1453 The constant duration of each Access Unit for this stream, 1454 measured with the same units as the RTP time stamp. 1456 maxDisplacement: 1457 The decimal representation of the maximum displacement in time 1458 of an interleaved AU, as defined in section 3.2.3.3, expressed 1459 in units of the RTP time stamp clock. 1460 This parameter MUST be present when interleaving is applied. 1462 de-interleaveBufferSize: 1463 The decimal representation in number of octets of the size of 1464 the de-interleave buffer, described in section 3.2.3.3. 1465 When interleaving, this parameter MUST be present if the 1466 calculation of the de-interleave buffer size given in 3.2.3.3 1467 and based on maxDisplacement and rate(max) under-estimates the 1468 size of the de-interleave buffer. If this calculation does not 1469 under-estimate the size of the de-interleave buffer, then the 1470 de-interleaveBufferSize parameter SHOULD NOT be present. 1472 Optional configuration parameters: 1474 sizeLength: 1475 The number of bits on which the AU-size field is encoded in the 1476 AU-header. The sizeLength and the constantSize parameters MUST 1477 NOT be simultaneously present. 1479 indexLength: 1480 The number of bits on which the AU-Index is encoded in the first 1481 AU-header. The default value of zero indicates the absence of 1482 the AU-Index field in each first AU-header. 1484 indexDeltaLength: 1485 The number of bits on which the AU-Index-delta field is encoded 1486 in any non-first AU-header. The default value of zero indicates 1487 the absence of the AU-Index-delta field in each non-first 1488 AU-header. 1490 CTSDeltaLength: 1491 The number of bits on which the CTS-delta field is encoded in 1492 the AU-header. 1494 DTSDeltaLength: 1495 The number of bits on which the DTS-delta field is encoded in 1496 the AU-header. 1498 randomAccessIndication: 1499 A decimal value of zero or one, indicating whether the RAP-flag 1500 is present in the AU-header. The decimal value of one indicates 1501 presence of the RAP-flag, the default value zero its absence. 1503 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1505 streamStateIndication: 1506 The number of bits on which the Stream-state field is encoded in 1507 the AU-header. This parameter MAY be present when transporting 1508 MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio 1509 and MPEG-4 video streams. 1511 auxiliaryDataSizeLength: 1512 The number of bits that is used to encode the auxiliary-data-size 1513 field. 1515 Applications MAY use more parameters, in addition to those defined 1516 above. Each additional parameter MUST be registered with IANA, to 1517 ensure that there is no clash of names. Each additional parameter 1518 MUST be accompanied by a specification in the form of an RFC, MPEG 1519 standard, or other permanent and readily available reference (the 1520 "Specification Required" policy defined in RFC 2434 [6]). Receivers 1521 MUST tolerate the presence of such additional parameters, but these 1522 parameters SHALL NOT impact the decoding of receivers that comply to 1523 this specification. 1525 Encoding considerations: 1526 This MIME subtype is defined for RTP transport only. System 1527 bitstreams MUST be generated according to MPEG-4 Systems 1528 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 1529 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 1530 bitstreams MUST be generated according to MPEG-4 Audio 1531 specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized 1532 according to the RTP payload format defined in RFC xxxx. 1534 Security considerations: 1535 As defined in section 5 of RFC xxxx. 1537 Interoperability considerations: 1538 MPEG-4 provides a large and rich set of tools for the coding of 1539 visual objects. For effective implementation of the standard, 1540 subsets of the MPEG-4 tool sets have been provided for use in 1541 specific applications. These subsets, called 'Profiles', limit the 1542 size of the tool set a decoder is required to implement. In order to 1543 restrict computational complexity, one or more 'Levels' are set for 1544 each Profile. A Profile@Level combination allows: 1545 . a codec builder to implement only the subset of the standard he 1546 needs, while maintaining interworking with other MPEG-4 devices 1547 that implement the same combination, and 1548 . checking whether MPEG-4 devices comply with the standard 1549 ('conformance testing'). 1551 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1552 by the parameter "profile-level-id". Interoperability between a 1553 sender and a receiver is achieved by specifying the parameter 1554 "profile-level-id" in MIME content. In the capability exchange / 1555 announcement procedure this parameter may mutually be set to the 1556 same value. 1558 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1560 Published specification: 1561 The specifications for MPEG-4 streams are presented in ISO/IEC 1562 14496-1, 14496-2, and 14496-3. The RTP payload format is described 1563 in RFC xxxx. 1565 Applications which use this media type: 1566 Multimedia streaming and conferencing tools. 1568 Additional information: none 1570 Magic number(s): none 1572 File extension(s): 1573 None. A file format with the extension .mp4 has been defined for 1574 MPEG-4 content but is not directly correlated with this MIME type 1575 for which the sole purpose is RTP transport. 1577 Macintosh File Type Code(s): none 1579 Person & email address to contact for further information: 1580 Authors of RFC xxxx, IETF Audio/Video Transport working group. 1582 Intended usage: COMMON 1584 Author/Change controller: 1585 Authors of RFC xxxx, IETF Audio/Video Transport working group. 1587 4.2 Registration of mode definitions with IANA 1589 This specification can be used in a number of modes. The mode of 1590 operation is signaled using the "mode" MIME parameter, with the 1591 initial set of values specified in section 4.1. New modes may be 1592 defined at any time, as described in section 3.3.7. These modes 1593 MUST be registered with IANA, to ensure that there is no clash 1594 of names. 1596 A new mode registration MUST be accompanied by a specification in 1597 the form of an RFC, MPEG standard, or other permanent and readily 1598 available reference (the "Specification Required" policy defined 1599 in RFC 2434 [6]). 1601 4.3 Concatenation of parameters 1603 Multiple parameters SHOULD be expressed as a MIME media type string, 1604 in the form of a semicolon-separated list of parameter=value pairs 1605 (for parameter usage examples see sections 3.3.2 up to 3.3.6). 1607 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1609 4.4 Usage of SDP 1611 4.4.1 The a=fmtp keyword 1613 It is assumed that one typical way to transport the above-described 1614 parameters associated with this payload format is via a SDP message 1615 [5] for example transported to the client in reply to a RTSP 1616 DESCRIBE [8] or via SAP [11]. In that case the (a=fmtp) keyword 1617 MUST be used as described in RFC 2327 [5], section 6, the syntax 1618 being then: 1620 a=fmtp: =[; =] 1622 5. Security Considerations 1624 RTP packets using the payload format defined in this specification 1625 are subject to the security considerations discussed in the RTP 1626 specification [2]. This implies that confidentiality of the media 1627 streams is achieved by encryption. Because the data compression used 1628 with this payload format is applied end-to-end, encryption may be 1629 performed on the compressed data so there is no conflict between the 1630 two operations. The packet processing complexity of this payload 1631 type (i.e. excluding media data processing) does not exhibit any 1632 significant non-uniformity in the receiver side to cause a denial- 1633 of-service threat. 1635 However, it is possible to inject non-compliant MPEG streams (Audio, 1636 Video, and Systems) to overload the receiver/decoder's buffers, 1637 which might compromise the functionality of the receiver or even 1638 crash it. This is especially true for end-to-end systems like MPEG 1639 where the buffer models are precisely defined. 1641 MPEG-4 Systems supports stream types including commands that are 1642 executed on the terminal like OD commands, BIFS commands, etc. and 1643 programmatic content like MPEG-J (Java(TM) Byte Code) and MPEG-4 1644 scripts. It is possible to use one or more of the above in a 1645 manner non-compliant to MPEG to crash the receiver or make it 1646 temporarily unavailable. Senders that transport MPEG-4 content 1647 SHOULD ensure that such content is MPEG compliant, as defined in the 1648 compliance part of IEC/ISO 14496 [1]. Receivers that support MPEG-4 1649 content should prevent malfunctioning of the receiver in case of 1650 non MPEG compliant content. 1652 Authentication mechanisms can be used to validate the sender and 1653 the data to prevent security problems due to non-compliant malignant 1654 MPEG-4 streams. 1656 In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems 1657 streams carrying MPEG-J access units which comprise Java(TM) classes 1658 and objects. MPEG-J defines a set of Java APIs and a secure 1659 execution model. MPEG-J content can call this set of APIs and 1660 Java(TM) methods from a set of Java packages supported in the 1662 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1664 receiver within the defined security model. According to this 1665 security model, downloaded byte code is forbidden to load libraries, 1666 define native methods, start programs, read or write files, or read 1667 system properties. 1668 Receivers can implement intelligent filters to validate the buffer 1669 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1670 MPEG-4 scripts) commands in the streams. However, this can increase 1671 the complexity significantly. 1673 Implementors of MPEG-4 streaming over RTP who also implement MPEG-4 1674 scripts (subset of ECMAScript) MUST ensure that the action of such 1675 scripts is limited solely to the domain of the single presentation 1676 in which they reside (thus disallowing session to session 1677 communication, access to local resources and storage, etc). Though 1678 loading static network-located resources (such as media) into the 1679 presentation should be permitted, network access by scripts MUST be 1680 restricted to such (media) download. 1682 6. Acknowledgements 1684 This document evolved through several revisions thanks to 1685 contributions by people from the ISMA forum, from the IETF AVT 1686 Working Group and from the 4-on-IP ad-hoc group within MPEG. The 1687 authors wish to thank all involved people, and in particular Andrea 1688 Basso, Stephen Casner, M. Reha Civanlar, Carsten Herpel, John 1689 Lazaro, Zvi Lifshitz, Young-kwon Lim, Alex MacAulay, Bill May, 1690 Colin Perkins, Dorairaj V and Stephan Wenger for their valuable 1691 comments and support. 1693 7. References 1695 7.1 Normative references 1697 [1] ISO/IEC International Standard 14496 (MPEG-4); "Information 1698 technology - Coding of audio-visual objects", January 2000 1700 [2] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson RTP, "A 1701 Transport Protocol for Real Time Applications", RFC 1889, Internet 1702 Engineering Task Force, January 1996. 1704 [3] N. Freed, J. Klensin, J. Postel, " Multipurpose Internet Mail 1705 Extensions (MIME) Part Four: Registration Procedures", RFC 2048, 1706 Internet Engineering Task Force, November 1996. 1708 [4] S. Bradner, "Key words for use in RFCs to Indicate Requirement 1709 Levels", RFC 2119, March 1997. 1711 [5] M. Handley, V. Jacobson, "SDP: Session Description Protocol", 1712 RFC 2327, Internet Engineering Task Force, April 1998. 1714 [6] T. Narten, H. Alvestrand, " Guidelines for Writing an IANA 1715 Considerations Section in RFCs", RFC 2434, October 1998. 1717 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1719 7.2 Informative references 1721 [7] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload 1722 format for MPEG1/MPEG2 Video", RFC 2250, January 1998. 1724 [8] H. Schulzrinne, A. Rao, R. Lanphier, "RTSP: Real-Time Session 1725 Protocol", RFC 2326, Internet Engineering Task Force, April 1998. 1727 [9] C. Perkins, O. Hodson, "Options for Repair of Streaming Media" 1728 RFC 2354, Internet Engineering Task Force, June 1998. 1730 [10] H. Schulzrinne, J. Rosenberg, "An RTP Payload Format for 1731 Generic Forward Error Correction", RFC 2733, Internet Engineering 1732 Task Force, December 1999. 1734 [11] M. Handley, C. Perkins, E. Whelan, "SAP: Session Announcement 1735 Protocol", RFC 2974, Internet Engineering Task Force, October 2000. 1737 [12] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP 1738 payload format for MPEG-4 Audio/Visual streams", RFC 3016, Internet 1739 Engineering Task Force, November 2000. 1741 8. Author Addresses 1743 Jan van der Meer 1744 Philips Electronics, MP4Net 1745 Prof Holstlaan 4 1746 Building WDB-1 1747 5600 JZ Eindhoven 1748 Netherlands 1749 Email : jan.vandermeer@philips.com 1751 David Mackie 1752 Apple Computer, Inc. 1753 One Infinite Loop, MS:302-2LF 1754 Cupertino CA 95014 1755 Email: dmackie@apple.com 1757 Viswanathan Swaminathan 1758 Sun Microsystems Inc. 1759 901 San Antonio Road, M/S UMPK15-214 1760 Palo Alto, CA 94303 1761 Email: viswanathan.swaminathan@sun.com 1763 David Singer 1764 Apple Computer, Inc. 1765 One Infinite Loop, MS:302-3MT 1766 Cupertino CA 95014 1767 Email: singer@apple.com 1769 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1771 Philippe Gentric 1772 Philips Electronics, MP4Net 1773 51 rue Carnot 1774 92156 Suresnes 1775 France 1776 e-mail: philippe.gentric@philips.com 1778 Full Copyright Statement 1780 Copyright (C) The Internet Society (August 2003). All Rights 1781 Reserved. 1783 This document and translations of it may be copied and furnished to 1784 others, and derivative works that comment on or otherwise explain 1785 it or assist in its implementation may be prepared, copied, 1786 published and distributed, in whole or in part, without restriction 1787 of any kind, provided that the above copyright notice and this 1788 paragraph are included on all such copies and derivative works. 1789 However, this document itself may not be modified in any way, such 1790 as by removing the copyright notice or references to the Internet 1791 Society or other Internet organizations, except as needed for the 1792 purpose of developing Internet standards in which case the 1793 procedures for copyrights defined in the Internet Standards process 1794 MUST be followed, or as required to translate it into languages 1795 other than English. 1797 The limited permissions granted above are perpetual and will 1798 not be revoked by the Internet Society or its successors or 1799 assigns. 1801 This document and the information contained herein is provided on 1802 an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 1803 ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR 1804 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1805 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1806 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1808 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1810 APPENDIX: Usage of this payload format 1812 Appendix A. Interleave analysis 1814 A.1 Introduction 1816 In this appendix interleaving issues are discussed. Some general 1817 notes are provided on de-interleaving and error concealment, while 1818 a number of interleaving patterns are examined, in particular 1819 for determining the maximum displacement in time and the size of 1820 the de-interleave buffer. In these examples, the maximum 1821 displacement is cited in terms of an access unit count, for ease of 1822 reading. In actual streams, it is signaled in units of the RTP 1823 time stamp clock. 1825 A.2 De-interleaving and error concealment 1827 This appendix does not describe any details on de-interleaving and 1828 error concealment, as the control of the AU decoding and error 1829 concealment process has little to do with interleaving. If the 1830 next AU to be decoded is present and there is sufficient storage 1831 available for the decoded AU, then decode it now. If not, wait. 1832 When the decoding deadline is reached (i.e., the time when decoding 1833 must begin in order to be completed by the time the AU is to be 1834 presented), or if the decoder is some hardware that presents a 1835 constant delay between initiation of decoding of an AU and 1836 presentation of that AU, then decoding must begin at that deadline 1837 time. 1839 If the next AU to be decoded is not present when the decoding 1840 deadline is reached, then that AU is lost so the receiver must take 1841 whatever error concealment measures is deemed appropriate. The 1842 play-out delay may need to be adjusted at that point (especially if 1843 other AUs have also missed their deadline recently). Or, if it was 1844 a momentary delay, and maintaining the latency is important, then 1845 the receiver should minimize the glitch and continue processing 1846 with the next AU. 1848 A.3 Simple Group interleave 1850 A.3.1 Introduction 1852 An example of regular interleave is when packets are formed into 1853 groups. If the 'stride' of the interleave (the distance between 1854 interleaved AUs) is N, packet 0 could contain AU(0), AU(N), AU(2N), 1855 and so on; packet 1 could contain AU(1), AU(1+N), AU(1+2N), and so 1856 on. If there are M access units in a packet, then there are M*N 1857 access units in the group. 1859 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1861 An example with N=M=3 follows; note that this is the same example 1862 as given in section 2.5 and that a fixed time duration per Access 1863 Unit is assumed: 1865 Packet Time stamp Carried AUs AU-Index, AU-Index-delta 1866 P(0) T[0] 0, 3, 6 0, 2, 2 1867 P(1) T[1] 1, 4, 7 0, 2, 2 1868 P(2) T[2] 2, 5, 8 0, 2, 2 1869 P(3) T[9] 9,12,15 0, 2, 2 1871 In this example the AU-Index is present in the first AU-header and 1872 coded with the value 0, as required for fixed duration AUs. The 1873 position of the first AU of each packet within the group is defined 1874 by the RTP time stamp, while the AU-Index-delta field indicates the 1875 position of subsequent AUs relative to the first AU in the packet. 1876 All AU-Index-delta fields are coded with the value N-1, equal to 2 1877 in this example. Hence the RTP time stamp and the AU-Index-delta are 1878 used to reconstruct the original order. See also section 3.2.3.2. 1880 A.3.2 Determining the de-interleave buffer size 1882 For the regular pattern as in this example, figure 6 in section 1883 3.2.3.3 shows that the de-interleave buffer stores at most 4 AUs. A 1884 de-interleaveBufferSize value may be signaled that is at least 1885 equal to the total number of octets of any 4 "early" AUs that are 1886 stored at the same time. 1888 A.3.3 Determining the maximum displacement 1890 For the regular pattern as in this example, figure 7 in section 3.3 1891 shows that the maximum displacement in time equals 5 AU periods. 1892 Hence the minimum maxDisplacement value that must be signaled is 5 1893 AU periods. In case each AU has the same size, this maxDisplacement 1894 value over-estimates the de-interleave buffer size with one AU. 1895 However, note that in case of variable AU sizes the total size of 1896 any 4 "early" AUs that must be stored at the same time may exceed 1897 maxDisplacement times the maximum bitrate, in which case the 1898 de-interleaveBufferSize must be signaled. 1900 A.4 More subtle group interleave 1902 A.4.1 Introduction 1904 Another example of forming packets with group interleave is given 1905 below. In this example the packets are formed such that the loss of 1906 two subsequent RTP packets does not cause the loss of two subsequent 1907 AUs. Note that in this example the RTP time stamps of packet 3 and 1908 packet 4 are earlier than the RTP time stamps of packets 1 and 2, 1909 respectively; a fixed time duration per Access Unit is assumed. 1911 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1913 Packet Time stamp Carried AUs AU-Index, AU-Index-delta 1914 0 T[0] 0, 5 0, 4 1915 1 T[2] 2, 7 0, 4 1916 2 T[4] 4, 9 0, 4 1917 3 T[1] 1, 6 0, 4 1918 4 T[3] 3, 8 0, 4 1919 5 T[10] 10, 15 0, 4 1920 and so on .. 1922 In this example the AU-Index is present in the first AU-header and 1923 coded with the value 0, as required for AUs with a fixed duration. 1924 To reconstruct the original order, the RTP time stamp and the 1925 AU-Index-delta (coded with the value 4) are used. See also 1926 section 3.2.3.2. 1928 A.4.2 Determining the de-interleave buffer size 1930 From figure 8 it can be to determined that at most 5 "early" AUs 1931 are to be stored. If the AUs are of constant size, then this value 1932 equals 5 times the AU size. The minimum size of the de-interleave 1933 buffer equals the maximum total number of octets of the "early" AUs 1934 that are to be stored at the same time. This gives the minimum 1935 value of the de-interleaveBufferSize that may be signaled. 1937 +--+--+--+--+--+--+--+--+--+--+ 1938 Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| 1939 +--+--+--+--+--+--+--+--+--+--+ 1940 - - 5 - 5 - 2 7 4 9 1941 7 4 9 5 1942 "Early" AUs 5 6 1943 7 7 1944 9 9 1946 Figure 8: Storage of "early" AUs in the de-interleave buffer per 1947 interleaved AU. 1949 A.4.3 Determining the maximum displacement 1951 From figure 9 it can be seen that the maximum displacement in time 1952 equals 8 AU periods. Hence the minimum maxDisplacement value to be 1953 signaled is 8 AU periods. 1955 +--+--+--+--+--+--+--+--+--+--+ 1956 Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| 1957 +--+--+--+--+--+--+--+--+--+--+ 1959 Earliest not yet present AU - 1 1 1 1 1 - 3 - - 1961 Figure 9: The earliest not yet present AU for each AU in the 1962 interleaving pattern. 1964 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 1966 In case each AU has the same size, the found maxDisplacement value 1967 over-estimates the de-interleave buffer size with three AUs. 1968 However, in case of variable AU sizes the total size of any 5 1969 "early" AUs stored at the same time may exceed maxDisplacement 1970 times the maximum bitrate, in which case de-interleaveBufferSize 1971 must be signaled. 1973 A.5 Continuous interleave 1975 A.5.1 Introduction 1977 In continuous interleave, once the scheme is 'primed', the number 1978 of AUs in a packet exceeds the 'stride' (the distance between 1979 them). This shortens the buffering needed, smooths the data-flow, 1980 and gives slightly larger packets -- and thus lower overhead -- for 1981 the same interleave. For example, here is a continuous interleave 1982 also over a stride of 3 AUs, but with 4 AUs per packet, for a run 1983 of 20 AUs. This shows both how the scheme 'starts up' and how it 1984 finishes. Once again, the example assumes fixed time duration per 1985 Access Unit. 1987 Packet Time-stamp Carried AUs AU-Index, AU-Index-delta 1988 0 T[0] 0 0 1989 1 T[1] 1 4 0 2 1990 2 T[2] 2 5 8 0 2 2 1991 3 T[3] 3 6 9 12 0 2 2 2 1992 4 T[7] 7 10 13 16 0 2 2 2 1993 5 T[11] 11 14 17 20 0 2 2 2 1994 6 T[15] 15 18 0 2 1995 7 T[19] 19 0 1997 In this example the AU-Index is present in the first AU-header and 1998 coded with the value 0, as required for AUs with a fixed duration. 1999 To reconstruct the original order, the RTP time stamp and the 2000 AU-Index-delta (coded with the value 2) are used. See also 3.2.3.2. 2001 Note that this example has RTP time-stamps in increasing order. 2003 A.5.2 Determining the de-interleave buffer size 2005 For this example the de-interleave buffer size can be derived from 2006 figure 10. The maximum number of "early" AUs is three. If the AUs 2007 are of constant size, then this value equals 3 times the AU size. 2008 Compared to the example in A.2, for constant size AUs the 2009 de-interleave buffer size is reduced from 4 to 3 times the AU size, 2010 while maintaining the same 'stride'. 2012 RFC xxxx Transport of MPEG-4 Elementary Streams August 2003 2014 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 2015 Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16| 2016 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 2017 - - - 4 - - 4 8 - - 8 12 - - 2018 5 9 2019 "Early" AUs 8 12 2021 Figure 10: Storage of "early" AUs in the de-interleave buffer per 2022 interleaved AU. 2024 A.5.3 Determining the maximum displacement 2026 For this example the maximum displacement has a value of 5 AU 2027 periods. See figure 11. Compared to the example in A.2, the maximum 2028 displacement does not decrease, though in fact less de-interleave 2029 buffering is required. 2031 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 2032 Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16| 2033 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 2034 Earliest not yet 2035 present AU - - 2 - 3 3 - - 7 7 - - 11 11 2037 Figure 11: The earliest not yet present AU for each AU in the 2038 interleaving pattern.