idnits 2.17.1 draft-ietf-avt-mpeg4-simple-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 39 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The "Author's Address" (or "Authors' Addresses") section title is misspelled. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: This mode is signaled by mode=CELP-cbr. In this mode one or more complete CELP frames of fixed size can be transported in one RTP packet; there is no support for interleaving. The RTP payload consists of one or more concatenated CELP frames, each of the same size. CELP frames MUST not be fragmented when using this mode. Both the AU Header Section and the Auxiliary Section MUST be empty. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: This mode is signaled by mode=CELP-vbr. With this mode one or more complete CELP frames of variable size can be transported in one RTP packet with optional interleaving. As CELP frames are very small, while the largest possible AU-size in this mode is greater than the maximum CELP frame size, there is no support for fragmentation of CELP frames. Hence CELP frames MUST not be fragmented when using this mode. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: This mode is signaled by mode=AAC-lbr. This mode supports transport of one or more complete AAC frames of variable size. In this mode the AAC frames are allowed to be interleaved and hence receivers MUST support de-interleaving. The maximum size of an AAC frame in this mode is 63 octets. CELP frames MUST not be fragmented when using this mode. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2003) is 7620 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 1875, but not defined == Missing Reference: '9' is mentioned on line 1777, but not defined == Missing Reference: '10' is mentioned on line 1817, but not defined == Missing Reference: '11' is mentioned on line 1880, but not defined == Missing Reference: '15' is mentioned on line 1881, but not defined == Missing Reference: '19' is mentioned on line 1882, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '5') (Obsoleted by RFC 6416) ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566) ** Downref: Normative reference to an Experimental RFC: RFC 2974 (ref. '7') ** Obsolete normative reference: RFC 2326 (ref. '8') (Obsoleted by RFC 7826) Summary: 8 errors (**), 0 flaws (~~), 12 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force J. van der Meer 2 Internet Draft Philips Electronics 3 D. Mackie 4 Apple Computer 5 V. Swaminathan 6 Sun Microsystems Inc. 7 D. Singer 8 Apple Computer 9 P. Gentric 10 Philips Electronics 12 December 2002 13 Expires June 2003 15 Document: draft-ietf-avt-mpeg4-simple-05.txt 17 Transport of MPEG-4 Elementary Streams 19 Status of this Memo 21 This document is an Internet-Draft and is in full conformance with 22 all provisions of section 10 of RFC2026. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. Internet-Drafts are draft documents valid for a maximum of 28 six months and may be updated, replaced, or obsoleted by other 29 documents at any time. It is inappropriate to use Internet- Drafts 30 as reference material or to cite them other than as "work in 31 progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This specification is a product of the Audio/Video Transport working 39 group within the Internet Engineering Task Force. Comments are 40 solicited and should be addressed to the working group's mailing 41 list at avt@ietf.org and/or the authors. 43 << Note for the RFC editor: xxxx should be replaced with the RFC 44 number that will be assigned. >> 46 Abstract 48 The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in 49 ISO that produced the MPEG-4 standard. MPEG defines tools to 50 compress content such as audio-visual information into elementary 51 streams. This specification defines a simple, but generic RTP 52 payload format for transport of any non-multiplexed MPEG-4 53 elementary stream. 55 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Carriage of MPEG-4 elementary streams over RTP . . . . . . . 6 61 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 6 62 2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . . 6 63 2.3. Concatenation of Access Units . . . . . . . . . . . . . . 6 64 2.4. Fragmentation of Access Units . . . . . . . . . . . . . . 7 65 2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . . 7 66 2.6. Time stamp information . . . . . . . . . . . . . . . . . . 8 67 2.7. State indication of MPEG-4 system streams . . . . . . . . 8 68 2.8. Random Access Indication . . . . . . . . . . . . . . . . . 8 69 2.9. Carriage of auxiliary information . . . . . . . . . . . . 9 70 2.10. MIME format parameters and configuring conditional field . 9 71 2.11. Global structure of payload format . . . . . . . . . . . . 9 72 2.12. Modes to transport MPEG-4 streams . . . . . . . . . . . . 10 73 2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . 10 74 3. Payload format . . . . . . . . . . . . . . . . . . . . . . . 11 75 3.1. Usage of RTP header fields and RTCP . . . . . . . . . . . 11 76 3.2. RTP payload structure . . . . . . . . . . . . . . . . . . 12 77 3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . 12 78 3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . 12 79 3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . 14 80 3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . 15 81 3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . 16 82 3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 16 83 3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . 17 84 3.3. Usage of this specification . . . . . . . . . . . . . . . 21 85 3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . 21 86 3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . 21 87 3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . 22 88 3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . 22 89 3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . 23 90 3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . 24 91 3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . 25 92 4. IANA considerations . . . . . . . . . . . . . . . . . . . . 26 93 4.1. MIME type registration . . . . . . . . . . . . . . . . . . 26 94 4.2. Registration of mode definitions with IANA . . . . . . . . 31 95 4.3. Concatenation of parameters . . . . . . . . . . . . . . . 31 96 4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . . 32 97 4.4.1. The a=fmtp keyword . . . . . . . . . . . . . . . . . . . 32 98 5. Security considerations . . . . . . . . . . . . . . . . . . 32 99 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33 100 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 101 8. Author addresses . . . . . . . . . . . . . . . . . . . . . . 34 103 APPENDIX: Usage of this payload format . . . . . . . . . . . 36 104 A. Examples of delay analysis with interleave . . . . . . . 36 105 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 36 106 A.2 De-interleaving and error concealment . . . . . . . . . 36 108 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 110 A.3 Simple Group interleave . . . . . . . . . . . . . . . . 36 111 A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 36 112 A.3.2 Determining the de-interleave buffer size . . . . . . 37 113 A.3.3 Determining the maximum displacement . . . . . . . . . 37 114 A.4 More subtle group interleave . . . . . . . . . . . . . . 37 115 A.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 37 116 A.4.2 Determining the de-interleave buffer size . . . . . . 38 117 A.4.3 Determining the maximum displacement . . . . . . . . . 38 118 A.5 Continuous interleave . . . . . . . . . . . . . . . . . 38 119 A.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 38 120 A.5.2 Determining the de-interleave buffer size . . . . . . 39 121 A.5.3 Determining the maximum displacement . . . . . . . . . 39 123 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 125 1. Introduction 127 The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 128 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 129 standards [1]. The MPEG-4 standard specifies compression of 130 audio-visual data into for example an audio or video elementary 131 stream. In the MPEG-4 standard, these streams take the form of 132 audio-visual objects that may be arranged into an audio-visual scene 133 by means of a scene description. Each MPEG-4 elementary stream 134 consists of a sequence of Access Units; examples of an Access Unit 135 (AU) are an audio frame and a video picture. 137 This specification defines a general and configurable payload 138 structure to transport MPEG-4 elementary streams, in particular 139 MPEG-4 audio (including speech) streams, MPEG-4 video streams and 140 also MPEG-4 systems streams, such as BIFS (BInary Format for 141 Scenes), OCI (Object Content Information), OD (Object Descriptor) 142 and IPMP (Intellectual Property Management and Protection) streams. 143 The RTP payload defined in this document is simple to implement and 144 reasonably efficient. It allows for optional interleaving of Access 145 Units (such as audio frames) to increase error resiliency in packet 146 loss. 148 Some types of MPEG-4 elementary streams include "crucial" 149 information whose loss cannot be tolerated, but RTP does not provide 150 reliable transmission so receipt of that crucial information is not 151 assured. Section 3.2.3.4 specifies how stream state is conveyed so 152 that the receiver can detect the loss of crucial information and 153 cease decoding until the next random access point is received. 154 Applications transmitting streams that include crucial information, 155 such as OD commands, BIFS commands, or programmatic content such as 156 MPEG-J (Java) and ECMAScript, should include random access points 157 sufficiently often, depending upon the probability of loss, to 158 reduce stream corruption to an acceptable level. An example is the 159 carousel mechanism as defined by MPEG in ISO/IEC 14496-1. 161 Such applications may also employ additional protocols or services 162 to reduce the probability of loss. At the RTP layer, these measures 163 include payload formats and profiles for retransmission or forward 164 error correction (such as in RFC 2733), which must be employed with 165 due consideration to congestion control. Another solution that may 166 be appropriate for some applications is to carry RTP over TCP (such 167 as in RFC 2326, section 10.12). At the network layer, resource 168 allocation or preferential service may be available to reduce the 169 probability of loss. For a general description of methods to repair 170 streaming media see RFC 2354. 172 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 174 Though the RTP payload format defined in this document is capable 175 of transporting any MPEG-4 stream, other, more specific, formats 176 may exist, such as RFC 3016 for transport of MPEG-4 video (part 2). 178 Configuration of the payload is provided to accommodate transport 179 of any MPEG-4 stream at any possible bit rate. However, for a 180 specific MPEG-4 elementary stream typically only very few 181 configurations are needed. So as to allow for the design of 182 simplified, but dedicated receivers, this specification requires 183 that specific modes are defined for transport of MPEG-4 streams. 184 This document defines modes for MPEG-4 CELP and AAC streams, as 185 well as a generic mode that can be used to transport any MPEG-4 186 stream. In the future new RFCs are expected to specify additional 187 modes for transport of MPEG-4 streams. 189 The RTP payload format defined in this document specifies carriage 190 of system-related information that is often equivalent to the 191 information that may be contained in the MPEG-4 Sync Layer (SL) as 192 defined in MPEG-4 Systems [1]. This document does not prescribe how 193 to transcode or map information from the SL to fields defined in 194 the RTP payload format. Such processing, if any, is left to the 195 discretion of the application. However, to anticipate the need for 196 transport of any additional system-related information in future, 197 an auxiliary field can be configured that may carry any such data. 199 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 200 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 201 this document are to be interpreted as described in RFC 2119 [3]. 203 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 205 2. Carriage of MPEG-4 elementary streams over RTP 207 2.1 Introduction 209 With this payload format a single MPEG-4 elementary stream can be 210 transported. Information on the type of MPEG-4 stream carried in 211 the payload is conveyed by MIME format parameters, for example in 212 an SDP [6] message or by other means (see section 4). These MIME 213 format parameters specify the configuration of the payload. To 214 allow for simplified and dedicated receivers, a MIME format 215 parameter is available to signal a specific mode of using this 216 payload. A mode definition MAY include the type of MPEG-4 217 elementary stream as well as the applied configuration, so as to 218 avoid the need in receivers to parse all MIME format parameters. 219 The applied mode MUST be signaled. 221 2.2 MPEG Access Units 223 For carriage of compressed audio-visual data MPEG defines Access 224 Units. An MPEG Access Unit (AU) is the smallest data entity to 225 which timing information is attributed. In case of audio an Access 226 Unit may represent an audio frame and in case of video a picture. 227 MPEG Access Units are by definition octet-aligned. If for example 228 an audio frame is not octet-aligned, up to 7 zero-padding bits MUST 229 be inserted at the end of the frame to achieve the octet-aligned 230 Access Units, as required by the MPEG-4 specification. MPEG-4 231 decoders MUST be able to decode AUs in which such padding is 232 applied. 234 Consistent with the MPEG-4 specification, this document requires 235 that each MPEG-4 part 2 video Access Unit includes all the coded 236 data of a picture, any video stream headers that may precede the 237 coded picture data, and any video stream stuffing that may follow 238 it, up to, but not including the startcode indicating the start of 239 a new video stream or the next Access Unit. 241 2.3 Concatenation of Access Units 243 Frequently it is possible to carry multiple Access Units in one RTP 244 packet. This is particularly useful for audio; for example, when 245 AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC 246 frames contain on average approximately 200 octets. On a LAN with a 247 1500 octet MTU this would allow on average 7 complete AAC frames to 248 be carried per AAC packet. 250 Access Units may have a fixed size in octets, but a variable size 251 is also possible. To facilitate parsing in case of multiple 252 concatenated AUs in one RTP packet, the size of each AU is made 253 known to the receiver. When concatenating in case of a constant AU 254 size, this size is communicated "out of band" through a MIME format 255 parameter. When concatenating in case of variable size AUs, the RTP 256 payload carries "in band" an AU size field for each contained AU. 258 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 260 In combination with the RTP payload length the size information 261 allows the RTP payload to be split by the receiver back into the 262 individual AUs. 264 To simplify the implementation of RTP receivers, it is required 265 that when multiple AUs are carried in an RTP packet, each AU MUST 266 be complete, i.e. the number of AUs in an RTP packet MUST be 267 integral. In addition, an AU MUST NOT be repeated in other RTP 268 packets; hence repetition of an AU is only possible by using a 269 duplicate RTP packet. 271 2.4 Fragmentation of Access Units 273 MPEG allows for very large Access Units. Since most IP networks 274 have significantly smaller MTU sizes, this payload format allows 275 for the fragmentation of an Access Unit over multiple RTP packets 276 so as to avoid IP layer fragmentation. To simplify the 277 implementation of RTP receivers, an RTP packet SHALL either carry 278 one or more complete Access Units or a single fragment of one 279 Access Unit (i.e. packets MUST NOT contain fragments of multiple 280 Access Units). 282 2.5 Interleaving 284 When an RTP packet carries a contiguous sequence of Access Units, 285 the loss of such a packet can result in a "decoding gap" for the 286 user. One method to alleviate this problem is to allow for the 287 Access Units to be interleaved in the RTP packets. For a modest 288 cost in latency and implementation complexity, significant error 289 resiliency to packet loss can be achieved. 291 To support optional interleaving of Access Units, this payload 292 format allows for index information to be sent for each Access Unit. 293 After informing receivers about buffer resources to allocate for 294 de-interleaving, the RTP sender is free to choose the interleaving 295 pattern without propagating this information a priori to the 296 receiver(s). Indeed the sender could dynamically adjust the 297 interleaving pattern based on the Access Unit size, error rates, 298 etc. The RTP receiver does not need to know the interleaving 299 pattern used, it only needs to extract the index information of the 300 Access Unit and insert the Access Unit into the appropriate 301 sequence in the decoding or rendering queue. An example of 302 interleaving is given below. 304 Assume that an RTP packet contains 3 AUs, and that the AUs are 305 numbered 0, 1, 2, 3, 4, etc. If an interleaving group length of 9 is 306 chosen, then RTP packet(i) contains the following AU(n): 307 RTP packet(0): AU(0), AU(3), AU(6) 308 RTP packet(1): AU(1), AU(4), AU(7) 309 RTP packet(2): AU(2), AU(5), AU(8) 310 RTP packet(3): AU(9), AU(12), AU(15) 311 RTP packet(4): AU(10), AU(13), AU(16) 312 Etc. 314 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 316 2.6 Time stamp information 318 The RTP time stamp MUST carry the sampling instant of the first AU 319 (fragment) in the RTP packet. When multiple AUs are carried within 320 an RTP packet, the time stamps of subsequent AUs can be calculated 321 if the frame period of each AU is known. For audio and video this 322 is possible if the frame rate is constant. However, in some cases 323 it is not possible to make such calculation, for example for 324 variable frame rate video and for MPEG-4 BIFS streams carrying 325 composition information. To support such cases, this payload format 326 can be configured to carry a time stamp in the RTP payload for each 327 contained Access Unit. A time stamp MAY be conveyed in the RTP 328 payload only for non-first AUs in the RTP packet, and SHALL NOT be 329 conveyed for the first AU (fragment), as the time stamp for the 330 first AU in the RTP packet is carried by the RTP time stamp. 332 MPEG-4 defines two type of time stamps, the composition time stamp 333 (CTS) and the decoding time stamp (DTS). The CTS represents the 334 sampling instant of an AU, and hence the CTS is equivalent to the 335 RTP time stamp. The DTS may be used in MPEG-4 video streams that 336 use bi-directional coding, i.e. when pictures are predicted in both 337 forward and backward direction by using either a reference picture 338 in the past, or a reference picture in the future. The DTS cannot 339 be carried in the RTP header. In some cases the DTS can be derived 340 from the RTP time stamp using frame rate information; this requires 341 deep parsing in the video stream, which may be considered 342 objectionable. But if the video frame rate is variable, the required 343 information may not even be present in the video stream. For both 344 reasons, the capability has been defined to optionally carry the 345 DTS in the RTP payload for each contained Access Unit. 347 To keep the coding of time stamps efficient, each time stamp 348 contained in the RTP payload is coded differentially, the CTS from 349 the RTP time stamp, and the DTS from the CTS. 351 2.7 State indication of MPEG-4 system streams 353 ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to 354 convey state information when transporting MPEG-4 system streams, 355 this payload format allows for the optional carriage in the RTP 356 payload of the stream state for each contained Access Unit. Stream 357 states are used to signal "crucial" AUs that carry information whose 358 loss cannot be tolerated and are also useful when repeating AUs 359 according to the carousel mechanism defined in ISO/IEC 14496-1. 361 2.8 Random access indication 363 Random access to the content of MPEG-4 elementary streams may be 364 possible at some but not all Access Units. To signal Access Units 365 where random access is possible, a random access point flag can 367 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 369 optionally be carried in the RTP payload for each contained Access 370 Unit. Carriage of random access points is particularly useful for 371 MPEG-4 system streams in combination with the stream state. 373 2.9 Carriage of auxiliary information. 375 This payload format defines a specific field to carry auxiliary 376 data. The auxiliary data field is preceded by a field that specifies 377 the length of the auxiliary data, so as to facilitate skipping of 378 the data without parsing it. The coding of the auxiliary data is not 379 defined in this document; instead the format, meaning and signaling 380 of auxiliary information is expected to be specified in one or more 381 future RFCs. Auxiliary information MUST NOT be transmitted until its 382 format, meaning and signaling have been specified and its use has 383 been signaled. Receivers that have knowledge of the auxiliary data 384 MAY decode the auxiliary data, but receivers without knowledge of 385 such data MUST skip the auxiliary data field. 387 2.10 MIME format parameters and configuring conditional fields 389 To support the features described in the previous sections several 390 fields are defined for carriage in the RTP payload. However, their 391 use strongly depends on the type of MPEG-4 elementary stream that 392 is carried. Sometimes a specific field is needed with a certain 393 length, while in other cases such field is not needed at all. To be 394 efficient in either case, the fields to support these features are 395 configurable by means of MIME format parameters. In general, a MIME 396 format parameter defines the presence and length of the associated 397 field. A length of zero indicates absence of the field. As a 398 consequence, parsing of the payload requires knowledge of MIME 399 format parameters. The MIME format parameters are conveyed to the 400 receiver via SDP [6] messages, as specified in section 4.4.1, or 401 through other means. 403 2.11 Global structure of payload format 405 The RTP payload following the RTP header, contains three 406 octet-aligned data sections, of which the first two MAY be empty. 407 See figure 1. 409 +---------+-----------+-----------+---------------+ 410 | RTP | AU Header | Auxiliary | Access Unit | 411 | Header | Section | Section | Data Section | 412 +---------+-----------+-----------+---------------+ 414 <----------RTP Packet Payload-----------> 416 Figure 1: Data sections within an RTP packet 418 The first data section is the AU (Access Unit) Header Section, that 419 contains one or more AU-headers; however, each AU-header MAY be 420 empty, in which case the entire AU Header Section is empty. The 422 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 424 second section is the Auxiliary Section, containing auxiliary data; 425 this section MAY also be configured empty. The third section is the 426 Access Unit Data Section, containing either a single fragment of 427 one Access Unit or one or more complete Access Units. The Access 428 Unit Data Section MUST NOT be empty. 430 2.12 Modes to transport MPEG-4 streams 432 While it is possible to build fully configurable receivers capable 433 of receiving any MPEG-4 stream, this specification also allows for 434 the design of simplified, but dedicated receivers, that are capable 435 for example of receiving only one type of MPEG-4 stream. This 436 is achieved by requiring that specific modes be defined for using 437 this specification. Each mode may define constraints for transport 438 of one or more type of MPEG-4 streams, for instance on the payload 439 configuration. 441 The applied mode MUST be signaled. Signaling the mode is 442 particularly important for receivers that are only capable of 443 decoding one or more specific modes. Such receivers need to 444 determine whether the applied mode is supported, so as to avoid 445 problems with processing of payloads that are beyond the 446 capabilities of the receiver. 448 In this document several modes are defined for transport of MPEG-4 449 CELP and AAC streams, as well as a generic mode that can be used 450 for any MPEG-4 stream. In the future, new RFCs may specify other 451 modes of using this specification. However, each mode MUST be in 452 full compliance with this specification (see section 3.3.7). 454 2.13 Alignment with RFC 3016 456 This payload can be configured to be nearly identical to the 457 payload format defined in RFC 3016 [5] for the MPEG-4 video 458 configurations recommended in RFC 3016. Hence, receivers that 459 comply with RFC 3016 can decode such RTP payload, providing that 460 additional packets containing video decoder configuration (VO, 461 VOL, VOSH) are inserted in the stream, as required by RFC 3016. 462 Conversely, receivers that comply with the specification in this 463 document should be able to decode payloads, names and parameters 464 defined for MPEG-4 video in RFC 3016. In this respect it is 465 strongly RECOMMENDED to implement the ability to ignore "in band" 466 video decoder configuration packets in the RFC 3016 payload. 468 Note the "out of band" availability of the video decoder 469 configuration is optional in RFC 3016. To achieve maximum 470 interoperability with the RTP payload format defined in this 471 document, applications that use RFC 3016 to transport MPEG-4 video 472 (part 2) are recommended to make the video decoder configuration 473 available as a MIME parameter. 475 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 477 3. Payload Format 479 3.1 Usage of RTP Header Fields and RTCP 481 Payload Type (PT): The assignment of an RTP payload type for this 482 packet format is outside the scope of this document; it is 483 specified by the RTP profile under which this payload format is 484 used. 486 Marker (M) bit: The M bit is set to 1 to indicate that the RTP 487 packet payload contains either the final fragment of a fragmented 488 Access Unit or one or more complete Access Units. 490 Extension (X) bit: Defined by the RTP profile used. 492 Sequence Number: The RTP sequence number SHOULD be generated by the 493 sender in the usual manner with a constant random offset. 495 Timestamp: Indicates the sampling instant of the first AU 496 contained in the RTP payload. This sampling instant is equivalent 497 to the CTS in the MPEG-4 time domain. When using SDP the clock rate 498 of the RTP time stamp MUST be expressed using the "rtpmap" 499 attribute. If an MPEG-4 audio stream is transported, the rate SHOULD 500 be set to the same value as the sampling rate of the audio stream. 501 If an MPEG-4 video stream is transported, it is RECOMMENDED to set 502 the rate to 90 kHz. 504 In all cases, the sender SHALL make sure that RTP time stamps 505 are identical only if the RTP time stamp refers to fragments of the 506 same Access Unit. 508 According to RFC 1889 [2] (section 5.1), RTP time stamps are 509 RECOMMENDED to start at a random value for security reasons. This 510 is not an issue for synchronization of multiple RTP streams. When, 511 however, streams from multiple sources are to be synchronized (for 512 example one stream from local storage, another from an RTP streaming 513 server), synchronization may become impossible if the receiver only 514 knows the original time stamp relationships. Synchronization in such 515 cases, may require to provide the correct relationship between time 516 stamps for obtaining synchronization by out of band means. The 517 format of such information as well as methods to convey such 518 information are beyond the scope of this specification. 520 SSRC: set as described in RFC 1889 [2]. 522 CC and CSRC fields are used as described in RFC 1889 [2]. 524 RTCP SHOULD be used as defined in RFC 1889 [2]. Note that time 525 stamps in RTCP Sender Reports may be used to synchronize multiple 527 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 529 MPEG-4 elementary streams and also to synchronize MPEG-4 streams 530 with non-MPEG-4 streams, in case the delivery of these streams uses 531 RTP. 533 3.2 RTP Payload Structure 535 3.2.1 The AU Header Section 537 When present, the AU Header Section consists of the 538 AU-headers-length field, followed by a number of AU-headers. See 539 figure 2. 541 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ 542 |AU-headers-length|AU-header|AU-header| |AU-header|padding| 543 | | (1) | (2) | | (n) | bits | 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ 546 Figure 2: The AU Header Section 548 The AU-headers are configured using MIME format parameters and MAY 549 be empty. If the AU-header is configured empty, the 550 AU-headers-length field SHALL NOT be present and consequently the 551 AU Header Section is empty. If the AU-header is not configured 552 empty, then the AU-headers-length is a two octet field that 553 specifies the length in bits of the immediately following 554 AU-headers, excluding the padding bits. 556 Each AU-header is associated with a single Access Unit (fragment) 557 contained in the Access Unit Data Section in the same RTP packet. 558 For each contained Access Unit (fragment) there is exactly one 559 AU-header. Within the AU Header Section, the AU-headers are 560 bit-wise concatenated in the order in which the Access Units are 561 contained in the Access Unit Data Section. Hence, the n-th 562 AU-header refers to the n-th AU (fragment). If the concatenated 563 AU-headers consume a non-integer number of octets, up to 7 564 zero-padding bits MUST be inserted at the end in order to achieve 565 octet-alignment of the AU Header Section. 567 3.2.1.1 The AU-header 569 Each AU-header may contain the fields given in figure 3. The length 570 in bits of the above fields with the exception of the CTS-flag, the 571 DTS-flag and the RAP-flag fields is defined by MIME format 572 parameters; see section 4.1. If a MIME format parameter has the 573 default value of zero, then the associated field is not present. 575 If present, the fields MUST occur in the mutual order given in 576 figure 3. In the general case a receiver can only discover the size 577 of an AU-header by parsing it since the presence of the CTS-delta 578 and DTS-delta fields is signaled by the value of the CTS-flag and 579 DTS-flag, respectively. 581 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 583 +---------------------------------------+ 584 | AU-size | 585 +---------------------------------------+ 586 | AU-Index / AU-Index-delta | 587 +---------------------------------------+ 588 | CTS-flag | 589 +---------------------------------------+ 590 | CTS-delta | 591 +---------------------------------------+ 592 | DTS-flag | 593 +---------------------------------------+ 594 | DTS-delta | 595 +---------------------------------------+ 596 | RAP-flag | 597 +---------------------------------------+ 598 | Stream-state | 599 +---------------------------------------+ 601 Figure 3: The fields in the AU-header. If used, the AU-Index field 602 only occurs in the first AU-header within an AU Header 603 Section; in any other AU-header the AU-Index-delta field 604 occurs instead. 606 AU-size: Indicates the size in octets of the associated Access Unit 607 in the Access Unit Data Section in the same RTP packet. When 608 the AU-size is associated with an AU fragment, the AU size 609 indicates the size of the entire AU and not the size of the 610 fragment. In this case, the size of the fragment is known 611 from the size of the AU data section. This can be exploited 612 to determine whether a packet contains an entire AU or a 613 fragment, which is particularly useful after losing a packet 614 carrying the last fragment of an AU. 616 AU-Index: Indicates the serial number of the associated Access Unit 617 (fragment). For each (in decoding order) consecutive AU or AU 618 fragment, the serial number is incremented with 1. When 619 present, the AU-Index field occurs in the first AU-header in 620 the AU Header Section, but MUST NOT occur in any subsequent 621 (non-first) AU-header in that Section. To encode the serial 622 number in any such non-first AU-header, the AU-Index-delta 623 field is used. If each AU-Index field is coded with the value 624 0, the serial number of the AU (fragment) is not specified, 625 and in that case receivers may ignore the AU-Index field. 627 AU-Index-delta: The AU-Index-delta field is an unsigned integer 628 that specifies the serial number of the associated AU as the 629 difference with respect to the serial number of the previous 630 Access Unit. Hence, for the n-th (n>1) AU the serial number 631 is found from: 632 AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1 633 If the AU-Index field is present in the first AU-header in 635 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 637 the AU Header Section, then the AU-Index-delta field MUST be 638 present in any subsequent (non-first) AU-header. When the 639 AU-Index-delta is coded with the value 0, it indicates that 640 the Access Units are consecutive in decoding order. An 641 AU-Index-delta value larger than 0 signals that interleaving 642 is applied. 644 CTS-flag: Indicates whether the CTS-delta field is present. 645 A value of 1 indicates that the field is present, a value 646 of 0 that it is not present. 647 The CTS-flag field MUST be present in each AU-header if the 648 length of the CTS-delta field is signaled to be larger than 649 zero. In that case, the CTS-flag field MUST have the value 0 650 in the first AU-header and MAY have the value 1 in all 651 non-first AU-headers. The CTS-flag field SHOULD be 0 for 652 any non-first fragment of an Access Unit. 654 CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's 655 complement offset (delta) from the time stamp in the RTP 656 header of this RTP packet. The CTS MUST use the same clock 657 rate as the time stamp in the RTP header. 659 DTS-flag: Indicates whether the DTS-delta field is present. A value 660 of 1 indicates that DTS-delta is present, a value of 0 that 661 it is not present. 662 The DTS-flag field MUST be present in each AU-header if the 663 length of the DTS-delta field is signaled to be larger than 664 zero. The DTS-flag field MUST have the same value for all 665 fragments of an Access Unit. 667 DTS-delta: Specifies the value of the DTS as a 2's complement 668 offset (delta) from the CTS. The DTS MUST use the 669 same clock rate as the time stamp in the RTP header. The 670 DTS-delta field MUST have the same value for all fragments of 671 an Access Unit. 673 RAP-flag: Indicates when set to 1 that the associated Access Unit 674 provides a random access point to the content of the stream. 675 If an Access Unit is fragmented, the RAP flag, if present, 676 MUST be set to 0 for each non-first fragment of the AU. 678 Stream-state: Specifies the state of the stream for an AU of an 679 MPEG-4 system stream; each state is identified by a value of 680 a modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams 681 use the AU_SequenceNumber to signal stream states. When the 682 stream state changes, the value of stream-state MUST be 683 incremented by one. 685 Note: no relation is required between stream-states of 686 different streams. 688 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 690 3.2.2 The Auxiliary Section 692 The Auxiliary Section consists of the auxiliary-data-size field 693 followed by the auxiliary-data field. Receivers MAY (but are not 694 required to) parse the auxiliary-data field; to facilitate skipping 695 of the auxiliary-data field by receivers, the auxiliary-data-size 696 field indicates the length in bits of the auxiliary-data. If the 697 concatenation of the auxiliary-data-size and the auxiliary-data 698 fields consume a non-integer number of octets, up to 7 zero padding 699 bits MUST be inserted immediately after the auxiliary data in order 700 to achieve octet-alignment. See figure 4. 702 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ 703 | auxiliary-data-size | auxiliary-data |padding bits | 704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ 706 Figure 4: The fields in the Auxiliary Section 708 The length in bits of the auxiliary-data-size field is configurable 709 by a MIME format parameter; see section 4.1. The default length of 710 zero indicates that the entire Auxiliary Section is absent. 712 auxiliary-data-size: specifies the length in bits of the immediately 713 following auxiliary-data field; 715 auxiliary-data: the auxiliary-data field contains data of a format 716 not defined by this specification. 718 3.2.3 The Access Unit Data Section 720 The Access Unit Data Section contains an integer number of complete 721 Access Units or a single fragment of one AU. The Access Unit Data 722 Section is never empty. If data of more than one Access Unit is 723 present, then the AUs are concatenated into a contiguous string 724 of octets. See figure 5. The AUs inside the Access Unit Data 725 Section MUST be in decoding order, though not necessarily contiguous 726 in the case of interleaving. 728 The size and number of Access Units SHOULD be adjusted such that 729 the resulting RTP packet is not larger than the path MTU. To handle 730 larger packets, this payload format relies on lower layers for 731 fragmentation, which may result in reduced performance. 733 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 735 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 736 |AU(1) | 737 + | 738 | | 739 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 740 | |AU(2) | 741 +-+-+-+-+-+-+-+-+ | 742 | | 743 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 744 | | AU(n) | 745 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 746 |AU(n) continued| 747 |-+-+-+-+-+-+-+-+ 749 Figure 5: Access Unit Data Section; each AU is octet-aligned. 751 When multiple Access Units are carried, the size of each AU MUST be 752 made available to the receiver. If the AU size is variable then the 753 size of each AU MUST be indicated in the AU-size field of the 754 corresponding AU-header. However, if the AU size is constant for a 755 stream, this mechanism SHOULD NOT be used, but instead the fixed 756 size SHOULD be signaled by the MIME format parameter 757 "ConstantSize", see section 4.1. 759 The absence of both AU-size in the AU-header and the ConstantSize 760 MIME format parameter indicates carriage of a single AU (fragment), 761 i.e. that a single Access Unit (fragment) is transported in each 762 RTP packet for that stream. 764 3.2.3.1 Fragmentation 766 A packet SHALL carry either one or more complete Access Units, or 767 a single fragment of an Access Unit. Fragments of the same Access 768 Unit have the same time stamp but different RTP sequence numbers. 769 The marker bit in the RTP header is 1 on the last fragment of an 770 Access Unit, and 0 on all other fragments. 772 3.2.3.2 Interleaving 774 Access Units MAY be interleaved. Senders MAY perform interleaving. 775 Receivers MUST support interleaving, except if the receiver only 776 supports modes in which no interleaving is allowed. When 777 interleaving of Access Units is used it SHALL be implemented using 778 the AU-Index and AU-Index-delta fields in the AU-header. 780 Based on the RTP sequence number, the RTP time stamp, the AU-Index 781 and the AU-Index-delta, a receiver can unambiguously reconstruct 782 the original order even in case of out-of-order packets, packet 783 loss or duplication. Note that for this purpose the AU-Index is 785 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 787 redundant when the RTP time stamp and the AU-Index-delta values are 788 sufficient for placing the AUs correctly in time. In such cases 789 receivers MAY ignore the AU-Index value and senders MAY code the 790 AU-Index field with the value 0, but only if they code each AU-Index 791 field with that value. If the AU-Index is not redundant, senders 792 SHOULD use a length of the AU-Index field so that this field is not 793 coded with the value 0 in two subsequent RTP packets. 795 When interleaving is applied, a de-interleave buffer is needed in 796 receivers to put the Access Units in their correct logical 797 consecutive decoding order. This requires the computation of the 798 time stamp for each Access Unit. In case of a fixed time duration 799 per Access Unit, the time stamp of the i-th access unit in an RTP 800 packet with RTP time stamp T is calculated as follows: 802 Timestamp[0] = T 803 Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] 804 + 1))) * access-unit-duration 806 When AU-Index-delta is always 0, this reduces to T + i * (access- 807 unit-duration). This is the non-interleaved case, where the frames 808 are consecutive in decoding order. Note that the AU-Index field 809 (present for the first Access Unit) is not needed in this 810 calculation. Hence in cases where the access-unit-duration has a 811 fixed and known value, the AU-Index does not need to provide index 812 information and can be coded with the value 0. See also the 813 semantics of the AU-Index field in 3.2.1.1. 815 If the Access Units are not fixed duration, the AU-Index is not 816 redundant, and MUST provide the index information required for 817 re-ordering. The number of bits of the AU-Index field MUST be chosen 818 so that valid index information is provided at the applied 819 interleaving scheme, without causing problems due to roll-over of 820 the AU-Index field. Note that the CTS-delta may be required to 821 compute the correct time stamp for each AU. 823 3.2.3.3 Constraints for interleaving 825 The size of the packets should be suitably chosen to be appropriate 826 to both the path MTU and the capacity of the receiver's 827 de-interleave buffer. The maximum packet size for a session SHOULD 828 be chosen not to exceed the path MTU. 830 To allow receivers to allocate sufficient resources for 831 de-interleaving, senders MUST provide the information to receivers 832 as specified in this section. 834 AUs enter the decoder in decoding order. The de-interleave buffer 835 is used to re-order a stream of interleaved AUs back into decoding 836 order. When interleaving is applied, the decoding of "early" AUs 837 has to be postponed until all AUs that precede in decoding order 839 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 841 have been received. Therefore these "early" AUs are stored in the 842 de-interleave buffer. As an example in figure 6 the interleaving 843 pattern from section 2.5 is considered. 845 +--+--+--+--+--+--+--+--+--+--+--+- 846 Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|.. 847 +--+--+--+--+--+--+--+--+--+--+--+- 848 Storage of "early" AUs 3 3 3 3 3 3 849 6 6 6 6 6 6 850 4 4 4 851 7 7 7 852 12 12 854 Figure 6: Storage of "early" AUs in the de-interleave buffer per 855 interleaved AU. 857 AU(3) is to be delivered to the decoder after AU(0), AU(1)and AU(2); 858 of these AUs, AU(2) is most late and hence AU(3) needs to be stored 859 until AU(2) is received. Similarly, AU(6) is to be stored until 860 AU(5) is received, while AU(4) and AU(7) are to be stored until 861 AU(2) and AU(5) are received, respectively. Note that the fullness 862 of the de-interleave buffer varies in time. In figure 6, the 863 de-interleave buffer contains at most 4, but often less AUs. 865 So as to give a rough indication of the resources needed in the 866 receiver for de-interleaving, the maximum displacement in time of 867 an AU is defined. The maximum displacement in time of an AU is the 868 maximum difference between the time stamp of any received AU and 869 the time stamp of the earliest AU that is not yet received. In other 870 words, when considering a sequence of interleaved AUs, then: 872 Maximum displacement = max{TS(i) - TS(j)}, for any i and any j>i, 874 where i and j indicate the index of the AU in the 875 interleaving pattern and 876 TS denotes the time stamp of the AU 878 As an example in figure 7 the interleaving pattern from section 2.5 879 is considered. For each AU in the pattern the earliest not yet 880 received AU is indicated. A "-" indicates that all previous AUs 881 are received. If the AU period is constant, the maximum displacement 882 equals 5 AU periods, as found for AU(6) and AU(7). 884 +--+--+--+--+--+--+--+--+--+--+--+- 885 Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|.. 886 +--+--+--+--+--+--+--+--+--+--+--+- 888 Earliest not yet received AU - 1 1 - 2 2 - - - - 10 890 Figure 7: The earliest not yet received AU for each AU in the 891 interleaving pattern. 893 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 895 When interleaving, senders MUST signal the maximum displacement 896 in time during the session via the MIME format parameter 897 "maxDisplacement"; see section 4.1. 899 An estimate of the size of the de-interleave buffer is found by 900 multiplying the maximum displacement by the maximum bit rate: 902 size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP 903 clock frequency), 905 where Rate(max) is the maximum bit-rate of the transported stream. 907 Note that receivers can derive Rate(max) from the MIME format 908 parameters StreamType, Profile-level-id, and config. 910 However, this calculation estimates the size of the de-interleave 911 buffer and its size may be larger than calculated. If this 912 calculation under-estimates the size of the de-interleave buffer, 913 then senders, when interleaving, MUST signal a size of the 914 de-interleave buffer that is large enough to contain all "early" 915 AUs at any point in time during the session via the MIME format 916 parameter "de-interleaveBufferSize"; see section 4.1. 918 If the "de-interleaveBufferSize" parameter is present, then the 919 applied buffer for de-interleaving in a receiver MUST have a size 920 that is at least equal to the signaled size of the de-interleave 921 buffer, else a size that is at least equal to the calculated size 922 of the de-interleave buffer. 924 No matter what interleaving scheme is used, the scheme must be 925 analyzed to calculate the applicable maxDisplacement value, as well 926 as the required size of the de-interleave buffer. Senders SHOULD 927 signal values that are not larger than the strictly required 928 values; if larger values are signalled, the receiver will buffer 929 excessively. 931 Note that for low bit-rate material, the applied interleaving 932 may make packets shorter than the MTU size. 934 3.2.3.5. Crucial and non-crucial AUs with MPEG-4 System data 936 Some Access Units with MPEG-4 system data, called "crucial" AUs, 937 carry information whose loss cannot be tolerated, either in the 938 presentation or in the decoder. At each crucial AU in an MPEG-4 939 system stream, the stream state changes. The stream-state MAY 940 remain constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4 941 system streams use the AU_SequenceNumber to signal stream states. 943 Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set 944 position of node X", AU3 = "Set position of node X". AU1 is crucial, 945 since if it is lost, AU2 cannot be executed. However, AU2 is not 946 crucial, since AU3 can be executed even if AU2 is lost. 948 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 950 When a crucial AU is (possibly) lost, the stream is corrupted. For 951 example, when an AU is lost and the stream state has changed at the 952 next received AU, then it is possible that the lost AU was crucial. 953 Once corrupted, the stream remains corrupted until the next random 954 access point. Note that loss of non-crucial AUs does not corrupt the 955 stream. When a decoder starts receiving a stream, the decoder MUST 956 consider the stream corrupted until an AU is received that provides 957 a random access point. 959 An AU that provides a random access point, as signaled by the 960 RAP-flag, may be crucial or not. Non-crucial RAP AUs provide a 961 "repeated" random access point for use by decoders that recently 962 joined the stream or that need to re-start decoding after a stream 963 corruption. Non-crucial RAP AUs MUST include all updates since the 964 last crucial RAP AU. 966 Upon receiving AUs, decoders are to react as follows: 967 a) if the RAP-flag is set to 1 and the stream-state changes, then 968 the AU is a crucial RAP AU, and the AU MUST be decoded. 969 b) if the RAP-flag is set to 1 and the stream state does not change, 970 then the AU is a non-crucial RAP AU, and the receiver SHOULD 971 decode it if the stream is corrupted. Otherwise, the decoder MUST 972 ignore the AU. 973 c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless 974 the stream is corrupted, in which case the AU MUST be ignored. 976 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 978 3.3 Usage of this specification 980 3.3.1 General 982 Usage of this specification requires definition of a mode. A mode 983 defines how to use this specification, as deemed appropriate. 984 Senders MUST signal the applied mode via the MIME format parameter 985 "Mode", as specified in section 4.1. This specification defines a 986 generic mode that can be used for any MPEG-4 stream, as well as 987 specific modes for transport of MPEG-4 CELP and MPEG-4 AAC streams, 988 defined in ISO/IEC 14496-3. 990 When use of this payload format is signaled using SDP [6], an 991 "rtpmap" attribute is part of that signaling. The same requirements 992 apply for the rtpmap attribute in any mode compliant to this 993 specification. The general form of an rtpmap attribute is: 994 a=rtpmap: /[/] 996 For audio streams, specifies the number of 997 audio channels: 2 for stereo material (see RFC 2327) and 1 for 998 mono. Provided no additional parameters are needed, this parameter 999 may be omitted for mono material, hence its default value is 1. 1001 3.3.2 The generic mode 1003 The generic mode can be used for any MPEG-4 stream. In this mode 1004 no mode-specific constraints are applied; hence, in the generic 1005 mode the full flexibility of this specification can be exploited. 1006 The generic mode is signaled by mode=generic. 1008 An example is given below for transport of a BIFS stream. In this 1009 example carriage of multiple BIFS Access Units is allowed in one 1010 RTP packet. The AU-header contains the AU-size field, the CTS-flag 1011 and, if the CTS flag is set to 1, the CTS-delta field. The number 1012 of bits of the AU-size and the CTS-delta fields is 10 and 16, 1013 respectively. The AU-header also contains the RAP-flag and the 1014 Stream-state of 4 bits. This results in an AU-header with a 1015 total size of two or four octets per BIFS AU. The RTP time stamp 1016 uses a 1 kHz clock. Note that the media type name is video, 1017 because the BIFS stream is part of an audio-visual presentation. For 1018 conventions on media type names see section 4.1. 1020 In detail: 1022 m=video 49230 RTP/AVP 96 1023 a=rtpmap:96 mpeg4-generic/1000 1024 a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic; 1025 ObjectType=2; config=BIFSConfiguration(); SizeLength=10; 1026 CTSDeltaLength=16; RandomAccessIndication=1; 1027 StreamStateIndication=4 1029 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1031 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1032 a single line in the SDP file. 1033 BIFSConfiguration() is the hexadecimal string as defined in ISO/IEC 1034 14496-1; for the description of MIME parameters see section 4.1. 1036 3.3.3 Constant bit-rate CELP 1038 This mode is signaled by mode=CELP-cbr. In this mode one or more 1039 complete CELP frames of fixed size can be transported in one RTP 1040 packet; there is no support for interleaving. The RTP payload 1041 consists of one or more concatenated CELP frames, each of the same 1042 size. CELP frames MUST not be fragmented when using this mode. Both 1043 the AU Header Section and the Auxiliary Section MUST be empty. 1045 The MIME format parameter ConstantSize MUST be provided to specify 1046 the length of each CELP frame. 1048 For example: 1050 m=audio 49230 RTP/AVP 96 1051 a=rtpmap:96 mpeg4-generic/44100/2 1052 a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config= 1053 AudioSpecificConfig(); ConstantSize=xxx; 1055 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1056 a single line in the SDP file. 1058 AudioSpecificConfig() is the hexadecimal string as defined in 1059 ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio 1060 stream type is CELP. For the description of MIME parameters see 1061 section 4.1. 1063 3.3.4 Variable bit-rate CELP 1065 This mode is signaled by mode=CELP-vbr. With this mode one or more 1066 complete CELP frames of variable size can be transported in one RTP 1067 packet with optional interleaving. As CELP frames are very small, 1068 while the largest possible AU-size in this mode is greater than the 1069 maximum CELP frame size, there is no support for fragmentation of 1070 CELP frames. Hence CELP frames MUST not be fragmented when using 1071 this mode. 1073 In this mode the RTP payload consists of the AU Header Section, 1074 followed by one or more concatenated CELP frames. The Auxiliary 1075 Section MUST be empty. For each CELP frame contained in the payload 1076 there MUST be a one octet AU-header in the AU Header Section to 1077 provide: 1078 (a) the size of each CELP frame in the payload and 1079 (b) index information for computing the sequence (and hence timing) 1080 of each CELP frame. 1081 Transport of CELP frames requires that the AU-size field is coded 1082 with 6 bits. In this mode therefore 6 bits are allocated to the 1084 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1086 AU-size field, and 2 bits to the AU-Index(-delta) field. Each 1087 AU-Index field MUST be coded with the value 0. In the AU Header 1088 Section, the concatenated AU-headers are preceded by the 16-bit 1089 AU-headers-length field, as specified in section 3.2.1. 1091 In addition to the required MIME format parameters, the following 1092 parameters MUST be present: SizeLength, IndexLength, and 1093 IndexDeltaLength. 1094 When interleaving is applied (AU-Index-delta coded with a value 1095 larger than 0), the parameter InterleaveDelay MUST also be present. 1097 For example: 1099 m=audio 49230 RTP/AVP 96 1100 a=rtpmap:96 mpeg4-generic/44100/2 1101 a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config= 1102 AudioSpecificConfig(); SizeLength=6; IndexLength=2; 1103 IndexDeltaLength=2 1105 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1106 a single line in the SDP file. 1108 AudioSpecificConfig() is the hexadecimal string as defined in 1109 ISO/IEC 14496-3, AudioSpecificConfig() specifies that the audio 1110 stream type is CELP. For the description of MIME parameters see 1111 section 4.1. 1113 3.3.5 Low bit-rate AAC 1115 This mode is signaled by mode=AAC-lbr. This mode supports transport 1116 of one or more complete AAC frames of variable size. In this mode 1117 the AAC frames are allowed to be interleaved and hence receivers 1118 MUST support de-interleaving. The maximum size of an AAC frame in 1119 this mode is 63 octets. CELP frames MUST not be fragmented when 1120 using this mode. 1122 The payload configuration in this mode is the same as in the 1123 variable bit-rate CELP mode as defined in 3.3.4. The RTP payload 1124 consists of the AU Header Section, followed by concatenated AAC 1125 frames. The Auxiliary Section MUST be empty. For each AAC frame 1126 contained in the payload the one octet AU-header MUST provide: 1127 (a) the size of each AAC frame in the payload and 1128 (b) index information for computing the sequence (and hence timing) 1129 of each AAC frame. 1130 In the AU-header, the AU-size MUST be coded with 6 bits and the 1131 AU-Index(-delta) with 2 bits; the AU-Index field MUST have the 1132 value 0 in each AU-header. 1133 In the AU-header Section, the concatenated AU-headers MUST be 1134 preceded by the 16-bit AU-headers-length field, as specified in 1135 section 3.2.1. 1137 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1139 In addition to the required MIME format parameters, the following 1140 parameters MUST be present: SizeLength, IndexLength, and 1141 IndexDeltaLength. 1142 When interleaving is applied (AU-Index-delta coded with a value 1143 larger than 0), also the parameter InterleaveDelay MUST be present. 1145 For example: 1147 m=audio 49230 RTP/AVP 96 1148 a=rtpmap:96 mpeg4-generic/44100/2 1149 a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config= 1150 AudioSpecificConfig(); SizeLength=6; IndexLength=2; 1151 IndexDeltaLength=2 1153 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1154 a single line in the SDP file. 1156 AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC 1157 14496-3. AudioSpecificConfig() specifies that the audio 1158 stream type is AAC. For the description of MIME parameters see 1159 section 4.1. 1161 3.3.6 High bit-rate AAC 1163 This mode is signaled by mode=AAC-hbr. This mode supports transport 1164 of variable size AAC frames. In one RTP packet either one or more 1165 complete AAC frames are carried, or a single fragment of an AAC 1166 frame. In this mode the AAC frames are allowed to be interleaved 1167 and hence receivers MUST support de-interleaving. The maximum size 1168 of an AAC frame in this mode is 8191 octets. 1170 In this mode the RTP payload consists of the AU Header Section, 1171 followed by either one AAC frame, several concatenated AAC frames 1172 or one fragmented AAC frame. The Auxiliary Section MUST be empty. 1173 For each AAC frame contained in the payload there MUST be an 1174 AU-header in the AU Header Section to provide: 1175 (a) the size of each AAC frame in the payload and 1176 (b) index information for computing the sequence (and hence timing) 1177 of each AAC frame. 1179 To code the maximum size of an AAC frame requires 13 bits. Therefore 1180 in this configuration 13 bits are allocated to the AU-size, and 1181 3 bits to the AU-Index(-delta) field. Thus each AU-header has a size 1182 of 2 octets. Each AU-Index field MUST be coded with the value 0. In 1183 the AU Header Section, the concatenated AU-headers MUST be preceded 1184 by the 16-bit AU-headers-length field, as specified in section 3.2.1. 1186 In addition to the required MIME format parameters, the following 1187 parameters MUST be present: SizeLength, IndexLength, and 1188 IndexDeltaLength. 1189 When interleaving is applied (AU-Index-delta coded with a value 1190 larger than 0), also the parameter InterleaveDelay MUST be present. 1192 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1194 For example: 1196 m=audio 49230 RTP/AVP 96 1197 a=rtpmap:96 mpeg4-generic/44100/2 1198 a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr; 1199 config=AudioSpecificConfig(); SizeLength=13; IndexLength=3; 1200 IndexDeltaLength=3 1202 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1203 a single line in the SDP file. 1205 AudioSpecificConfig() is the hexadecimal string as defined in 1206 ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio 1207 stream type is AAC. For the description of MIME parameters see 1208 section 4.1. 1210 3.3.7 Additional modes 1212 This specification only defines the modes specified in sections 1213 3.3.2 up to 3.3.6. Additional modes are expected to be defined in 1214 future RFCs. Each additional mode MUST be in full compliance with 1215 this specification. 1217 Any new mode MUST be defined such that an implementation including 1218 all the features of this specification can decode the payload format 1219 corresponding to this new mode. For this reason a mode MUST NOT 1220 specify new default values for MIME parameters. In particular, MIME 1221 parameters that configure the RTP payload MUST be present (unless 1222 they have the default value), even if its presence is redundant in 1223 case the mode assigns a fixed value to a parameter. A mode may 1224 define additionally that some MIME parameters are required instead 1225 of optional, that some MIME parameters have fixed values (or 1226 ranges), and that there are rules restricting the usage. 1228 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1230 4. IANA considerations 1232 This section describes the MIME types and names associated with 1233 this payload format. Section 4.1 registers the MIME types, as per 1234 RFC 2048. 1236 This format may require additional information about the mapping to 1237 be made available to the receiver. This is done using parameters 1238 also described in the next section. 1240 4.1 MIME type registration 1242 MIME media type name: "video" or "audio" or "application" 1244 "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2) 1245 or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information 1246 needed for an audio/visual presentation. 1248 "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3) 1249 or MPEG-4 Systems streams that convey information needed for an 1250 audio only presentation. 1252 "application" MUST be used for MPEG-4 Systems streams (ISO/IEC 1253 14496-1) that serve purposes other than audio/visual presentation, 1254 e.g. in some cases when MPEG-J (Java) streams are transmitted. 1256 Depending on the required payload configuration, MIME format 1257 parameters need to be available to the receiver. This is done using 1258 the parameters described in the next section. There are required 1259 and optional parameters. 1261 Optional parameters are of two types: general parameters and 1262 configuration parameters. The configuration parameters are used to 1263 configure the fields in the AU Header section and in the auxiliary 1264 section. The absence of any configuration parameter is equivalent to 1265 the associated field set to its default value, which is always zero. 1266 The absence of all configuration parameters resolves into a default 1267 "basic" configuration with an empty AU-header section and an empty 1268 auxiliary section in each RTP packet. 1270 MIME subtype name: mpeg4-generic 1272 Required parameters: 1274 MIME format parameters are not case dependent; however for clarity 1275 both upper and lower case are used in the names of the parameters 1276 described in this specification. 1278 StreamType: 1279 The integer value that indicates the type of MPEG-4 stream that 1280 is carried; its coding corresponds to the values of the 1281 streamType as defined in Table 9 (streamType Values) in ISO/IEC 1282 14496-1. 1284 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1286 Profile-level-id: 1287 A decimal representation of the MPEG-4 Profile Level indication. 1288 This parameter MUST be used in the capability exchange or 1289 session set-up procedure to indicate the MPEG-4 Profile and Level 1290 combination of which the relevant MPEG-4 media codec is capable 1291 of. 1292 For MPEG-4 Audio streams, this parameter is the decimal value 1293 from Table 5 (audioProfileLevelIndication Values) in ISO/IEC 1294 14496-1, indicating which MPEG-4 Audio tool subsets are 1295 required to decode the audio stream. 1296 For MPEG-4 Visual streams, this parameter is the decimal value 1297 from Table G-1 (FLC table for profile and level indication) of 1298 ISO/IEC 14496-2, indicating which MPEG-4 Visual tool subsets 1299 are required to decode the visual stream. 1300 For BIFS streams, this parameter is the decimal value that is 1301 obtained from (SPLI + 256*GPLI), where: 1302 SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with 1303 the applied sceneProfileLevelIndication; 1304 GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with 1305 the applied graphicsProfileLevelIndication. 1306 For MPEG-J streams, this parameter is the decimal value from 1307 table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1, 1308 indicating the profile and level of the MPEG-J stream. 1309 For OD streams, this parameter is the decimal value from table 3 1310 (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the 1311 profile and level of the OD stream. 1312 For IPMP streams, this parameter has either the decimal value 0, 1313 indicating an unspecified profile and level, or a value larger 1314 than zero, indicating an MPEG-4 IPMP profile and level as 1315 defined in a future MPEG-4 specification. 1316 For Clock Reference streams and Object Content Info streams, this 1317 parameter has the decimal value zero, indicating that profile 1318 and level information is conveyed through the OD framework. 1320 Config: 1321 A hexadecimal representation of an octet string that expresses 1322 the media payload configuration. Configuration data is mapped 1323 onto the hexadecimal octet string in an MSB-first basis. The 1324 first bit of the configuration data SHALL be located at the MSB 1325 of the first octet. In the last octet, if necessary to achieve 1326 octet-alignment, up to 7 zero-valued padding bits shall follow 1327 the configuration data. 1328 For MPEG-4 Audio streams, config is the audio object type 1329 specific decoder configuration data AudioSpecificConfig() as 1330 defined in ISO/IEC 14496-3. For Structured Audio, the 1331 AudioSpecificConfig() may be conveyed by other means, not 1332 defined by this specification. If the AudioSpecificConfig() 1333 is conveyed by other means for Structured Audio, then the 1334 config MUST be a quoted empty hexadecimal octet string, as 1335 follows: config="". 1336 Note that a future mode of using this RTP payload format for 1337 Structured Audio may define such other means. 1339 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1341 For MPEG-4 Visual streams, config is the MPEG-4 Visual 1342 configuration information as defined in subclause 6.2.1 Start 1343 codes of ISO/IEC 14496-2. The configuration information 1344 indicated by this parameter SHALL be the same as the 1345 configuration information in the corresponding MPEG-4 Visual 1346 stream, except for first-half-vbv-occupancy and 1347 latter-half-vbv-occupancy, if it exists, which may vary in 1348 the repeated configuration information inside an MPEG-4 1349 Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2). 1350 For BIFS streams, this is the BIFSConfig() information as defined 1351 in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in 1352 section 9.3.5.2, and for version 2 in section 9.3.5.3. The 1353 MIME format parameter ObjectType signals the version of 1354 BIFSConfig. 1355 For IPMP streams, this is either a quoted empty hexadecimal octet 1356 string, indicating the absence of any decoder configuration 1357 information (config=""), or the IPMPConfiguration() as 1358 defined in a future MPEG-4 IPMP specification. 1359 For Object Content Info (OCI) streams, this is the 1360 OCIDecoderConfiguration() information of the OCI stream, as 1361 defined in section 8.4.2.4 in ISO/IEC 14496-1. 1362 For OD streams, Clock Reference streams and MPEG-J streams, this 1363 is a quoted empty hexadecimal octet string (config=""), as 1364 no information on the decoder configuration is required. 1366 Mode: 1367 The mode in which this specification is used. The following modes 1368 can be signaled: 1369 mode=generic, 1370 mode=CELP-cbr, 1371 mode=CELP-vbr, 1372 mode=AAC-lbr and 1373 mode=AAC-hbr. 1374 Other modes are expected to be defined in future RFCs. See also 1375 section 3.3.7 and 4.2 of RFC xxxx. 1377 Optional general parameters: 1379 ObjectType: 1380 The decimal value from Table 8 in ISO/IEC 14496-1, indicating 1381 the value of the objectTypeIndication of the transported stream. 1382 For BIFS streams this parameter MUST be present to signal the 1383 version of BIFSConfiguration(). Note that ObjectTypeIndication 1384 may signal a non-MPEG-4 stream and that the RTP payload format 1385 defined in this document may not be suitable to carry a stream 1386 that is not defined by MPEG-4. ObjectType SHOULD NOT be set to 1387 a value that signals a stream that cannot be carried by this 1388 payload format. 1390 ConstantSize: 1391 The constant size in octets of each Access Unit for this stream. 1392 The ConstantSize and the SizeLength parameters MUST NOT be 1393 simultaneously present. 1395 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1397 maxDisplacement: 1398 The decimal representation of the maximum displacement in time 1399 of an interleaved AU, as defined in section 3.2.3.3, expressed 1400 in units of the RTP time stamp clock. 1401 This parameter MUST be present when interleaving is applied. 1403 de-interleaveBufferSize: 1404 The decimal representation in number of octets of the size of 1405 the de-interleave buffer, described in section 3.2.3.3. 1406 When interleaving, this parameter MUST be present if the 1407 calculation of the de-interleave buffer size given in 3.2.3.3 1408 and based on maxDisplacement and rate(max) under-estimates the 1409 size of the de-interleave buffer. If this calculation does not 1410 under-estimate the size of the de-interleave buffer, then the 1411 de-interleaveBufferSize parameter SHOULD NOT be present. 1413 Optional configuration parameters: 1415 SizeLength: 1416 The number of bits on which the AU-size field is encoded in the 1417 AU-header. The SizeLength and the ConstantSize parameters MUST 1418 NOT be simultaneously present. 1420 IndexLength: 1421 The number of bits on which the AU-Index is encoded in the first 1422 AU-header. The default value of zero indicates the absence of 1423 the AU-Index and AU-Index-delta fields in each AU-header. 1425 IndexDeltaLength: 1426 The number of bits on which the AU-Index-delta field is encoded 1427 in any non-first AU-header. 1429 CTSDeltaLength: 1430 The number of bits on which the CTS-delta field is encoded in 1431 the AU-header. 1433 DTSDeltaLength: 1434 The number of bits on which the DTS-delta field is encoded in 1435 the AU-header. 1437 RandomAccessIndication: 1438 A decimal value of zero or one, indicating whether the RAP-flag 1439 is present in the AU-header. The decimal value of one indicates 1440 presence of the RAP-flag, the default value zero its absence. 1442 StreamStateIndication: 1443 The number of bits on which the Stream-state field is encoded in 1444 the AU-header. This parameter MAY be present when transporting 1445 MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio 1446 and MPEG-4 video streams. 1448 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1450 AuxiliaryDataSizeLength: 1451 The number of bits that is used to encode the auxiliary-data-size 1452 field. 1454 Applications MAY use more parameters, in addition to those defined 1455 above. Each additional parameter MUST be registered with IANA, to 1456 ensure that there is no clash of names. Each additional parameter 1457 MUST be accompanied by a specification in the form of an RFC, MPEG 1458 standard, or other permanent and readily available reference (the 1459 "Specification Required" policy defined in RFC 2434). Receivers MUST 1460 tolerate the presence of such additional parameters, but these 1461 parameters SHALL NOT impact the decoding of receivers that comply to 1462 this specification. 1464 Encoding considerations: 1465 This MIME subtype is defined for RTP transport only. System 1466 bitstreams MUST be generated according to MPEG-4 Systems 1467 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 1468 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 1469 bitstreams MUST be generated according to MPEG-4 Audio 1470 specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized 1471 according to the RTP payload format defined in RFC xxxx. 1473 Security considerations: 1474 As defined in section 5 of RFC xxxx. 1476 Interoperability considerations: 1477 MPEG-4 provides a large and rich set of tools for the coding of 1478 visual objects. For effective implementation of the standard, 1479 subsets of the MPEG-4 tool sets have been provided for use in 1480 specific applications. These subsets, called 'Profiles', limit the 1481 size of the tool set a decoder is required to implement. In order to 1482 restrict computational complexity, one or more 'Levels' are set for 1483 each Profile. A Profile@Level combination allows: 1484 . a codec builder to implement only the subset of the standard he 1485 needs, while maintaining interworking with other MPEG-4 devices 1486 that implement the same combination, and 1487 . checking whether MPEG-4 devices comply with the standard 1488 ('conformance testing'). 1490 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1491 by the parameter "profile-level-id". Interoperability between a 1492 sender and a receiver is achieved by specifying the parameter 1493 "profile-level-id" in MIME content. In the capability exchange / 1494 announcement procedure this parameter may mutually be set to the 1495 same value. 1497 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1499 Published specification: 1500 The specifications for MPEG-4 streams are presented in ISO/IEC 1501 14496-1, 14496-2, and 14496-3. The RTP payload format is described 1502 in RFC xxxx. 1504 Applications which use this media type: 1505 Multimedia streaming and conferencing tools. 1507 Additional information: none 1509 Magic number(s): none 1511 File extension(s): 1512 None. A file format with the extension .mp4 has been defined for 1513 MPEG-4 content but is not directly correlated with this MIME type 1514 for which the sole purpose is RTP transport. 1516 Macintosh File Type Code(s): none 1518 Person & email address to contact for further information: 1519 Authors of RFC xxxx, IETF Audio/Video Transport working group. 1521 Intended usage: COMMON 1523 Author/Change controller: 1524 Authors of RFC xxxx, IETF Audio/Video Transport working group. 1526 4.2 Registration of mode definitions with IANA 1528 This specification can be used in a number of modes. The mode of 1529 operation is signaled using the "Mode" MIME parameter, with the 1530 initial set of values specified in section 4.1. New modes may be 1531 defined at any time, as described in section 3.3.7. These modes 1532 MUST be registered with IANA, to ensure that there is no clash 1533 of names. 1535 A new mode registration MUST be accompanied by a specification in 1536 the form of an RFC, MPEG standard, or other permanent and readily 1537 available reference (the "Specification Required" policy defined 1538 in RFC 2434). 1540 4.3 Concatenation of parameters 1542 Multiple parameters SHOULD be expressed as a MIME media type string, 1543 in the form of a semicolon-separated list of parameter=value pairs 1544 (for parameter usage examples see sections 3.3.2 up to 3.3.6). 1546 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1548 4.4 Usage of SDP 1550 4.4.1 The a=fmtp keyword 1552 It is assumed that one typical way to transport the above-described 1553 parameters associated with this payload format is via a SDP message 1554 [6] for example transported to the client in reply to a RTSP 1555 DESCRIBE [8] or via SAP [7]. In that case the (a=fmtp) keyword MUST 1556 be used as described in RFC 2327 [6], section 6, the syntax being 1557 then: 1559 a=fmtp: =[; =] 1561 5. Security Considerations 1563 RTP packets using the payload format defined in this specification 1564 are subject to the security considerations discussed in the RTP 1565 specification [2]. This implies that confidentiality of the media 1566 streams is achieved by encryption. Because the data compression used 1567 with this payload format is applied end-to-end, encryption may be 1568 performed on the compressed data so there is no conflict between the 1569 two operations. The packet processing complexity of this payload 1570 type (i.e. excluding media data processing) does not exhibit any 1571 significant non-uniformity in the receiver side to cause a denial- 1572 of-service threat. 1574 However, it is possible to inject non-compliant MPEG streams (Audio, 1575 Video, and Systems) to overload the receiver/decoder's buffers, 1576 which might compromise the functionality of the receiver or even 1577 crash it. This is especially true for end-to-end systems like MPEG 1578 where the buffer models are precisely defined. 1580 MPEG-4 Systems supports stream types including commands that are 1581 executed on the terminal like OD commands, BIFS commands, etc. and 1582 programmatic content like MPEG-J (Java(TM) Byte Code) and 1583 ECMAScript. It is possible to use one or more of the above in a 1584 manner non-compliant to MPEG to crash the receiver or make it 1585 temporarily unavailable. Senders that transport MPEG-4 content 1586 SHOULD ensure that such content is MPEG compliant, as defined in the 1587 compliance part of IEC/ISO 14496 [1]. Receivers that support MPEG-4 1588 content should prevent malfunctioning of the receiver in case of 1589 non MPEG compliant content. 1591 Authentication mechanisms can be used to validate the sender and 1592 the data to prevent security problems due to non-compliant malignant 1593 MPEG-4 streams. 1595 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1597 In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems 1598 streams carrying MPEG-J access units which comprise Java(TM) classes 1599 and objects. MPEG-J defines a set of Java APIs and a secure 1600 execution model. MPEG-J content can call this set of APIs and 1601 Java(TM) methods from a set of Java packages supported in the 1602 receiver within the defined security model. According to this 1603 security model, downloaded byte code is forbidden to load libraries, 1604 define native methods, start programs, read or write files, or read 1605 system properties. 1606 Receivers can implement intelligent filters to validate the buffer 1607 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1608 ECMAScript) commands in the streams. However, this can increase the 1609 complexity significantly. 1611 6. Acknowledgements 1613 This document evolved through several revisions thanks to 1614 contributions by people from the ISMA forum, from the IETF AVT 1615 Working Group and from the 4-on-IP ad-hoc group within MPEG. The 1616 authors wish to thank all involved people, and in particular Andrea 1617 Basso, Stephen Casner, M. Reha Civanlar, Carsten Herpel, John 1618 Lazaro, Zvi Lifshitz, Young-kwon Lim, Alex MacAulay, Bill May, 1619 Colin Perkins, Dorairaj V and Stephan Wenger for their valuable 1620 comments and support. 1622 7. References 1624 [1] ISO/IEC International Standard 14496 (MPEG-4); "Information 1625 technology - Coding of audio-visual objects", January 2000 1627 [2] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson RTP, "A 1628 Transport Protocol for Real Time Applications", RFC 1889, Internet 1629 Engineering Task Force, January 1996. 1631 [3] S. Bradner, "Key words for use in RFCs to Indicate Requirement 1632 Levels", RFC 2119, March 1997. 1634 [4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload 1635 format for MPEG1/MPEG2 Video", RFC 2250, January 1998. 1637 [5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP 1638 payload format for MPEG-4 Audio/Visual streams", RFC 3016. 1640 [6] M. Handley, V. Jacobson, "SDP: Session Description Protocol", 1641 RFC 2327, Internet Engineering Task Force, April 1998. 1643 [7] M. Handley, C. Perkins, E. Whelan, "SAP: Session Announcement 1644 Protocol", RFC 2974, Internet Engineering Task Force, October 2000. 1646 [8] H. Schulzrinne, A. Rao, R. Lanphier, "RTSP: Real-Time Session 1647 Protocol", RFC 2326, Internet Engineering Task Force, April 1998. 1649 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1651 8. Author Addresses 1653 Jan van der Meer 1654 Philips Digital Networks 1655 Cederlaan 4 1656 5600 JB Eindhoven 1657 Netherlands 1658 Email : jan.vandermeer@philips.com 1660 David Mackie 1661 Apple Computer, Inc. 1662 One Infinite Loop, MS:302-2LF 1663 Cupertino CA 95014 1664 Email: dmackie@apple.com 1666 Viswanathan Swaminathan 1667 Sun Microsystems Inc. 1668 901 San Antonio Road, M/S UMPK15-214 1669 Palo Alto, CA 94303 1670 Email: viswanathan.swaminathan@sun.com 1672 David Singer 1673 Apple Computer, Inc. 1674 One Infinite Loop, MS:302-3MT 1675 Cupertino CA 95014 1676 Email: singer@apple.com 1678 Philippe Gentric 1679 Philips Digital Networks, MP4Net 1680 51 rue Carnot 1681 92156 Suresnes 1682 France 1683 e-mail: philippe.gentric@philips.com 1685 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1687 Full Copyright Statement 1689 Copyright (C) The Internet Society (December 2002). All Rights 1690 Reserved. 1692 This document and translations of it may be copied and furnished to 1693 others, and derivative works that comment on or otherwise explain 1694 it or assist in its implementation may be prepared, copied, 1695 published and distributed, in whole or in part, without restriction 1696 of any kind, provided that the above copyright notice and this 1697 paragraph are included on all such copies and derivative works. 1698 However, this document itself may not be modified in any way, such 1699 as by removing the copyright notice or references to the Internet 1700 Society or other Internet organizations, except as needed for the 1701 purpose of developing Internet standards in which case the 1702 procedures for copyrights defined in the Internet Standards process 1703 MUST be followed, or as required to translate it into languages 1704 other than English. 1706 The limited permissions granted above are perpetual and will 1707 not be revoked by the Internet Society or its successors or 1708 assigns. 1710 This document and the information contained herein is provided on 1711 an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 1712 ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR 1713 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1714 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1715 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1717 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1719 APPENDIX: Usage of this payload format 1721 Appendix A. Interleave analysis 1723 A.1 Introduction 1725 In this appendix interleaving issues are discussed. Some general 1726 notes are provided on de-interleaving and error concealment, while 1727 a number of interleaving patterns are examined, in particular 1728 for determining the maximum displacement in time and the size of 1729 the de-interleave buffer. In these examples, the maximum 1730 displacement is cited in terms of an access unit count, for ease of 1731 reading. In actual streams, it is signalled in units of the RTP 1732 time stamp clock. 1734 A.2 De-interleaving and error concealment 1736 This appendix does not describe any details on de-interleaving and 1737 error concealment, as the control of the AU decoding and error 1738 concealment process has little to do with interleaving. If the 1739 next AU to be decoded is present and there is sufficient storage 1740 available for the decoded AU, then decode it now. If not, wait. 1741 When the decoding deadline is reached (i.e., the time when decoding 1742 must begin in order to be completed by the time the AU is to be 1743 presented), or if the decoder is some hardware that presents a 1744 constant delay between initiation of decoding of an AU and 1745 presentation of that AU, then decoding must begin at that deadline 1746 time. 1748 If the next AU to be decoded is not present when the decoding 1749 deadline is reached, then that AU is lost so the receiver must take 1750 whatever error concealment measures is deemed appropriate. The 1751 playout delay may need to be adjusted at that point (especially if 1752 other AUs have also missed their deadline recently). Or, if it 1753 was a momentary delay, and maintaining the latency is important, 1754 then the receiver should minimize the glitch and continue processing 1755 with the next AU. 1757 A.3 Simple Group interleave 1759 A.3.1 Introduction 1761 An example of regular interleave is when packets are formed into 1762 groups. If the 'stride' of the interleave (the distance between 1763 interleaved AUs) is N, packet 0 could contain AU(0), AU(N), AU(2N), 1764 and so on; packet 1 could contain AU(1), AU(1+N), AU(1+2N), and so 1765 on. If there are M access units in a packet, then there are M*N 1766 access units in the group. 1768 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1770 An example with N=M=3 follows; note that this is the same example 1771 as given in section 2.5: 1773 Packet Time stamp Carried AUs AU-Index, AU-Index-delta 1774 P(0) T[0] 0, 3, 6 0, 2, 2 1775 P(1) T[1] 1, 4, 7 0, 2, 2 1776 P(2) T[2] 2, 5, 8 0, 2, 2 1777 P(3) T[9] 9,12,15 0, 2, 2 1779 In the above example the AU-Index is coded with the value 0, as 1780 required for the modes defined in this document. The position of 1781 the first AU of each packet within the group is defined by the RTP 1782 time stamp, while the AU-Index-delta field indicates the position 1783 of subsequent AUs relative to the first AU in the packet. All 1784 AU-Index-delta fields are coded with the value N-1, equal to 2 in 1785 this example. Hence the RTP time stamp and the AU-Index-delta are 1786 used to reconstruct the original order. See also section 3.2.3.2. 1788 A.3.2 Determining the de-interleave buffer size 1790 For the regular pattern as in this example, figure 6 in section 1791 3.2.3.3 shows that the de-interleave buffer size is equal to 4 AU 1792 sizes. 1794 A.3.3 Determining the maximum displacement 1796 For the regular pattern as in this example, figure 7 in section 3.3 1797 shows that the value of the maxDisplacement equals 5 AU periods. 1799 A.4 More subtle group interleave 1801 A.4.1 Introduction 1803 Another example of forming packets with group interleave is given 1804 below. In this example the packets are formed such that the loss of 1805 two subsequent RTP packets does not cause the loss of two subsequent 1806 AUs. Note that in this example the RTP time stamps of packet 3 and 1807 packet 4 are earlier than the RTP time stamps of packets 1 and 2, 1808 respectively. 1810 Packet Time stamp Carried AUs AU-Index, AU-Index-delta 1811 0 T[0] 0, 5 0, 5 1812 1 T[2] 2, 7 0, 5 1813 2 T[4] 4, 9 0, 5 1814 3 T[1] 1, 6 0, 5 1815 4 T[3] 3, 8 0, 5 1817 5 T[10] 10, 15 0, 5 1818 and so on .. 1820 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1822 In this example the AU-Index is coded with the value 0, as required 1823 for the modes defined in this document. To reconstruct the original 1824 order, the RTP time stamp and the AU-Index-delta (coded with the 1825 value 5) are used. See also section 3.2.3.2. 1827 A.4.2 Determining the de-interleave buffer size 1829 From figure 8 it can be to determined that at most 5 "early" AUs 1830 are to be stored. If the AUs are of constant size, then this value 1831 equals 5 times the AU size. 1833 +--+--+--+--+--+--+--+--+--+--+ 1834 Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| 1835 +--+--+--+--+--+--+--+--+--+--+ 1836 - - 5 - 5 - 2 7 4 9 1837 7 4 9 5 1838 Received "early" AUs 5 6 1839 7 7 1840 9 9 1842 Figure 8: Storage of "early" AUs in the de-interleave buffer per 1843 interleaved AU. 1845 A.4.2 Determining the maximum displacement 1847 From figure 9 it can be seen that max-interleaveDisplacement has 1848 a value of 8 AU periods. 1850 +--+--+--+--+--+--+--+--+--+--+ 1851 Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| 1852 +--+--+--+--+--+--+--+--+--+--+ 1854 Earliest not yet received AU - 1 1 1 1 1 - 3 - - 1856 Figure 9: The earliest not yet received AU for each AU in the 1857 interleaving pattern. 1859 A.5 Continuous interleave 1861 A.5.1 Introduction 1863 In continuous interleave, once the scheme is 'primed', the number 1864 of AUs in a packet exceeds the 'stride' (the distance between 1865 them). This shortens the buffering needed, smooths the data-flow, 1866 and gives slightly larger packets -- and thus lower overhead -- for 1867 the same interleave. For example, here is a continuous interleave 1868 also over a stride of 3 AUs, but with 4 AUs per packet, for a run 1869 of 20 AUs. This shows both how the scheme 'starts up' and how it 1870 finishes. 1872 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1874 Packet Time-stamp Carried AUs AU-Index, AU-Index-delta 1875 0 T[0] 0 0 1876 1 T[1] 1 4 0 2 1877 2 T[2] 2 5 8 0 2 2 1878 3 T[3] 3 6 9 12 0 2 2 2 1879 4 T[7] 7 10 13 16 0 2 2 2 1880 5 T[11] 11 14 17 20 0 2 2 2 1881 6 T[15] 15 18 0 2 1882 7 T[19] 19 0 1884 Also in this example the AU-Index is coded with the value 0, as 1885 required for the modes defined in this document. To reconstruct the 1886 original order, the RTP time stamp and the AU-Index-delta (coded 1887 with the value 2) are used. See also 3.2.3.2. Note that this 1888 example has RTP time-stamps in increasing order. 1890 A.5.2 Determining the de-interleave buffer size 1892 For this example the de-interleave buffer size can be derived from 1893 figure 10. The maximum number of "early" AUs is three. If the AUs 1894 are of constant size, then this value equals 3 times the AU size. 1895 Compared to the example in A.2, for constant size AUs the 1896 de-interleave buffer size is reduced from 4 to 3 times the AU size, 1897 while maintaining the same 'stride'. 1899 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 1900 Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16| 1901 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 1902 - - - 4 - - 4 8 - - 8 12 - - 1903 5 9 1904 Received "early" AUs 8 12 1906 Figure 10: Storage of "early" AUs in the de-interleave buffer per 1907 interleaved AU. 1909 A.5.3 Determining the maximum displacement 1911 For this example the maxDisplacement has a value of 5 AU periods. 1912 See figure 11. 1914 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 1915 Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16| 1916 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 1917 Earliest not yet 1918 received AU - - 2 - 3 3 - - 7 7 - - 11 11 1920 Figure 11: The earliest not yet received AU for each AU in the 1921 interleaving pattern.