idnits 2.17.1 draft-ietf-avt-mpeg4-simple-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == There is 1 instance of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 41 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The "Author's Address" (or "Authors' Addresses") section title is misspelled. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: This mode is signaled by mode=CELP-cbr. In this mode one or more complete CELP frames of fixed size can be transported in one RTP packet; there is no support for interleaving. The RTP payload consists of one or more concatenated CELP frames, each of the same size. CELP frames MUST not be fragmented when using this mode. Both the AU Header Section and the Auxiliary Section MUST be empty. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: This mode is signaled by mode=CELP-vbr. With this mode one or more complete CELP frames of variable size can be transported in one RTP packet with optional interleaving. As CELP frames are very small, while the largest possible AU-size in this mode is greater than the maximum CELP frame size, there is no support for fragmentation of CELP frames. Hence CELP frames MUST not be fragmented when using this mode. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: This mode is signaled by mode=AAC-lbr. This mode supports transport of one or more complete AAC frames of variable size. In this mode the AAC frames are allowed to be interleaved and hence receivers MUST support de-interleaving. The maximum size of an AAC frame in this mode is 63 octets. AAC frames MUST not be fragmented when using this mode. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2003) is 7621 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 1977, but not defined == Missing Reference: '15' is mentioned on line 1983, but not defined == Missing Reference: '19' is mentioned on line 1984, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 2048 (ref. '3') (Obsoleted by RFC 4288, RFC 4289) ** Obsolete normative reference: RFC 2327 (ref. '5') (Obsoleted by RFC 4566) ** Obsolete normative reference: RFC 2434 (ref. '6') (Obsoleted by RFC 5226) -- Obsolete informational reference (is this intentional?): RFC 2326 (ref. '8') (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 2733 (ref. '10') (Obsoleted by RFC 5109) -- Obsolete informational reference (is this intentional?): RFC 3016 (ref. '12') (Obsoleted by RFC 6416) Summary: 6 errors (**), 0 flaws (~~), 10 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force J. van der Meer 2 Internet Draft Philips Electronics 3 D. Mackie 4 Apple Computer 5 V. Swaminathan 6 Sun Microsystems Inc. 7 D. Singer 8 Apple Computer 9 P. Gentric 10 Philips Electronics 12 December 2002 13 Expires June 2003 15 Document: draft-ietf-avt-mpeg4-simple-06.txt 17 Transport of MPEG-4 Elementary Streams 19 Status of this Memo 21 This document is an Internet-Draft and is in full conformance with 22 all provisions of section 10 of RFC 2026. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. Internet-Drafts are draft documents valid for a maximum of 28 six months and may be updated, replaced, or obsoleted by other 29 documents at any time. It is inappropriate to use Internet- Drafts 30 as reference material or to cite them other than as "work in 31 progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This specification is a product of the Audio/Video Transport working 39 group within the Internet Engineering Task Force. Comments are 40 solicited and should be addressed to the working group's mailing 41 list at avt@ietf.org and/or the authors. 43 << Note for the RFC editor: xxxx should be replaced with the RFC 44 number that will be assigned. >> 46 Abstract 48 The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in 49 ISO that produced the MPEG-4 standard. MPEG defines tools to 50 compress content such as audio-visual information into elementary 51 streams. This specification defines a simple, but generic RTP 52 payload format for transport of any non-multiplexed MPEG-4 53 elementary stream. 55 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Carriage of MPEG-4 elementary streams over RTP . . . . . . . 6 61 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 6 62 2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . . 6 63 2.3. Concatenation of Access Units . . . . . . . . . . . . . . 6 64 2.4. Fragmentation of Access Units . . . . . . . . . . . . . . 7 65 2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . . 7 66 2.6. Time stamp information . . . . . . . . . . . . . . . . . . 8 67 2.7. State indication of MPEG-4 system streams . . . . . . . . 8 68 2.8. Random Access Indication . . . . . . . . . . . . . . . . . 8 69 2.9. Carriage of auxiliary information . . . . . . . . . . . . 9 70 2.10. MIME format parameters and configuring conditional field . 9 71 2.11. Global structure of payload format . . . . . . . . . . . . 9 72 2.12. Modes to transport MPEG-4 streams . . . . . . . . . . . . 10 73 2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . 10 74 3. Payload format . . . . . . . . . . . . . . . . . . . . . . . 11 75 3.1. Usage of RTP header fields and RTCP . . . . . . . . . . . 11 76 3.2. RTP payload structure . . . . . . . . . . . . . . . . . . 12 77 3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . 12 78 3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . 12 79 3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . 14 80 3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . 15 81 3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . 16 82 3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 16 83 3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . 17 84 3.2.3.4. Crucial and non-crucial AUs with MPEG-4 System data . 20 85 3.3. Usage of this specification . . . . . . . . . . . . . . . 22 86 3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . 22 87 3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . 22 88 3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . 23 89 3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . 23 90 3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . 24 91 3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . 25 92 3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . 26 93 4. IANA considerations . . . . . . . . . . . . . . . . . . . . 27 94 4.1. MIME type registration . . . . . . . . . . . . . . . . . . 27 95 4.2. Registration of mode definitions with IANA . . . . . . . . 32 96 4.3. Concatenation of parameters . . . . . . . . . . . . . . . 32 97 4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . . 33 98 4.4.1. The a=fmtp keyword . . . . . . . . . . . . . . . . . . . 33 99 5. Security considerations . . . . . . . . . . . . . . . . . . 33 100 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 101 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 102 7.1 Normative references . . . . . . . . . . . . . . . . . . . . 34 103 7.2 Informative references . . . . . . . . . . . . . . . . . . . 34 104 8. Author addresses . . . . . . . . . . . . . . . . . . . . . . 35 106 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 108 APPENDIX: Usage of this payload format . . . . . . . . . . . 37 109 A. Examples of delay analysis with interleave . . . . . . . 37 110 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 37 111 A.2 De-interleaving and error concealment . . . . . . . . . 37 112 A.3 Simple Group interleave . . . . . . . . . . . . . . . . 37 113 A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 37 114 A.3.2 Determining the de-interleave buffer size . . . . . . 38 115 A.3.3 Determining the maximum displacement . . . . . . . . . 38 116 A.4 More subtle group interleave . . . . . . . . . . . . . . 38 117 A.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 38 118 A.4.2 Determining the de-interleave buffer size . . . . . . 39 119 A.4.3 Determining the maximum displacement . . . . . . . . . 39 120 A.5 Continuous interleave . . . . . . . . . . . . . . . . . 39 121 A.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 39 122 A.5.2 Determining the de-interleave buffer size . . . . . . 40 123 A.5.3 Determining the maximum displacement . . . . . . . . . 41 125 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 127 1. Introduction 129 The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 130 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 131 standards [1]. The MPEG-4 standard specifies compression of 132 audio-visual data into for example an audio or video elementary 133 stream. In the MPEG-4 standard, these streams take the form of 134 audio-visual objects that may be arranged into an audio-visual scene 135 by means of a scene description. Each MPEG-4 elementary stream 136 consists of a sequence of Access Units; examples of an Access Unit 137 (AU) are an audio frame and a video picture. 139 This specification defines a general and configurable payload 140 structure to transport MPEG-4 elementary streams, in particular 141 MPEG-4 audio (including speech) streams, MPEG-4 video streams and 142 also MPEG-4 systems streams, such as BIFS (BInary Format for 143 Scenes), OCI (Object Content Information), OD (Object Descriptor) 144 and IPMP (Intellectual Property Management and Protection) streams. 145 The RTP payload defined in this document is simple to implement and 146 reasonably efficient. It allows for optional interleaving of Access 147 Units (such as audio frames) to increase error resiliency in packet 148 loss. 150 Some types of MPEG-4 elementary streams include "crucial" 151 information whose loss cannot be tolerated, but RTP does not provide 152 reliable transmission so receipt of that crucial information is not 153 assured. Section 3.2.3.4 specifies how stream state is conveyed so 154 that the receiver can detect the loss of crucial information and 155 cease decoding until the next random access point is received. 156 Applications transmitting streams that include crucial information, 157 such as OD commands, BIFS commands, or programmatic content such as 158 MPEG-J (Java) and ECMAScript, should include random access points 159 sufficiently often, depending upon the probability of loss, to 160 reduce stream corruption to an acceptable level. An example is the 161 carousel mechanism as defined by MPEG in ISO/IEC 14496-1. 163 Such applications may also employ additional protocols or services 164 to reduce the probability of loss. At the RTP layer, these measures 165 include payload formats and profiles for retransmission or forward 166 error correction (such as in RFC 2733 [10]), which must be employed 167 with due consideration to congestion control. Another solution that 168 may be appropriate for some applications is to carry RTP over TCP 169 (such as in RFC 2326 [8], section 10.12). At the network layer, 170 resource allocation or preferential service may be available to 171 reduce the probability of loss. For a general description of methods 172 to repair streaming media see RFC 2354 [9]. 174 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 176 Though the RTP payload format defined in this document is capable 177 of transporting any MPEG-4 stream, other, more specific, formats 178 may exist, such as RFC 3016 [12] for transport of MPEG-4 video 179 (ISO/IEC 14496 [1] part 2). 181 Configuration of the payload is provided to accommodate transport 182 of any MPEG-4 stream at any possible bit rate. However, for a 183 specific MPEG-4 elementary stream typically only very few 184 configurations are needed. So as to allow for the design of 185 simplified, but dedicated receivers, this specification requires 186 that specific modes are defined for transport of MPEG-4 streams. 187 This document defines modes for MPEG-4 CELP and AAC streams, as 188 well as a generic mode that can be used to transport any MPEG-4 189 stream. In the future new RFCs are expected to specify additional 190 modes for transport of MPEG-4 streams. 192 The RTP payload format defined in this document specifies carriage 193 of system-related information that is often equivalent to the 194 information that may be contained in the MPEG-4 Sync Layer (SL) as 195 defined in MPEG-4 Systems [1]. This document does not prescribe how 196 to transcode or map information from the SL to fields defined in 197 the RTP payload format. Such processing, if any, is left to the 198 discretion of the application. However, to anticipate the need for 199 transport of any additional system-related information in future, 200 an auxiliary field can be configured that may carry any such data. 202 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 203 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 204 this document are to be interpreted as described in RFC 2119 [4]. 206 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 208 2. Carriage of MPEG-4 elementary streams over RTP 210 2.1 Introduction 212 With this payload format a single MPEG-4 elementary stream can be 213 transported. Information on the type of MPEG-4 stream carried in 214 the payload is conveyed by MIME format parameters, for example in 215 an SDP [5] message or by other means (see section 4). These MIME 216 format parameters specify the configuration of the payload. To 217 allow for simplified and dedicated receivers, a MIME format 218 parameter is available to signal a specific mode of using this 219 payload. A mode definition MAY include the type of MPEG-4 220 elementary stream as well as the applied configuration, so as to 221 avoid the need in receivers to parse all MIME format parameters. 222 The applied mode MUST be signaled. 224 2.2 MPEG Access Units 226 For carriage of compressed audio-visual data MPEG defines Access 227 Units. An MPEG Access Unit (AU) is the smallest data entity to 228 which timing information is attributed. In case of audio an Access 229 Unit may represent an audio frame and in case of video a picture. 230 MPEG Access Units are by definition octet-aligned. If for example 231 an audio frame is not octet-aligned, up to 7 zero-padding bits MUST 232 be inserted at the end of the frame to achieve the octet-aligned 233 Access Units, as required by the MPEG-4 specification. MPEG-4 234 decoders MUST be able to decode AUs in which such padding is 235 applied. 237 Consistent with the MPEG-4 specification, this document requires 238 that each MPEG-4 part 2 video Access Unit includes all the coded 239 data of a picture, any video stream headers that may precede the 240 coded picture data, and any video stream stuffing that may follow 241 it, up to, but not including the startcode indicating the start of 242 a new video stream or the next Access Unit. 244 2.3 Concatenation of Access Units 246 Frequently it is possible to carry multiple Access Units in one RTP 247 packet. This is particularly useful for audio; for example, when 248 AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC 249 frames contain on average approximately 200 octets. On a LAN with a 250 1500 octet MTU this would allow on average 7 complete AAC frames to 251 be carried per AAC packet. 253 Access Units may have a fixed size in octets, but a variable size 254 is also possible. To facilitate parsing in case of multiple 255 concatenated AUs in one RTP packet, the size of each AU is made 256 known to the receiver. When concatenating in case of a constant AU 257 size, this size is communicated "out of band" through a MIME format 258 parameter. When concatenating in case of variable size AUs, the RTP 259 payload carries "in band" an AU size field for each contained AU. 261 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 263 In combination with the RTP payload length the size information 264 allows the RTP payload to be split by the receiver back into the 265 individual AUs. 267 To simplify the implementation of RTP receivers, it is required 268 that when multiple AUs are carried in an RTP packet, each AU MUST 269 be complete, i.e. the number of AUs in an RTP packet MUST be 270 integral. In addition, an AU MUST NOT be repeated in other RTP 271 packets; hence repetition of an AU is only possible by using a 272 duplicate RTP packet. 274 2.4 Fragmentation of Access Units 276 MPEG allows for very large Access Units. Since most IP networks 277 have significantly smaller MTU sizes, this payload format allows 278 for the fragmentation of an Access Unit over multiple RTP packets 279 so as to avoid IP layer fragmentation. To simplify the 280 implementation of RTP receivers, an RTP packet SHALL either carry 281 one or more complete Access Units or a single fragment of one 282 Access Unit (i.e. packets MUST NOT contain fragments of multiple 283 Access Units). 285 2.5 Interleaving 287 When an RTP packet carries a contiguous sequence of Access Units, 288 the loss of such a packet can result in a "decoding gap" for the 289 user. One method to alleviate this problem is to allow for the 290 Access Units to be interleaved in the RTP packets. For a modest 291 cost in latency and implementation complexity, significant error 292 resiliency to packet loss can be achieved. 294 To support optional interleaving of Access Units, this payload 295 format allows for index information to be sent for each Access Unit. 296 After informing receivers about buffer resources to allocate for 297 de-interleaving, the RTP sender is free to choose the interleaving 298 pattern without propagating this information a priori to the 299 receiver(s). Indeed the sender could dynamically adjust the 300 interleaving pattern based on the Access Unit size, error rates, 301 etc. The RTP receiver does not need to know the interleaving 302 pattern used, it only needs to extract the index information of the 303 Access Unit and insert the Access Unit into the appropriate 304 sequence in the decoding or rendering queue. An example of 305 interleaving is given below. 307 Assume that an RTP packet contains 3 AUs, and that the AUs are 308 numbered 0, 1, 2, 3, 4, etc. If an interleaving group length of 9 is 309 chosen, then RTP packet(i) contains the following AU(n): 310 RTP packet(0): AU(0), AU(3), AU(6) 311 RTP packet(1): AU(1), AU(4), AU(7) 312 RTP packet(2): AU(2), AU(5), AU(8) 313 RTP packet(3): AU(9), AU(12), AU(15) 314 RTP packet(4): AU(10), AU(13), AU(16) 315 Etc. 317 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 319 2.6 Time stamp information 321 The RTP time stamp MUST carry the sampling instant of the first AU 322 (fragment) in the RTP packet. When multiple AUs are carried within 323 an RTP packet, the time stamps of subsequent AUs can be calculated 324 if the frame period of each AU is known. For audio and video this 325 is possible if the frame rate is constant. However, in some cases 326 it is not possible to make such calculation, for example for 327 variable frame rate video and for MPEG-4 BIFS streams carrying 328 composition information. To support such cases, this payload format 329 can be configured to carry a time stamp in the RTP payload for each 330 contained Access Unit. A time stamp MAY be conveyed in the RTP 331 payload only for non-first AUs in the RTP packet, and SHALL NOT be 332 conveyed for the first AU (fragment), as the time stamp for the 333 first AU in the RTP packet is carried by the RTP time stamp. 335 MPEG-4 defines two type of time stamps, the composition time stamp 336 (CTS) and the decoding time stamp (DTS). The CTS represents the 337 sampling instant of an AU, and hence the CTS is equivalent to the 338 RTP time stamp. The DTS may be used in MPEG-4 video streams that 339 use bi-directional coding, i.e. when pictures are predicted in both 340 forward and backward direction by using either a reference picture 341 in the past, or a reference picture in the future. The DTS cannot 342 be carried in the RTP header. In some cases the DTS can be derived 343 from the RTP time stamp using frame rate information; this requires 344 deep parsing in the video stream, which may be considered 345 objectionable. But if the video frame rate is variable, the required 346 information may not even be present in the video stream. For both 347 reasons, the capability has been defined to optionally carry the 348 DTS in the RTP payload for each contained Access Unit. 350 To keep the coding of time stamps efficient, each time stamp 351 contained in the RTP payload is coded differentially, the CTS from 352 the RTP time stamp, and the DTS from the CTS. 354 2.7 State indication of MPEG-4 system streams 356 ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to 357 convey state information when transporting MPEG-4 system streams, 358 this payload format allows for the optional carriage in the RTP 359 payload of the stream state for each contained Access Unit. Stream 360 states are used to signal "crucial" AUs that carry information whose 361 loss cannot be tolerated and are also useful when repeating AUs 362 according to the carousel mechanism defined in ISO/IEC 14496-1. 364 2.8 Random access indication 366 Random access to the content of MPEG-4 elementary streams may be 367 possible at some but not all Access Units. To signal Access Units 368 where random access is possible, a random access point flag can 370 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 372 optionally be carried in the RTP payload for each contained Access 373 Unit. Carriage of random access points is particularly useful for 374 MPEG-4 system streams in combination with the stream state. 376 2.9 Carriage of auxiliary information. 378 This payload format defines a specific field to carry auxiliary 379 data. The auxiliary data field is preceded by a field that specifies 380 the length of the auxiliary data, so as to facilitate skipping of 381 the data without parsing it. The coding of the auxiliary data is not 382 defined in this document; instead the format, meaning and signaling 383 of auxiliary information is expected to be specified in one or more 384 future RFCs. Auxiliary information MUST NOT be transmitted until its 385 format, meaning and signaling have been specified and its use has 386 been signaled. Receivers that have knowledge of the auxiliary data 387 MAY decode the auxiliary data, but receivers without knowledge of 388 such data MUST skip the auxiliary data field. 390 2.10 MIME format parameters and configuring conditional fields 392 To support the features described in the previous sections several 393 fields are defined for carriage in the RTP payload. However, their 394 use strongly depends on the type of MPEG-4 elementary stream that 395 is carried. Sometimes a specific field is needed with a certain 396 length, while in other cases such field is not needed at all. To be 397 efficient in either case, the fields to support these features are 398 configurable by means of MIME format parameters. In general, a MIME 399 format parameter defines the presence and length of the associated 400 field. A length of zero indicates absence of the field. As a 401 consequence, parsing of the payload requires knowledge of MIME 402 format parameters. The MIME format parameters are conveyed to the 403 receiver via SDP [5] messages, as specified in section 4.4.1, or 404 through other means. 406 2.11 Global structure of payload format 408 The RTP payload following the RTP header, contains three 409 octet-aligned data sections, of which the first two MAY be empty. 410 See figure 1. 412 +---------+-----------+-----------+---------------+ 413 | RTP | AU Header | Auxiliary | Access Unit | 414 | Header | Section | Section | Data Section | 415 +---------+-----------+-----------+---------------+ 417 <----------RTP Packet Payload-----------> 419 Figure 1: Data sections within an RTP packet 421 The first data section is the AU (Access Unit) Header Section, that 422 contains one or more AU-headers; however, each AU-header MAY be 423 empty, in which case the entire AU Header Section is empty. The 425 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 427 second section is the Auxiliary Section, containing auxiliary data; 428 this section MAY also be configured empty. The third section is the 429 Access Unit Data Section, containing either a single fragment of 430 one Access Unit or one or more complete Access Units. The Access 431 Unit Data Section MUST NOT be empty. 433 2.12 Modes to transport MPEG-4 streams 435 While it is possible to build fully configurable receivers capable 436 of receiving any MPEG-4 stream, this specification also allows for 437 the design of simplified, but dedicated receivers, that are capable 438 for example of receiving only one type of MPEG-4 stream. This 439 is achieved by requiring that specific modes be defined for using 440 this specification. Each mode may define constraints for transport 441 of one or more type of MPEG-4 streams, for instance on the payload 442 configuration. 444 The applied mode MUST be signaled. Signaling the mode is 445 particularly important for receivers that are only capable of 446 decoding one or more specific modes. Such receivers need to 447 determine whether the applied mode is supported, so as to avoid 448 problems with processing of payloads that are beyond the 449 capabilities of the receiver. 451 In this document several modes are defined for transport of MPEG-4 452 CELP and AAC streams, as well as a generic mode that can be used 453 for any MPEG-4 stream. In the future, new RFCs may specify other 454 modes of using this specification. However, each mode MUST be in 455 full compliance with this specification (see section 3.3.7). 457 2.13 Alignment with RFC 3016 459 This payload can be configured to be nearly identical to the 460 payload format defined in RFC 3016 [12] for the MPEG-4 video 461 configurations recommended in RFC 3016. Hence, receivers that 462 comply with RFC 3016 can decode such RTP payload, providing that 463 additional packets containing video decoder configuration (VO, 464 VOL, VOSH) are inserted in the stream, as required by RFC 3016. 465 Conversely, receivers that comply with the specification in this 466 document should be able to decode payloads, names and parameters 467 defined for MPEG-4 video in RFC 3016. In this respect it is 468 strongly RECOMMENDED to implement the ability to ignore "in band" 469 video decoder configuration packets in the RFC 3016 payload. 471 Note the "out of band" availability of the video decoder 472 configuration is optional in RFC 3016. To achieve maximum 473 interoperability with the RTP payload format defined in this 474 document, applications that use RFC 3016 to transport MPEG-4 video 475 (part 2) are recommended to make the video decoder configuration 476 available as a MIME parameter. 478 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 480 3. Payload Format 482 3.1 Usage of RTP Header Fields and RTCP 484 Payload Type (PT): The assignment of an RTP payload type for this 485 packet format is outside the scope of this document; it is 486 specified by the RTP profile under which this payload format is 487 used. 489 Marker (M) bit: The M bit is set to 1 to indicate that the RTP 490 packet payload contains either the final fragment of a fragmented 491 Access Unit or one or more complete Access Units. 493 Extension (X) bit: Defined by the RTP profile used. 495 Sequence Number: The RTP sequence number SHOULD be generated by the 496 sender in the usual manner with a constant random offset. 498 Timestamp: Indicates the sampling instant of the first AU 499 contained in the RTP payload. This sampling instant is equivalent 500 to the CTS in the MPEG-4 time domain. When using SDP the clock rate 501 of the RTP time stamp MUST be expressed using the "rtpmap" 502 attribute. If an MPEG-4 audio stream is transported, the rate SHOULD 503 be set to the same value as the sampling rate of the audio stream. 504 If an MPEG-4 video stream is transported, it is RECOMMENDED to set 505 the rate to 90 kHz. 507 In all cases, the sender SHALL make sure that RTP time stamps 508 are identical only if the RTP time stamp refers to fragments of the 509 same Access Unit. 511 According to RFC 1889 [2] (section 5.1), RTP time stamps are 512 RECOMMENDED to start at a random value for security reasons. This 513 is not an issue for synchronization of multiple RTP streams. When, 514 however, streams from multiple sources are to be synchronized (for 515 example one stream from local storage, another from an RTP streaming 516 server), synchronization may become impossible if the receiver only 517 knows the original time stamp relationships. Synchronization in such 518 cases, may require to provide the correct relationship between time 519 stamps for obtaining synchronization by out of band means. The 520 format of such information as well as methods to convey such 521 information are beyond the scope of this specification. 523 SSRC: set as described in RFC 1889 [2]. 525 CC and CSRC fields are used as described in RFC 1889 [2]. 527 RTCP SHOULD be used as defined in RFC 1889 [2]. Note that time 528 stamps in RTCP Sender Reports may be used to synchronize multiple 529 MPEG-4 elementary streams and also to synchronize MPEG-4 streams 530 with non-MPEG-4 streams, in case the delivery of these streams uses 531 RTP. 533 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 535 3.2 RTP Payload Structure 537 3.2.1 The AU Header Section 539 When present, the AU Header Section consists of the 540 AU-headers-length field, followed by a number of AU-headers. See 541 figure 2. 543 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ 544 |AU-headers-length|AU-header|AU-header| |AU-header|padding| 545 | | (1) | (2) | | (n) | bits | 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ 548 Figure 2: The AU Header Section 550 The AU-headers are configured using MIME format parameters and MAY 551 be empty. If the AU-header is configured empty, the 552 AU-headers-length field SHALL NOT be present and consequently the 553 AU Header Section is empty. If the AU-header is not configured 554 empty, then the AU-headers-length is a two octet field that 555 specifies the length in bits of the immediately following 556 AU-headers, excluding the padding bits. 558 Each AU-header is associated with a single Access Unit (fragment) 559 contained in the Access Unit Data Section in the same RTP packet. 560 For each contained Access Unit (fragment) there is exactly one 561 AU-header. Within the AU Header Section, the AU-headers are 562 bit-wise concatenated in the order in which the Access Units are 563 contained in the Access Unit Data Section. Hence, the n-th 564 AU-header refers to the n-th AU (fragment). If the concatenated 565 AU-headers consume a non-integer number of octets, up to 7 566 zero-padding bits MUST be inserted at the end in order to achieve 567 octet-alignment of the AU Header Section. 569 3.2.1.1 The AU-header 571 Each AU-header may contain the fields given in figure 3. The length 572 in bits of the above fields with the exception of the CTS-flag, the 573 DTS-flag and the RAP-flag fields is defined by MIME format 574 parameters; see section 4.1. If a MIME format parameter has the 575 default value of zero, then the associated field is not present. 576 The number of bits for fields that are present and that represent 577 the value of a parameter MUST be chosen large enough to correctly 578 encode the largest value of that parameter during the session. 580 If present, the fields MUST occur in the mutual order given in 581 figure 3. In the general case a receiver can only discover the size 582 of an AU-header by parsing it since the presence of the CTS-delta 583 and DTS-delta fields is signaled by the value of the CTS-flag and 584 DTS-flag, respectively. 586 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 588 +---------------------------------------+ 589 | AU-size | 590 +---------------------------------------+ 591 | AU-Index / AU-Index-delta | 592 +---------------------------------------+ 593 | CTS-flag | 594 +---------------------------------------+ 595 | CTS-delta | 596 +---------------------------------------+ 597 | DTS-flag | 598 +---------------------------------------+ 599 | DTS-delta | 600 +---------------------------------------+ 601 | RAP-flag | 602 +---------------------------------------+ 603 | Stream-state | 604 +---------------------------------------+ 606 Figure 3: The fields in the AU-header. If used, the AU-Index field 607 only occurs in the first AU-header within an AU Header 608 Section; in any other AU-header the AU-Index-delta field 609 occurs instead. 611 AU-size: Indicates the size in octets of the associated Access Unit 612 in the Access Unit Data Section in the same RTP packet. When 613 the AU-size is associated with an AU fragment, the AU size 614 indicates the size of the entire AU and not the size of the 615 fragment. In this case, the size of the fragment is known 616 from the size of the AU data section. This can be exploited 617 to determine whether a packet contains an entire AU or a 618 fragment, which is particularly useful after losing a packet 619 carrying the last fragment of an AU. 621 AU-Index: Indicates the serial number of the associated Access Unit 622 (fragment). For each (in decoding order) consecutive AU or AU 623 fragment, the serial number is incremented with 1. When 624 present, the AU-Index field occurs in the first AU-header in 625 the AU Header Section, but MUST NOT occur in any subsequent 626 (non-first) AU-header in that Section. To encode the serial 627 number in any such non-first AU-header, the AU-Index-delta 628 field is used. 630 AU-Index-delta: The AU-Index-delta field is an unsigned integer 631 that specifies the serial number of the associated AU as the 632 difference with respect to the serial number of the previous 633 Access Unit. Hence, for the n-th (n>1) AU the serial number 634 is found from: 635 AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1 636 If the AU-Index field is present in the first AU-header in 638 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 640 the AU Header Section, then the AU-Index-delta field MUST be 641 present in any subsequent (non-first) AU-header. When the 642 AU-Index-delta is coded with the value 0, it indicates that 643 the Access Units are consecutive in decoding order. An 644 AU-Index-delta value larger than 0 signals that interleaving 645 is applied. 647 CTS-flag: Indicates whether the CTS-delta field is present. 648 A value of 1 indicates that the field is present, a value 649 of 0 that it is not present. 650 The CTS-flag field MUST be present in each AU-header if the 651 length of the CTS-delta field is signaled to be larger than 652 zero. In that case, the CTS-flag field MUST have the value 0 653 in the first AU-header and MAY have the value 1 in all 654 non-first AU-headers. The CTS-flag field SHOULD be 0 for 655 any non-first fragment of an Access Unit. 657 CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's 658 complement offset (delta) from the time stamp in the RTP 659 header of this RTP packet. The CTS MUST use the same clock 660 rate as the time stamp in the RTP header. 662 DTS-flag: Indicates whether the DTS-delta field is present. A value 663 of 1 indicates that DTS-delta is present, a value of 0 that 664 it is not present. 665 The DTS-flag field MUST be present in each AU-header if the 666 length of the DTS-delta field is signaled to be larger than 667 zero. The DTS-flag field MUST have the same value for all 668 fragments of an Access Unit. 670 DTS-delta: Specifies the value of the DTS as a 2's complement 671 offset (delta) from the CTS. The DTS MUST use the 672 same clock rate as the time stamp in the RTP header. The 673 DTS-delta field MUST have the same value for all fragments of 674 an Access Unit. 676 RAP-flag: Indicates when set to 1 that the associated Access Unit 677 provides a random access point to the content of the stream. 678 If an Access Unit is fragmented, the RAP flag, if present, 679 MUST be set to 0 for each non-first fragment of the AU. 681 Stream-state: Specifies the state of the stream for an AU of an 682 MPEG-4 system stream; each state is identified by a value of 683 a modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams 684 use the AU_SequenceNumber to signal stream states. When the 685 stream state changes, the value of stream-state MUST be 686 incremented by one. 688 Note: no relation is required between stream-states of 689 different streams. 691 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 693 3.2.2 The Auxiliary Section 695 The Auxiliary Section consists of the auxiliary-data-size field 696 followed by the auxiliary-data field. Receivers MAY (but are not 697 required to) parse the auxiliary-data field; to facilitate skipping 698 of the auxiliary-data field by receivers, the auxiliary-data-size 699 field indicates the length in bits of the auxiliary-data. If the 700 concatenation of the auxiliary-data-size and the auxiliary-data 701 fields consume a non-integer number of octets, up to 7 zero padding 702 bits MUST be inserted immediately after the auxiliary data in order 703 to achieve octet-alignment. See figure 4. 705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ 706 | auxiliary-data-size | auxiliary-data |padding bits | 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ 709 Figure 4: The fields in the Auxiliary Section 711 The length in bits of the auxiliary-data-size field is configurable 712 by a MIME format parameter; see section 4.1. The default length of 713 zero indicates that the entire Auxiliary Section is absent. 715 auxiliary-data-size: specifies the length in bits of the immediately 716 following auxiliary-data field; 718 auxiliary-data: the auxiliary-data field contains data of a format 719 not defined by this specification. 721 3.2.3 The Access Unit Data Section 723 The Access Unit Data Section contains an integer number of complete 724 Access Units or a single fragment of one AU. The Access Unit Data 725 Section is never empty. If data of more than one Access Unit is 726 present, then the AUs are concatenated into a contiguous string 727 of octets. See figure 5. The AUs inside the Access Unit Data 728 Section MUST be in decoding order, though not necessarily contiguous 729 in the case of interleaving. 731 The size and number of Access Units SHOULD be adjusted such that 732 the resulting RTP packet is not larger than the path MTU. To handle 733 larger packets, this payload format relies on lower layers for 734 fragmentation, which may result in reduced performance. 736 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 739 |AU(1) | 740 + | 741 | | 742 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 743 | |AU(2) | 744 +-+-+-+-+-+-+-+-+ | 745 | | 746 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 747 | | AU(n) | 748 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 749 |AU(n) continued| 750 |-+-+-+-+-+-+-+-+ 752 Figure 5: Access Unit Data Section; each AU is octet-aligned. 754 When multiple Access Units are carried, the size of each AU MUST be 755 made available to the receiver. If the AU size is variable then the 756 size of each AU MUST be indicated in the AU-size field of the 757 corresponding AU-header. However, if the AU size is constant for a 758 stream, this mechanism SHOULD NOT be used, but instead the fixed 759 size SHOULD be signaled by the MIME format parameter 760 "ConstantSize", see section 4.1. 762 The absence of both AU-size in the AU-header and the ConstantSize 763 MIME format parameter indicates carriage of a single AU (fragment), 764 i.e. that a single Access Unit (fragment) is transported in each 765 RTP packet for that stream. 767 3.2.3.1 Fragmentation 769 A packet SHALL carry either one or more complete Access Units, or 770 a single fragment of an Access Unit. Fragments of the same Access 771 Unit have the same time stamp but different RTP sequence numbers. 772 The marker bit in the RTP header is 1 on the last fragment of an 773 Access Unit, and 0 on all other fragments. 775 3.2.3.2 Interleaving 777 Access Units MAY be interleaved. Senders MAY perform interleaving. 778 Receivers MUST support interleaving, except if the receiver only 779 supports modes in which no interleaving is allowed. When Access 780 Units are interleaved, it SHALL be implemented using the AU-Index 781 and the AU-Index-delta fields in the AU-header. 783 When a sender interleaves Access Units, then the transmitter needs 784 to provide sufficient information to enable a receiver to 785 unambiguously reconstruct the original order, even in case of 786 out-of-order packets, packet loss or duplication. The information 788 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 790 that senders need to provide depends on whether or not the Access 791 Units have a constant time duration. Access Units have a constant 792 time duration, if: 794 TS(i+1) � TS(i) = constant, for any i, where 796 i indicates the index of the AU in original order 797 TS(i) denotes the time stamp of AU(i) 799 If Access Units have a constant time duration then a receiver can 800 unambiguously reconstruct the original order based on the RTP 801 time stamp, the AU-Index and the AU-Index-delta. Note that for this 802 purpose the AU-Index is redundant, as the RTP time stamp and the 803 AU-Index-delta values are sufficient for placing the AUs correctly 804 in time. The RTP time stamp usually provides better robustness to 805 large bursts of packet losses, and is therefore to be preferred. 806 In order to unambiguously determine the index of each AU in the 807 most convenient way when the AUs have a constant time duration, the 808 value of the time duration SHOULD be signaled by the MIME format 809 parameter "constantDuration", see section 4.1. 811 If the "constantDuration" parameter is present, then the transmitter 812 MUST encode the AU-Index, if present, with the value 0 and the 813 receiver MUST use the RTP time stamp to determine the index of the 814 first AU in the RTP packet. 816 If the "constantDuration" parameter is not present, then Access 817 Units are assumed to have a variable duration. In this case, the 818 AU-Index is not redundant, and MUST provide the index information 819 required for re-ordering, and the receiver MUST use that value to 820 determine the index of the first AU in the RTP packet. The number 821 of bits of the AU-Index field MUST be chosen so that valid index 822 information is provided at the applied interleaving scheme, without 823 causing problems due to roll-over of the AU-Index field. For 824 variable duration AUs, index information is needed to reconstruct 825 the original order and to identify missing AUs, but to place the 826 AUs correctly in time, for each AU the time stamp is needed. 827 Therefore, if the "constantDuration" parameter is not present, then 828 the CTS-delta MUST be coded in the AU header for each non-first AU 829 in the RTP packet. 831 When interleaving is applied, a de-interleave buffer is needed in 832 receivers to put the Access Units in their correct logical 833 consecutive decoding order. This requires the computation of the 834 time stamp for each Access Unit. In case of a constant time duration 835 per Access Unit, the time stamp of the i-th access unit in an RTP 836 packet with RTP time stamp T is calculated as follows: 838 Timestamp[0] = T 839 Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] 840 + 1))) * access-unit-duration 842 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 844 When AU-Index-delta is always 0, this reduces to T + i * (access- 845 unit-duration). This is the non-interleaved case, where the frames 846 are consecutive in decoding order. Note that the AU-Index field 847 (present for the first Access Unit) is indeed not needed in this 848 calculation. 850 3.2.3.3 Constraints for interleaving 852 The size of the packets should be suitably chosen to be appropriate 853 to both the path MTU and the capacity of the receiver's 854 de-interleave buffer. The maximum packet size for a session SHOULD 855 be chosen not to exceed the path MTU. 857 To allow receivers to allocate sufficient resources for 858 de-interleaving, senders MUST provide the information to receivers 859 as specified in this section. 861 AUs enter the decoder in decoding order. The de-interleave buffer 862 is used to re-order a stream of interleaved AUs back into decoding 863 order. When interleaving is applied, the decoding of "early" AUs 864 has to be postponed until all AUs that precede in decoding order 865 are present. Therefore these "early" AUs are stored in the 866 de-interleave buffer. As an example in figure 6 the interleaving 867 pattern from section 2.5 is considered. 869 +--+--+--+--+--+--+--+--+--+--+--+- 870 Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|.. 871 +--+--+--+--+--+--+--+--+--+--+--+- 872 Storage of "early" AUs 3 3 3 3 3 3 873 6 6 6 6 6 6 874 4 4 4 875 7 7 7 876 12 12 878 Figure 6: Storage of "early" AUs in the de-interleave buffer per 879 interleaved AU. 881 AU(3) is to be delivered to the decoder after AU(0), AU(1)and 882 AU(2); of these AUs, AU(2) is most late and hence AU(3) needs to be 883 stored until AU(2) is present in the pattern. Similarly, AU(6) is 884 to be stored until AU(5) is present, while AU(4) and AU(7) are to 885 be stored until AU(2) and AU(5) are present, respectively. Note 886 that the fullness of the de-interleave buffer varies in time. In 887 figure 6, the de-interleave buffer contains at most 4, but often 888 less AUs. 890 So as to give a rough indication of the resources needed in the 891 receiver for de-interleaving, the maximum displacement in time of 892 an AU is defined. For any AU in the pattern it can be verified 894 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 896 which AUs are not yet present. The maximum displacement in time of 897 an AU is the maximum difference between the time stamp of an AU in 898 the pattern and the time stamp of the earliest AU that is not yet 899 present. In other words, when considering a sequence of interleaved 900 AUs, then: 902 Maximum displacement = max{TS(i) - TS(j)}, for any i and any j>i, 904 where i and j indicate the index of the AU in the 905 interleaving pattern and TS denotes the time stamp 906 of the AU 908 As an example in figure 7 the interleaving pattern from section 2.5 909 is considered. For each AU in the pattern the earliest not yet 910 present AU is indicated. A "-" indicates that all previous AUs 911 are present. If the AU period is constant, the maximum displacement 912 equals 5 AU periods, as found for AU(6) and AU(7). 914 +--+--+--+--+--+--+--+--+--+--+--+- 915 Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|.. 916 +--+--+--+--+--+--+--+--+--+--+--+- 918 Earliest not yet present AU - 1 1 - 2 2 - - - - 10 920 Figure 7: The earliest not yet present AU for each AU in the 921 interleaving pattern. 923 When interleaving, senders MUST signal the maximum displacement 924 in time during the session via the MIME format parameter 925 "maxDisplacement"; see section 4.1. 927 An estimate of the size of the de-interleave buffer is found by 928 multiplying the maximum displacement by the maximum bit rate: 930 size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP 931 clock frequency), 933 where Rate(max) is the maximum bit-rate of the transported stream. 935 Note that receivers can derive Rate(max) from the MIME format 936 parameters StreamType, Profile-level-id, and config. 938 However, this calculation estimates the size of the de-interleave 939 buffer and the really required size may differ from the calculated 940 value. If this calculation under-estimates the size of the 941 de-interleave buffer, then senders, when interleaving, MUST signal 942 a size of the de-interleave buffer via the MIME format parameter 943 "de-interleaveBufferSize"; see section 4.1. If the calculation 945 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 947 over-estimates the size of the de-interleave buffer, then senders, 948 when interleaving, MAY signal a size of the de-interleave buffer 949 via the MIME format parameter "de-interleaveBufferSize". 951 The signaled size of the de-interleave buffer MUST be large enough 952 to contain all "early" AUs at any point in time during the session, 953 that is: 955 minimum de-interleave buffer size = max [sum {if TS(i) > TS(j) then 956 AU-size(i) else 0}] for any j 957 and any i /[/] 1043 For audio streams, specifies the number of 1044 audio channels: 2 for stereo material (see RFC 2327 [5]) and 1 for 1045 mono. Provided no additional parameters are needed, this parameter 1046 may be omitted for mono material, hence its default value is 1. 1048 3.3.2 The generic mode 1050 The generic mode can be used for any MPEG-4 stream. In this mode 1051 no mode-specific constraints are applied; hence, in the generic 1052 mode the full flexibility of this specification can be exploited. 1053 The generic mode is signaled by mode=generic. 1055 An example is given below for transport of a BIFS stream. In this 1056 example carriage of multiple BIFS Access Units is allowed in one 1057 RTP packet. The AU-header contains the AU-size field, the CTS-flag 1058 and, if the CTS flag is set to 1, the CTS-delta field. The number 1059 of bits of the AU-size and the CTS-delta fields is 10 and 16, 1060 respectively. The AU-header also contains the RAP-flag and the 1061 Stream-state of 4 bits. This results in an AU-header with a 1062 total size of two or four octets per BIFS AU. The RTP time stamp 1063 uses a 1 kHz clock. Note that the media type name is video, 1064 because the BIFS stream is part of an audio-visual presentation. For 1065 conventions on media type names see section 4.1. 1067 In detail: 1068 m=video 49230 RTP/AVP 96 1069 a=rtpmap:96 mpeg4-generic/1000 1070 a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic; 1071 ObjectType=2; config=BIFSConfiguration(); SizeLength=10; 1072 CTSDeltaLength=16; RandomAccessIndication=1; 1073 StreamStateIndication=4 1074 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1075 a single line in the SDP file. 1076 BIFSConfiguration() is the hexadecimal string as defined in ISO/IEC 1077 14496-1; for the description of MIME parameters see section 4.1. 1079 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1081 3.3.3 Constant bit-rate CELP 1083 This mode is signaled by mode=CELP-cbr. In this mode one or more 1084 complete CELP frames of fixed size can be transported in one RTP 1085 packet; there is no support for interleaving. The RTP payload 1086 consists of one or more concatenated CELP frames, each of the same 1087 size. CELP frames MUST not be fragmented when using this mode. Both 1088 the AU Header Section and the Auxiliary Section MUST be empty. 1090 The MIME format parameter ConstantSize MUST be provided to specify 1091 the length of each CELP frame. 1093 For example: 1095 m=audio 49230 RTP/AVP 96 1096 a=rtpmap:96 mpeg4-generic/44100/2 1097 a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config= 1098 AudioSpecificConfig(); ConstantSize=xxx; 1100 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1101 a single line in the SDP file. 1103 AudioSpecificConfig() is the hexadecimal string as defined in 1104 ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio 1105 stream type is CELP. For the description of MIME parameters see 1106 section 4.1. 1108 3.3.4 Variable bit-rate CELP 1110 This mode is signaled by mode=CELP-vbr. With this mode one or more 1111 complete CELP frames of variable size can be transported in one RTP 1112 packet with optional interleaving. As CELP frames are very small, 1113 while the largest possible AU-size in this mode is greater than the 1114 maximum CELP frame size, there is no support for fragmentation of 1115 CELP frames. Hence CELP frames MUST not be fragmented when using 1116 this mode. 1118 In this mode the RTP payload consists of the AU Header Section, 1119 followed by one or more concatenated CELP frames. The Auxiliary 1120 Section MUST be empty. For each CELP frame contained in the payload 1121 there MUST be a one octet AU-header in the AU Header Section to 1122 provide: 1123 (a) the size of each CELP frame in the payload and 1124 (b) index information for computing the sequence (and hence timing) 1125 of each CELP frame. 1127 Transport of CELP frames requires that the AU-size field is coded 1128 with 6 bits. In this mode therefore 6 bits are allocated to the 1129 AU-size field, and 2 bits to the AU-Index(-delta) field. Each 1130 AU-Index field MUST be coded with the value 0. In the AU Header 1132 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1134 Section, the concatenated AU-headers are preceded by the 16-bit 1135 AU-headers-length field, as specified in section 3.2.1. 1137 In addition to the required MIME format parameters, the following 1138 parameters MUST be present: SizeLength, IndexLength, and 1139 IndexDeltaLength. CELP frames have fixed time duration per Access 1140 Unit; when interleaving in this mode, the applicable duration MUST 1141 be signaled by the MIME format parameter constantDuration. In 1142 addition, the parameter maxDisplacement MUST be present when 1143 interleaving. 1145 For example: 1147 m=audio 49230 RTP/AVP 96 1148 a=rtpmap:96 mpeg4-generic/8000/1 1149 a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config= 1150 AudioSpecificConfig(); SizeLength=6; IndexLength=2; 1151 IndexDeltaLength=2; constantDuration=xxx; maxDisplacement=yyy 1153 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1154 a single line in the SDP file. 1156 AudioSpecificConfig() is the hexadecimal string as defined in 1157 ISO/IEC 14496-3, AudioSpecificConfig() specifies that the audio 1158 stream type is CELP. For the description of MIME parameters see 1159 section 4.1. 1161 3.3.5 Low bit-rate AAC 1163 This mode is signaled by mode=AAC-lbr. This mode supports transport 1164 of one or more complete AAC frames of variable size. In this mode 1165 the AAC frames are allowed to be interleaved and hence receivers 1166 MUST support de-interleaving. The maximum size of an AAC frame in 1167 this mode is 63 octets. AAC frames MUST not be fragmented when 1168 using this mode. 1170 The payload configuration in this mode is the same as in the 1171 variable bit-rate CELP mode as defined in 3.3.4. The RTP payload 1172 consists of the AU Header Section, followed by concatenated AAC 1173 frames. The Auxiliary Section MUST be empty. For each AAC frame 1174 contained in the payload the one octet AU-header MUST provide: 1175 (a) the size of each AAC frame in the payload and 1176 (b) index information for computing the sequence (and hence timing) 1177 of each AAC frame. 1178 In the AU-header, the AU-size MUST be coded with 6 bits and the 1179 AU-Index(-delta) with 2 bits; the AU-Index field MUST have the 1180 value 0 in each AU-header. 1182 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1184 In the AU-header Section, the concatenated AU-headers MUST be 1185 preceded by the 16-bit AU-headers-length field, as specified in 1186 section 3.2.1. 1188 In addition to the required MIME format parameters, the following 1189 parameters MUST be present: SizeLength, IndexLength, and 1190 IndexDeltaLength. AAC frames have fixed time duration per Access 1191 Unit; when interleaving in this mode, the applicable duration MUST 1192 be signaled by the MIME format parameter constantDuration. In 1193 addition, the parameter maxDisplacement MUST be present when 1194 interleaving. 1196 For example: 1198 m=audio 49230 RTP/AVP 96 1199 a=rtpmap:96 mpeg4-generic/44100/2 1200 a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config= 1201 AudioSpecificConfig(); SizeLength=6; IndexLength=2; 1202 IndexDeltaLength=2; constantDuration=xxx; maxDisplacement=yyy 1204 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1205 a single line in the SDP file. 1207 AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC 1208 14496-3. AudioSpecificConfig() specifies that the audio 1209 stream type is AAC. For the description of MIME parameters see 1210 section 4.1. 1212 3.3.6 High bit-rate AAC 1214 This mode is signaled by mode=AAC-hbr. This mode supports transport 1215 of variable size AAC frames. In one RTP packet either one or more 1216 complete AAC frames are carried, or a single fragment of an AAC 1217 frame. In this mode the AAC frames are allowed to be interleaved 1218 and hence receivers MUST support de-interleaving. The maximum size 1219 of an AAC frame in this mode is 8191 octets. 1221 In this mode the RTP payload consists of the AU Header Section, 1222 followed by either one AAC frame, several concatenated AAC frames 1223 or one fragmented AAC frame. The Auxiliary Section MUST be empty. 1224 For each AAC frame contained in the payload there MUST be an 1225 AU-header in the AU Header Section to provide: 1226 (a) the size of each AAC frame in the payload and 1227 (b) index information for computing the sequence (and hence timing) 1228 of each AAC frame. 1230 To code the maximum size of an AAC frame requires 13 bits. Therefore 1231 in this configuration 13 bits are allocated to the AU-size, and 1232 3 bits to the AU-Index(-delta) field. Thus each AU-header has a size 1233 of 2 octets. Each AU-Index field MUST be coded with the value 0. In 1234 the AU Header Section, the concatenated AU-headers MUST be preceded 1235 by the 16-bit AU-headers-length field, as specified in 1236 section 3.2.1. 1238 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1240 In addition to the required MIME format parameters, the following 1241 parameters MUST be present: SizeLength, IndexLength, and 1242 IndexDeltaLength. AAC frames have fixed time duration per Access 1243 Unit; when interleaving in this mode, the applicable duration MUST 1244 be signaled by the MIME format parameter constantDuration. In 1245 addition, the parameter maxDisplacement MUST be present when 1246 interleaving. 1248 For example: 1250 m=audio 49230 RTP/AVP 96 1251 a=rtpmap:96 mpeg4-generic/44100/2 1252 a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr; 1253 config=AudioSpecificConfig(); SizeLength=13; IndexLength=3; 1254 IndexDeltaLength=3; constantDuration=xxx; maxDisplacement=yyy 1256 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1257 a single line in the SDP file. 1259 AudioSpecificConfig() is the hexadecimal string as defined in 1260 ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio 1261 stream type is AAC. For the description of MIME parameters see 1262 section 4.1. 1264 3.3.7 Additional modes 1266 This specification only defines the modes specified in sections 1267 3.3.2 up to 3.3.6. Additional modes are expected to be defined in 1268 future RFCs. Each additional mode MUST be in full compliance with 1269 this specification. 1271 Any new mode MUST be defined such that an implementation including 1272 all the features of this specification can decode the payload format 1273 corresponding to this new mode. For this reason a mode MUST NOT 1274 specify new default values for MIME parameters. In particular, MIME 1275 parameters that configure the RTP payload MUST be present (unless 1276 they have the default value), even if its presence is redundant in 1277 case the mode assigns a fixed value to a parameter. A mode may 1278 define additionally that some MIME parameters are required instead 1279 of optional, that some MIME parameters have fixed values (or 1280 ranges), and that there are rules restricting the usage. 1282 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1284 4. IANA considerations 1286 This section describes the MIME types and names associated with 1287 this payload format. Section 4.1 registers the MIME types, as per 1288 RFC 2048 [3]. 1290 This format may require additional information about the mapping to 1291 be made available to the receiver. This is done using parameters 1292 also described in the next section. 1294 4.1 MIME type registration 1296 MIME media type name: "video" or "audio" or "application" 1298 "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2) 1299 or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information 1300 needed for an audio/visual presentation. 1302 "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3) 1303 or MPEG-4 Systems streams that convey information needed for an 1304 audio only presentation. 1306 "application" MUST be used for MPEG-4 Systems streams (ISO/IEC 1307 14496-1) that serve purposes other than audio/visual presentation, 1308 e.g. in some cases when MPEG-J (Java) streams are transmitted. 1310 Depending on the required payload configuration, MIME format 1311 parameters need to be available to the receiver. This is done using 1312 the parameters described in the next section. There are required 1313 and optional parameters. 1315 Optional parameters are of two types: general parameters and 1316 configuration parameters. The configuration parameters are used to 1317 configure the fields in the AU Header section and in the auxiliary 1318 section. The absence of any configuration parameter is equivalent to 1319 the associated field set to its default value, which is always zero. 1320 The absence of all configuration parameters resolves into a default 1321 "basic" configuration with an empty AU-header section and an empty 1322 auxiliary section in each RTP packet. 1324 MIME subtype name: mpeg4-generic 1326 Required parameters: 1328 MIME format parameters are not case dependent; however for clarity 1329 both upper and lower case are used in the names of the parameters 1330 described in this specification. 1332 StreamType: 1333 The integer value that indicates the type of MPEG-4 stream that 1334 is carried; its coding corresponds to the values of the 1335 streamType as defined in Table 9 (streamType Values) in ISO/IEC 1336 14496-1. 1338 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1340 Profile-level-id: 1341 A decimal representation of the MPEG-4 Profile Level indication. 1342 This parameter MUST be used in the capability exchange or 1343 session set-up procedure to indicate the MPEG-4 Profile and Level 1344 combination of which the relevant MPEG-4 media codec is capable 1345 of. 1346 For MPEG-4 Audio streams, this parameter is the decimal value 1347 from Table 5 (audioProfileLevelIndication Values) in ISO/IEC 1348 14496-1, indicating which MPEG-4 Audio tool subsets are 1349 required to decode the audio stream. 1350 For MPEG-4 Visual streams, this parameter is the decimal value 1351 from Table G-1 (FLC table for profile and level indication) of 1352 ISO/IEC 14496-2, indicating which MPEG-4 Visual tool subsets 1353 are required to decode the visual stream. 1354 For BIFS streams, this parameter is the decimal value that is 1355 obtained from (SPLI + 256*GPLI), where: 1356 SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with 1357 the applied sceneProfileLevelIndication; 1358 GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with 1359 the applied graphicsProfileLevelIndication. 1360 For MPEG-J streams, this parameter is the decimal value from 1361 table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1, 1362 indicating the profile and level of the MPEG-J stream. 1363 For OD streams, this parameter is the decimal value from table 3 1364 (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the 1365 profile and level of the OD stream. 1366 For IPMP streams, this parameter has either the decimal value 0, 1367 indicating an unspecified profile and level, or a value larger 1368 than zero, indicating an MPEG-4 IPMP profile and level as 1369 defined in a future MPEG-4 specification. 1370 For Clock Reference streams and Object Content Info streams, this 1371 parameter has the decimal value zero, indicating that profile 1372 and level information is conveyed through the OD framework. 1374 Config: 1375 A hexadecimal representation of an octet string that expresses 1376 the media payload configuration. Configuration data is mapped 1377 onto the hexadecimal octet string in an MSB-first basis. The 1378 first bit of the configuration data SHALL be located at the MSB 1379 of the first octet. In the last octet, if necessary to achieve 1380 octet-alignment, up to 7 zero-valued padding bits shall follow 1381 the configuration data. 1382 For MPEG-4 Audio streams, config is the audio object type 1383 specific decoder configuration data AudioSpecificConfig() as 1384 defined in ISO/IEC 14496-3. For Structured Audio, the 1385 AudioSpecificConfig() may be conveyed by other means, not 1386 defined by this specification. If the AudioSpecificConfig() 1387 is conveyed by other means for Structured Audio, then the 1388 config MUST be a quoted empty hexadecimal octet string, as 1389 follows: config="". 1390 Note that a future mode of using this RTP payload format for 1391 Structured Audio may define such other means. 1393 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1395 For MPEG-4 Visual streams, config is the MPEG-4 Visual 1396 configuration information as defined in subclause 6.2.1 Start 1397 codes of ISO/IEC 14496-2. The configuration information 1398 indicated by this parameter SHALL be the same as the 1399 configuration information in the corresponding MPEG-4 Visual 1400 stream, except for first-half-vbv-occupancy and 1401 latter-half-vbv-occupancy, if it exists, which may vary in 1402 the repeated configuration information inside an MPEG-4 1403 Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2). 1404 For BIFS streams, this is the BIFSConfig() information as defined 1405 in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in 1406 section 9.3.5.2, and for version 2 in section 9.3.5.3. The 1407 MIME format parameter ObjectType signals the version of 1408 BIFSConfig. 1409 For IPMP streams, this is either a quoted empty hexadecimal octet 1410 string, indicating the absence of any decoder configuration 1411 information (config=""), or the IPMPConfiguration() as 1412 defined in a future MPEG-4 IPMP specification. 1413 For Object Content Info (OCI) streams, this is the 1414 OCIDecoderConfiguration() information of the OCI stream, as 1415 defined in section 8.4.2.4 in ISO/IEC 14496-1. 1416 For OD streams, Clock Reference streams and MPEG-J streams, this 1417 is a quoted empty hexadecimal octet string (config=""), as 1418 no information on the decoder configuration is required. 1420 Mode: 1421 The mode in which this specification is used. The following modes 1422 can be signaled: 1423 mode=generic, 1424 mode=CELP-cbr, 1425 mode=CELP-vbr, 1426 mode=AAC-lbr and 1427 mode=AAC-hbr. 1428 Other modes are expected to be defined in future RFCs. See also 1429 section 3.3.7 and 4.2 of RFC xxxx. 1431 Optional general parameters: 1433 ObjectType: 1434 The decimal value from Table 8 in ISO/IEC 14496-1, indicating 1435 the value of the objectTypeIndication of the transported stream. 1436 For BIFS streams this parameter MUST be present to signal the 1437 version of BIFSConfiguration(). Note that ObjectTypeIndication 1438 may signal a non-MPEG-4 stream and that the RTP payload format 1439 defined in this document may not be suitable to carry a stream 1440 that is not defined by MPEG-4. ObjectType SHOULD NOT be set to 1441 a value that signals a stream that cannot be carried by this 1442 payload format. 1444 ConstantSize: 1445 The constant size in octets of each Access Unit for this stream. 1446 The ConstantSize and the SizeLength parameters MUST NOT be 1447 simultaneously present. 1449 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1451 ConstantDuration: 1452 The constant duration of each Access Unit for this stream, 1453 measured with the same units as the RTP time stamp. 1455 maxDisplacement: 1456 The decimal representation of the maximum displacement in time 1457 of an interleaved AU, as defined in section 3.2.3.3, expressed 1458 in units of the RTP time stamp clock. 1459 This parameter MUST be present when interleaving is applied. 1461 de-interleaveBufferSize: 1462 The decimal representation in number of octets of the size of 1463 the de-interleave buffer, described in section 3.2.3.3. 1464 When interleaving, this parameter MUST be present if the 1465 calculation of the de-interleave buffer size given in 3.2.3.3 1466 and based on maxDisplacement and rate(max) under-estimates the 1467 size of the de-interleave buffer. If this calculation does not 1468 under-estimate the size of the de-interleave buffer, then the 1469 de-interleaveBufferSize parameter SHOULD NOT be present. 1471 Optional configuration parameters: 1473 SizeLength: 1474 The number of bits on which the AU-size field is encoded in the 1475 AU-header. The SizeLength and the ConstantSize parameters MUST 1476 NOT be simultaneously present. 1478 IndexLength: 1479 The number of bits on which the AU-Index is encoded in the first 1480 AU-header. The default value of zero indicates the absence of 1481 the AU-Index field in each first AU-header. 1483 IndexDeltaLength: 1484 The number of bits on which the AU-Index-delta field is encoded 1485 in any non-first AU-header. The default value of zero indicates 1486 the absence of the AU-Index-delta field in each non-first 1487 AU-header. 1489 CTSDeltaLength: 1490 The number of bits on which the CTS-delta field is encoded in 1491 the AU-header. 1493 DTSDeltaLength: 1494 The number of bits on which the DTS-delta field is encoded in 1495 the AU-header. 1497 RandomAccessIndication: 1498 A decimal value of zero or one, indicating whether the RAP-flag 1499 is present in the AU-header. The decimal value of one indicates 1500 presence of the RAP-flag, the default value zero its absence. 1502 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1504 StreamStateIndication: 1505 The number of bits on which the Stream-state field is encoded in 1506 the AU-header. This parameter MAY be present when transporting 1507 MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio 1508 and MPEG-4 video streams. 1510 AuxiliaryDataSizeLength: 1511 The number of bits that is used to encode the auxiliary-data-size 1512 field. 1514 Applications MAY use more parameters, in addition to those defined 1515 above. Each additional parameter MUST be registered with IANA, to 1516 ensure that there is no clash of names. Each additional parameter 1517 MUST be accompanied by a specification in the form of an RFC, MPEG 1518 standard, or other permanent and readily available reference (the 1519 "Specification Required" policy defined in RFC 2434 [6]). Receivers 1520 MUST tolerate the presence of such additional parameters, but these 1521 parameters SHALL NOT impact the decoding of receivers that comply to 1522 this specification. 1524 Encoding considerations: 1525 This MIME subtype is defined for RTP transport only. System 1526 bitstreams MUST be generated according to MPEG-4 Systems 1527 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 1528 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 1529 bitstreams MUST be generated according to MPEG-4 Audio 1530 specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized 1531 according to the RTP payload format defined in RFC xxxx. 1533 Security considerations: 1534 As defined in section 5 of RFC xxxx. 1536 Interoperability considerations: 1537 MPEG-4 provides a large and rich set of tools for the coding of 1538 visual objects. For effective implementation of the standard, 1539 subsets of the MPEG-4 tool sets have been provided for use in 1540 specific applications. These subsets, called 'Profiles', limit the 1541 size of the tool set a decoder is required to implement. In order to 1542 restrict computational complexity, one or more 'Levels' are set for 1543 each Profile. A Profile@Level combination allows: 1544 . a codec builder to implement only the subset of the standard he 1545 needs, while maintaining interworking with other MPEG-4 devices 1546 that implement the same combination, and 1547 . checking whether MPEG-4 devices comply with the standard 1548 ('conformance testing'). 1550 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1551 by the parameter "profile-level-id". Interoperability between a 1552 sender and a receiver is achieved by specifying the parameter 1553 "profile-level-id" in MIME content. In the capability exchange / 1554 announcement procedure this parameter may mutually be set to the 1555 same value. 1557 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1559 Published specification: 1560 The specifications for MPEG-4 streams are presented in ISO/IEC 1561 14496-1, 14496-2, and 14496-3. The RTP payload format is described 1562 in RFC xxxx. 1564 Applications which use this media type: 1565 Multimedia streaming and conferencing tools. 1567 Additional information: none 1569 Magic number(s): none 1571 File extension(s): 1572 None. A file format with the extension .mp4 has been defined for 1573 MPEG-4 content but is not directly correlated with this MIME type 1574 for which the sole purpose is RTP transport. 1576 Macintosh File Type Code(s): none 1578 Person & email address to contact for further information: 1579 Authors of RFC xxxx, IETF Audio/Video Transport working group. 1581 Intended usage: COMMON 1583 Author/Change controller: 1584 Authors of RFC xxxx, IETF Audio/Video Transport working group. 1586 4.2 Registration of mode definitions with IANA 1588 This specification can be used in a number of modes. The mode of 1589 operation is signaled using the "Mode" MIME parameter, with the 1590 initial set of values specified in section 4.1. New modes may be 1591 defined at any time, as described in section 3.3.7. These modes 1592 MUST be registered with IANA, to ensure that there is no clash 1593 of names. 1595 A new mode registration MUST be accompanied by a specification in 1596 the form of an RFC, MPEG standard, or other permanent and readily 1597 available reference (the "Specification Required" policy defined 1598 in RFC 2434 [6]). 1600 4.3 Concatenation of parameters 1602 Multiple parameters SHOULD be expressed as a MIME media type string, 1603 in the form of a semicolon-separated list of parameter=value pairs 1604 (for parameter usage examples see sections 3.3.2 up to 3.3.6). 1606 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1608 4.4 Usage of SDP 1610 4.4.1 The a=fmtp keyword 1612 It is assumed that one typical way to transport the above-described 1613 parameters associated with this payload format is via a SDP message 1614 [5] for example transported to the client in reply to a RTSP 1615 DESCRIBE [8] or via SAP [11]. In that case the (a=fmtp) keyword 1616 MUST be used as described in RFC 2327 [5], section 6, the syntax 1617 being then: 1619 a=fmtp: =[; =] 1621 5. Security Considerations 1623 RTP packets using the payload format defined in this specification 1624 are subject to the security considerations discussed in the RTP 1625 specification [2]. This implies that confidentiality of the media 1626 streams is achieved by encryption. Because the data compression used 1627 with this payload format is applied end-to-end, encryption may be 1628 performed on the compressed data so there is no conflict between the 1629 two operations. The packet processing complexity of this payload 1630 type (i.e. excluding media data processing) does not exhibit any 1631 significant non-uniformity in the receiver side to cause a denial- 1632 of-service threat. 1634 However, it is possible to inject non-compliant MPEG streams (Audio, 1635 Video, and Systems) to overload the receiver/decoder's buffers, 1636 which might compromise the functionality of the receiver or even 1637 crash it. This is especially true for end-to-end systems like MPEG 1638 where the buffer models are precisely defined. 1640 MPEG-4 Systems supports stream types including commands that are 1641 executed on the terminal like OD commands, BIFS commands, etc. and 1642 programmatic content like MPEG-J (Java(TM) Byte Code) and 1643 ECMAScript. It is possible to use one or more of the above in a 1644 manner non-compliant to MPEG to crash the receiver or make it 1645 temporarily unavailable. Senders that transport MPEG-4 content 1646 SHOULD ensure that such content is MPEG compliant, as defined in the 1647 compliance part of IEC/ISO 14496 [1]. Receivers that support MPEG-4 1648 content should prevent malfunctioning of the receiver in case of 1649 non MPEG compliant content. 1651 Authentication mechanisms can be used to validate the sender and 1652 the data to prevent security problems due to non-compliant malignant 1653 MPEG-4 streams. 1655 In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems 1656 streams carrying MPEG-J access units which comprise Java(TM) classes 1657 and objects. MPEG-J defines a set of Java APIs and a secure 1658 execution model. MPEG-J content can call this set of APIs and 1660 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1662 Java(TM) methods from a set of Java packages supported in the 1663 receiver within the defined security model. According to this 1664 security model, downloaded byte code is forbidden to load libraries, 1665 define native methods, start programs, read or write files, or read 1666 system properties. 1667 Receivers can implement intelligent filters to validate the buffer 1668 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1669 ECMAScript) commands in the streams. However, this can increase the 1670 complexity significantly. 1672 6. Acknowledgements 1674 This document evolved through several revisions thanks to 1675 contributions by people from the ISMA forum, from the IETF AVT 1676 Working Group and from the 4-on-IP ad-hoc group within MPEG. The 1677 authors wish to thank all involved people, and in particular Andrea 1678 Basso, Stephen Casner, M. Reha Civanlar, Carsten Herpel, John 1679 Lazaro, Zvi Lifshitz, Young-kwon Lim, Alex MacAulay, Bill May, 1680 Colin Perkins, Dorairaj V and Stephan Wenger for their valuable 1681 comments and support. 1683 7. References 1685 7.1 Normative references 1687 [1] ISO/IEC International Standard 14496 (MPEG-4); "Information 1688 technology - Coding of audio-visual objects", January 2000 1690 [2] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson RTP, "A 1691 Transport Protocol for Real Time Applications", RFC 1889, Internet 1692 Engineering Task Force, January 1996. 1694 [3] N. Freed, J. Klensin, J. Postel, " Multipurpose Internet Mail 1695 Extensions (MIME) Part Four: Registration Procedures", RFC 2048, 1696 Internet Engineering Task Force, November 1996. 1698 [4] S. Bradner, "Key words for use in RFCs to Indicate Requirement 1699 Levels", RFC 2119, March 1997. 1701 [5] M. Handley, V. Jacobson, "SDP: Session Description Protocol", 1702 RFC 2327, Internet Engineering Task Force, April 1998. 1704 [6] T. Narten, H. Alvestrand, " Guidelines for Writing an IANA 1705 Considerations Section in RFCs", RFC 2434, October 1998. 1707 7.2 Informative references 1709 [7] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload 1710 format for MPEG1/MPEG2 Video", RFC 2250, January 1998. 1712 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1714 [8] H. Schulzrinne, A. Rao, R. Lanphier, "RTSP: Real-Time Session 1715 Protocol", RFC 2326, Internet Engineering Task Force, April 1998. 1717 [9] C. Perkins, O. Hudson, "Options for Repair of Streaming Media" 1718 RFC 2354, Internet Engineering Task Force, June 1998. 1720 [10] H. Schulzrinne, J. Rosenberg, "An RTP Payload Format for 1721 Generic Forward Error Correction", RFC 2733, Internet Engineering 1722 Task Force, December 1999. 1724 [11] M. Handley, C. Perkins, E. Whelan, "SAP: Session Announcement 1725 Protocol", RFC 2974, Internet Engineering Task Force, October 2000. 1727 [12] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP 1728 payload format for MPEG-4 Audio/Visual streams", RFC 3016, Internet 1729 Engineering Task Force, November 2000. 1731 8. Author Addresses 1733 Jan van der Meer 1734 Philips Digital Networks 1735 Cederlaan 4 1736 5600 JB Eindhoven 1737 Netherlands 1738 Email : jan.vandermeer@philips.com 1740 David Mackie 1741 Apple Computer, Inc. 1742 One Infinite Loop, MS:302-2LF 1743 Cupertino CA 95014 1744 Email: dmackie@apple.com 1746 Viswanathan Swaminathan 1747 Sun Microsystems Inc. 1748 901 San Antonio Road, M/S UMPK15-214 1749 Palo Alto, CA 94303 1750 Email: viswanathan.swaminathan@sun.com 1752 David Singer 1753 Apple Computer, Inc. 1754 One Infinite Loop, MS:302-3MT 1755 Cupertino CA 95014 1756 Email: singer@apple.com 1758 Philippe Gentric 1759 Philips Digital Networks, MP4Net 1760 51 rue Carnot 1761 92156 Suresnes 1762 France 1763 e-mail: philippe.gentric@philips.com 1765 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1767 Full Copyright Statement 1769 Copyright (C) The Internet Society (December 2002). All Rights 1770 Reserved. 1772 This document and translations of it may be copied and furnished to 1773 others, and derivative works that comment on or otherwise explain 1774 it or assist in its implementation may be prepared, copied, 1775 published and distributed, in whole or in part, without restriction 1776 of any kind, provided that the above copyright notice and this 1777 paragraph are included on all such copies and derivative works. 1778 However, this document itself may not be modified in any way, such 1779 as by removing the copyright notice or references to the Internet 1780 Society or other Internet organizations, except as needed for the 1781 purpose of developing Internet standards in which case the 1782 procedures for copyrights defined in the Internet Standards process 1783 MUST be followed, or as required to translate it into languages 1784 other than English. 1786 The limited permissions granted above are perpetual and will 1787 not be revoked by the Internet Society or its successors or 1788 assigns. 1790 This document and the information contained herein is provided on 1791 an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 1792 ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR 1793 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1794 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1795 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1797 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1799 APPENDIX: Usage of this payload format 1801 Appendix A. Interleave analysis 1803 A.1 Introduction 1805 In this appendix interleaving issues are discussed. Some general 1806 notes are provided on de-interleaving and error concealment, while 1807 a number of interleaving patterns are examined, in particular 1808 for determining the maximum displacement in time and the size of 1809 the de-interleave buffer. In these examples, the maximum 1810 displacement is cited in terms of an access unit count, for ease of 1811 reading. In actual streams, it is signalled in units of the RTP 1812 time stamp clock. 1814 A.2 De-interleaving and error concealment 1816 This appendix does not describe any details on de-interleaving and 1817 error concealment, as the control of the AU decoding and error 1818 concealment process has little to do with interleaving. If the 1819 next AU to be decoded is present and there is sufficient storage 1820 available for the decoded AU, then decode it now. If not, wait. 1821 When the decoding deadline is reached (i.e., the time when decoding 1822 must begin in order to be completed by the time the AU is to be 1823 presented), or if the decoder is some hardware that presents a 1824 constant delay between initiation of decoding of an AU and 1825 presentation of that AU, then decoding must begin at that deadline 1826 time. 1828 If the next AU to be decoded is not present when the decoding 1829 deadline is reached, then that AU is lost so the receiver must take 1830 whatever error concealment measures is deemed appropriate. The 1831 play-out delay may need to be adjusted at that point (especially if 1832 other AUs have also missed their deadline recently). Or, if it was 1833 a momentary delay, and maintaining the latency is important, then 1834 the receiver should minimize the glitch and continue processing 1835 with the next AU. 1837 A.3 Simple Group interleave 1839 A.3.1 Introduction 1841 An example of regular interleave is when packets are formed into 1842 groups. If the 'stride' of the interleave (the distance between 1843 interleaved AUs) is N, packet 0 could contain AU(0), AU(N), AU(2N), 1844 and so on; packet 1 could contain AU(1), AU(1+N), AU(1+2N), and so 1845 on. If there are M access units in a packet, then there are M*N 1846 access units in the group. 1848 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1850 An example with N=M=3 follows; note that this is the same example 1851 as given in section 2.5 and that a fixed time duration per Access 1852 Unit is assumed: 1854 Packet Time stamp Carried AUs AU-Index, AU-Index-delta 1855 P(0) T[0] 0, 3, 6 0, 2, 2 1856 P(1) T[1] 1, 4, 7 0, 2, 2 1857 P(2) T[2] 2, 5, 8 0, 2, 2 1858 P(3) T[9] 9,12,15 0, 2, 2 1860 In this example the AU-Index is present in the first AU-header and 1861 coded with the value 0, as required for fixed duration AUs. The 1862 position of the first AU of each packet within the group is defined 1863 by the RTP time stamp, while the AU-Index-delta field indicates the 1864 position of subsequent AUs relative to the first AU in the packet. 1865 All AU-Index-delta fields are coded with the value N-1, equal to 2 1866 in this example. Hence the RTP time stamp and the AU-Index-delta are 1867 used to reconstruct the original order. See also section 3.2.3.2. 1869 A.3.2 Determining the de-interleave buffer size 1871 For the regular pattern as in this example, figure 6 in section 1872 3.2.3.3 shows that the de-interleave buffer stores at most 4 AUs. A 1873 de-interleaveBufferSize value may be signaled that is at least 1874 equal to the total number of octets of any 4 "early" AUs that are 1875 stored at the same time. 1877 A.3.3 Determining the maximum displacement 1879 For the regular pattern as in this example, figure 7 in section 3.3 1880 shows that the maximum displacement in time equals 5 AU periods. 1881 Hence the minimum maxDisplacement value that must be signaled is 5 1882 AU periods. In case each AU has the same size, this maxDisplacement 1883 value over-estimates the de-interleave buffer size with one AU. 1884 However, note that in case of variable AU sizes the total size of 1885 any 4 "early" AUs that must be stored at the same time may exceed 1886 maxDisplacement times the maximum bitrate, in which case the 1887 de-interleaveBufferSize must be signaled. 1889 A.4 More subtle group interleave 1891 A.4.1 Introduction 1893 Another example of forming packets with group interleave is given 1894 below. In this example the packets are formed such that the loss of 1895 two subsequent RTP packets does not cause the loss of two subsequent 1896 AUs. Note that in this example the RTP time stamps of packet 3 and 1897 packet 4 are earlier than the RTP time stamps of packets 1 and 2, 1898 respectively; a fixed time duration per Access Unit is assumed. 1900 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1902 Packet Time stamp Carried AUs AU-Index, AU-Index-delta 1903 0 T[0] 0, 5 0, 4 1904 1 T[2] 2, 7 0, 4 1905 2 T[4] 4, 9 0, 4 1906 3 T[1] 1, 6 0, 4 1907 4 T[3] 3, 8 0, 4 1908 5 T[10] 10, 15 0, 4 1909 and so on .. 1911 In this example the AU-Index is present in the first AU-header and 1912 coded with the value 0, as required for AUs with a fixed duration. 1913 To reconstruct the original order, the RTP time stamp and the 1914 AU-Index-delta (coded with the value 4) are used. See also 1915 section 3.2.3.2. 1917 A.4.2 Determining the de-interleave buffer size 1919 From figure 8 it can be to determined that at most 5 "early" AUs 1920 are to be stored. If the AUs are of constant size, then this value 1921 equals 5 times the AU size. The minimum size of the de-interleave 1922 buffer equals the maximum total number of octets of the "early" AUs 1923 that are to be stored at the same time. This gives the minimum 1924 value of the de-interleaveBufferSize that may be signaled. 1926 +--+--+--+--+--+--+--+--+--+--+ 1927 Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| 1928 +--+--+--+--+--+--+--+--+--+--+ 1929 - - 5 - 5 - 2 7 4 9 1930 7 4 9 5 1931 "Early" AUs 5 6 1932 7 7 1933 9 9 1935 Figure 8: Storage of "early" AUs in the de-interleave buffer per 1936 interleaved AU. 1938 A.4.2 Determining the maximum displacement 1940 From figure 9 it can be seen that the maximum displacement in time 1941 equals 8 AU periods. Hence the minimum maxDisplacement value to be 1942 signaled is 8 AU periods. 1944 +--+--+--+--+--+--+--+--+--+--+ 1945 Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8| 1946 +--+--+--+--+--+--+--+--+--+--+ 1948 Earliest not yet present AU - 1 1 1 1 1 - 3 - - 1950 Figure 9: The earliest not yet present AU for each AU in the 1951 interleaving pattern. 1953 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 1955 In case each AU has the same size, the found maxDisplacement value 1956 over-estimates the de-interleave buffer size with three AUs. 1957 However, in case of variable AU sizes the total size of any 5 1958 "early" AUs stored at the same time may exceed maxDisplacement 1959 times the maximum bitrate, in which case de-interleaveBufferSize 1960 must be signaled. 1962 A.5 Continuous interleave 1964 A.5.1 Introduction 1966 In continuous interleave, once the scheme is 'primed', the number 1967 of AUs in a packet exceeds the 'stride' (the distance between 1968 them). This shortens the buffering needed, smooths the data-flow, 1969 and gives slightly larger packets -- and thus lower overhead -- for 1970 the same interleave. For example, here is a continuous interleave 1971 also over a stride of 3 AUs, but with 4 AUs per packet, for a run 1972 of 20 AUs. This shows both how the scheme 'starts up' and how it 1973 finishes. Once again, the example assumes fixed time duration per 1974 Access Unit. 1976 Packet Time-stamp Carried AUs AU-Index, AU-Index-delta 1977 0 T[0] 0 0 1978 1 T[1] 1 4 0 2 1979 2 T[2] 2 5 8 0 2 2 1980 3 T[3] 3 6 9 12 0 2 2 2 1981 4 T[7] 7 10 13 16 0 2 2 2 1982 5 T[11] 11 14 17 20 0 2 2 2 1983 6 T[15] 15 18 0 2 1984 7 T[19] 19 0 1986 In this example the AU-Index is present in the first AU-header and 1987 coded with the value 0, as required for AUs with a fixed duration. 1988 To reconstruct the original order, the RTP time stamp and the 1989 AU-Index-delta (coded with the value 2) are used. See also 3.2.3.2. 1990 Note that this example has RTP time-stamps in increasing order. 1992 A.5.2 Determining the de-interleave buffer size 1994 For this example the de-interleave buffer size can be derived from 1995 figure 10. The maximum number of "early" AUs is three. If the AUs 1996 are of constant size, then this value equals 3 times the AU size. 1997 Compared to the example in A.2, for constant size AUs the 1998 de-interleave buffer size is reduced from 4 to 3 times the AU size, 1999 while maintaining the same 'stride'. 2001 RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 2003 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 2004 Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16| 2005 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 2006 - - - 4 - - 4 8 - - 8 12 - - 2007 5 9 2008 "Early" AUs 8 12 2010 Figure 10: Storage of "early" AUs in the de-interleave buffer per 2011 interleaved AU. 2013 A.5.3 Determining the maximum displacement 2015 For this example the maximum displacement has a value of 5 AU 2016 periods. See figure 11. Compared to the example in A.2, the maximum 2017 displacement does not decrease, though in fact less de-interleave 2018 buffering is required. 2020 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 2021 Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16| 2022 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+- 2023 Earliest not yet 2024 present AU - - 2 - 3 3 - - 7 7 - - 11 11 2026 Figure 11: The earliest not yet present AU for each AU in the 2027 interleaving pattern.