idnits 2.17.1 draft-ietf-avt-mpeg4-simple-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 33 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 2003) is 7771 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 1579, but not defined == Missing Reference: '9' is mentioned on line 1525, but not defined == Missing Reference: '20' is mentioned on line 1564, but not defined == Missing Reference: '7' is mentioned on line 1583, but not defined == Missing Reference: '11' is mentioned on line 1584, but not defined == Missing Reference: '15' is mentioned on line 1585, but not defined == Missing Reference: '19' is mentioned on line 1586, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '5') (Obsoleted by RFC 6416) ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566) Summary: 7 errors (**), 0 flaws (~~), 9 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force J. van der Meer 2 Internet Draft Philips Electronics 3 D. Mackie 4 Cisco Systems Inc. 5 V. Swaminathan 6 Sun Microsystems Inc. 7 D. Singer 8 Apple Computer 9 P. Gentric 10 Philips Electronics 12 July 2002 13 Expires January 2003 15 Document: draft-ietf-avt-mpeg4-simple-04.txt 17 Transport of MPEG-4 Elementary Streams 19 Status of this Memo 21 This document is an Internet-Draft and is in full conformance with 22 all provisions of Section 10 of RFC2026. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. Internet-Drafts are draft documents valid for a maximum of 28 six months and may be updated, replaced, or obsoleted by other 29 documents at any time. It is inappropriate to use Internet- Drafts 30 as reference material or to cite them other than as "work in 31 progress." 33 The list of current Internet-Drafts can be accessed at 34 http://www.ietf.org/ietf/1id-abstracts.txt 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This specification is a product of the Audio/Video Transport working 39 group within the Internet Engineering Task Force. Comments are 40 solicited and should be addressed to the working group's mailing 41 list at avt@ietf.org and/or the authors. 43 << Note for the RFC editor: xxxx should be replaced with the RFC 44 number that will be assigned. >> 46 Abstract 48 The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in 49 ISO that produced the MPEG-4 standard. MPEG defines tools to 50 compress content such as audio-visual information into elementary 51 streams. This specification defines a simple, but generic RTP 52 payload format for transport of any non-multiplexed MPEG-4 53 elementary stream. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Carriage of MPEG-4 elementary streams over RTP . . . . . . . 4 59 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 4 60 2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . . 4 61 2.3. Concatenation of Access Units . . . . . . . . . . . . . . 4 62 2.4. Fragmentation of Access Units . . . . . . . . . . . . . . 5 63 2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . . 5 64 2.6. Time stamp information . . . . . . . . . . . . . . . . . . 6 65 2.7. State indication of MPEG-4 system streams . . . . . . . . 6 66 2.8. Random Access Indication . . . . . . . . . . . . . . . . . 6 67 2.9. Carriage of auxiliary information . . . . . . . . . . . . 7 68 2.10. MIME format parameters and configuring conditional field . 7 69 2.11. Global structure of payload format . . . . . . . . . . . . 7 70 2.12. Modes to transport MPEG-4 streams . . . . . . . . . . . . 8 71 2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . . 8 72 3. Payload format . . . . . . . . . . . . . . . . . . . . . . . 9 73 3.1. Usage of RTP header fields and RTCP . . . . . . . . . . . 9 74 3.2. RTP payload structure . . . . . . . . . . . . . . . . . . 10 75 3.2.1. The AU Header Section . . . . . . . . . . . . . . . . . 10 76 3.2.1.1. The AU-header . . . . . . . . . . . . . . . . . . . . 10 77 3.2.2. The Auxiliary Section . . . . . . . . . . . . . . . . . 12 78 3.2.3. The Access Unit Data Section . . . . . . . . . . . . . . 13 79 3.2.3.1. Fragmentation . . . . . . . . . . . . . . . . . . . . 14 80 3.2.3.2. Interleaving . . . . . . . . . . . . . . . . . . . . . 14 81 3.2.3.3. Constraints for interleaving . . . . . . . . . . . . . 15 82 3.2.3.4. Crucial and non-crucial AUs with MPEG-4 System data . 16 83 3.3. Usage of this specification . . . . . . . . . . . . . . . 17 84 3.3.1. General . . . . . . . . . . . . . . . . . . . . . . . . 17 85 3.3.2. The generic mode . . . . . . . . . . . . . . . . . . . . 17 86 3.3.3. Constant bit rate CELP . . . . . . . . . . . . . . . . . 18 87 3.3.4. Variable bit rate CELP . . . . . . . . . . . . . . . . . 18 88 3.3.5. Low bit rate AAC . . . . . . . . . . . . . . . . . . . . 19 89 3.3.6. High bit rate AAC . . . . . . . . . . . . . . . . . . . 20 90 3.3.7. Additional modes . . . . . . . . . . . . . . . . . . . . 21 91 4. IANA considerations . . . . . . . . . . . . . . . . . . . . 22 92 4.1. MIME type registration . . . . . . . . . . . . . . . . . . 22 93 4.2. Registration of mode definitions with IANA . . . . . . . . 27 94 4.3. Concatenation of parameters . . . . . . . . . . . . . . . 27 95 4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . . 28 96 4.4.1. The a=fmtp keyword . . . . . . . . . . . . . . . . . . . 28 97 5. Security considerations . . . . . . . . . . . . . . . . . . 28 98 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 29 99 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 100 8. Author addresses . . . . . . . . . . . . . . . . . . . . . . 30 101 APPENDIX: Usage of this payload format . . . . . . . . . . . 31 102 A. Examples of delay analysis with interleave . . . . . . . 31 103 A.1 Group interleave . . . . . . . . . . . . . . . . . . . . 31 104 A.2 Continuous interleave . . . . . . . . . . . . . . . . . 32 106 1. Introduction 108 The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 109 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 110 standards [1]. The MPEG-4 standard specifies compression of 111 audio-visual data into for example an audio or video elementary 112 stream. In the MPEG-4 standard, these streams take the form of 113 audiovisual objects that may be arranged into an audio-visual scene 114 by means of a scene description. Each MPEG-4 elementary stream 115 consists of a sequence of Access Units; examples of an Access Unit 116 (AU) are an audio frame and a video picture. 118 This specification defines a general and configurable payload 119 structure to transport MPEG-4 elementary streams, in particular 120 MPEG-4 audio (including speech) streams, MPEG-4 video streams and 121 also MPEG-4 systems streams, such as BIFS (BInary Format for 122 Scenes), OCI (Object Content Information), OD (Object Descriptor) 123 and IPMP (Intellectual Property Management and Protection) streams. 124 The RTP payload defined in this document is simple to implement and 125 reasonably efficient. It allows for optional interleaving of Access 126 Units (such as audio frames) to increase error resiliency in packet 127 loss. 129 Though the RTP payload format defined in this document is capable 130 of transporting any MPEG-4 stream, other, more specific, formats 131 may exist, such as RFC 3016 for transport of MPEG-4 video (part 2). 133 Configuration of the payload is provided to accommodate transport 134 of any MPEG-4 stream at any possible bit rate. However, for a 135 specific MPEG-4 elementary stream typically only very few 136 configurations are needed. So as to allow for the design of 137 simplified, but dedicated receivers, this specification requires 138 that specific modes are defined for transport of MPEG-4 streams. 139 This document defines modes for MPEG-4 CELP and AAC streams, as 140 well as a generic mode that can be used to transport any MPEG-4 141 stream. In the future new RFCs are expected to specify additional 142 modes for transport of MPEG-4 streams. 144 The RTP payload format defined in this document specifies carriage 145 of system-related information that is often equivalent to the 146 information that may be contained in the MPEG-4 SL. This 147 document does not prescribe how to transcode or map information 148 from the SL to fields defined in the RTP payload format. Such 149 processing, if any, is left to the discretion of the application. 150 However, to anticipate the need for transport of any additional 151 system-related information in future, an auxiliary field can be 152 configured that may carry any such data. 154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 156 this document are to be interpreted as described in RFC 2119 [3]. 158 2. Carriage of MPEG-4 elementary streams over RTP 160 2.1 Introduction 162 With this payload format a single MPEG-4 elementary stream can be 163 transported. Information on the type of MPEG-4 stream carried in 164 the payload is conveyed by MIME format parameters, for example in 165 an SDP [6] message or by other means (see section 4). These MIME 166 format parameters specify the configuration of the payload. To 167 allow for simplified and dedicated receivers, a MIME format 168 parameter is available to signal a specific mode of using this 169 payload. A mode definition MAY include the type of MPEG-4 170 elementary stream as well as the applied configuration, so as to 171 avoid the need in receivers to parse all MIME format parameters. 172 The applied mode MUST be signaled. 174 2.2 MPEG Access Units 176 For carriage of compressed audio-visual data MPEG defines Access 177 Units. An MPEG Access Unit (AU) is the smallest data entity to 178 which timing information is attributed. In case of audio an Access 179 Unit may represent an audio frame and in case of video a picture. 180 MPEG Access Units are by definition octet aligned. If for example 181 an audio frame is not octet aligned, up to 7 zero-padding bits MUST 182 be inserted at the end of the frame to achieve the octet-aligned 183 Access Units, as required by the MPEG-4 specification. MPEG-4 184 decoders MUST be able to decode AUs in which such padding is 185 applied. 187 Consistent with the MPEG-4 specification, this document requires 188 that each MPEG-4 part 2 video Access Unit includes all the coded 189 data of a picture, any video stream headers that may precede the 190 coded picture data, and any video stream stuffing that may follow 191 it, up to, but not including the startcode indicating the start of 192 a new video stream or the next Access Unit. 194 2.3 Concatenation of Access Units 196 Frequently it is possible to carry multiple Access Units in one RTP 197 packet. This is particularly useful for audio; for example, when 198 AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC 199 frames contain on average approximately 200 octets. On a LAN with a 200 1500 octet MTU this would allow on average 7 complete AAC frames to 201 be carried per AAC packet. 203 Access Units may have a fixed size in octets, but a variable size 204 is also possible. To facilitate parsing in case of multiple 205 concatenated AUs in one RTP packet, the size of each AU is made 206 known to the receiver. When concatenating in case of a constant AU 207 size, this size is communicated "out of band" through a MIME format 208 parameter. When concatenating in case of variable size AUs, the RTP 209 payload carries "in band" an AU size field for each contained AU. 210 In combination with the RTP payload length the size information 211 allows the RTP payload to be split by the receiver back into the 212 individual AUs. 214 To simplify the implementation of RTP receivers, it is required 215 that when multiple AUs are carried in an RTP packet, each AU MUST 216 be complete, i.e. the number of AUs in an RTP packet MUST be 217 integral. 219 2.4 Fragmentation of Access Units 221 MPEG allows for very large Access Units. Since most IP networks 222 have significantly smaller MTU sizes, this payload format allows 223 for the fragmentation of an Access Unit over multiple RTP packets 224 so as to avoid IP layer fragmentation. To simplify the 225 implementation of RTP receivers, an RTP packet SHALL either carry 226 one or more complete Access Units or a single fragment of one 227 Access Unit (i.e. packets MUST NOT contain fragments of multiple 228 Access Units). 230 2.5 Interleaving 232 When an RTP packet carries a contiguous sequence of Access Units, 233 the loss of such a packet can result in a "decoding gap" for the 234 user. One method to alleviate this problem is to allow for the 235 Access Units to be interleaved in the RTP packets. For a modest 236 cost in latency and implementation complexity, significant error 237 resiliency to packet loss can be achieved. 239 To support optional interleaving of Access Units, this payload 240 format allows for index information to be sent for each Access Unit. 241 The RTP sender is free to choose the interleaving pattern without 242 propagating this information to the receiver(s). Indeed the sender 243 could dynamically adjust the interleaving pattern based on the 244 Access Unit size, error rates, etc. The RTP receiver does not need 245 to know the interleaving pattern used, it only needs to extract the 246 index information of the Access Unit and insert the Access Unit 247 into the appropriate sequence in the rendering queue. An example of 248 interleaving is given below. 250 Assume that an RTP packet contains 3 AUs, and that the AUs are 251 numbered 1, 2, 3, 4, etc. If an interleaving group length of 9 is 252 chosen, then RTP packet(i) contains the following AU(n): 254 RTP packet(1): AU(1), AU(4), AU(7) 255 RTP packet(2): AU(2), AU(5), AU(8) 256 RTP packet(3): AU(3), AU(6), AU(9) 257 RTP packet(4): AU(10), AU(13), AU(16) 258 RTP packet(5): AU(11), AU(14), AU(17) 259 Etc. 261 2.6 Time stamp information 263 The RTP time stamp MUST carry the sampling instance of the first AU 264 (fragment) in the RTP packet. When multiple AUs are carried within 265 an RTP packet, the time stamps of subsequent AUs can be calculated 266 if the frame period of each AU is known. For audio and video this 267 is possible if the frame rate is constant. However, in some cases 268 it is not possible to make such calculation, for example for 269 variable frame rate video and for MPEG-4 BIFS streams carrying 270 composition information. To support such cases, this payload format 271 can be configured to carry a time stamp in the RTP payload for each 272 contained Access Unit. A time stamp MAY be conveyed in the RTP 273 payload only for non-first AUs in the RTP packet, and SHALL NOT be 274 conveyed for the first AU (fragment), as the time stamp for the 275 first AU in the RTP packet is carried by the RTP time stamp. 277 MPEG-4 defines two type of time stamps, the composition time stamp 278 (CTS) and the decoding time stamp (DTS). The CTS represents the 279 sampling instance of an AU, and hence the CTS is equivalent to the 280 RTP time stamp. The DTS may be used only in MPEG-4 video streams 281 that use bi-directional coding, i.e. when pictures are predicted in 282 both forward and backward direction by using either a reference 283 picture in the past, or a reference picture in the future. The DTS 284 cannot be carried in the RTP header. In some cases the DTS can be 285 derived from the RTP time stamp using frame rate information; this 286 requires deep parsing in the video stream, which may be considered 287 objectionable. But if the video frame rate is variable, the required 288 information may not even be present in the video stream. For both 289 reasons, the capability has been defined to optionally carry the 290 DTS in the RTP payload for each contained Access Unit. 292 Since RTP time stamps may be re-stamped by RTP devices, each time 293 stamp contained in the RTP payload is coded differentially, the CTS 294 from the RTP time stamp, and the DTS from the CTS, so as to avoid 295 extensive parsing by re-stamping devices. 297 2.7 State indication of MPEG-4 system streams 299 ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to 300 convey state information when transporting MPEG-4 system streams, 301 this payload format allows for the optional carriage in the RTP 302 payload of the stream state for each contained Access Unit. Stream 303 states are used to signal "crucial" AUs that carry information whose 304 loss cannot be tolerated and are also useful when repeating AUs 305 according to the carousel mechanism defined in ISO/IEC 14496-1. 307 2.8 Random access indication 309 Random access to the content of MPEG-4 elementary streams may be 310 possible at some but not all Access Units. To signal Access Units 311 where random access is possible, a random access point flag can 312 optionally be carried in the RTP payload for each contained Access 313 Unit. Carriage of random access points is particularly useful for 314 MPEG-4 system streams in combination with the stream state. 316 2.9 Carriage of auxiliary information. 318 This payload format defines a specific field to carry auxiliary 319 data. The auxiliary data field is preceded by a field that specifies 320 the length of the auxiliary data, so as to facilitate skipping of 321 the data without parsing it. The coding of the auxiliary data is not 322 defined in this document, but is left to the discretion of 323 applications. Receivers that have knowledge of the auxiliary data 324 MAY decode the auxiliary data, but receivers without knowledge of 325 such data MUST skip the auxiliary data field. 327 2.10 MIME format parameters and configuring conditional fields 329 To support the features described in the previous sections several 330 fields are defined for carriage in the RTP payload. However, their 331 use strongly depends on the type of MPEG-4 elementary stream that 332 is carried. Sometimes a specific field is needed with a certain 333 length, while in other cases such field is not needed at all. To be 334 efficient in either case, the fields to support these features are 335 configurable by means of MIME format parameters. In general, a MIME 336 format parameter defines the presence and length of the associated 337 field. A length of zero indicates absence of the field. As a 338 consequence, parsing of the payload requires knowledge of MIME 339 format parameters. The MIME format parameters are conveyed to the 340 receiver via SDP [6] messages or through other means. 342 2.11 Global structure of payload format 344 The RTP payload following the RTP header, contains three octet 345 aligned data sections, of which the first two MAY be empty. See 346 figure 1. 348 +---------+-----------+-----------+---------------+ 349 | RTP | AU Header | Auxiliary | Access Unit | 350 | Header | Section | Section | Data Section | 351 +---------+-----------+-----------+---------------+ 353 <----------RTP Packet Payload-----------> 355 Figure 1: Data sections within an RTP packet 357 The first data section is the AU (Access Unit) Header Section, that 358 contains one or more AU-headers; however, each AU-header MAY be 359 empty, in which case the entire AU Header Section is empty. The 360 second section is the Auxiliary Section, containing auxiliary data; 361 this section MAY also be configured empty. The third section is the 362 Access Unit Data Section, containing either a single fragment of 363 one Access Unit or one or more complete Access Units. The Access 364 Unit Data Section MUST NOT be empty. 366 2.12 Modes to transport MPEG-4 streams 368 While it is possible to build fully configurable receivers capable 369 of receiving any MPEG-4 stream, this specification also allows for 370 the design of simplified, but dedicated receivers, that are capable 371 for example of receiving only one type of MPEG-4 stream. This 372 is achieved by requiring that specific modes be defined for using 373 this specification. Each mode may define constraints for transport 374 of one or more type of MPEG-4 streams, for instance on the payload 375 configuration. 377 The applied mode MUST be signaled. Signaling the mode is 378 particularly important for receivers that are only capable of 379 decoding one or more specific modes. Such receivers need to 380 determine whether the applied mode is supported, so as to avoid 381 problems with processing of payloads that are beyond the 382 capabilities of the receiver. 384 In this document several modes are defined for transport of MPEG-4 385 CELP and AAC streams, as well as a generic mode that can be used 386 for any MPEG-4 stream. In future, new RFCs are expected to specify 387 additional modes of using this specification. New modes can be 388 defined as deemed appropriate, typically by specifications that are 389 hierarchically higher than this payload format. However, each mode 390 MUST be in full compliance with this specification. 392 2.13 Alignment with RFC 3016 394 This payload can be configured to be nearly identical to the 395 payload format defined in RFC 3016 [5] for the MPEG-4 video 396 configurations recommended in RFC 3016. Hence, receivers that 397 comply with RFC 3016 can decode such RTP payload, providing that 398 additional packets containing video decoder configuration (VO, 399 VOL, VOSH) are inserted in the stream, as required by RFC 3016. 400 Conversely, receivers that comply with the specification in this 401 document SHOULD be able to decode payloads, names and parameters 402 defined for MPEG-4 video in RFC 3016. In this respect it is 403 strongly recommended to implement the ability to ignore "in band" 404 video decoder configuration packets in the RFC 3016 payload. 406 Note the "out of band" availability of the video decoder 407 configuration is optional in RFC 3016. To achieve maximum 408 interoperability with the RTP payload format defined in this 409 document, applications that use RFC 3016 to transport MPEG-4 video 410 (part 2) are RECOMMENDED to make the video decoder configuration 411 available as a MIME parameter. 413 3. Payload Format 415 3.1 Usage of RTP Header Fields and RTCP 417 Payload Type (PT): The assignment of an RTP payload type for this 418 RTP packet format is outside the scope of this document, and will 419 not be specified here. It is expected that the RTP profile for a 420 particular class of applications will assign a payload type for 421 this encoding, or if that is not done, then a payload type in the 422 dynamic range shall be chosen. 424 Marker (M) bit: The M bit is set to 1 to indicate that the RTP 425 packet payload includes the end of each Access Unit of which data 426 is contained in this RTP packet. As the payload either carries one 427 or more complete Access Units or a single fragment of an Access 428 Unit, the M bit is usually set to 1, except when the packet carries 429 a single fragment of an Access Unit that is not the last one. 431 Extension (X) bit: Defined by the RTP profile used. 433 Sequence Number: The RTP sequence number SHOULD be generated by the 434 sender in the usual manner with a constant random offset. 436 Timestamp: Indicates the sampling instance of the first AU 437 contained in the RTP payload. This sampling instance is equivalent 438 to the CTS in the MPEG-4 time domain. When using SDP the clock rate 439 of the RTP time stamp MUST be expressed using the "rtpmap" 440 attribute. If an MPEG-4 audio stream is transported, the rate SHOULD 441 be set to the same value as the sampling rate of the audio stream. 442 If an MPEG-4 video stream is transported, it is RECOMMENDED to set 443 the rate to 90 kHz. 445 In all cases, the sender SHALL make sure that RTP time stamps 446 are identical only if the RTP time stamp refers to fragments of the 447 same Access Unit. 449 According to RFC 1889 [2] (section 5.1), RTP time stamps are 450 RECOMMENDED to start at a random value for security reasons. This 451 is not an issue for synchronization of multiple RTP streams. When, 452 however, streams from multiple sources are to be synchronized (for 453 example one stream from local storage, another from an RTP streaming 454 server), synchronization may become impossible if the receiver only 455 knows the original time stamp relationships. Synchronization in such 456 cases, may require to provide the correct relationship between time 457 stamps for obtaining synchronization by out of band means. The 458 format of such information as well as methods to convey such 459 information are beyond the scope of this specification. 461 SSRC: set as described in RFC1889 [2]. 463 CC and CSRC fields are used as described in RFC 1889 [2]. 465 RTCP SHOULD be used as defined in RFC 1889 [2]. 467 3.2 RTP Payload Structure 469 3.2.1 The AU Header Section 471 When present, the AU Header Section consists of the AU-header-length 472 field, followed by a number of AU-headers. See figure 2. 474 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ 475 |AU-headers-length|AU-header|AU-header| |AU-header|padding| 476 | | (1) | (2) | | (n) | bits | 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ 479 Figure 2: The AU Header Section 481 The AU-headers are configured using MIME format parameters and MAY 482 be empty. If the AU-header is configured empty, the 483 AU-headers-length field SHALL NOT be present and consequently the 484 AU Header Section is empty. If the AU-header is not configured 485 empty, then the AU-headers-length is a two octet field that 486 specifies the length in bits of the immediately following 487 AU-headers, excluding the padding bits. 489 Each AU-header is associated with a single Access Unit (fragment) 490 contained in the Access Unit Data Section in the same RTP packet. 491 For each contained Access Unit (fragment) there is exactly one 492 AU-header. Within the AU Header Section, the AU-headers are 493 bit-wise concatenated in the order in which the Access Units are 494 contained in the Access Unit Data Section. Hence, the n-th 495 AU-header refers to the n-th AU (fragment). If the concatenated 496 AU-headers consume a non-integer number of octets, up to 7 497 zero-padding bits MUST be inserted at the end in order to achieve 498 octet-alignment of the AU Header Section. 500 3.2.1.1 The AU-header 502 Each AU-header may contain the fields given in figure 3. The length 503 in bits of the above fields with the exception of the CTS-flag, the 504 DTS-flag and the RAP-flag fields is defined by MIME format 505 parameters; see section 4.1. If a MIME format parameter has the 506 default value of zero, then the associated field is not present. 508 If present, the fields MUST occur in the mutual order given in 509 figure 3. In the general case a receiver can only discover the size 510 of an AU-header by parsing it since the presence of the CTS-delta 511 and DTS-delta fields is signaled by the value of the CTS-flag and 512 DTS-flag, respectively. 514 +---------------------------------------+ 515 | AU-size | 516 +---------------------------------------+ 517 | AU-Index / AU-Index-delta | 518 +---------------------------------------+ 519 | CTS-flag | 520 +---------------------------------------+ 521 | CTS-delta | 522 +---------------------------------------+ 523 | DTS-flag | 524 +---------------------------------------+ 525 | DTS-delta | 526 +---------------------------------------+ 527 | RAP-flag | 528 +---------------------------------------+ 529 | Stream-state | 530 +---------------------------------------+ 532 Figure 3: The fields in the AU-header. If used, the AU-Index field 533 only occurs in the first AU-header within an AU Header 534 Section; in any other AU-header the AU-Index-delta field 535 occurs instead. 537 AU-size: Indicates the size in octets of the associated Access Unit 538 in the Access Unit Data Section in the same RTP packet. When 539 the AU-size is associated with an AU fragment, the AU size 540 indicates the size of the entire AU and not the size of the 541 fragment. This can be exploited to determine whether a packet 542 contains an entire AU or a fragment, which is particularly 543 useful after losing a packet carrying the last fragment of an 544 AU. 546 AU-Index: Indicates the serial number of the associated Access Unit 547 (fragment). For each (in decoding order) consecutive AU or AU 548 fragment, the serial number is incremented with 1. When 549 present, the AU-Index field occurs in the first AU-header in 550 the AU Header Section, but MUST NOT occur in any subsequent 551 (non-first) AU-header in that Section. To encode the serial 552 number in any such non-first AU-header, the AU-Index-delta 553 field is used. If each AU-Index field is coded with the value 554 0, the serial number of the AU (fragment) is not specified, 555 and in that case receivers MAY ignore the AU-Index field. 557 AU-Index-delta: The AU-Index-delta field is an unsigned integer 558 that specifies the serial number of the associated AU as the 559 difference with respect to the serial number of the previous 560 Access Unit. Hence, for the n-th (n>1) AU the serial number 561 is found from: 562 AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1 563 If the AU-Index field is present in the first AU-header in 564 the AU Header Section, then the AU-Index-delta field MUST be 565 present in any subsequent (non-first) AU-header. When the 566 AU-Index-delta is coded with the value 0, it indicates that 567 the Access Units are consecutive in decoding order. An 568 AU-Index-delta value larger than 0 signals that interleaving 569 is applied. 571 CTS-flag: Indicates whether the CTS-delta field is present. 572 A value of 1 indicates that the field is present, a value 573 of 0 that it is not present. 574 The CTS-flag field MUST be present in each AU-header if the 575 length of the CTS-delta field is signaled to be larger than 576 zero. In that case, the CTS-flag field MUST have the value 0 577 in the first AU-header and MAY have the value 1 in all 578 non-first AU-headers. The CTS-flag field SHOULD be 0 for 579 any non-first fragment of an Access Unit. 581 CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's 582 complement offset (delta) from the time stamp in the RTP 583 header of this RTP packet. The CTS MUST use the same clock 584 rate as the time stamp in the RTP header. 586 DTS-flag: Indicates whether the DTS-delta field is present. A value 587 of 1 indicates that DTS-delta is present, a value of 0 that 588 it is not present. 589 The DTS-flag field MUST be present in each AU-header if the 590 length of the DTS-delta field is signaled to be larger than 591 zero. The DTS-flag field SHOULD be 0 for any non-first 592 fragment of an Access Unit. 594 DTS-delta: Specifies the value of the DTS as a 2's complement 595 offset (delta) from the CTS. The DTS MUST use the 596 same clock rate as the time stamp in the RTP header. 598 RAP-flag: Indicates when set to 1 that the associated Access Unit 599 provides a random access point to the content of the stream. 600 If an Access Unit is fragmented, the RAP flag, if present, 601 MUST be set to 0 for each non-first fragment of the AU. 603 Stream-state: Specifies the state of the stream for an AU of an 604 MPEG-4 system stream; each state is identified by a value of 605 a modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams 606 use the AU_SequenceNumber to signal stream states. When the 607 stream state changes, the value of stream-state MUST be 608 incremented by one. 610 Note: no relation is required between stream-states of 611 different streams. 613 3.2.2 The Auxiliary Section 615 The Auxiliary Section consists of the auxiliary-data-size field 616 followed by the auxiliary-data field. Receivers MAY (but are not 617 required to) parse the auxiliary-data field; to facilitate skipping 618 of the auxiliary-data field by receivers, the auxiliary-data-size 619 field indicates the length in bits of the auxiliary-data. If the 620 concatenation of the auxiliary-data-size and the auxiliary-data 621 fields consume a non-integer number of octets, up to 7 zero padding 622 bits MUST be inserted immediately after the auxiliary data in order 623 to achieve octet-alignment. See figure 4. 625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ 626 | auxiliary-data-size | auxiliary-data |padding bits | 627 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ 629 Figure 4: The fields in the Auxiliary Section 631 The length in bits of the auxiliary-data-size field is configurable 632 by a MIME format parameter; see section 4.1. The default length of 633 zero indicates that the entire Auxiliary Section is absent. 635 auxiliary-data-size: specifies the length in bits of the immediately 636 following auxiliary-data field; 638 auxiliary-data: the auxiliary-data field contains data of a format 639 not defined by this specification. 641 3.2.3 The Access Unit Data Section 643 The Access Unit Data Section contains an integer number of complete 644 Access Units or a single fragment of one AU. The Access Unit Data 645 Section is never empty. If data of more than one Access Unit is 646 present, then the AUs are concatenated into a contiguous string 647 of octets. See figure 5. The AUs inside the Access Unit Data 648 Section MUST be in decoding order. 650 The size and number of Access Units SHOULD be adjusted such that 651 the resulting RTP packet is not larger than the path MTU. To handle 652 larger packets, this payload format relies on lower layers for 653 fragmentation, which may not be desirable. 655 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 656 |AU(1) | 657 + | 658 | | 659 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 660 | |AU(2) | 661 +-+-+-+-+-+-+-+-+ | 662 | | 663 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 664 | | AU(n) | 665 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 666 |AU(n) continued| 667 |-+-+-+-+-+-+-+-+ 669 Figure 5: Access Unit Data Section; each AU is octet aligned. 671 When multiple Access Units are carried, the size of each AU MUST be 672 made available to the receiver. If the AU size is variable then the 673 size of each AU MUST be indicated in the AU-size field of the 674 corresponding AU-header. However, if the AU size is constant for a 675 stream, this mechanism SHOULD NOT be used, but instead the fixed 676 size SHOULD be signaled by the MIME format parameter 677 "ConstantSize", see section 4.1. 679 The absence of both AU-size in the AU-header and the ConstantSize 680 MIME format parameter indicates carriage of a single AU (fragment), 681 i.e. that a single Access Unit (fragment) is transported in each 682 RTP packet for that stream. 684 3.2.3.1 Fragmentation 686 A packet SHALL carry either one or more Access Units, or a single 687 fragment of an Access Unit. Fragments of the same Access Unit have 688 the same time stamp but different RTP sequence numbers. The marker 689 bit in the RTP header is 1 on the last fragment of an Access Unit, 690 and 0 on all other fragments. 692 3.2.3.2 Interleaving 694 Access Units MAY be interleaved. Senders MAY perform interleaving. 695 Receivers MUST support interleaving. When interleaving of Access 696 Units is used it SHALL be implemented using the AU-Index and 697 AU-Index-delta fields in the AU-header. 699 Based on the RTP sequence number, the RTP time stamp, the AU-Index 700 and the AU-Index-delta, a receiver can unambiguously reconstruct 701 the original order even in case of out-of-order packets, packet 702 loss or duplication. Note that for this purpose the AU-Index is 703 redundant when the RTP time stamp and the AU-Index-delta values are 704 sufficient for placing the AUs correctly in time. In such cases 705 receivers MAY ignore the AU-Index value and senders MAY code the 706 AU-Index field with the value 0, but only if they code each AU-Index 707 field with that value. 709 When interleaving is applied, a de-interleave buffer is needed in 710 receivers to put the Access Units in their correct logical 711 consecutive decoding order. This requires the computation of the 712 time stamp for each Access Unit. In case of a fixed time duration 713 per Access Unit, the time stamp of the i-th access unit in an RTP 714 packet with RTP time stamp T is calculated as follows: 716 Timestamp[0] = T 717 Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] 718 + 1))) * access-unit-duration 720 When AU-Index-delta is always 0, this reduces to T + i * (access- 721 unit-duration). This is the non-interleaved case, where the frames 722 are consecutive in decoding order. Note that the AU-Index field 723 (present for the first Access Unit) is not needed in this 724 calculation. Hence in cases where the access-unit-duration has a 725 fixed and known value, the AU-Index does not need to provide index 726 information and can be coded with the value 0. See also the 727 semantics of the AU-Index field in 3.2.1.1. 729 If the Access Units are not fixed duration, the AU-Index is not 730 redundant, and MUST provide the index information required for 731 re-ordering. The number of bits of the AU-Index field MUST be chosen 732 so that valid index information is provided at the applied 733 interleaving scheme, without causing problems due to roll-over of 734 the AU-Index field. Note that the CTS-delta may be required to 735 compute the correct time stamp for each AU. 737 When an RTP packet arrives (after any reordering has been done), 738 receivers may 'flush' all Access Units from the interleave buffer 739 if the time stamp of each Access Units in the interleave buffer is 740 strictly less than the time stamp of the arriving packet. Access 741 Units should also be flushed in time to be played; this can be 742 important if there is loss before end-of-stream, before a silence 743 interval, or before a large drop-out. 745 3.2.3.3 Constraints for interleaving 747 The size of the packets should be suitably chosen to be appropriate 748 to both the path MTU and the duration and capacity of the receiver's 749 de-interleave buffer. The maximum packet size for a session SHOULD 750 be chosen not to exceed the path MTU. 752 In order to control receiver latency and mitigate the effects of 753 loss, there are profile-based limits on the size of the packet. 754 This is expressed as a duration: it is calculated from the duration 755 of the Access Units contained within a packet. Note that this 756 duration is NOT the difference between the time stamps of the first 757 and last Access Unit in a packet. 759 No matter what interleaving scheme is used, the scheme must be 760 analyzed to calculate the minimum number of frames a receiver has 761 to buffer in order to de-interleave. 763 Three profiles are defined to constrain the latency when 764 interleaving. The applied profile is signaled by the MIME format 765 parameter "Profile", indicating the decimal number of the profile. 766 The maximum de-interleave buffer required at the receiver can be 767 determined if the maximum packet duration is known. The maximum 768 packet duration in milliseconds for the three profiles, SHALL NOT 769 exceed: 771 Profile 0 -- 200 milliseconds 772 Profile 1 -- 500 milliseconds 773 Profile 2 -- 1500 milliseconds 774 When interleaving is applied, the applied profile MUST be signaled 775 by the MIME format parameter "Profile"; see section 4.1. 777 Note that for low bit-rate material, this duration limit may make 778 packets shorter than the MTU size. 780 3.2.3.4. Crucial and non-crucial AUs with MPEG-4 System data 782 Some Access Units with MPEG-4 system data, called "crucial" AUs, 783 carry information whose loss cannot be tolerated, either in the 784 presentation or in the decoder. At each crucial AU in an MPEG-4 785 system stream, the stream state changes. The stream-state MAY 786 remain constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4 787 system streams use the AU_SequenceNumber to signal stream states. 789 Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set 790 position of node X", AU3 = "Set position of node X". AU1 is crucial, 791 since if it is lost, AU2 cannot be executed. However, AU2 is not 792 crucial, since AU3 can be executed even if AU2 is lost. 794 When a crucial AU is (possibly) lost, the stream is corrupted. For 795 example, when an AU is lost and the stream state has changed at the 796 next received AU, then it is possible that the lost AU was crucial. 797 Once corrupted, the stream remains corrupted until the next random 798 access point. Note that loss of non-crucial AUs does not corrupt the 799 stream. When a decoder starts receiving a stream, the decoder MUST 800 consider the stream corrupted until an AU is received that provides 801 a random access point. 803 An AU that provides a random access point, as signaled by the 804 RAP-flag, may be crucial or not. Non-crucial RAP AUs provide a 805 "repeated" random access point for use by decoders that recently 806 joined the stream or that need to re-start decoding after a stream 807 corruption. Non-crucial RAP AUs MUST include all updates since the 808 last crucial RAP AU. 810 Upon receiving AUs, decoders are to react as follows: 811 a) if the RAP-flag is set to 1 and the stream-state changes, then 812 the AU is a crucial RAP AU, and the AU MUST be decoded. 813 b) if the RAP-flag is set to 1 and the stream state does not change, 814 then the AU is a non-crucial RAP AU, and the receiver SHOULD 815 decode it if the stream is corrupted. Otherwise, the decoder MUST 816 ignore the AU. 817 c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless 818 the stream is corrupted, in which case the AU MUST be ignored. 820 3.3 Usage of this specification 822 3.3.1 General 824 Usage of this specification requires definition of a mode. A mode 825 defines how to use this specification, as deemed appropriate. 826 Senders MUST signal the applied mode via the MIME format parameter 827 "Mode". This specification defines a generic mode that can be used 828 for any MPEG-4 stream, as well as specific modes for transport of 829 MPEG-4 CELP and MPEG-4 AAC streams, defined in ISO/IEC 14496-3. 831 In any mode compliant to this specification the same requirements 832 apply for the rtpmap attributes. The general form of an rtpmap 833 attribute is: 834 a=rtpmap: /[/] 836 For audio streams, specifies the number of 837 audio channels: 2 for stereo material (see RFC 2327) and 1 for 838 mono. Provided no additional parameters are needed, this parameter 839 may be omitted for mono material, hence its default value is 1. 841 3.3.2 The generic mode 843 The generic mode can be used for any MPEG-4 stream. In this mode 844 no mode-specific constraints are applied; hence, in the generic 845 mode the full flexibility of this specification can be exploited. 846 The generic mode is signaled by mode=generic. 848 An example is given below for transport of a BIFS stream. In this 849 example carriage of multiple BIFS Access Units is allowed in one 850 RTP packet. The AU-header contains the AU-size field, the CTS-flag 851 and, if the CTS flag is set to 1, the CTS-delta field. The number 852 of bits of the AU-size and the CTS-delta fields is 10 and 16, 853 respectively. The AU-header also contains the RAP-flag and the 854 Stream-state of 4 bits. This results in an AU-header with a 855 total size of two or four octets per BIFS AU. The RTP time stamp 856 uses a 1 kHz clock. Note that the media type name is video, 857 because the BIFS stream is part of an audiovisual presentation. For 858 conventions on media type names see section 4.1. 860 In detail: 862 m=video 49230 RTP/AVP 96 863 a=rtpmap:96 mpeg4-generic/1000 864 a=fmtp:96 streamtype=3; profile-level-id=257; mode=generic; 865 ObjectType=2; config=BIFSConfiguration(); SizeLength=10; 866 CTSDeltaLength=16; RandomAccessIndication=1; 867 StreamStateIndication=4 869 Note: The a=fmtp line has been wrapped to fit the page, it comprises 870 a single line in the SDP file. 871 BIFSConfiguration() is the hexadecimal string as defined in ISO/IEC 872 14496-1; for the description of MIME parameters see section 4.1. 874 3.3.3 Constant bit-rate CELP 876 This mode is signaled by mode=CELP-cbr. In this mode one or more 877 fixed size CELP frames can be transported in one RTP packet; there 878 is no support for interleaving. The RTP payload consist of one or 879 more concatenated CELP frames, each of the same size. Both the AU 880 Header Section and the Auxiliary Section MUST be empty. 882 The MIME format parameter ConstantSize MUST be provided to specify 883 the length of each CELP frame. 885 For example: 887 m=audio 49230 RTP/AVP 96 888 a=rtpmap:96 mpeg4-generic/44100/2 889 a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config= 890 AudioSpecificConfig(); ConstantSize=xxx; 892 Note: The a=fmtp line has been wrapped to fit the page, it comprises 893 a single line in the SDP file. 895 AudioSpecificConfig() is the haxadecimal string as defined in 896 ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio 897 stream type is CELP. For the description of MIME parameters see 898 section 4.1. 900 3.3.4 Variable bit-rate CELP 902 This mode is signaled by mode=CELP-vbr. With this mode one or 903 more variable size CELP frames can be transported in one RTP packet 904 with optional interleaving. As the largest possible frame size in 905 this mode is greater than the maximum CELP frame size, there is no 906 support for fragmentation of CELP frames. 908 In this mode the RTP payload consists of the AU Header Section, 909 followed by one or more concatenated CELP frames. The Auxiliary 910 Section MUST be empty. For each CELP frame contained in the payload 911 there MUST be a one octet AU-header in the AU Header Section to 912 provide: 913 (a) the size of each CELP frame in the payload and 914 (b) index information for computing the sequence (and hence timing) 915 of each CELP frame. 916 Transport of CELP frames requires that the AU-size field is coded 917 with 6 bits. In this mode therefore 6 bits are allocated to the 918 AU-size field, and 2 bits to the AU-Index(-delta) field. Each 919 AU-Index field MUST be coded with the value 0. In the AU Header 920 Section, the concatenated AU-headers are preceded by the 16-bit 921 AU-headers-length field, as specified in section 3.2.1. 923 In addition to the required MIME format parameters, the following 924 parameters MUST be present: SizeLength, IndexLength, and 925 IndexDeltaLength. 926 When interleaving is applied (AU-Index-delta coded with a value 927 larger than 0), the parameter Profile MUST also be present. 929 For example: 931 m=audio 49230 RTP/AVP 96 932 a=rtpmap:96 mpeg4-generic/44100/2 933 a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config= 934 AudioSpecificConfig(); SizeLength=6; IndexLength=2; 935 IndexDeltaLength=2; Profile=1 937 Note: The a=fmtp line has been wrapped to fit the page, it comprises 938 a single line in the SDP file. 940 AudioSpecificConfig() is the hexadecimal string as defined in 941 ISO/IEC 14496-3, AudioSpecificConfig() specifies that the audio 942 stream type is CELP. For the description of MIME parameters see 943 section 4.1. 945 3.3.5 Low bit-rate AAC 947 This mode is signaled by mode=AAC-lbr. This mode supports transport 948 of one or more variable size AAC frames with optional support for 949 interleaving and fragmenting. The maximum size of an AAC frame 950 (fragment) in this mode is 63 octets. 952 The payload configuration in this mode is the same as in the 953 variable bit-rate CELP mode as defined in 3.3.4. The RTP payload 954 consists of the AU Header Section, followed by concatenated AAC 955 frames. The Auxiliary Section MUST be empty. For each AAC frame 956 contained in the payload the one octet AU-header MUST provide: 957 (a) the size of each AAC frame in the payload and 958 (b) index information for computing the sequence (and hence timing) 959 of each AAC frame. 960 In the AU-header, the AU-size MUST be coded with 6 bits and the 961 AU-Index(-delta) with 2 bits; the AU-Index field MUST have the 962 value 0 in each AU-header. 963 In the AU-header Section, the concatenated AU-headers MUST be 964 preceded by the 16-bit AU-headers-length field, as specified in 965 section 3.2.1. 967 In addition to the required MIME format parameters, the following 968 parameters MUST be present: SizeLength, IndexLength, and 969 IndexDeltaLength. 970 When interleaving is applied (AU-Index-delta coded with a value 971 larger than 0), also the parameter Profile MUST be present. 973 For example: 975 m=audio 49230 RTP/AVP 96 976 a=rtpmap:96 mpeg4-generic/44100/2 977 a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config= 978 AudioSpecificConfig(); SizeLength=6; IndexLength=2; 979 IndexDeltaLength=2; Profile=1 981 Note: The a=fmtp line has been wrapped to fit the page, it comprises 982 a single line in the SDP file. 984 AudioSpecificConfig() is the hexadecimal string as defined in ISO/IEC 985 14496-3. AudioSpecificConfig() specifies that the audio 986 stream type is AAC. For the description of MIME parameters see 987 section 4.1. 989 3.3.6 High bit-rate AAC 991 This mode is signaled by mode=AAC-hbr. This mode supports transport 992 of one or more large variable size AAC frames in one RTP packet with 993 optional support for interleaving and fragmenting. The maximum size 994 of an AAC frame (fragment) in this mode is 8191 octets. 996 In this mode the RTP payload consists of the AU Header Section, 997 followed by one or more concatenated AAC frames. The Auxiliary 998 Section MUST be empty. For each AAC frame contained in the payload 999 there MUST be an AU-header in the AU Header Section to provide: 1000 (a) the size of each AAC frame in the payload and 1001 (b) index information for computing the sequence (and hence timing) 1002 of each AAC frame. 1004 To code the maximum size of an AAC frame requires 13 bits. Therefore 1005 in this configuration 13 bits are allocated to the AU-size, and 1006 3 bits to the AU-Index(-delta) field. Thus each AU-header has a size 1007 of 2 octets. Each AU-Index field MUST be coded with the value 0. In 1008 the AU Header Section, the concatenated AU-headers MUST be preceded 1009 by the 16-bit AU-headers-length field, as specified in section 3.2.1. 1011 In addition to the required MIME format parameters, the following 1012 parameters MUST be present: SizeLength, IndexLength, and 1013 IndexDeltaLength. 1014 When interleaving is applied (AU-Index-delta coded with a value 1015 larger than 0), also the parameter Profile MUST be present. 1017 For example: 1019 m=audio 49230 RTP/AVP 96 1020 a=rtpmap:96 mpeg4-generic/44100/2 1021 a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr; 1022 config=AudioSpecificConfig(); SizeLength=13; IndexLength=3; 1023 IndexDeltaLength=3; Profile=1 1024 Note: The a=fmtp line has been wrapped to fit the page, it comprises 1025 a single line in the SDP file. 1027 AudioSpecificConfig() is the hexadecimal string as defined in 1028 ISO/IEC 14496-3. AudioSpecificConfig() specifies that the audio 1029 stream type is AAC. For the description of MIME parameters see 1030 section 4.1. 1032 3.3.7 Additional modes 1034 This specification only defines the modes specified in sections 1035 3.3.2 up to 3.3.6. Additional modes are expected to be defined in 1036 future RFCs. Each additional mode MUST be in full compliance with 1037 this specification. 1039 When defining a new mode care MUST be taken that an implementation 1040 of all features of this specification can decode the payload format 1041 corresponding to this new mode. For this reason a mode MUST NOT 1042 specify new default values for MIME parameters. In particular, MIME 1043 parameters that configure the RTP payload MUST be present (unless 1044 they have the default value), even if its presence is redundant in 1045 case the mode assigns a fixed value to a parameter. A mode may 1046 define additionally that some MIME parameters are required instead 1047 of optional, that some MIME parameters have fixed values (or 1048 ranges), and that there are rules restricting the usage. 1050 4. IANA considerations 1052 This section describes the MIME types and names associated with 1053 this payload format. Section 4.1 registers the MIME types, as per 1054 RFC 2048. 1056 This format may require additional information about the mapping to 1057 be made available to the receiver. This is done using parameters 1058 also described in the next section. 1060 4.1 MIME type registration 1062 MIME media type name: "video" or "audio" or "application" 1064 "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2) 1065 or MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information 1066 needed for an audio/visual presentation. 1068 "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3) 1069 or MPEG-4 Systems streams that convey information needed for an 1070 audio only presentation. 1072 "application" MUST be used for MPEG-4 Systems streams (ISO/IEC 1073 14496-1) that serve purposes other than audio/visual presentation, 1074 e.g. in some cases when MPEG-J streams are transmitted. 1076 Depending on the required payload configuration, MIME format 1077 parameters need to be available to the receiver. This is done using 1078 the parameters described in the next section. There are required 1079 and optional parameters. 1081 Optional parameters are of two types: general parameters and 1082 configuration parameters. The configuration parameters are used to 1083 configure the fields in the AU Header section and in the auxiliary 1084 section. The absence of any configuration parameter is equivalent to 1085 the associated field set to its default value, which is always zero. 1086 The absence of all configuration parameters resolves into a default 1087 "basic" configuration with an empty AU-header section and an empty 1088 auxiliary section in each RTP packet. 1090 MIME subtype name: mpeg4-generic 1091 Required parameters: 1093 MIME format parameters are not case dependent; however for clarity 1094 both upper and lower case are used in the names of the parameters 1095 described in this specification. 1097 StreamType: 1098 The integer value that indicates the type of MPEG-4 stream that 1099 is carried; its coding corresponds to the values of the 1100 streamType as defined in Table 9 (objectTypeIndication Values) 1101 in ISO/IEC 14496-1. Note that the StreamType allows signaling of 1102 an MPEG-7 stream; this RTP payload format is not designed to 1103 carry an MPEG-7 stream, and may not be suitable for transport of 1104 MPEG-7 streams. 1106 Profile-level-id: 1107 A decimal representation of the MPEG-4 Profile Level indication. 1108 This parameter MUST be used in the capability exchange or 1109 session set-up procedure to indicate the MPEG-4 Profile and Level 1110 combination of which the relevant MPEG-4 media codec is capable 1111 of. 1112 For MPEG-4 Audio streams, this parameter is the decimal value 1113 from Table 5 (audioProfileLevelIndication Values) in ISO/IEC 1114 14496-1, indicating which MPEG-4 Audio tool subsets are 1115 required to decode the audio stream. 1116 For MPEG-4 Visual streams, this parameter is the decimal value 1117 from Table G-1 (FLC table for profile and level indication of 1118 ISO/IEC 14496-2), indicating which MPEG-4 Visual tool subsets 1119 are required to decode the visual stream. 1120 For BIFS streams, this parameter is the decimal value that is 1121 obtained from (SPLI + 256*GPLI), where: 1122 SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with 1123 the applied sceneProfileLevelIndication; 1124 GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with 1125 the applied graphicsProfileLevelIndication. 1126 For MPEG-J streams, this parameter is the decimal value from 1127 table 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1, 1128 indicating the profile and level of the MPEG-J stream. 1129 For OD streams, this parameter is the decimal value from table 3 1130 (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the 1131 profile and level of the OD stream. 1132 For IPMP streams, this parameter has either the decimal value 0, 1133 indicating an unspecified profile and level, or a value larger 1134 than zero, indicating an MPEG-4 IPMP profile and level as 1135 defined in a future MPEG-4 specification. 1136 For Clock Reference streams and Object Content Info streams, this 1137 parameter has the decimal value zero, indicating that profile 1138 and level information is conveyed through the OD framework. 1140 Config: 1141 A hexadecimal representation of an octet string that expresses 1142 the media payload configuration. Configuration data is mapped 1143 onto the hexadecimal octet string in an MSB-first basis. The 1144 first bit of the configuration data SHALL be located at the MSB 1145 of the first octet. In the last octet, if necessary to achieve 1146 octet alignment, up to 7 zero-valued padding bits shall follow 1147 the configuration data. 1148 For MPEG-4 Audio streams, config is the audio object type 1149 specific decoder configuration data AudioSpecificConfig() as 1150 defined in ISO/IEC 14496-3. For Stuctured Audio, the 1151 AudioSpecificConfig()may be conveyed by other means, not 1152 defined by this specification. If the AudioSpecificConfig() 1153 is conveyed by other means for Stuctured Audio, then the 1154 config MUST be a quoted empty hexadecimal octet string, as 1155 follows: config="". 1156 Note that a future mode of using this RTP payload format for 1157 Structured Audio may define such other means. 1158 For MPEG-4 Visual streams, config is the MPEG-4 Visual 1159 configuration information as defined in subclause 6.2.1 Start 1160 codes of ISO/IEC 14496-2. The configuration information 1161 indicated by this parameter SHALL be the same as the 1162 configuration information in the corresponding MPEG-4 Visual 1163 stream, except for first-half-vbv-occupancy and 1164 latter-half-vbv-occupancy, if it exists, which may vary in 1165 the repeated configuration information inside an MPEG-4 1166 Visual stream (See 6.2.1 Start codes of ISO/IEC 14496-2). 1167 For BIFS streams, this is the BIFSConfig() information as defined 1168 in ISO/IEC 14496-1. For version 1, BIFSConfig is defined in 1169 section 9.3.5.2, and for version 2 in section 9.3.5.3. The 1170 MIME format parameter ObjectType signals the version of 1171 BIFSConfig. 1172 For IPMP streams, this is either a quoted empty hexadecimal octet 1173 string, indicating the absence of any decoder configuration 1174 information (config=""), or the IPMPConfiguration() as 1175 defined in a future MPEG-4 IPMP specification. 1176 For Object Content Info (OCI) streams, this is the 1177 OCIDecoderConfiguration() information of the OCI stream, as 1178 defined in section 8.4.2.4 in ISO/IEC 14496-1. 1179 For OD streams, Clock Reference streams and MPEG-J streams, this 1180 is a quoted empty hexadecimal octet string (config=""), as 1181 no information on the decoder configuration is required. 1183 Mode: 1184 The mode in which this specification is used. The following modes 1185 can be signaled: 1186 mode=generic, 1187 mode=CELP-cbr, 1188 mode=CELP-vbr, 1189 mode=AAC-lbr and 1190 mode=AAC-hbr. 1191 Other modes are expected to be defined in future RFCs. See also 1192 section 3.3.7 and 4.2 of RFCxxxx. 1194 Optional general parameters: 1196 ObjectType: 1197 The decimal value from Table 8 in ISO/IEC 14496-1, indicating 1198 the value of the objectTypeIndication of the transported stream. 1199 For BIFS streams this parameter MUST be present to signal the 1200 version of BIFSConfiguration(). Note that the ObjectType MAY 1201 signal a non-MPEG-4 stream, and that the RTP payload format 1202 defined in this document may not be suitable to carry a stream 1203 that is not defined by MPEG-4. 1205 ConstantSize: 1206 The constant size in octets of each Access Unit for this stream. 1207 Simultaneous presence of ConstantSize and the SizeLength 1208 parameters is not permitted. 1210 Profile: 1211 The decimal representation of the applied profile to constrain 1212 the latency when interleaving; see section 3.2.3.3. Absence of 1213 this parameter signals that the profile is not specified. This 1214 parameter MUST be present when interleaving is applied. 1216 Optional configuration parameters: 1218 SizeLength: 1219 The number of bits on which the AU-size field is encoded in the 1220 AU-header. Simultaneous presence of SizeLength and the 1221 ConstantSize parameter is not permitted. 1223 IndexLength: 1224 The number of bits on which the AU-Index is encoded in the first 1225 AU-header. The default value of zero indicates the absence of 1226 the AU-Index and AU-Index-delta fields in each AU-header. 1228 IndexDeltaLength: 1229 The number of bits on which the AU-Index-delta field is encoded 1230 in any non-first AU-header. 1232 CTSDeltaLength: 1233 The number of bits on which the CTS-delta field is encoded in 1234 the AU-header. 1236 DTSDeltaLength: 1237 The number of bits on which the DTS-delta field is encoded in 1238 the AU-header. 1240 RandomAccessIndication: 1241 A decimal value of zero or one, indicating whether the RAP-flag 1242 is present in the AU-header. The decimal value of one indicates 1243 presence of the RAP-flag, the default value zero its absence. 1245 StreamStateIndication: 1246 The number of bits on which the Stream-state field is encoded in 1247 the AU-header. This parameter MAY be present when transporting 1248 MPEG-4 system streams, and SHALL NOT be present MPEG-4 audio and 1249 MPEG-4 video streams. 1251 AuxiliaryDataSizeLength: 1252 The number of bits that is used to encode the auxiliary-data-size 1253 field. 1255 Applications MAY use more parameters, in addition to those defined 1256 above. Each additional parameters MUST be registered with IANA, to 1257 ensure that there is no clash of names. Each additional parameter 1258 MUST be accompanied by a specification in the form of an RFC, MPEG 1259 standard, or other permanent and readily available reference (the 1260 "Specification Required" policy defined in RFC 2434). Receivers MUST 1261 tolerate the presence of such additional parameters, but these 1262 parameters SHALL NOT impact the decoding of receivers that comply to 1263 this specification. 1265 Encoding considerations: 1266 System bitstreams MUST be generated according to MPEG-4 Systems 1267 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 1268 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 1269 bitstreams MUST be generated according to MPEG-4 Audio 1270 specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized 1271 according to the RTP payload format defined in RFC xxxx. 1273 Security considerations: 1274 As defined in section 5 of RFC xxxx. 1276 Interoperability considerations: 1277 MPEG-4 provides a large and rich set of tools for the coding of 1278 visual objects. For effective implementation of the standard, 1279 subsets of the MPEG-4 tool sets have been provided for use in 1280 specific applications. These subsets, called 'Profiles', limit the 1281 size of the tool set a decoder is required to implement. In order to 1282 restrict computational complexity, one or more 'Levels' are set for 1283 each Profile. A Profile@Level combination allows: 1284 . a codec builder to implement only the subset of the standard he 1285 needs, while maintaining interworking with other MPEG-4 devices 1286 that implement the same combination, and 1287 . checking whether MPEG-4 devices comply with the standard 1288 ('conformance testing'). 1290 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1291 by the parameter "profile-level-id". Interoperability between a 1292 sender and a receiver is achieved by specifying the parameter 1293 "profile-level-id" in MIME content. In the capability exchange / 1294 announcement procedure this parameter may mutually be set to the 1295 same value. 1297 Published specification: 1298 The specifications for MPEG-4 streams are presented in ISO/IEC 1299 14496-1, 14496-2, and 14496-3. The RTP payload format is described 1300 in RFC xxxx. 1302 Applications which use this media type: 1303 Multimedia streaming and conferencing tools, Internet messaging and 1304 Email applications. 1306 Additional information: none 1308 Magic number(s): none 1310 File extension(s): 1311 None. A file format with the extension .mp4 has been defined for 1312 MPEG-4 content but is not directly correlated with this MIME type 1313 for which the sole purpose is RTP transport. 1315 Macintosh File Type Code(s): none 1317 Person & email address to contact for further information: 1318 Authors of RFC xxxx, IETF Audio/Video Transport working group. 1320 Intended usage: COMMON 1322 Author/Change controller: 1323 Authors of RFC xxxx, IETF Audio/Video Transport working group. 1325 4.2 Registration of mode definitions with IANA 1327 This specification can be used in a number of modes. The mode of 1328 operation is signalled using the "Mode" MIME parameter, with the 1329 initial set of values specified in Section 4.1. New modes may be 1330 defined at any time, as described in Section 3.3.7. These modes 1331 MUST be registered with IANA, to ensure that there is no clash 1332 of names. 1334 A new mode registration MUST be accompanied by a specification in 1335 the form of an RFC, MPEG standard, or other permanent and readily 1336 available reference (the "Specification Required" policy defined 1337 in RFC 2434). 1339 4.3 Concatenation of parameters 1341 Multiple parameters SHOULD be expressed as a MIME media type string, 1342 in the form of a semicolon-separated list of parameter=value pairs 1343 (for parameter usage examples see sections 3.3.2 up to 3.3.6). 1345 4.4 Usage of SDP 1347 4.4.1 The a=fmtp keyword 1349 It is assumed that one typical way to transport the above-described 1350 parameters associated with this payload format is via a SDP message 1351 [6] for example transported to the client in reply to a RTSP 1352 DESCRIBE or via SAP. In that case the (a=fmtp) keyword MUST be used 1353 as described in RFC 2327 [6], section 6, the syntax being then: 1355 a=fmtp: =[; =] 1357 5. Security Considerations 1359 RTP packets using the payload format defined in this specification 1360 are subject to the security considerations discussed in the RTP 1361 specification [2]. This implies that confidentiality of the media 1362 streams is achieved by encryption. Because the data compression used 1363 with this payload format is applied end-to-end, encryption may be 1364 performed on the compressed data so there is no conflict between the 1365 two operations. The packet processing complexity of this payload 1366 type (i.e. excluding media data processing) does not exhibit any 1367 significant non-uniformity in the receiver side to cause a denial- 1368 of-service threat. 1370 However, it is possible to inject non-compliant MPEG streams (Audio, 1371 Video, and Systems) to overload the receiver/decoder's buffers, 1372 which might compromise the functionality of the receiver or even 1373 crash it. This is especially true for end-to-end systems like MPEG 1374 where the buffer models are precisely defined. 1376 MPEG-4 Systems supports stream types including commands that are 1377 executed on the terminal like OD commands, BIFS commands, etc. and 1378 programmatic content like MPEG-J (Java(TM) Byte Code) and 1379 ECMAScript. It is possible to use one or more of the above in a 1380 manner non-compliant to MPEG to crash or temporarily make the 1381 receiver unavailable. 1383 Senders SHOULD ensure that packet loss does not cause severe 1384 problems in application execution when the packet carries OD 1385 commands, BIFS commands, or programmatic content such as MPEG-J and 1386 ECMAScript. For example, the reliability can be improved by 1387 re-transmission, or by using the carousel mechanism as defined by 1388 MPEG in ISO/IEC 14496-1, while observing the general congestion 1389 control principles. When such measures are deemed unsufficiently 1390 adequate, instead of this payload format applications SHOULD use 1391 more reliable means to transport the information, for example by 1392 applying an FEC scheme for RTP (such as in RFC 2733), or by using 1393 RTP over TCP (such as in RFC 2326, section 10.12), while giving due 1394 consideration to congestion control. For a general description of 1395 methods to repair streaming media see RFC 2354. 1397 Authentication mechanisms can be used to validate the sender and 1398 the data to prevent security problems due to non-compliant malignant 1399 MPEG-4 streams. 1401 In ISO/IEC 14496-1 a security model is defined for MPEG-4 Systems 1402 streams carrying MPEG-J access units which comprise Java(TM) classes 1403 and objects. MPEG-J defines a set of Java APIs and a secure 1404 execution model. MPEG-J content can call this set of APIs and 1405 Java(TM) methods from a set of Java packages supported in the 1406 receiver within the defined security model. According to this 1407 security model, downloaded byte code is forbidden to load libraries, 1408 define native methods, start programs, read or write files, or read 1409 system properties. 1410 Receivers can implement intelligent filters to validate the buffer 1411 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1412 ECMAScript) commands in the streams. However, this can increase the 1413 complexity significantly. 1415 6. Acknowledgements 1417 This document evolved through several revisions thanks to 1418 contributions by people from the ISMA forum, from the IETF AVT 1419 Working Group and from the 4-on-IP ad-hoc group within MPEG. The 1420 authors wish to thank all involved people, and in particular John 1421 Lazarro, Alex MacAulay, Bill May, Colin Perkins, Dorairaj V and 1422 Stephan Wenger for their valuable comments and support. 1424 7. References 1426 [1] ISO/IEC International Standard 14496 (MPEG-4); "Information 1427 technology - Coding of audio-visual objects", January 2000 1429 [2] Schulzrinne, Casner, Frederick, Jacobson RTP, "A Transport 1430 Protocol for Real Time Applications", RFC 1889, Internet 1431 Engineering Task Force, January 1996. 1433 [3] S. Bradner, "Key words for use in RFCs to Indicate Requirement 1434 Levels", RFC 2119, March 1997. 1436 [4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, "RTP payload 1437 format for MPEG1/MPEG2 Video", RFC 2250, January 1998. 1439 [5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, "RTP 1440 payload format for MPEG-4 Audio/Visual streams", RFC 3016. 1442 [6] Handley, Jacobson, "SDP: Session Description Protocol", 1443 RFC 2327, Internet Engineering Task Force, April 1998. 1445 8. Author Adresses 1447 Jan van der Meer 1448 Philips Digital Networks 1449 Cederlaan 4 1450 5600 JB Eindhoven 1451 Netherlands 1452 Email : jan.vandermeer@philips.com 1454 David Mackie 1455 Cisco Systems Inc. 1456 170 West Tasman Dr. 1457 San Jose, CA 95134 1458 Email: dmackie@cisco.com 1460 Viswanathan Swaminathan 1461 Sun Microsystems Inc. 1462 901 San Antonio Road, M/S UMPK15-214 1463 Palo Alto, CA 94303 1464 Email: viswanathan.swaminathan@sun.com 1466 David Singer 1467 Apple Computer, Inc. 1468 One Infinite Loop, MS:302-3MT 1469 Cupertino CA 95014 1470 Email: singer@apple.com 1472 Philippe Gentric 1473 Philips Digital Networks, MP4Net 1474 51 rue Carnot 1475 92156 Suresnes 1476 France 1477 e-mail: philippe.gentric@philips.com 1479 Full Copyright Statement 1481 "Copyright (C) The Internet Society (date). All Rights Reserved. 1482 This document and translations of it may be copied and furnished to 1483 others, and derivative works that comment on or otherwise explain 1484 it or assist in its implementation may be prepared, copied, 1485 published and distributed, in whole or in part, without restriction 1486 of any kind, provided that the above copyright notice and this 1487 paragraph are included on all such copies and derivative works. 1488 However, this document itself may not be modified in any way, such 1489 as by removing the copyright notice or references to the Internet 1490 Society or other Internet organizations, except as needed for the 1491 purpose of developing Internet standards in which case the 1492 procedures for copyrights defined in the Internet Standards process 1493 MUST be followed, or as required to translate it into. 1495 APPENDIX: Usage of this payload format 1497 Appendix A. Examples of delay analysis with interleave 1499 A.1 Group interleave 1501 An example of regular interleave is when packets are formed into 1502 groups. If the number of packets in a group is N, for example 1503 packet 0 could contain frame 0, frame N, frame 2N, and so on; 1504 packet 1 could contain frame 1, frame 1+N, 1+2N, and so on. The 1505 AU-Index field is used to document the sequence of the packet 1506 within the group (or the first frame in the packet, which is the 1507 same thing in this scheme), and all the AU-Index-delta fields 1508 contain N-1. 1510 Because each subsequent frame in the packet has a higher time stamp 1511 than the preceding frame, receivers can tell when a new interleave 1512 group is starting, by noting that the computed time stamp of the 1513 first frame in a packet is later than any previously computed time 1514 stamp. In that case the time stamps of all frames contained in the 1515 packet are higher than any previously computed time stamp, and 1516 hence interleaving with any previously received frame is not 1517 possible. In conclusion, a new group has been started. 1519 If the group size is 3, then packets can be formed as follows: 1521 Packet Time stamp Frame Numbers AU-Index, AU-Index-delta 1522 0 T[0] 0, 3, 6 0, 2, 2 1523 1 T[1] 1, 4, 7 0, 2, 2 1524 2 T[2] 2, 5, 8 0, 2, 2 1525 3 T[9] 9,12,15 0, 2, 2 1527 In this case, the receiver would have to buffer 4 frames at least 1528 from packets 0 and 1, and can flush all frames when packet 2 1529 arrives. (Frame 0 can be flushed as packet 0 arrives, since it is 1530 the earliest frame we hold, and likewise frame 1 from packet 1; we 1531 are therefore holding 3,4,6,7 until packet 2 arrives). 1533 If there is loss, then the receiver may wait longer than is strictly 1534 necessary before it emits frames. For example, say packet 1 is lost 1535 from the above example. Packet 0 allows frame 0 to be emitted, and 1536 then packet 2 arrives, allowing us to notice the loss of frame 1, 1537 and emit frame 2 and 3. Then it is not until the arrival of packet 3 1538 (which has a time-stamp beyond the times of all the frames seen so 1539 far), that we can finish dealing with the loss, even though the 1540 first group has, in fact, ended. (This is in contrast to schemes 1541 which signal the group size explicitly; if the receiver knows that 1542 this is packet 3 of 3, then even if 2 of 3 is missing, it can 1543 de-interleave this group without waiting for the next one to start). 1545 In the above example the AU-Index is coded with the value 0, as 1546 required for the modes defined in this document. To reconstruct the 1547 original order, the RTP time stamp and the AU-Index-delta are used. 1548 See also section 3.2.3.2. 1550 Another example of forming packets with group interleave is given 1551 below. In this example the packets are formed such that the loss of 1552 two subsequent RPT packets does not cause the loss of two subsequent 1553 audio frames. Note that in this example the RTP time stamps of 1554 packets 3 and 4 are earlier than the RTP time stamps of packets 1 1555 and 2, respectively. 1557 Packet Time stamp Frame Numbers AU-Index, AU-Index-delta 1558 0 T[0] 0, 5, 10, 15 0, 5, 5, 5 1559 1 T[2] 2, 7, 12, 17 0, 5, 5, 5 1560 2 T[4] 4, 9, 14, 19 0, 5, 5, 5 1561 3 T[1] 1, 6, 11, 16 0, 5, 5, 5 1562 4 T[3] 3, 8, 13, 18 0, 5, 5, 5 1564 5 T[20] 20, 25, 30, 35 0, 5, 5, 5 1565 and so on .. 1567 A.2 Continuous interleave 1569 In continuous interleave, once the scheme is 'primed', the number of 1570 frames in a packet exceeds the 'stride' (the distance between them). 1571 This shortens the buffering needed, smooths the data-flow, and gives 1572 slightly larger packets -- and thus lower overhead -- for the same 1573 interleave. For example, here is a continuous interleave also over 1574 a stride of 3 frames, but with 4 frames per packet, for a run of 20 1575 frames. This shows both how the scheme 'starts up' and how it 1576 finishes. 1578 Packet Time-stamp Frame Numbers AU-Index, AU-Index-delta 1579 0 T[0] 0 0 1580 1 T[1] 1 4 0 2 1581 2 T[2] 2 5 8 0 2 2 1582 3 T[3] 3 6 9 12 0 2 2 2 1583 4 T[7] 7 10 13 16 0 2 2 2 1584 5 T[11] 11 14 17 20 0 2 2 2 1585 6 T[15] 15 18 0 2 1586 7 T[19] 19 0 1588 In this case, the receiver has to buffer only 3 frames, not 4. Say 1589 we are waiting for packet 4. We can flush frames 0, 1, 2, 3, 4, 5, 1590 6; we are holding therefore 8, 9, 12. Packet 4 arrives, allowing 1591 us to emit 7,8,9,10, and we are holding 12,13,16. Each arriving 1592 packet contains 4 frames, and allows 4 frames to be flushed. 1594 In the above example the AU-Index is coded with the value 0, as 1595 required for the modes defined in this document. To reconstruct the 1596 original order, the RTP time stamp and the AU-Index-delta are used. 1597 See also 3.2.3.2. 1599 If there is loss, again the receiver has to wait to emit the erasure 1600 frames. In this case, say packet 3 is lost. We were holding frames 1601 4, 5, and 8. On the arrival of packet 4, (time-stamp of frame 7), 1602 we now know frame 3 was lost, we can emit frames 4,5, and we know 6 1603 must be lost, and emit 7, which is in the packet that arrived. Then 1604 on the arrival of packet 5 (time-stamp 11) we can emit 8, indicate 1605 loss of 9, and emit 10 and 11. Finally, the arrival of packet 6 1606 (time-stamp 15) indicates that 12 must be lost; we have now 1607 detected all the lost frames.