idnits 2.17.1 draft-ietf-avt-mpeg4-simple-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1269 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 101 instances of too long lines in the document, the longest one being 6 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 741 has weird spacing: '... for stere...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHALL not' in this paragraph: The AU-headers are configured using format parameters and MAY be empty. If the AU-header is configured empty, the AU-headers-length field SHALL not be present and consequently the AU Header Section is empty. If the AU-header is not configured empty, then the AU-headers-length is a two octet field that specifies the length in bits of the immediately following AU-headers. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHALL not' in this paragraph: Applications MAY use more parameters, in addition to those defined above. Receivers MUST tolerate the presence of such additional parameters, but these parameters SHALL not impact the decoding of receivers that comply to this specification. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2002) is 7856 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 1214, but not defined == Missing Reference: '9' is mentioned on line 1177, but not defined == Missing Reference: '11' is mentioned on line 1219, but not defined == Missing Reference: '15' is mentioned on line 1220, but not defined == Missing Reference: '19' is mentioned on line 1221, but not defined == Unused Reference: '4' is defined on line 1092, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 1098, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 1889 (ref. '2') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '5') (Obsoleted by RFC 6416) -- No information found for draft-gentric-avt-mpeg4-multiSL - is the name correct? -- Possible downref: Normative reference to a draft: ref. '6' ** Obsolete normative reference: RFC 2327 (ref. '7') (Obsoleted by RFC 4566) Summary: 9 errors (**), 0 flaws (~~), 12 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force J. van der Meer 2 Internet Draft Philips Electronics 3 D. Mackie 4 Cisco Systems Inc. 5 V. Swaminathan 6 Sun Microsystems Inc. 7 D. Singer 8 Apple Computer 10 March 2002 11 Expires September 2002 13 Document: draft-ietf-avt-mpeg4-simple-01.txt 15 Use of "RFC XXXX" for MPEG-4 Elementary Streams with no SL layer 17 Status of this Memo 19 This document is an Internet-Draft and is in full conformance with 20 all provisions of Section 10 of RFC2026. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. Internet-Drafts are draft documents valid for a maximum of 26 six months and may be updated, replaced, or obsoleted by other 27 documents at any time. It is inappropriate to use Internet- Drafts 28 as reference material or to cite them other than as "work in 29 progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This specification is a product of the Audio/Video Transport working 37 group within the Internet Engineering Task Force. Comments are 38 solicited and should be addressed to the working group's mailing 39 list at avt@ietf.org and/or the authors. 41 << 42 Note for the RFC editor: 43 XXXX should be replaced with the RFC number that will be assigned to 44 the companion RFC which draft is: draft-ietf-avt-mpeg4-multisl-**.txt. 45 >> 47 Abstract 49 The MPEG Committee (ISO/IEC JTC1/SC29 WG11) is a working group in ISO 50 that recently produced the MPEG-4 standard. MPEG defines tools to 51 compress content such as audio-visual information into elementary 52 streams. In RFC XXXXX a generic RTP payload format is defined for 53 transport of any non-multiplexed MPEG-4 elementary stream. To achieve 54 the generic MPEG-4 functionality, RFC XXXXX addresses detailed issues 55 related to the MPEG-4 SL layer. However, many initial applications will 56 not use the SL Layer. To facilitate usage of RFC XXXXX by such 57 applications, this document describes how to use RFC XXXX when no SL 58 layer is used. 60 1. Introduction 62 The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 63 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 64 standards [1]. The MPEG-4 standard specifies compression of 65 audio-visual data into for example an audio or video elementary 66 stream. In the MPEG-4 standard, these streams take the form of 67 audiovisual objects that may be arranged into an audio-visual scene 68 by means of a scene description. Each MPEG-4 elementary stream 69 consists of a sequence of Access Units; in case of audio an Access 70 Unit (AU) is an audio frame and in case of video a picture. 72 The MPEG-4 system specification is a rather abstract specification in 73 the sense that no transport format for MPEG-4 elementary streams is 74 defined. Instead, a conceptual SL layer has been specified to store 75 transport specific information such as time stamps and random access 76 point information. When transporting an MPEG-4 elementary stream, 77 transport information from the SL layer is typically mapped to the 78 actual transport layer. Note however that the SL layer is conceptual 79 and may not exist in practice. 81 In RFC XXXX, a general payload format is defined for transport of a single 82 MPEG-4 elementary stream over RTP. The RTP payload format specified 83 in RFC XXXX allows for carriage of any information that may be contained in 84 the MPEG-4 SL layer, either by mapping to the RTP header fields or by 85 carriage in specific fields defined in the RTP payload. Consequently, 86 the format defined in RFC XXXX is very generic and complete; for example, 87 transcoding issues from and to the SL layer are described in detail. 89 However, in many initial MPEG-4 applications the SL layer does not 90 exist in practice. Such applications do not require any knowledge of 91 the SL layer. While the use of RFC XXXX is highly desirable for all MPEG-4 92 applications, to understand RFC XXXX may be difficult without knowledge of 93 the MPEG-4 SL layer. Therefore in this document the use of RFC XXXX is 94 described without requiring knowledge of the SL layer to understand 95 its functionality. 97 Sophisticated features on interleaving of fragmented Access Units are 98 defined in RFC XXXX. Because initial applications only need interleaving 99 of complete (non-fragmented) Access Units, these more sophisticated 100 features are not supported in this document. Hence, only a functional 101 set of RFC XXXX is supported. 103 In RFC XXXX, a general and configurable payload structure is defined for 104 transport of MPEG-4 streams. This allows for the design of receivers 105 that can be configured to receive any MPEG-4 stream. Configuration of 106 the payload is provided to accommodate transport of any MPEG-4 stream, 107 but for a specific MPEG-4 elementary stream typically only very few 108 configurations are needed. So as to allow for the design of simplified, 109 but dedicated receivers, this specifications requires that specific 110 modes are defined for transport of MPEG-4 streams. In this document 111 only modes are defined for transport of MPEG-4 CELP and AAC streams, 112 but in future new RFCs are expected to specify additional modes for 113 transport of other MPEG-4 streams. 115 In summary, this document: 116 - is intended for applications that do not apply the SL layer; 117 - describes how to use RFC XXXX without requiring knowledge of the 118 SL layer; 119 - defines a functional but true subset of RFC XXXX; 120 - defines modes how to use this specification for transport of MPEG-4 121 CELP and AAC streams. 123 The use of RFC XXXX defined in this document is simple to implement 124 and reasonably efficient. It allows for optional interleaving of 125 Access Units (such as audio frames) to increase error resiliency in 126 packet loss. 128 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 129 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 130 this document are to be interpreted as described in RFC 2119 [3]. 132 2. Carriage of MPEG-4 elementary streams over RTP 134 2.1 Introduction 136 With this payload format a single MPEG-4 elementary stream can be 137 transported. Information on the type of MPEG-4 stream carried in the 138 payload is conveyed by format parameters in an SDP [7] message or 139 by other means. These format parameters specify the configuration 140 of the payload. To simplify receivers, also a format parameter is 141 available to signal a specific mode of using this payload. A mode 142 definition MAY include the type of MPEG-4 elementary stream as well 143 as the applied configuration, so as to avoid the need in receivers 144 for parsing all format parameters. 146 2.2 MPEG Access Units 148 For carriage of compressed audio-visual data MPEG defines Access 149 Units. An MPEG Access Unit (AU) is the smallest data entity to which 150 timing information can be attributed. In case of audio an Access 151 Unit represents an audio frame and in case of video a picture. MPEG 152 Access Units are by definition byte aligned. If for example an audio 153 frame is not byte aligned, up to 7 zero-padding bits MUST be inserted 154 at the end of the frame to achieve a byte-aligned Access Unit. 155 Decoders MUST be able to decode AUs in which such padding is applied. 157 Consistent with the MPEG-4 specification, this document requires that 158 each MPEG-4 video Access Unit includes all the coded data of a 159 picture, any video stream headers that may precede the coded picture 160 data, and any video stream stuffing that may follow it, up to, but not 161 including the startcode indicating the start of a new video stream or 162 the next Access Unit. 164 2.3 Concatenation of Access Units 166 Frequently it is possible to carry multiple Access Units in one RTP 167 packet. This is particularly useful for audio; for example, when AAC 168 is used for encoding of a stereo signal at 64 kbits/sec, AAC frames 169 contain on average approximately 200 bytes. On a LAN with a 1500 octet 170 MTU this would allow on average 7 complete AAC frames to be carried 171 per AAC packet. 173 Access Units may have a fixed size in octets, but a variable size is 174 also possible. To facilitate parsing in case of multiple concatenated 175 AUs in one RTP packet, the size of each AU is made known to the 176 receiver. When concatenating in case of a constant AU size, this size 177 is communicated through a format parameter. When concatenating in case 178 of variable size AUs, the RTP payload carries an AU size field for 179 each contained AU. In combination with the RTP payload length the 180 size information allows the RTP payload to be split by the receiver 181 back into the individual AUs. 183 To simplify the implementation of RFC XXXX defined in this document, it 184 is required that when multiple AUs are carried in an RTP packet, that 185 each AU MUST be complete, i.e. the number of AUs in an RTP packet 186 MUST be integral. 188 2.4 Fragmentation of Access Units 190 MPEG allows for very large Access Units. Since most IP networks have 191 significantly smaller MTU's, this payload format allows to fragment 192 the AUs over multiple RTP packets so as to avoid IP layer 193 fragmentation. To simplify the implementation of RFC XXXX defined in this 194 document, an RTP packet SHALL either carry one or more complete 195 Access Units or a single fragment of one Access Unit. 197 2.5 Interleaving 199 When an RTP packet carries a contiguous sequence of Access Units, 200 the loss of such packet can result in "decoding gaps" for the user. 201 One method to alleviate this problem is to allow for the Access 202 Units to be interleaved in the RTP packets. For a modest cost in 203 latency and implementation complexity, significant error resiliency 204 to packet loss can be achieved. 206 To support optional interleaving of Access Units, this payload 207 format allows for index information to be sent for each Access Unit. 208 The RTP sender is free to choose the interleaving pattern without 209 propagating this information to the receiver(s). Indeed the sender 210 could dynamically adjust the interleaving pattern based on the 211 Access Unit size, error rates, etc. The RTP receiver does not need 212 to know the interleaving pattern used, it only need extract the 213 index information of the Access Unit and insert the Access Unit into 214 the appropriate sequence in the rendering queue. An example of 215 interleaving is given below. 217 Assume that an RTP packet contains 3 AUs, and that the AUs are 218 numbered 1, 2, 3, 4, etc. If an interleaving group length of 9 is 219 chosen, then RTP packet(i) contain the following AU(n): 220 RTP packet(1): AU(1), AU(4), AU(7) 221 RTP packet(2): AU(2), AU(5), AU(8) 222 RTP packet(3): AU(3), AU(6), AU(9) 223 RTP packet(4): AU(10), AU(13), AU(16) 224 RTP packet(5): AU(11), AU(14), AU(17) 225 Etc. 227 2.6 Time stamp information 229 MPEG-4 defines two type of time stamps, the decoding time stamp DTS 230 and the composition time stamp CTS. The RTP timestamp is equivalent 231 to the composition time stamp. 233 The RTP time stamp MUST carry the sampling instance of the first AU 234 (fragment) in the RTP packet. When multiple AUs are carried within 235 an RTP packet, the time stamps of subsequent AUs can be calculated 236 if the frame period of each AU is known. For audio and video this 237 is possible if the frame rate is constant. However, in some cases it 238 is not possible to make such calculation, for example for variable 239 frame rate video and for MPEG-4 BIFS streams carrying composition 240 information. To support such cases, this payload format can be 241 configured to carry a CTS in the RTP payload for each contained 242 Access Unit. A CTS time stamp MAY be conveyed in the RTP payload 243 only for non-first AUs in the RTP packet, and SHALL NOT be conveyed 244 for the first AU (fragment), as the time stamp for the latter is 245 carried by the RTP time stamp. 247 The DTS timestamp may be applied only in MPEG video streams that use 248 bi-directional coding, i.e. when pictures may be predicted in both 249 forward and backward direction by using either a reference picture in 250 the past, or a reference picture in the future. The DTS cannot be 251 carried in the RTP header. In some cases the DTS can be derived from 252 the RTP time stamp using frame rate information; this requires deep 253 parsing in the video stream, which may be considered objectionable. 254 But if the video frame rate is variable, the required information 255 may not even present in the video stream. For both reasons, the 256 capability has been defined to optionally carry a DTS in the RTP 257 payload for each contained Access Unit. 259 Since RTP time stamps may be re-stamped by RTP devices, each CTS 260 and DTS contained in the RTP payload is coded differentially from the 261 RTP time stamp, so as to avoid extensive parsing by re-stamping 262 devices. 264 2.7 Carriage of auxiliary information. 266 This payload format defines a specific field to carry auxiliary data 267 on the contained MPEG-4 stream, representing MPEG-4 system information. 268 The auxiliary data corresponds to the RSLH field defined in RFC XXXX. 269 Receivers MAY use the auxiliary data to decode the contained stream, 270 but receivers that have no interest in such data MAY skip the 271 auxiliary data field. To facilitate skipping of the data, and to avoid 272 the need for parsing it, the auxiliary data field is preceded by a 273 field that specifies the length of the auxiliary data. 275 2.8 Format parameters and the conditional presence and length of fields 277 To support the features described in the previous sections several 278 fields are defined for carriage in the RTP payload. However, their use 279 strongly depends on the type of MPEG-4 elementary stream that is 280 carried. Sometimes a specific field is needed with a certain length, 281 while in other cases such field is not needed at all. To be efficient 282 in either case, the fields needed for these features are configurable 283 by means of format parameters. In general, a format parameter defines 284 the presence and length of associated fields. A length of zero 285 indicates absence of the field. As a consequence, parsing of the 286 payload requires knowledge of format parameters. The format 287 parameters are conveyed to the receiver via SDP [7] messages or 288 through other means. 290 2.9 Global structure of payload format 292 The payload structure in RFC XXXX is described in terms derived from the 293 SL layer. In this document exactly the same structure is described 294 in more general terms, so as to improve the readability for people 295 with no knowledge of the SL layer. So the payload structure described 296 below corresponds on bit level exactly to the payload structure 297 defined in RFC XXXX. 299 The RTP payload following the RTP header, contains three byte aligned 300 data sections, of which the first two MAY be empty. See figure 1. 302 +---------+-----------+-----------+---------------+ 303 | RTP | AU Header | Auxiliary | Access Unit | 304 | Header | Section | Section | Data Section | 305 +---------+-----------+-----------+---------------+ 307 <----------RTP Packet Payload-----------> 309 Figure 1: Data sections within an RTP packet 311 The first data section is the AU (Access Unit) Header Section, that 312 contains one or more AU-headers; however, each AU-header MAY be empty, 313 in which case the entire AU Header Section is empty. The second 314 section is the Auxiliary Section, containing auxiliary data; also 315 this section MAY be configured empty. The third section is the Access 316 Unit Data Section, containing either a single fragment of one Access 317 Unit or one or more complete Access Units. The Access Unit Data 318 Section is never empty. 320 When compared to the terms used in RFC XXXX, the AU Header Section 321 exactly corresponds to the Payload Header Section, the Auxiliary 322 Section to the RSLH Section, and the Access Unit Data Section to the 323 Payload Section. 325 2.10 Modes to transport MPEG-4 streams 327 While it is possible to build fully configurable receivers capable of 328 receiving any MPEG-4 stream, this specification also allows for the 329 design of simplified, but dedicated receivers, that are capable for 330 example to receive only one type of MPEG-4 stream. This is achieved by 331 requiring that specific modes be defined for using this specification. 332 Each mode defines how to transport specific MPEG-4 streams, for example 333 by defining suitable constraints or payload configurations. Modes can 334 be defined as deemed appropriate. However, each mode MUST be in full 335 compliance with this specification. 337 The applied mode MUST be signalled. Signalling the mode is particularly 338 important for receivers that are only capable of decoding a particular 339 mode. Such receivers need to determine whether that particular mode is 340 applied, so as to avoid problems with processing of payloads that are 341 beyond the capabilities of the receiver. 343 In this internet draft only modes are defined for transport of MPEG-4 344 CELP and AAC streams. However, in future new RFCs are expected to 345 specify additional modes of using this specification for transport of 346 other MPEG-4 streams. 348 2.11 Alignment with RFC XXXX and RFC 3016 350 This document defines a subset of the RFC XXXX. The main characteristic 351 of this subset is that each RTP payload is only allowed to contain either 352 a single fragment of one Access Unit or one or more complete Access Units. 353 Obviously, RTP payloads that apply this subset in conformance with this 354 document conform also to RFC XXXX. Receivers that comply with RFC XXXX 355 are able to decode MPEG-4 streams carried in compliance with this 356 document. 358 Receivers designed to only comply to this document may not be able to 359 decode a RTP payload that conforms to RFC XXXX but not to this document. 360 Such receivers may also not be capable of exploiting some of features 361 of the SL layer supported in RFC XXXX, such as knowledge of AU-start, 362 random access information and other information carried in the SL header, 363 but not described in this document. 365 Furthermore, this payload can be configured to be identical to the 366 payload format defined in RFC 3016 [5] for the MPEG-4 video configurations 367 recommended in RFC 3016. Hence, receivers that comply with RFC 3016 368 can decode such RTP payload. Vice versa, receivers that comply with the 369 specification in this document SHOULD be able to decode payloads, names 370 and parameters defined for MPEG-4 video in RFC 3016. 372 For interoperability reasons, applications that transport MPEG-4 video 373 over RTP SHOULD use the payload format and associated names and 374 parameters defined in RFC 3016 if the functionality provided by RFC 3016 375 can meet the requirements of that application. 377 3 Payload Format 379 3.1 RTP Header Fields Usage 381 Payload Type (PT): The assignment of an RTP payload type for this 382 RTP packet format is outside the scope of this document, and will 383 not be specified here. It is expected that the RTP profile for a 384 particular class of applications will assign a payload type for this 385 encoding, or if that is not done, then a payload type in the dynamic 386 range shall be chosen. 388 Marker (M) bit: The M bit is set to 1 to indicate that the RTP packet 389 payload includes the end of each Access Unit of which data is 390 contained in this RTP packet. As the payload either carries one or 391 more complete Access Units or a single fragment of an Access Unit, 392 the M is always set to set to 1, except when the packet carries a 393 single fragment of an Access Unit that is not the last one. 395 Extension (X) bit: Defined by the RTP profile used. 397 Sequence Number: The RTP sequence number SHOULD be generated by the 398 sender with a constant random offset. 400 Timestamp: Indicates the sampling instance of the first AU contained 401 in the RTP payload. This sampling instance is equivalent to the CTS 402 in the MPEG-4 time domain. The clock rate of the RTP time stamp MUST 403 be expressed as part of the RTPMAP. If an audio or video stream with 404 a fixed frame rate is transported, the rate SHOULD be set to the same 405 value as the sampling frequency of the audio or video frames (number 406 of samples per second). 407 In all cases, the sender SHALL make sure that RTP time stamps 408 are identical only if the RTP time stamp refers to fragments of the 409 same Access Unit. 410 According to RFC 1889 [2] (section 5.1), RTP timestamps are 411 recommended to start at a random value for security reasons. However, 412 then a receiver is, in the general case, not able to reconstruct the 413 original MPEG Time Stamps, which creates problems for applications 414 where streams from multiple sources are to be synchronized. To enable 415 synchronisation in such cases, for example between one stream from 416 local storage and another from an RTP streaming server, the applied 417 random offset MUST be provided out of band. Methods to convey the 418 applied random offset value are beyond the scope of this 419 specification. 421 SSRC: set as described in RFC1889 [2]. 423 CC and CSRC fields are used as described in RFC 1889 [2]. 425 RTCP SHOULD be used as defined in RFC 1889 [2]. 427 3.2 RTP Payload Structure 429 As already noted in section 2.9 of this document, this document uses 430 more general names to describe exactly the same payload structure as 431 defined in RFC XXXX. For mapping between section names in RFC XXXX and 432 in this document see section 2.9. 434 3.2.1 The AU Header Section 436 When present, the AU Header Section consists of the AU-header-length 437 field, followed by a number of AU-headers. See figure 2. 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ 440 |AU-headers-length|AU-header|AU-header| |AU-header|padding| 441 | | (1) | (2) | | (n) | bits | 442 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+ 444 Figure 2: The AU Header Section 446 The AU-headers are configured using format parameters and MAY be empty. 447 If the AU-header is configured empty, the AU-headers-length field 448 SHALL not be present and consequently the AU Header Section is empty. 449 If the AU-header is not configured empty, then the AU-headers-length 450 is a two octet field that specifies the length in bits of the 451 immediately following AU-headers. 453 Each AU-header is associated with a single Access Unit (fragment) 454 contained in the Access Unit Data Section in the same RTP packet. For 455 each contained Access Unit (fragment) there is exactly one AU-header. 456 Within the AU Header Section, the AU-headers are bit-wise concatenated 457 in the order in which the Access Units are contained in the Access 458 Unit Data Section. Hence, the n-th AU-header refers to the n-th AU 459 (fragment). If the concatenated AU-headers consume a non-integer 460 number of octets, up to 7 zero-padding bits MUST be inserted at the end 461 in order to achieve byte-alignment of the AU Header Section. 463 3.2.1.1 The AU-header 465 The AU-header contains the fields given in figure 3. The length in 466 bits of the above fields with the exception of the CTS-flag and 467 the DTS-flag fields is defined by format parameters; see section 4.1. 468 If a format parameter has the default value of zero, then the 469 associated field is not present. 471 +---------------------------------------+ 472 | AU-size | 473 +---------------------------------------+ 474 | AU-Index / AU-Index-delta | 475 +---------------------------------------+ 476 | CTS-flag | 477 +---------------------------------------+ 478 | CTS-delta | 479 +---------------------------------------+ 480 | DTS-flag | 481 +---------------------------------------+ 482 | DTS-delta | 483 +---------------------------------------+ 485 Figure 3: The fields in the AU-header. If used, the AU-Index field 486 only occurs in the first AU-header within an AU Header 487 Section; in any other AU-header the AU-Index-delta field 488 occurs instead. 490 AU-size: indicates the size in octets of the associated Access Unit 491 in the Access Unit Data Section in the same RTP packet. When the 492 AU-size is associated to an AU fragment, the AU size indicates 493 the size of the entire AU and not the size of the fragment. This 494 can be exploited to determine whether a packet contains an entire 495 AU or a fragment, which is particularly useful after losing a 496 packet carrying the last fragment of an AU. 498 AU-Index: indicates the serial number of the associated Access Unit 499 (fragment). For each (in time) consecutive AU or AU fragment, 500 the serial number is incremented with 1. When present, the 501 AU-Index field occurs in the first AU-header in the AU Header 502 Section, but MUST NOT occur in any subsequent (non-first) 503 AU-header in that Section. To encode the serial number in any 504 such non-first AU-header, the AU-Index-delta field is used. 505 When each AU-Index field is coded with the value 0, the serial 506 number of the AU (fragment) is not specified and in that case 507 receivers MAY ignore the AU-Index field. 509 AU-Index-delta: The AU-Index-delta field is an unsigned integer 510 that specifies the serial number of the associated AU as the 511 difference with respect to the serial number of the previous 512 Access Unit. Hence, for the n-th (n>1) AU the serial number is 513 found from: 514 AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1 515 If the AU-Index field is present in the first AU-header in 516 the AU Header Section, then the AU-Index-delta field MUST be 517 present in any subsequent (non-first) AU-header. When the 518 AU-Index-delta is coded with the value 0, it indicates that 519 the Access Units are consecutive in time. An AU-Index-delta 520 value larger than 0 signals that interleaving is applied. 522 CTS-flag: Indicates whether the CTS-delta field is present. 523 A value of 1 indicates that the field is present, a value of 0 524 that it is not present. 525 The CTS-flag field MUST be present in each AU-header if the 526 length of the CTS-delta field is signalled to be larger than 527 zero. In that case, the CTS-flag field MUST have the value 0 528 in the first AU-header and MAY have the value 1 in all non-first 529 AU-headers. The CTS-flag field SHOULD be 0 for any non-first 530 fragment of an Access Unit. 532 CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's 533 complement offset (delta) from the timestamp in the RTP header 534 of this RTP packet. The CTS MUST use the same clock rate as the 535 time stamp in the RTP header. 537 DTS-flag: Indicates whether the DTS-delta field is present. A value 538 of 1 indicates that DTS-delta is present, a value of 0 that it 539 is not present. 540 The DTS-flag field MUST be present in each AU-header if the 541 length of the DTS-delta field is signalled to be larger than 542 zero. The DTS-flag field SHOULD be 0 for any non-first 543 fragment of an Access Unit. 545 DTS-delta: specifies the value of the DTS as a 2's complement offset 546 (delta) from the CTS timestamp. The DTS MUST use the same clock 547 rate as the time stamp in the RTP header. 549 If present, the fields MUST occur in the mutual order given in 550 figure 3. In the general case a receiver can only discover the size 551 of an AU-header by parsing it since the presence of the CTS-delta 552 and DTS-delta fields is signalled by the value of the CTS-flag and 553 DTS-flag, respectively. 555 3.2.2 The Auxiliary Section 557 The Auxiliary Section consists of the auxiliary-data-size field 558 followed by the auxiliary-data field. Receivers MAY (but are not 559 required to) parse the auxiliary-data field; to facilitate skipping 560 of the auxiliary-data field by receivers, the auxiliary-data-size 561 field indicates the length in bits of the auxiliary-data. If the 562 concatenation of the auxiliary-data-size and the auxiliary-data 563 fields consume a non-integer number of octets, up to 7 zero padding 564 bits MUST be inserted immediately after the auxiliary data in order 565 to achieve byte-alignment. See figure 4. 567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ 568 | auxiliary-data-size | auxiliary-data |padding bits | 569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+ 571 Figure 4: The fields in the Auxiliary Section 573 The length in bits of the auxiliary-data-size field is configurable 574 by a format parameter; see section 4.1. The default length of zero 575 indicates that the entire Auxiliary Section is absent. 577 auxiliary-data-size; specifies the length in bits of the immediately 578 following auxiliary-data field; 580 auxiliary-data; the auxiliary-data field contains the Remaining SL 581 headers (RSLHs) as defined in RFC XXXX. 583 3.2.3 The Access Unit Data Section 585 The Access Unit Data Section contains an integer number of complete 586 Access Units or a single fragment of one AU. The Access Unit Data 587 Section is never empty. If data of more than one Access Units is 588 contained, then the AUs are concatenated into a contiguous string of 589 octets. See figure 5. The AUs inside the Access Unit Data Section 590 MUST be in decoding order. 592 The size and number of Access Units SHOULD be adjusted such that the 593 resulting RTP packet is not larger than the path-MTU. To handle 594 larger packets, this payload format relies on lower layers for 595 fragmentation, which may not be desirable. 597 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 598 |AU(1) | 599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- | 600 | | 601 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 602 | |AU(2) | 603 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 604 | | 605 | -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 606 | | AU(n) | 607 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 608 | | 609 |-+-+-+-+-+-+-+-+ 611 Figure 5: Access Unit Data Section; each AU is byte aligned. 613 When multiple Access Units are carried, the size of each AU MUST be 614 made available to the receiver. If the AU size is variable then the 615 size of each AU MUST be indicated in the AU-size field of the 616 corresponding AU-header. However, if the AU size is constant for a 617 stream, this mechanism SHOULD NOT be used, but instead the fixed size 618 SHOULD be signalled by the format parameter "ConstantSize", see 619 section 4.1. 621 The absence of both AU-size in the AU-header and the ConstantSize 622 format parameter indicates carriage of a single AU (fragment), i.e. 623 that a single Access Unit (fragment) is transported in each RTP 624 packet for that stream. 626 3.2.3.1 Fragmentation 628 A packet SHALL carry either one or more Access Units, or a single 629 fragment of an Access Unit. Fragments of the same Access Unit have 630 the same time stamp but differing RTP sequence numbers. The marker 631 bit in the RTP header is 1 on the last fragment of an Access Unit, 632 and 0 on all other fragments. 634 3.2.3.2 Interleaving 636 Access Units MAY be interleaved. Senders MAY perform interleaving. 637 Receivers MUST support interleaving. 639 When interleaving of Access Units is used it SHALL be implemented 640 using the AU-Index and AU-Index-delta fields in the AU-header. 642 Based on the RTP sequence number, the RTP time stamp, the AU-Index and 643 the AU-Index-delta, a receiver can unambiguously reconstruct the 644 original order even in case of out-of-order packets, packet loss or 645 duplication. Note that for this purpose the AU-Index is redundant when 646 the RTP time stamp and the AU-Index-delta values are sufficient for 647 placing the AUs correctly in time. In such cases receivers MAY ignore 648 the AU-Index value and senders MAY code the AU-Index field with the 649 value 0, but only if they code each AU-Index field with that value. 651 When interleaving is applied, a de-interleave buffer is needed in 652 receivers to put the Access Units in their correct logical consecutive 653 order in time. This requires the computation of the time stamp for 654 each Access Unit. In case of a fixed time duration per Access Unit, 655 the time-stamp of each access unit i in an RTP packet with RTP 656 time-stamp T is calculated as follows: 658 Timestamp[0] = T 659 Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k] 660 + 1))) * access-unit-duration 662 When AU-Index-delta is always 0, this reduces to T + I * (access-unit- 663 duration). This is the non-interleaved case, the frames are consecutive 664 in time. Note that the AU-Index field (present for the first Access 665 Unit) is not needed in this calculation. Hence in cases where the 666 Access-unit-duration has a fixed and known value, the AU-Index does not 667 need to provide index information and can be coded with the value 0. 668 See also the semantics of the AU-Index field in 3.2.1.1. 670 When an RTP packet arrives (after any re-ordering has been done), 671 receivers may 'flush' all Access Units from the interleave buffer 672 which have a time-stamp strictly less than the time-stamp of the 673 arriving packet. Similarly the first Access Unit of every arriving 674 packet can always be flushed (as no following packet can provide an 675 earlier Access Unit), and any Access Units which are consecutive with 676 it which have already been received. Access Units should also be 677 flushed in time to be played; this can be important if there is loss 678 before end-of-stream, before a silence interval, or before a large 679 drop-out. 681 3.2.3.3 Constraints for interleaving 683 The size of the packets should be suitably chosen to be appropriate 684 to both the path MTU and the duration and capacity of the receiver's 685 de-interleave buffer. The maximum packet size for a session should be 686 chosen not to exceed the path MTU. 688 In order to control receiver latency and mitigate the effects of loss, 689 there are profile-based limits on the size of the packet. This is 690 expressed as a duration: it is calculated from the duration of the 691 Access Units contained within a packet. It is NOT the difference in 692 time-stamp between the first and last Access Unit in a packet. 694 No matter what interleaving scheme is used, the scheme must be 695 analyzed to calculate the minimum number of frames a receiver has to 696 buffer in order to de-interleave. 698 The maximum packet duration in milliseconds, and the maximum 699 de-interleave buffer required at the receiver, for the two profiles, 700 shall not exceed: 702 RTP transport profile 0 -- 200 milliseconds 703 RTP transport profile 1 -- 500 milliseconds 705 When interleaving is applied, the applied RTP transport profile MUST 706 be signalled by the profile parameter; see section 4.1. 708 Note that for low bit-rate material, the duration limit may make 709 packets shorter than the MTU size. 711 3.3 Usage of this specification 713 3.3.1 General 715 Usage of this specification requires definition of a mode. A mode 716 defines how use this specification for transport of one or more types 717 of MPEG-4 streams. Each mode may specify constraints and payload 718 configurations as deemed appropriate. 720 Senders MUST signal the mode that they use by the format parameter 721 Mode. In this document only modes are defined for transport of MPEG-4 722 CELP and AAC streams, but more modes are expected to be defined in 723 future RFCs. 725 3.3.2 Modes for MPEG-4 CELP and AAC streams 727 Four modes are defined for transport of MPEG-4 CELP and AAC streams. 728 In each of these modes, the same requirements apply for the rtpmap 729 attributes. The general form of an rtpmap attribute is: 730 a=rtpmap:/[/] 732 For audio streams, specifies the number of 733 audio channels. This parameter may be omitted if the number of 734 channels is one, provided no additional parameters are needed. 735 In all four modes, the following attributes are REQUIRED: 736 a) The encoding name 737 b) The RTP clock rate MUST be expressed. It is RECOMMENDED that this 738 be the sampling rate of the audio, to give sample-accurate timing. 739 However, other rates MAY be used (e.g. 90 kHz). 740 c) The number of audio channels MUST be specified, for example as 2 741 for stereo material (see RFC 2327) and MAY be specified as 1 for 742 mono material; 1 is the default. 744 3.3.3 Constant bit-rate CELP. 746 This mode is signalled by mode=CELP-cbr. In this mode one or more 747 fixed size CELP frames can be transported in one RTP packet; there is 748 no support for interleaving. The RTP payload consist of one or more 749 concatenated CELP frames, each of the same size. Both the AU Header 750 Section and the Auxiliary Section are empty. 752 The format parameter ConstantSize MUST be provided to specify the 753 length of each CELP frame. 755 For an example see below. 757 m=audio 49230 RTP/AVP 96 758 a=rtpmap:96 mpeg-generic/44100/2 759 a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-cbr; config= 760 AudioSpecificConfig(); ConstantSize=xxx; 762 The AudioSpecificConfig() specifies that the audio stream type is CELP. 764 3.3.4 Variable bit-rate CELP 766 This mode is signalled by mode=CELP-vbr. With this mode in one RTP 767 packet one or more variable size CELP frames can be transported with 768 optional interleaving. As the largest possible frame size in this mode 769 is greater than the maximum CELP frames size, there is no support for 770 fragmentation on the CELP frames. 772 In this mode the RTP payload consists of the AU Header Section, 773 followed by one or more concatenated CELP frames. The Auxiliary Section 774 is empty. For each CELP frame contained in the payload there is a one 775 octet AU-header in the AU Header Section to provide : 776 (a) the size of each CELP frame in the payload and 777 (b) index information for computing the sequence (and hence timing) of 778 each CELP frame. 779 Transport of CELP frames requires that the AU-size field is coded with 780 6 bits. In this mode therefore 6 bits are allocated to the AU-size 781 field, and 2 bits to the AU-Index(-delta) field. Each AU-Index field 782 MUST be coded with the value 0. In the AU Header Section, the 783 concatenated AU-headers are preceded by the 16-bit AU-headers-length 784 field, as specified in 3.2.1. 786 Next to the required format parameters, the following parameters MUST 787 be present: 788 SizeLength, IndexLength, and IndexDeltaLength. 789 When interleaving is applied (AU-Index-delta coded with a value larger 790 than 0), also the parameter Profile MUST be present. 792 Example : 794 m=audio 49230 RTP/AVP 96 795 a=rtpmap:96 mpeg4-generic/44100/2 796 a=fmtp:96 streamtype=5; profile-level-id=15; mode=CELP-vbr; config= 797 AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2; 798 Profile=1 800 The AudioSpecificConfig() specifies that the audio stream type is CELP. 802 3.3.5 Low bit-rate AAC 804 This mode is signalled by AAC-lbr. This mode supports transport of one 805 or more variable size AAC frames with optional support for interleaving 806 and fragmenting. The maximum size of an AAC frame (fragment) in this 807 mode is 63 octets. 809 The payload configuration in this mode is the same as in the variable 810 bit-rate CELP mode as defined in 3.3.4. The RTP payload consists of the 811 AU Header Section, followed by concatenated AAC frames. The Auxiliary 812 Section is empty. For each AAC frame contained in the payload the one 813 octet AU-header provides : 814 (a) the size of each AAC frame in the payload and 815 (b) index information for computing the sequence (and hence timing) of 816 each AAC frame. 817 In the AU-header, the AU-size is coded with 6 and the AU-Index(-delta) 818 with 2 bits; the AU-Index field MUST have the value 0 in each AU-header. 819 In the AU-header Section, the concatenated AU-headers are preceded by 820 the 16-bit AU-headers-length field, as specified in 3.2.1. 822 Next to the required format parameters, the following parameters MUST 823 be present: 824 SizeLength, IndexLength, and IndexDeltaLength. 825 When interleaving is applied (AU-Index-delta coded with a value larger 826 than 0), also the parameter Profile MUST be present. 828 Example : 830 m=audio 49230 RTP/AVP 96 831 a=rtpmap:96 mpeg4-generic/44100/2 832 a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-lbr; config= 833 AudioSpecificConfig(); SizeLength=6; IndexLength=2; IndexDeltaLength=2; 834 Profile=1 836 The AudioSpecificConfig() specifies that the audio stream type is AAC. 838 3.3.6 High bit-rate AAC 840 This mode is signalled by mode=AAC-hbr. This mode supports transport 841 of one or more large variable size AAC frames in one RTP packet with 842 optional support for interleaving and fragmenting. The maximum size of 843 an AAC frame (fragment) in this mode is 8191 bytes. 845 In this mode the RTP payload consists of the AU Header Section, 846 followed by one or more concatenated AAC frames. The Auxiliary Section 847 is empty. For each AAC frame contained in the payload there is an 848 AU-header in the AU Header Section to provide : 849 (a) the size of each AAC frame in the payload and 850 (b) index information for computing the sequence (and hence timing) of 851 each AAC frame. 852 To code the maximum size of an AAC frame requires 13 bits. Therefore in 853 this configuration 13 bits are allocated to the AU-size, and 3 bits 854 to the AU-Index(-delta) field. Thus each AU-header has a size of 2 855 octets. Each AU-Index field MUST be coded with the value 0. In the 856 AU Header Section, the concatenated AU-headers are preceded by the 857 16-bit AU-headers-length field, as specified in 3.2.1. 859 Next to the required format parameters, the following parameters MUST 860 be present: 861 SizeLength, IndexLength, and IndexDeltaLength. 862 When interleaving is applied (AU-Index-delta coded with a value larger 863 than 0), also the parameter Profile MUST be present. 865 Example : 866 m=audio 49230 RTP/AVP 96 867 a=rtpmap:96 mpeg4-generic/44100/2 868 a=fmtp:96 streamtype=5; profile-level-id=15; mode= AAC-hbr; config= 869 AudioSpecificConfig(); SizeLength=13; IndexLength=3; IndexDeltaLength=3; 870 Profile=1 872 The AudioSpecificConfig() specifies that the audio stream type is AAC. 874 4. IANA considerations 876 This payload format uses the same the MIME types and names as defined 877 in RFC XXXX. However, some additional format parameters are defined. 879 Depending on the required payload configuration, format parameters may 880 need to be available to the receiver. This is done using the parameters 881 described in the next section. The absence of any of these parameters 882 is equivalent to the associated field set to its default value, which 883 is always zero. The absence of any such parameters resolves into a 884 default "basic" configuration. 886 MIME subtype name: mpeg4-generic 888 Required parameters: 890 StreamType: 892 The integer value that indicates the type of MPEG-4 stream that is 893 carried; its coding corresponds to the values of the streamType as 894 defined for the DecoderConfigDescriptor in ISO/IEC 14496-1. 896 Profile-level-id: 897 A decimal representation of the MPEG-4 Profile Level indication. 898 This parameter MUST be used in the capability exchange or session 899 set-up procedure to indicate the MPEG-4 Profile and Level 900 combination of which the relevant MPEG-4 media codec is capable 901 of. 902 For audio streams, this parameter is the decimal value from Table 5 903 (audioProfileLevelIndicationValues) in ISO/IEC 14496-1, indicating 904 which MPEG-4 Audio tool subsets are applied to encode the audio 905 stream. 906 For visual streams, this parameter is the decimal value from Table 907 G-1 (FLC table for profile and level indication of ISO/IEC 14496-2, 908 indicating which MPEG-4 Visual tool subsets are applied to encode 909 the visual stream. 911 Config: 912 A hexadecimal representation of an octet string that expresses the 913 media payload configuration. Configuration data is mapped onto the 914 octet string in an MSB-first basis. The first bit of the 915 configuration data SHALL be located at the MSB of the first octet. 916 In the last octet, if necessary to achieve byte alignment, up to 917 7 zero-valued padding bits shall follow the configuration data. 918 For audio streams, config is the audio object type specific decoder 919 configuration data AudioSpecificConfig() as defined in ISO/IEC 920 14496-3. 921 For visual streams, config is the MPEG-4 Visual configuration 922 information, as defined in subclause 6.2.1 Start codes of 923 ISO/IEC14496-2. The configuration information indicated by this 924 parameter SHALL be the same as the configuration information in the 925 corresponding MPEG-4 Visual stream, except for first-half-vbv- 926 occupancy and latter-half-vbv-occupancy, if it exists, which may 927 vary in the repeated configuration information inside an MPEG-4 928 Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2). 930 Optional parameters: 932 Mode: 933 The mode in which this specification is used. The following modes 934 can be signalled : 935 mode=CELP-cbr, 936 mode=CELP-vbr, 937 mode=AAC-lbr and 938 mode=AAC-hbr. 939 Other modes are expected to be defined in future RFCs. When defining 940 a new mode care MUST be taken that an implementation of all features 941 of this specification can decode the payload format corresponding to 942 this new mode. For this reason a mode MUST NOT specify new default 943 values for MIME parameters; in particular, MIME parameters MUST be 944 present (unless they have the default value), even if it is redundant 945 in case the mode assigns fixed values. A mode may define additionally 946 that some MIME parameters are required instead of optional, that some 947 MIME parameters have fixed values (or ranges), and that there are 948 rules restricting the usage. 950 ConstantSize: 951 The constant size in octets of each Access Unit for this stream. 952 Simultaneous presence of ConstantSize and the SizeLength 953 parameters is not permitted. 955 SizeLength: 956 The number of bits on which the AU-size field is encoded in the 957 AU-header. Simultaneous presence of SizeLength and the ConstantSize 958 parameter is not permitted. 960 IndexLength: 961 The number of bits on which the AU-Index is encoded in the first 962 AU-header. The default value of zero indicates the absence of the 963 AU-Index and AU-Index-delta fields in each AU-header. 965 IndexDeltaLength: 966 The number of bits on which the AU-Index-delta field is encoded in 967 any non-first AU-header. 969 CTSDeltaLength: 970 The number of bits on which the CTS-delta field is encoded in the 971 AU-header. 973 DTSDeltaLength: 974 The number of bits on which the DTS-delta field is encoded in the 975 AU-header. 977 AuxiliaryDataSizeLength: 978 The number of bits that is used to encode the auxiliary-data-size 979 field. 981 Profile: 982 The decimal representation of the RTP transport profile. 984 Applications MAY use more parameters, in addition to those defined 985 above. Receivers MUST tolerate the presence of such additional 986 parameters, but these parameters SHALL not impact the decoding of 987 receivers that comply to this specification. 989 Encoding considerations: 990 System bitstreams MUST be generated according to MPEG-4 System 991 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 992 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 993 bitstreams MUST be generated according to MPEG-4 Visual 994 specifications (ISO/IEC 14496-3). The RTP packets MUST be packetized 995 according to the RTP payload format defined in RFC . 998 Security considerations: 999 As in RFC . 1001 Interoperability considerations: 1002 MPEG-4 provides a large and rich set of tools for the coding of 1003 visual objects. For effective implementation of the standard, 1004 subsets of the MPEG-4 tool sets have been provided for use in 1005 specific applications. These subsets, called 'Profiles', limit the 1006 size of the tool set a decoder is required to implement. In order to 1007 restrict computational complexity, one or more 'Levels' are set for 1008 each Profile. A Profile@Level combination allows: 1009 . a codec builder to implement only the subset of the standard he 1010 needs, while maintaining interworking with other MPEG-4 devices 1011 included in the same combination, and 1012 . checking whether MPEG-4 devices comply with the standard 1013 ('conformance testing'). 1014 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1015 by the parameter "profile-level-id". Interoperability between a 1016 sender and a receiver may be achieved by specifying the parameter 1017 "profile-level-id" in MIME content, or by arranging in the 1018 capability exchange/announcement procedure to set this parameter 1019 mutually to the same value. 1021 Published specification: 1022 The specifications for MPEG-4 streams are presented in ISO/IEC 1023 14469-1, 14469-2, and 14469-3. The RTP payload format is described 1024 in RFC . 1026 Applications which use this media type: 1027 Multimedia streaming and conferencing tools, Internet messaging and 1028 Email applications. 1030 Additional information: none 1032 Magic number(s): none 1034 File extension(s): 1035 None. A file format with the extension .mp4 has been defined for 1036 MPEG-4 content but is not directly correlated with this MIME type 1037 which sole purpose is RTP transport. 1039 Macintosh File Type Code(s): none 1041 Person & email address to contact for further information: 1042 Authors of RFC . 1044 Intended usage: COMMON 1046 Author/Change controller: 1047 Authors of RFC . 1049 4.2 Concatenation of parameters 1051 Multiple parameters SHOULD be expressed as a MIME media type string, 1052 in the form of a semicolon-separated list of parameter=value pairs 1053 (for parameter usage examples see Appendix A). 1055 4.3 Usage of SDP 1057 4.3.1 The a=fmtp keyword 1059 It is assumed that one typical way to transport the above-described 1060 parameters associated with this payload format is via a SDP message 1061 [7] for example transported to the client in reply to a RTSP DESCRIBE 1062 of via SAP. In that case the (a=fmtp) keyword MUST be used as 1063 described in RFC 2327 [7, section 6]. The syntax being then: 1065 a=fmtp: =[; =] 1067 5. Security Considerations 1069 No additional security considerations apply beyond those discussed in 1070 RFC 1889 and RFC XXXX. 1072 6. Acknowledgements 1074 This document evolved through several revisions thanks to contributions 1075 from a people from the ISMA forum, from the IETF AVT working group and 1076 the 4-on-IP ad-hoc group within MPEG. The authors wish to thank all 1077 involved people, and in particular Colin Perkins, Stephan Wenger and 1078 Dorairaj V for their valuable comments and support. 1080 7. References 1082 [1] ISO/IEC International Standard 14496 (MPEG-4); "Information 1083 technology - Coding of audio-visual objects", January 2000 1085 [2] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport 1086 Protocol for Real Time Applications RFC 1889, Internet Engineering 1087 Task Force, January 1996. 1089 [3] S. Bradner, Key words for use in RFCs to Indicate Requirement 1090 Levels, RFC 2119, March 1997. 1092 [4] D. Hoffman, G. Fernando, V. Goyal, M. Civanlar, RTP payload 1093 format for MPEG1/MPEG2 Video, RFC 2250, January 1998. 1095 [5] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 1096 payload format for MPEG-4 Audio/Visual streams, RFC 3016. 1098 [6] Avaro, Basso, Casner, Civanlar, Gentric, Herpel, Lim, Perkins, 1099 van der Meer, RTP payload format for MPEG-4 streams, work in progress, 1100 draft-gentric-avt-mpeg4-multiSL-01.txt, January 2001. 1102 [7] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327, 1103 Internet Engineering Task Force, April 1998. 1105 7. Author Adresses 1107 Jan van der Meer 1108 Philips Digital Networks 1109 Cederlaan 4 1110 5600 JB Eindhoven 1111 Netherlands 1112 Email : jan.vandermeer@philips.com 1114 David Mackie 1115 Cisco Systems Inc. 1116 170 West Tasman Dr. 1117 San Jose, CA 95034 1118 Email: dmackie@cisco.com 1120 Viswanathan Swaminathan 1121 Sun Microsystems Inc. 1122 901 San Antonio Road, M/S UMPK15-214 1123 Palo Alto, CA 94303 1124 Email: viswanathan.swaminathan@sun.com 1126 David Singer 1127 Apple Computer, Inc. 1128 One Infinite Loop, MS:302-3MT 1129 Cupertino CA 95014 1130 Email: singer@apple.com 1132 Full Copyright Statement 1134 "Copyright (C) The Internet Society (date). All Rights Reserved. This 1135 document and translations of it may be copied and furnished to others, 1136 and derivative works that comment on or otherwise explain it or assist 1137 in its implementation may be prepared, copied, published and 1138 distributed, in whole or in part, without restriction of any kind, 1139 provided that the above copyright notice and this paragraph are 1140 included on all such copies and derivative works. However, this 1141 document itself may not be modified in any way, such as by removing 1142 the copyright notice or references to the Internet Society or other 1143 Internet organizations, except as needed for the purpose of developing 1144 Internet standards in which case the procedures for copyrights defined 1145 in the Internet Standards process MUST be followed, or as required to 1146 translate it into. 1148 APPENDIX: Usage of this payload format 1150 Appendix A. Examples 1152 A.1 Examples of delay analysis with interleave 1154 A.1.1 Group interleave 1156 An example of regular interleave is when packets are formed into 1157 groups. If the number of packets in a group is N, packet 0 contains 1158 frame 0, frame N, frame 2N, and so on; packet 1 contains frame 1, 1159 frame 1+N, 1+2N, and so on. The AU-Index field is used to document 1160 the sequence of the packet within the group (or the first frame in the 1161 packet, which is the same thing in this scheme), and all the 1162 AU-Index-delta fields contain N-1. 1164 Receivers can tell when a new interleave group is starting, by noting 1165 that the computed time-stamp of the first frame in a packet is later 1166 than any previously computed time-stamp. This is because no 1167 following packet can contain an earlier RTP timestamp (RTP rules), 1168 and the second and subsequent frames in a packet have larger 1169 time-stamps (the frames in a packet are also in time-order). 1171 If the group size is 3, then packets are formed as follows: 1173 Packet Time-stamp Frame Numbers AU-Index, AU-Index-delta 1174 0 T[0] 0, 3, 6 0, 2, 2 1175 1 T[1] 1, 4, 7 0, 2, 2 1176 2 T[2] 2, 5, 8 0, 2, 2 1177 3 T[9] 9,12,15 0, 2, 2 1179 In this case, the receiver would have to buffer 4 frames at least 1180 from packets 0 and 1, and can flush all frames when packet 2 arrives. 1181 (Frame 0 can be flushed as packet 0 arrives, since it is the earliest 1182 frame we hold, and likewise frame 1 from packet 1; we are therefore 1183 holding 3,4,6,7 until packet 2 arrives). 1185 If there is loss, then the receiver may wait longer than is strictly 1186 necessary before it emits frames. For example, say packet 1 is lost 1187 from the above example. Packet 0 allows frame 0 to be emitted, and 1188 then packet 2 arrives, allowing us to notice the loss of frame 1, and 1189 emit frame 2 and 3. Then it is not until the arrival of packet 3 1190 (which has a time-stamp beyond the times of all the frames seen so 1191 far), that we can finish dealing with the loss, even though the first 1192 group has, in fact, ended. (This is in contrast to schemes which 1193 signal the group size explicitly; if the receiver knows that this is 1194 packet 3 of 3, then even if 2 of 3 is missing, it can de-interleave 1195 this group without waiting for the next one to start). 1197 In the above example the AU-Index is coded with the value 0, as 1198 required for the modes defined in this document. To reconstruct the 1199 original order, the RTP time stamp and the AU-Index-delta are used. 1200 See also 3.2.3.2. 1202 A.1.2 Continuous interleave 1204 In continuous interleave, once the scheme is 'primed', the number of 1205 frames in a packet exceeds the 'stride' (the distance between them). 1206 This shortens the buffering needed, smooths the data-flow, and gives 1207 slightly larger packets -- and thus lower overhead -- for the same 1208 interleave. For example, here is a continuous interleave also over a 1209 stride of 3 frames, but with 4 frames per packet, for a run of 20 1210 frames. This shows both how the scheme 'starts up' and how it 1211 finishes. 1213 Packet Time-stamp Frame Numbers AU-Index, AU-Index-delta 1214 0 T[0] 0 0 1215 1 T[1] 1 4 0 2 1216 2 T[2] 2 5 8 0 2 2 1217 3 T[3] 3 6 9 12 0 2 2 2 1218 4 T[7] 7 10 13 16 0 2 2 2 1219 5 T[11] 11 14 17 20 0 2 2 2 1220 6 T[15] 15 18 0 2 1221 7 T[19] 19 0 1223 In this case, the receiver has to buffer only 3 frames, not 4. Say 1224 we are waiting for packet 4. We can flush frames 0, 1, 2, 3, 4, 5, 1225 6; we are holding therefore 8, 9, 12. Packet 4 arrives, allowing 1226 us to emit 7,8,9,10, and we are holding 12,13,16. Each arriving 1227 packet contains 4 frames, and allows 4 frames to be flushed. 1229 In the above example the AU-Index is coded with the value 0, as 1230 required for the modes defined in this document. To reconstruct the 1231 original order, the RTP time stamp and the AU-Index-delta are used. 1232 See also 3.2.3.2. 1234 If there is loss, again the receiver has to wait to emit the erasure 1235 frames. In this case, say packet 3 is lost. We were holding frames 1236 4, 5, and 8. On the arrival of packet 4, (time-stamp of frame 7), we 1237 now know frame 3 was lost, we can emit frames 4,5, and we know 6 must 1238 be lost, and emit 7, which is in the packet that arrived. Then on 1239 the arrival of packet 5 (time-stamp 11) we can emit 8, indicate loss 1240 of 9, and emit 10 and 11. Finally, the arrival of packet 6 1241 (time-stamp 15) indicates that 12 must be lost; we have now detected 1242 all the lost frames.