idnits 2.17.1 draft-ietf-avt-mpeg4-multisl-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 55 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 281 has weird spacing: '... media unawa...' == Line 1067 has weird spacing: '...aLength bits)...' == Line 2734 has weird spacing: '...dicated with:...' == Line 2786 has weird spacing: '...t it is also ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 2002) is 7896 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 716, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '7') (Obsoleted by RFC 6416) == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-04 == Outdated reference: A later version (-04) exists of draft-singer-mpeg4-ip-02 -- Possible downref: Normative reference to a draft: ref. '9' ** Obsolete normative reference: RFC 2327 (ref. '10') (Obsoleted by RFC 4566) == Outdated reference: A later version (-03) exists of draft-curet-avt-rtp-mpeg4-flexmux-00 -- Possible downref: Normative reference to a draft: ref. '11' ** Obsolete normative reference: RFC 1890 (ref. '12') (Obsoleted by RFC 3551) ** Obsolete normative reference: RFC 2326 (ref. '13') (Obsoleted by RFC 7826) ** Downref: Normative reference to an Experimental RFC: RFC 2974 (ref. '14') Summary: 9 errors (**), 0 flaws (~~), 11 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Basso-AT&T 3 Internet Draft Civanlar-AT&T 4 Gentric-Philips 5 Herpel-Thomson 6 Lifshitz-Optibase 7 Lim-mp4cast 8 Perkins-ISI 9 Van Der Meer-Philips 10 February 2002 11 Expires August 2002 12 Document: draft-ietf-avt-mpeg4-multisl-04.txt 14 RTP Payload Format for MPEG-4 Streams 16 Status of this Memo 18 This document is an Internet-Draft and is in full conformance with 19 all provisions of Section 10 of RFC2026. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. Internet-Drafts are draft documents valid for a maximum of 25 six months and may be updated, replaced, or obsoleted by other 26 documents at any time. It is inappropriate to use Internet- Drafts 27 as reference material or to cite them other than as "work in 28 progress." 30 This specification is a product of the Audio/Video Transport working 31 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 32 ad hoc group on MPEG-4 over Internet. Comments are solicited and 33 should be addressed to the working group's mailing list at 34 avt@ietf.org and/or the authors. 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 << 42 Note for the RFC editor: 43 XXXX should be replaced with this RFC number and YYYY replaced by 44 the number given to the companion RFC which draft is: draft-ietf- 45 avt-mpeg4-simple-**.txt. 46 This document also contains a MIME type registration form that is 47 intended to be taken as-is and therefore makes reference to this 48 document, using the temporary placeholder: XXXX. 49 >> 51 Gentric et al. Expires August 2002 1 52 RTP Payload Format for MPEG-4 Streams February 2002 54 Abstract 56 This document describes a payload format for transporting MPEG-4 57 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 58 the coding of natural and synthetic audio-visual data. Several 59 services provided by RTP are beneficial for MPEG-4 encoded data 60 transport over the Internet. Additionally, the use of RTP makes it 61 possible to synchronize MPEG-4 data with other real-time data types. 63 Table of Contents 65 1. Introduction....................................................3 66 1.1 Overview of MPEG-4 End-System Architecture.....................3 67 1.2 The simplified MPEG-4 terminal model...........................4 68 1.3 The complete MPEG-4 terminal model.............................4 69 1.3.1 The Sync Layer and DMIF......................................6 70 2. Analysis of the carriage of MPEG-4 over IP......................8 71 2.1 The Sync Layer point of view...................................8 72 2.2 The Elementary Stream point of view............................9 73 2.3 How the two views reconcile...................................10 74 2.4 Rationale for features........................................11 75 2.5 Relation with RFC 3016........................................11 76 3. Payload format.................................................13 77 3.1 RTP Header Fields Usage.......................................14 78 3.2 RTP payload structure.........................................16 79 3.3 Payload Header Section structure..............................17 80 3.3.1 Payload Header structure....................................18 81 3.3.2 Fields of a Payload Header..................................19 82 3.4 RSLHSection structure.........................................21 83 3.4.1 RSLH structure..............................................22 84 3.4.2 Removal of fields...........................................22 85 3.4.3 Mapping of OCR..............................................23 86 3.4.4 Degradation Priority........................................23 87 3.5 Payload Section structure.....................................23 88 3.6 Interleaving..................................................24 89 3.6.1 Time stamp based interleaving (TSBI)........................25 90 3.6.2 Index based interleaving (IBI)..............................26 91 3.6.3 SL streams that should not be interleaved...................26 92 3.7 Fragmentation Rules...........................................26 93 4. Types and names................................................28 94 4.1 MIME type registration........................................28 95 4.2 Concatenation of parameters...................................33 96 4.3 Usage of SDP..................................................33 97 4.3.1 The a=fmtp keyword..........................................33 98 4.3.2 SDP example.................................................33 99 5. IANA considerations............................................34 100 6. Other issues...................................................34 101 6.1 SL-packetized stream reconstruction...........................34 102 6.2 Handling of scene description streams.........................38 103 6.3 Overlap with RFC 3016.........................................39 104 6.4 Multiplexing..................................................40 105 7. Security considerations........................................41 106 8. Acknowledgements...............................................42 108 Gentric et al. Expires March 2002 2 109 RTP Payload Format for MPEG-4 Streams February 2002 111 9. References.....................................................42 112 10. Authors's addresses...........................................43 113 APPENDIX: Examples of usage.......................................44 114 Appendix.1 RFC 3016 compatible MPEG-4 Video (no SL)...............44 115 Appendix.2 MPEG-4 Video with SL...................................46 116 Appendix.3 Low delay MPEG-4 Audio (no SL).........................48 117 Appendix.4 Media delivery MPEG-4 Audio (no SL)....................50 118 Appendix.5 AAC with interleaving (no SL)..........................51 119 Appendix.6 AAC with Index-based interleaving and SL...............53 121 1. Introduction 123 MPEG-4 is a recent standard from ISO/IEC for the coding of natural 124 and synthetic audio-visual data in the form of audiovisual objects 125 that are arranged into an audiovisual scene by means of a scene 126 description [1][2][3][4]. This draft specifies an RTP [5] payload 127 format for transporting MPEG-4 encoded data streams. It supplements 128 RFC 3016 in the respect that it can transport all MPEG-4 stream 129 types while being compatible with RFC 3016 for the transport of 130 MPEG-4 video. 132 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 133 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 134 this document are to be interpreted as described in RFC 2119 [6]. 136 The benefits of using RTP for MPEG-4 data stream transport include: 138 i. Ability to synchronize MPEG-4 streams with other RTP payloads, 139 one example is the transport and synchronization of MPEG-4 video 140 associated with AMR audio in mobile networks. 142 ii. Monitoring MPEG-4 delivery performance through RTCP. 144 iii. Combining MPEG-4 and other real-time data streams received from 145 multiple end-systems into a set of consolidated streams through RTP 146 mixers. 148 iv. Converting data types, etc. through the use of RTP translators. 150 1.1 Overview of MPEG-4 End-System Architecture 152 Two types of terminals can use this specification. One case is a 153 complete MPEG-4 terminal i.e. a terminal implementing the MPEG-4 154 system [1] specification and possibly also MPEG-4 video [2] and 155 audio [3]. Another possibility is a terminal implementing only a 156 part of this set of MPEG-4 specification; one example is a terminal 157 using MPEG-4 video [2] but not MPEG-4 systems as in RFC3016. 159 This document is structured so as to be understandable from both 160 points of view (with or without MPEG-4 systems). The target is also 161 that services deployed for one type of terminal can be adapted for 162 the other type with only a minor change in the session description 164 Gentric et al. Expires March 2002 3 165 RTP Payload Format for MPEG-4 Streams February 2002 167 because the media formats are the same. Another key assumption is 168 that the properties of streams of various types (video, audio, scene 169 description) can be described with the same Elementary Stream model 170 so that this same payload format can transport any MPEG-4 stream. 172 1.2 The simplified MPEG-4 terminal model 174 In the simplified MPEG-4 model MPEG-4 systems [1] is not used. 175 However the concept of Elementary Stream remains, by MPEG 176 definition: "A consecutive flow of mono-media data from a single 177 source entity to a single destination entity on the compression 178 layer". Indeed both MPEG-4 video [2] and MPEG-4 audio [3] documents 179 describe how respectively audio and video bit streams are fragmented 180 into pieces that are called Access Units, again by MPEG definition: 181 "An individually accessible portion of data within an Elementary 182 Stream. An access unit is the smallest data entity to which timing 183 information can be attributed". Each Access Unit has by this 184 definition a number of media independent basic properties: 185 . Composition time stamp (CTS) 186 . Framing 187 . Possibly decoding time stamp (DTS) 189 Furthermore both the video [2] and audio [3] specification also 190 define how Access Units (AU) shall be themselves fragmented since in 191 the spirit of Application Level Framing AUs should be fragmented in 192 such a way that decoders can process the packets arriving 193 immediately after a packet loss. In this case the signaling of 194 Access Unit fragment boundaries is also required. 196 In order to be understandable from this point of view this payload 197 format is described in terms of Access Units (AU) and Access Units 198 fragments. This specification does not make reference to media 199 specific properties (but for a few exceptions). Indeed it is the 200 purpose of this specification to provide RTP transport for all media 201 types in MPEG-4 in a generic fashion. 203 In this mode of operation the RTP framework is used for transport of 204 timing and synchronization and protocols such as H.323, SIP, RTSP, 205 etc, can be used for control. 207 1.3 The complete MPEG-4 terminal model 209 Fig. 1 below shows the layered architecture of a terminal, which 210 implements the complete MPEG-4 systems model. The Compression Layer 211 processes individual audio-visual media streams. The MPEG-4 212 compression schemes are defined in the ISO/IEC specifications 14496- 213 2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve 214 efficient encoding over a bandwidth ranging from a few kbps to many 215 Mbps. The audio-visual content compressed by this layer is organized 216 into Elementary Streams (ESs). 218 The MPEG-4 standard specifies MPEG-4 compliant streams. Within the 219 constraint of this compliance the compression layer is unaware of a 221 Gentric et al. Expires March 2002 4 222 RTP Payload Format for MPEG-4 Streams February 2002 224 specific delivery technology, but it can be made to react to the 225 characteristics of a particular delivery layer such as the path-MTU 226 or loss characteristics. Also, some compressors can be designed to 227 be delivery specific for implementation efficiency. In such cases 228 the compressor may work in a non-optimal fashion with delivery 229 technologies that are different than the one it is specifically 230 designed to operate with. 232 The hierarchical relations, location and properties of ESs in a 233 presentation are described by a dynamic set of Object Descriptors 234 (ODs). Each OD groups one or more ES Descriptors referring to a 235 single content item (audio-visual object). Hence, multiple 236 alternative or hierarchical representations of each content item are 237 possible. 239 ODs are themselves conveyed through one or more ESs. A complete set 240 of ODs can be seen as an MPEG-4 resource or session description at a 241 stream level. The resource description may itself be hierarchical, 242 i.e. an ES conveying an OD may describe other ESs conveying other 243 ODs. 245 The session description is accompanied by a dynamic scene 246 description, Binary Format for Scene (BIFS), again conveyed through 247 one or more ESs. At this level, content is identified in terms of 248 audio-visual objects. The spatio-temporal location of each object is 249 defined by BIFS. The audio-visual content of those objects that are 250 synthetic and static are described by BIFS also. Natural and 251 animated synthetic objects may refer to an OD that points to one or 252 more ESs that carries the coded representation of the object or its 253 animation data. 255 Gentric et al. Expires March 2002 5 256 RTP Payload Format for MPEG-4 Streams February 2002 258 media aware +-----------------------------------------+ 259 delivery unaware | COMPRESSION LAYER | 260 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 261 14496-3 Audio +-----------------------------------------+ 263 Elementary 264 Stream 265 ===================================================Interface 267 (ESI) 268 +-------------------------------------------+ 269 media and | SYNC LAYER | 270 delivery unaware | manages elementary streams, their synch- | 271 14496-1 Systems | ronization and hierarchical relations | 272 +-------------------------------------------+ 274 DMIF 275 Application 276 ====================================================Interface 278 (DAI) 279 +-------------------------------------------+ 280 delivery aware | DELIVERY LAYER | 281 media unaware |provides transparent access to and delivery| 282 14496-6 DMIF | of content irrespective of delivery | 283 | technology | 284 +-------------------------------------------+ 286 Figure 1: Conceptual MPEG-4 terminal architecture 288 By conveying the session (or resource) description as well as the 289 scene (or content composition) description through their own ESs, it 290 is made possible to change portions of the content composition and 291 the number and properties of media streams that carry the audio- 292 visual content separately and dynamically at well known instants in 293 time. 295 One or more initial Scene Description streams and the corresponding 296 OD stream are pointed to by an initial object descriptor (IOD). In 297 this context the IOD needs to be made available to the receivers 298 through some out-of-band means that are out of scope of this payload 299 specification. However in the context of transport on IP networks it 300 is defined in a separate document [9]. 302 The Compression Layer organizes the ESs in Access Units (AU), the 303 smallest elements that can be attributed individual timestamps. The 304 Access Units concept defines the boundary between media specific 305 processing and delivery specific processing. That is to say 306 transport should not depend on the nature of the media data but only 307 on AU properties. 309 Gentric et al. Expires March 2002 6 310 RTP Payload Format for MPEG-4 Streams February 2002 312 1.3.1 The Sync Layer and DMIF 314 The Sync Layer (SL) that primarily provides the synchronization 315 between streams defines a homogeneous encapsulation of ESs carrying 316 media or control data (ODs, BIFS). Integer or fractional AUs are 317 then encapsulated in SL packets. 319 All consecutive data from one stream is called an SL-packetized 320 stream. The interface between the compression layer and the SL is 321 called the Elementary Stream Interface (ESI). The ESI is informative 322 i.e. it is extremely useful in order to define concepts and 323 mechanisms but does not have to be implemented. 325 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 326 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 327 media unaware but delivery technology aware. It provides transparent 328 access to and delivery of content irrespective of the technologies 329 used. The interface between the SL and DMIF is called the DMIF 330 Application Interface (DAI). It offers content location independent 331 procedures for establishing MPEG-4 sessions and access to transport 332 channels. This payload format can be used as an instance of the 333 MPEG-4 Delivery Layer but is otherwise not tied to DMIF. 335 The ESs from the encoders are fed into the SL with indications of AU 336 boundaries, random access points, desired composition time and the 337 current time. The Sync Layer fragments the ESs into SL packets, each 338 containing a header that encodes information conveyed through the 339 ESI. If the AU is larger than a SL packet, subsequent packets 340 containing remaining parts of the AU are generated with subset 341 headers until the complete AU is packetized. One SL packet describes 342 an Access Units or fragments thereof, the SL packet header contains 343 extended timing and framing information; the SL packet payload 344 contains the bit stream frame (AU) or fragment. For the complete 345 list of features of the Sync Layer refer to the MPEG-4 systems 346 specification [1]. The syntax of the Sync Layer is configurable and 347 can be adapted to the needs of the stream to be transported. This 348 includes the possibility to select the presence or absence of 349 individual syntax elements as well as configuration of their length 350 in bits. The configuration for each individual stream is conveyed in 351 a SLConfigDescriptor, which is an integral part of the ES Descriptor 352 for this stream. The MPEG-4 SLConfigDescriptor, being configuration 353 information, is not carried by the media stream itself but is rather 354 transported via an ObjectDescriptor Stream encoded using the MPEG-4 355 Object Description framework. This can be done in a separate stream 356 using this payload format (see section 6.2 for details). The 357 SLConfigDescriptor MAY also be transported by other means (for 358 example as a MIME parameter, see section 4.1). 360 An important point is to note that this draft could just as well 361 have been entirely written in terms of SL packets instead of Access 362 Units and Access Unit fragments. However this could have created 363 confusion for implementers who only need basic properties and do not 364 want to cope with the additional complexity of the Sync Layer. 366 Gentric et al. Expires March 2002 7 367 RTP Payload Format for MPEG-4 Streams February 2002 369 Instead this specification refers to the Sync Layer only when 370 needed. 372 2. Analysis of the carriage of MPEG-4 over IP 374 As explained above when transporting MPEG-4 audio and video, 375 applications may or may not require the use of MPEG-4 systems. To 376 achieve the highest level of interoperability between all MPEG-4 377 applications, it is desirable that (a) in both cases the same MPEG-4 378 transport format can be used and that (b) receivers that have no 379 MPEG-4 system knowledge can easily skip the MPEG-4 system specific 380 information, if any. 382 An example of application not requiring MPEG-4 system is audio/video 383 streaming from a single source. Examples of applications that would 384 benefit from MPEG-4 system features are: 385 . Audio/video streaming mixing RTP and non-RTP sources (e.g. local 386 storage in the .mp4 interchange format) 387 . Rich multimedia applications including 2D, 2.5D or 3D interactive 388 scenes with multiple graphical/audio/video objects and/or a 389 composition variable in time and/or according to a server-push 390 and/or server-pull model. 391 . Applications involving Digital Right Management for some or all 392 parts/streams in the content 393 . Applications involving the use of advanced meta-data and the 394 associated content management features as provided by the MPEG suite 395 of relevant standards (MPEG-7 and MPEG-11). 397 2.1 The Sync Layer point of view 399 RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4 400 video, but when using MPEG-4 systems a problem arises from the fact 401 that both RTP and MPEG-4 systems contain a synchronization layer. 402 In particular, the RTP header duplicates some of the information 403 provided in SL packet headers such as the composition timestamps 404 (CTS) and Access Unit boundaries. 406 To avoid unnecessary overhead and potential interoperability risks 407 when transporting MPEG-4 systems, it is desirable to remove the 408 redundancy between the SL packet header and the RTP packet header. 409 To be independent on the use of MPEG-4 systems, synchronization can 410 rely on the parameters provided in the RTP header. Another desired 411 property is to have compatibility with RFC3016 for MPEG-4 video 412 transport. 414 This is achieved in the following fashion (also depicted in figure 415 5): In case SL headers are used, the redundant fields are removed 416 from the SL header. The remaining information from the SL header, if 417 any, is contained inside the RTP packet payload, together with the 418 SL packet payload. Some of this information is also useful for 419 transport over RTP when an MPEG-4 system is not used. For that 420 reason this information is split into "general useful information" 422 Gentric et al. Expires March 2002 8 423 RTP Payload Format for MPEG-4 Streams February 2002 425 and "MPEG-4 systems only information". The "general useful 426 information" hereinafter called Payload Header is carried by a 427 number of fields configurable using parameters defined in section 428 4.1; all receivers MUST parse these fields. The "MPEG-4 systems only 429 information", if any, is contained in an auxiliary header, 430 hereinafter called Remaining SL Packet Header (RSLH), also 431 configured using parameters (see section 4.1) and preceded by a 432 length field, so that non-MPEG-4-system devices MAY skip this 433 information. 435 +------------+ 436 extended framing and | AU or AU | 437 timing information | fragment | 438 +------------+ 439 | | 440 | | 441 | | 442 | | 443 V V 445 <----------SL Packet--------> 447 +---------------------------+ 448 | SL Packet | SL Packet | 449 | Header | Payload | 450 +---------------------------+ 451 | | 452 | | 453 +-------------+----------+---+ | 454 | | | | 455 V V V V 456 +-----------+ +-----------+ +-------------+ +-----------+ 457 |RTP Packet | | Payload | | Remaining SL| | SL Packet | 458 | Header | | Header | | Header | | Payload | 459 +-----------+ +-----------+ +-------------+ +-----------+ 461 <----RTP Packet Payload-------------------> 463 Figure 5: Mapping of ES into SL, then SL Packet into RTP packet 465 2.2 The Elementary Stream point of view 467 Another way to see the mapping of Elementary Streams (i.e. Access 468 Units or AU fragments) into RTP packets is depicted in Figure 6. In 469 this view the "basic" timing and fragmentation information listed in 470 section 1.2 is obtained directly at the codec interfaces and mapped 471 into the RTP header or the RTP Payload Header. 473 For example this RTP payload format has been designed so that it is 474 by default configured to be identical to RFC 3016 for the 475 recommended MPEG-4 video configurations, specifically in this case 476 the Payload Header is empty. Hence receivers that comply with this 477 payload specification can decode such RTP payload without knowledge 479 Gentric et al. Expires March 2002 9 480 RTP Payload Format for MPEG-4 Streams February 2002 482 about the Sync Layer (see the relevant examples in Appendix). In a 483 similar fashion but with non-empty Payload Headers, MPEG-4 audio 484 (see Appendix 3 and 4 for examples) can be transported without 485 explicit use of the Sync Layer. 487 +------------+ 488 basic framing and | AU or AU | 489 timing information | fragment | 490 +------------+ 491 | | 492 | | 493 +-------------+ | 494 | | | 495 V V V 496 +-----------+ +-----------+ +-----------+ 497 |RTP Packet | | Payload | | | 498 | Header | | Header | | Payload | 499 +-----------+ +-----------+ +-----------+ 501 <----RTP Packet Payload---> 503 Figure 6: Direct mapping of Elementary Streams into RTP packet 505 2.3 How the two views reconcile 507 A simple concept enables to unify these apparently antagonistic 508 points of view: a terminal that does not implement the Sync Layer 509 can skip (ignore) the Remaining SL Header, if present. 511 There are also cases when an Elementary Stream is such that SL 512 packets are reduced to the media (compressed) data (empty headers) 513 and in that case implementations do not actually need to be aware of 514 the Sync Layer at all. In these cases it is logically equivalent to 515 say that the Sync Layer is not implemented or to say that the SL 516 packet headers are completely empty (or fully map into the RTP 517 headers). The Sync Layer can then be seen as a purely conceptual 518 construction that does not have to be implemented at all. Examples 519 are video transported as in RFC3016 (see below) and some audio modes 520 (see Annex). 522 The above described MPEG-4 system model also deals with session 523 setup through Object Descriptors. In cases where the complete MPEG-4 524 system framework is not used a replacement for this key functionally 525 is required. In fact for simple (audio/video) systems only the 526 knowledge of the decoder configuration is needed; we will see how 527 this specification defines options so that decoder configuration can 528 also be signaled without MPEG-4 system. 530 In conclusion this payload format is intended to be capable of 531 transporting data formatted according to the Sync Layer 533 Gentric et al. Expires March 2002 10 534 RTP Payload Format for MPEG-4 Streams February 2002 536 specification but is also useful without the Sync Layer, or when the 537 Sync Layer is invisible, which is equivalent to not using it. 539 2.4 Rationale for features 541 This payload format has a number of uncommon features that are best 542 understood by first considering their rationale: 544 . Genericity: The payload structure does not depend on the nature of 545 the stream (audio, video, scene, etc). In this respect the apparent 546 complexity of this specification should be compared to the 547 complexity of the only alternative solution, which would have been 548 the specification and implementation of many different RTP payload 549 formats. 550 . Variable geometry: this payload format is highly configurable i.e. 551 the structure of the RTP payload depends on MIME parameters; 552 actually all the Payload Header components are optional and most of 553 them have a configurable size. This is aligned with the Sync Layer 554 definition and allows optimal efficiency in terms of payload size 555 per packet. 556 . Two packing style (single and multiple): the rationale for 557 transporting a single AU or AU fragment per RTP packet is 558 simplicity, it is also the packing style for backward compatibility 559 with RFC3016. The rationale for transporting multiple AU per RTP 560 packet is efficiency, at the cost of sensitivity to losses. 561 . Two interleaving methods: the rationale for interleaving is to 562 enable various error concealment strategies in case of packet losses 563 when packing several AU or AU fragments per RTP packets. The need 564 for two interleaving methods arises from the fact that the default 565 one, based on time stamps, is the most efficient but does not work 566 for all configurations. Another method, based on indexes, is 567 therefore required. 568 . The rationale for transporting multiple interleaved AU fragments 569 per RTP packet is to benefit from advanced error resiliency 570 properties of bit streams (such as MPEG-4 audio version 2). 572 2.5 Relation with RFC 3016 574 The following set of figures displays the relationship between the 575 MPEG-4 RTP payload formats; there are 4 MPEG-4-related RTP payload 576 formats. The FlexMux is a really separate issue [11] and need not be 577 discussed here apart from the fact that is shares with this work the 578 MPEG-4 Sync Layer as the interface into the MPEG-4 domain. RFC 3016 579 describes transport of MPEG-4 video and LATM (for speech and audio 580 codecs). This specification defines transport of any MPEG-4 type of 581 data, with or without the Sync Layer. RFC YYYY describes a subset of 582 the configurations that this specification can handle. 584 Figure 2 displays the situation for video; note that this 585 specification is compatible with RFC 3016. Figure 3 displays the 586 situation for audio, note the presence of the LATM multiplex, which 587 makes RFC 3016 audio transport incompatible with this specification. 589 Gentric et al. Expires March 2002 11 590 RTP Payload Format for MPEG-4 Streams February 2002 592 Figure 4 displays the situation for other MPEG-4 streams, including 593 BIFS, ODS, IPMP, etc. 595 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 596 | | 597 | MPEG-4 Video | 598 | | I 599 |+++++++++++++++++++++++| | S 600 | | | O 601 | Sync Layer | | / 602 | | | M 603 |+++++++++++++++++++++++| | P 604 | | | | E 605 | FlexMux | | | G 606 | | | <- same RTP packet structure -> | 607 |++++++++++++| +++++++++++++++++++++++++++|++++++++++++|*** 608 | | | | | 609 | FlexMux | RFC XXXX | RFC YYYY | RFC 3016 | I 610 | RTP | MPEG-4 generic RTP | | for | E 611 | payload | payload +++++++++++++ Video | T 612 | | | | F 613 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 615 Figure 2: Relationship of MPEG-4 RTP payload formats for the 616 transport of video 618 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 619 | | 620 | MPEG-4 Audio | 621 | | I 622 |+++++++++++++++++++++++| | S 623 | | | O 624 | Sync Layer | | / 625 | | | M 626 |+++++++++++++++++++++++| +++++++++++++| P 627 | | | | | E 628 | FlexMux | | | LATM | G 629 | | | | | 630 |++++++++++++| +++++++++++++++++++++++++++|++++++++++++|*** 631 | | | | | 632 | FlexMux | RFC XXXX | RFC YYYY | RFC 3016 | I 633 | RTP | MPEG-4 generic RTP | | for | E 634 | payload | payload +++++++++++++ Audio | T 635 | | | | F 636 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 638 Figure 3: Relationship of MPEG-4 RTP payload formats for the 639 transport of audio 641 Gentric et al. Expires March 2002 12 642 RTP Payload Format for MPEG-4 Streams February 2002 644 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 645 | | 646 | MPEG-4 system | 647 | | I 648 |+++++++++++++++++++++++| | S 649 | | | O 650 | Sync Layer | | / 651 | | | M 652 |+++++++++++++++++++++++| | P 653 | | | | E 654 | FlexMux | | | G 655 | | | | 656 |++++++++++++| +++++++++++++++++++++++++++|*** 657 | | | | 658 | FlexMux | RFC XXXX | RFC YYYY | I 659 | RTP | MPEG-4 generic RTP | | E 660 | payload | payload ++++++++++++| T 661 | | | F 662 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 664 Figure 4: Relationship of MPEG-4 RTP payload formats for the 665 transport of MPEG-4 system streams (including BIFS, ODS, IPMP). 667 3. Payload Format 669 One or more Access Units or Access Unit fragments (see section 3.9 670 for fragmentation rules) are mapped into each RTP packet. Some 671 information attached to these AU or AU Fragment is mapped onto the 672 RTP header (see section 3.1), some form an additional payload 673 header. The resulting RTP payload is described in section 3.2, it is 674 composed of 3 parts (see figure 5): 675 . a Payload Header section (optional) 676 . a RSLH (Remaining SL Header) section (optional) 677 . a Payload Section. 678 These are described respectively in section 3.3, 3.4 and 3.5 of this 679 memo. 681 When transporting SL streams, SL Packet Headers are transformed into 682 Remaining SL Header (RSLH) with some fields extracted to be mapped 683 in the RTP header and others extracted to be mapped in the 684 corresponding Payload Header. The AU or AU fragment data (SL packet 685 payload) i.e. Elementary Stream codec data is unchanged. 687 When transporting Elementary Streams there is no RSLH section. 689 This payload format has two packing styles. The "Single" packing 690 style is a packing style where a single AU or AU fragment is 691 transported per RTP packet. The "Multiple" packing style is a 692 packing style where possibly more than one AU or AU fragment are 693 transported per RTP packet. The default packing style is the 694 "Single" packing style. 696 Gentric et al. Expires March 2002 13 697 RTP Payload Format for MPEG-4 Streams February 2002 699 In the "Multiple" packing style, AU or AU fragments MUST be in 700 decoding order inside one RTP packet. Decoding order is defined by 701 the relevant codec specification. Note that decoding order and 702 presentation order may be different, typically for video streams 703 containing B frames (see [2]). According to the MPEG-4 system model 704 the decoding order may be quantified using decoding time stamps 705 (DTS). 707 RTP Packets SHOULD be sent in the decoding order. In case of 708 interleaving the first AU or AU fragment of each RTP packet is used 709 as reference as in the following examples of RTP packets containing 710 interleaved SL packets. 711 This sequence is correct: [0,2,4][1,3,5] 712 This sequence is correct: [0,3,6][1,2][4,5] 713 This sequence is correct: [0,3,6][1,4][2,5] 714 This sequence is prohibited: [0,4,2][1,5,3] 715 This sequence is prohibited: [1,3,5][0,2,4] 716 This sequence is prohibited: [0,3,6][2,5][1,4] 718 In the "Multiple" packing style the Payload Header and RSLH contains 719 fields with relative values, they MUST have sufficient bits to 720 encode the difference i.e. senders MUST make sure that no fields 721 undergo roll over inside one RTP packet. This may limit the number 722 of SL packets inside one RTP packet and, when interleaving, may 723 limit the interleaving period as detailed in section 3.6. 725 The size and/or number of the payload(s) SHOULD be adjusted such 726 that the resulting RTP packet is not larger than the path-MTU. To 727 handle larger packets, this payload format relies on lower layers 728 for fragmentation, which may not be desirable. 730 3.1 RTP Header Fields Usage 732 Payload Type (PT): 733 The assignment of an RTP payload type for this new packet 734 format is outside the scope of this document, and will not be 735 specified here. It is expected that the RTP profile for a 736 particular class of applications will assign a payload type for 737 this encoding, or if that is not done then a payload type in 738 the dynamic range shall be chosen. 740 Marker (M) bit: 741 The M bit is set to 1 when all AU fragments in the RTP packet 742 are Access Units ends. 744 Specifically the M bit is set to 0 when the RTP packet contains 745 one or more AU fragments that are not Access Unit ends, and the 746 M bit is set to 1 for RTP packets that contain either: 747 . A single complete Access Unit 748 . The last fragment of an Access Unit 749 . Several complete Access Units 750 . Several last fragments of Access Units 752 Gentric et al. Expires March 2002 14 753 RTP Payload Format for MPEG-4 Streams February 2002 755 . A mix of complete Access Units and last fragments of Access 756 Units 758 Therefore for streams where all SL packets are complete Access 759 Units the M bit is 1 for all RTP packets. Note also that in 760 terms of Sync Layer this means that the M bit is related to the 761 accessUnitEndFlag. 763 Extension (X) bit: 764 Defined by the RTP profile used. 766 Sequence Number: 767 The RTP sequence number should be generated by the sender with 768 a constant random offset. 770 Timestamp: 771 Set to a value corresponding to the compositionTimeStamp (CTS) 772 of the first AU or AU fragment in the RTP packet. This mapping 773 is established as follows: 775 If CTS has less than 32 bits length, the RTP timestamp is 776 generated to extend it out to 32 bits using the number of 777 wraparounds. If CTS has more than 32 bits length, the RTP 778 timestamp uses the 32 LSB of it. When using the Sync Layer the 779 resolution of the timestamp (timeStampLength) is available from 780 the SL configuration data and shall be used by receivers to 781 reconstruct CTS with the original bit length. It is RECOMMENDED 782 to use timeStampLength=32. 784 When an RTP packet starts with a non-initial AU fragment, the 785 timestamp of the initial fragment SHALL be used. 787 For SL streams where CTS is never present the RTP packetizer 788 SHOULD convey a reading of a local clock at the time the RTP 789 packet is created. 791 Note that since, according to RFC1889 [5, Section 5.1], 792 timestamps are recommended to start at a random value, a 793 receiver is not in the general case able to reconstruct the 794 original MPEG-4 Time Stamps (CTS, DTS, OCR). This is not an 795 issue for synchronization of multiple RTP streams. However, 796 applications where streams from multiple sources are to be 797 synchronized (for example one stream from local storage, 798 another from a RTP streaming server) may have to transport out 799 of band the random offset used to map CTS into RTP timestamp, 800 which is not in the scope of this specification. 801 Note also that since RTP devices may re-stamp the stream, all 802 time stamps inside of the RTP payload (CTS and DTS in the 803 Payload Header, OCR in RSLH) MUST be expressed as difference to 804 the RTP time stamp. Since this subtraction may lead to negative 805 values, the offset MUST be encoded as a two's complement signed 806 integer in network octet order. Note these offsets (delta) 807 typically require much fewer bits to be encoded than the 809 Gentric et al. Expires March 2002 15 810 RTP Payload Format for MPEG-4 Streams February 2002 812 original length. Nevertheless senders MUST make sure that these 813 fields have enough bits to encode these differences. 815 When startCompositionTimeStamp is signaled in the 816 SLConfigDescriptor the RTP time stamps MUST start with this 817 value. 819 SSRC, CC and CSRC fields are used as described in RFC 1889 [5]. 821 RTCP SHOULD be used as defined in RFC 1889 [5]. 823 3.2 RTP payload structure 825 The packet payload structure consists of 3 octet-aligned sections. 827 The first section is the Payload Header Section and contains Payload 828 Headers. Each Payload Header contains basic fragmentation and timing 829 information (relative to the RTP timestamp) for one AU or AU 830 fragment. The Payload Header structure is described in 3.3. In the 831 "Single" packing style this section is empty by default. 833 The second section is the RSLH Section and contains Remaining SL 834 Headers (RSLH). The RSLH structure is described in 3.4. By default 835 this section is empty. 837 The last section (Payload Section) contains the AU or AU fragment 838 codec bit stream fragments and is described in section 3.5. This 839 section is never empty. 841 The Nth Payload Header in the Payload Header Section, the Nth RSLH 842 in the RSLH Section and the Nth AU or AU fragment payload in the 843 Payload Section correspond to the Nth AU or AU fragment transported 844 by the RTP packet. 846 Gentric et al. Expires March 2002 16 847 RTP Payload Format for MPEG-4 Streams February 2002 849 0 1 2 3 850 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 851 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 852 |V=2|P|X| CC |M| PT | sequence number | 853 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 854 | timestamp | 855 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 856 | synchronization source (SSRC) identifier | 857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 858 : contributing source (CSRC) identifiers : 859 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 860 | | 861 | Payload Header Section (octet aligned) | 862 | | 863 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 864 | | | 865 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 866 | | 867 | RSLH Section (octet aligned) | 868 | | 869 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 870 | | | 871 +-+-+-+-+-+-+-+-+ | 872 | | 873 | Payload Section (octet aligned) | 874 | | 875 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 876 | :...OPTIONAL RTP padding | 877 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 879 Figure 5: RTP packet for MPEG-4 881 3.3 Payload Header Section structure 883 If the Payload Header Section consumes a non-integer number of 884 octets, up to 7 zero-valued padding bits MUST be inserted at the end 885 in order to achieve octet-alignment. 887 In the "Single" packing style the Payload Header Section consists of 888 a single Payload Header. 890 0 1 2 3 891 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 892 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 893 | Payload Header (x bits ) : padding bits| 894 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 896 Figure 6: Payload Header Section structure in "Single" packing style 898 In the "Multiple" packing style the Payload Header section consist 899 of a 2 octets field giving the size in bits (in network octet order) 901 Gentric et al. Expires March 2002 17 902 RTP Payload Format for MPEG-4 Streams February 2002 904 of the following block of bit-wise concatenated PayloadHeaders. This 905 size excludes the padding bits, if any. 907 This size field is absent in the "Single" packing style not because 908 it is not needed (which would be a minor gain) but for compatibility 909 with RFC 3016. 911 This size field is also absent when the value would always be zero 912 because the Payload Header is always empty, which happens when a 913 constant payload size in signaled using ConstantSize (see below). 915 0 1 2 3 916 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 917 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 918 | Payload Header section size | | 919 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 920 | as many bit-wise concatenated Payload Headers | 921 | as AU or AU fragments in this RTP packet | 922 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 923 | : padding bits| 924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 926 Figure 7: Payload Header Section structure in "Multiple" packing 927 style 929 3.3.1 Payload Header structure 931 The Payload Header content depends on parameters (as described in 932 section 4.1); by default it is empty for the "Single" packing style 933 and, in the "Multiple" packing style, contains at least the 934 PayloadSize field, except when ConstantSize is signaled. 936 When all options are used the Payload Header structure and the 937 relationship with the related parameter is given in table 1. 939 +===========================+=================================+ 940 | Fields of Payload Header | Number of bits (parameters) | 941 +===========================+=================================+ 942 | PayloadSize | SizeLength | 943 +---------------------------+---------------------------------+ 944 | Index | IndexLength | 945 +---------------------------+---------------------------------+ 946 | IndexDelta | IndexDeltaLength | 947 +---------------------------+---------------------------------+ 948 | CTSFlag | 1 If (CTSDeltaLength > 0) | 949 +---------------------------+---------------------------------+ 950 | CTSDelta | CTSDeltaLength If (CTSFlag==1) | 951 +---------------------------+---------------------------------+ 952 | DTSFlag | 1 If (DTSDeltaLength > 0) | 953 +---------------------------+---------------------------------+ 954 | DTSDelta | DTSDeltaLength If (DTSFlag==1) | 955 +---------------------------+---------------------------------+ 957 Gentric et al. Expires March 2002 18 958 RTP Payload Format for MPEG-4 Streams February 2002 960 Table 1: Payload Header fields and parameters giving the sizes 962 In the general case a receiver can only discover the size of a 963 Payload Header by parsing it since for example the presence of 964 CTSDelta is signaled by the value of CTSFlag. 966 3.3.2 Fields of a Payload Header 968 PayloadSize: 969 Indicates the size in octets of the associated Payload, which 970 can be found in the Payload Section of the RTP packet. The 971 length in bits of this field is signaled by the SizeLength 972 parameter (see section 4.1). 974 There is an exception to that. In the "Multiple" packing style 975 when a RTP packet contains only one AU or AU fragment, the 976 PayloadSize field SHALL contain the size of the entire 977 corresponding AU. There are two reasons, firstly the size of 978 the fragment is not needed when there is only one fragment in 979 the RTP packet, secondly this is useful in order to detect if a 980 full Access Unit has been received after the loss of a packet 981 carrying a M bit set to 1. 983 Index, IndexDelta: 984 Encodes the serial number of the associated AU or AU fragment. 985 IndexDelta is useful for interleaving (see section 3.6). When 986 transporting a SL stream, Index and IndexDelta SHALL be used to 987 encode the packetSequenceNumber field of the SL Packet Header, 988 if present. 990 Index is optional and -if present- appears in the first Payload 991 Header of a RTP packet. 993 The length in bits of the Index field is defined by the 994 IndexLength parameter (see section 4.1). 996 IndexDelta is optional and -if present- appears for subsequent 997 (non-first) Payload Headers of a RTP packet. 999 The length in bits of the IndexDelta field is defined by the 1000 IndexDeltaLength parameter (see section 4.1). 1002 Both Index and IndexDelta MUST be incremented so that 2 1003 consecutive AU or AU fragments SHALL be distinguishable. One 1004 exception for Index is described in 3.6.1. 1006 If the parameter IndexDeltaLength is defined, non-first AU or 1007 AU fragments inside a RTP packet have their serial number 1008 encoded as a difference (thus the name IndexDelta). IndexDelta 1009 MUST have sufficient bits to encode this difference. This 1010 difference is relative to the previous AU or AU fragment in the 1011 RTP packet according to (with i>=0): 1012 Serial number(0) = Index(0) 1014 Gentric et al. Expires March 2002 19 1015 RTP Payload Format for MPEG-4 Streams February 2002 1017 Serial number (i+1) = Serial number (i) + IndexDelta(i+1) + 1 1019 If the parameter IndexDeltaLength is not defined the default 1020 value is zero and then the IndexDelta field is not present for 1021 non-first AU or AU fragments. Nevertheless receivers SHALL then 1022 apply the above formula with IndexDelta equal to zero. In other 1023 words by default the serial number is incremented by 1 for each 1024 AU or AU fragment in the RTP packet. 1026 CTSFlag (1 bit): 1027 Indicates whether the CTSDelta field is present. 1028 A value of 1 indicates that the CTSDelta field is present, a 1029 value of 0 that it is not present. 1031 If CTSDeltaLength is not zero, CTSFlag is present in all 1032 Payload Headers regardless of whether the AU fragment is an 1033 Access Unit start or not. 1035 CTSDelta (CTSDeltaLength bits): 1036 Specifies the value of the CTS as a 2-complement offset (delta) 1037 from the timestamp in the RTP header of the RTP packet. The 1038 length in bits of each CTSDelta field is specified by the 1039 CTSDeltaLength parameter (see section 4.1). CTSDelta MUST have 1040 sufficient bits to encode this difference. 1042 The CTSDelta field is present if CTSFlag is 1. 1044 For the first Payload Header of each RTP packet CTSFlag is 1045 always 0, since the composition time stamp of the first AU or 1046 AU fragment in the RTP packet is mapped to the RTP time stamp. 1047 When using the Sync Layer the sender MUST remove the 1048 compositionTimeStamp from the RSLH. 1050 Senders MUST finish assembling a RTP packet for which CTSDelta 1051 would roll over since this would prevent the receiver from 1052 reconstructing the correct CTS. This can result in sub optimal 1053 RTP packets (smaller than the MTU) depending on the MTU, the AU 1054 or AU fragment sizes and CTSDeltaLength. 1056 DTSFlag (1 bit): 1057 Indicates whether the DTSDelta field is present. A value of 1 1058 indicates that DTSDelta is present, a value of 0 that it is not 1059 present. 1061 If DTSDeltaLength is not zero, DTSFlag is present in all 1062 Payload Headers regardless of whether the AU fragment is an 1063 Access Unit start or not. When transporting SL streams the 1064 receiver needs this flag in order to reconstruct the 1065 decodingTimeStampFlag of SL Packet Headers. 1067 DTSDelta (DTSDeltaLength bits): 1069 Gentric et al. Expires March 2002 20 1070 RTP Payload Format for MPEG-4 Streams February 2002 1072 Encodes (compositionTimeStamp - decodingTimeStamp) for the same 1073 AU or AU fragment(always positive). The length in bits of each 1074 DTSDelta field is specified by the DTSDeltaLength parameter 1075 (see section 4.1). 1077 Senders MUST make sure that DTSDeltaLength is large enough to 1078 encode the difference between CTS and DTS (otherwise the DTS 1079 computed by the receiver would be incorrect). 1081 The DTSDelta field appears when DTSFlag is 1. The sender MUST 1082 always remove the decodingTimeStamp from the RSLH. 1084 If DTSDelta is zero i.e. if decodingTimeStamp equals 1085 compositionTimeStamp then DTSFlag MUST be set to 0 and no 1086 DTSDelta field SHALL be present. 1088 3.4 RSLHSection structure 1090 This section is present only when using the Sync Layer, and then, 1091 when the rules in the previous section have left remaining fields. 1093 This section first consists of a field (RSLHSectionSize) giving the 1094 size in bits of the following block of bit-wise concatenated RSLHs 1095 (this size does not include padding bits). 1097 If the section consumes a non-integer number of octets, up to 7 zero 1098 padding bits MUST be inserted at the end in order to achieve octet- 1099 alignment. 1101 0 1 2 3 1102 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1103 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1104 | RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable| 1105 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1106 | number of bits) | 1107 | | 1108 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1109 | | RSLH (variable number of bits) | 1110 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1111 | etc | 1112 | as many bit-wise concatenated RSLHs | 1113 | as SL Packets in this RTP packet | 1114 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1115 | RSLH (variable number of bits) | 1116 | +-+-+-+-+-+-+-+ 1117 | : padding bits| 1118 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1120 Figure 8: RSLHSection structure 1122 The length in bits of the RSLHSectionSize field is 1123 RSLHSectionSizeLength and is specified with a default value of zero 1125 Gentric et al. Expires March 2002 21 1126 RTP Payload Format for MPEG-4 Streams February 2002 1128 indicating that the whole RSLHSection is absent. Note that for 1129 compatibility with RFC 3016 we need to be able to make the 1130 RSLHSection disappear completely, including the RSLHSectionSize 1131 field. This is the reason why there is such a variable length with a 1132 zero default value indicating the absence of the RSLHSectionSize 1133 field. 1135 +=================================+===============================+ 1136 | Fields of RSLHSection | Number of bits | 1137 +=================================+===============================+ 1138 | RSLHSectionSize | RSLHSectionSizeLength | 1139 +---------------------------------+-------------------------------+ 1140 | all bit-wise concatenated RSLHs | RSLHSectionSize | 1141 +---------------------------------+-------------------------------+ 1143 Table 2: Sizes in bits inside RSLHSection 1145 Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system 1146 awareness, specifically it requires to understand the MPEG-4 1147 Sync Layer (SL) syntax and the modifications to this syntax 1148 described in the next section. 1150 However thanks to the RSLHSectionSize field non-MPEG-4-system 1151 receivers can skip this part by rounding up RSLPHSize/8 to the next 1152 integer number of octets. This means that receivers not implementing 1153 the Sync Layer can process streams containing Sync Layer specific 1154 items by simply ignoring the parts they would not be able to parse. 1156 3.4.1 RSLH structure 1158 RSLH is present only when using the Sync Layer, and then, when the 1159 rules in the previous section have left remaining fields. 1161 A Remaining SL Packet Header (RSLH) is what remains of an SL header 1162 after modifications for mapping into this payload format. 1164 The following modifications of the SL Packet Header MUST be applied. 1165 The other fields of the SL Packet Header MUST remain unchanged but 1166 are bit-shifted to fill in the gaps left by the operations specified 1167 below. 1169 3.4.2 Removal of fields 1171 The following SL Packet Header fields -if present- are removed since 1172 they are mapped either in the RTP header or in the corresponding 1173 Payload Header: 1174 . compositionTimeStampFlag 1175 . compositionTimeStamp 1176 . decodingTimeStampFlag 1177 . decodingTimeStamp 1178 . packetSequenceNumber 1179 . AccessUnitEndFlag (in "Single" packing style only) 1181 Gentric et al. Expires March 2002 22 1182 RTP Payload Format for MPEG-4 Streams February 2002 1184 The AccessUnitEndFlag, when present for a given stream, MUST be 1185 removed from every RSLH when using the "Single" packing style since 1186 it has the same meaning as the Marker bit (and for compatibility 1187 with RFC 3016). However when using the "Multiple" packing style, 1188 AccessUnitEndFlag MUST NOT be removed since it is useful to signal 1189 individual AU ends. 1191 3.4.3 Mapping of OCR 1193 Furthermore if the SL Packet header contains an OCR, then this field 1194 is encoded in the RSLH as a 2-complement difference (delta) exactly 1195 like a compositionTimeStamp or a decodingTimeStamp in the 1196 PayloadHeader. The length in bit of this difference is indicated by 1197 the OCRDeltaLength parameter (see section 4.1). 1199 With this payload format OCRs MUST have the same clock frequency as 1200 Time Stamps. 1202 If compositionTimeStamp is not present for a SL packet that has OCR 1203 then the OCR SHALL be encoded as a difference to the RTP time stamp. 1205 3.4.4 Degradation Priority 1207 For streams that use the optional degradationPriority field in the 1208 SL Packet Headers, only SL packets with the same degradation 1209 priority SHALL be transported by one RTP packet so that components 1210 may dispatch the RTP packets according to appropriate QoS or 1211 protection schemes. Furthermore only the first RSLH of one RTP 1212 packet SHALL contain the degradationPriority field since it would be 1213 otherwise redundant. 1215 3.5 Payload Section structure 1217 The Payload Section contains the concatenated AU or AU fragment 1218 Payloads. By definition AU or AU fragment Payloads are octet 1219 aligned. 1221 For efficiency SL packets do not carry their own payload size. This 1222 is not an issue for RTP packets that contain a single SL Packet. 1223 However in the "Multiple" packing style the size of each AU or AU 1224 fragment payload MUST be available to the receiver. 1226 If the AU or AU fragment payload size is constant for a stream, the 1227 size information SHOULD NOT be transported in the RTP packet. 1228 However in that case it MUST be signaled using the ConstantSize 1229 parameter (see section 4.1). 1231 If the AU or AU fragment payload size is variable then the size of 1232 each AU or AU fragment payload MUST be indicated in the 1233 corresponding Payload Header. In order to do so the Payload Header 1234 MUST contain a PayloadSize field. The number of bits on which this 1235 PayloadSize field is encoded MUST be indicated using the SizeLength 1236 parameter (see section 4.1). 1238 Gentric et al. Expires March 2002 23 1239 RTP Payload Format for MPEG-4 Streams February 2002 1241 The absence of either ConstantSize or SizeLength indicates the 1242 "Single" packing style i.e. that a single AU or AU fragment is 1243 transported in each RTP packet for that stream. 1245 0 1 2 3 1246 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1247 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1248 | AU or AU fragment (variable number of octets) | 1249 | | 1250 | | 1251 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1252 | | AU or AU fragment | 1253 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1254 | | 1255 | (variable number of octets) | 1256 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1257 | etc | 1258 | as many octet-wise concatenated AU or AU fragment | 1259 | as required to finish RTP packet | 1260 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1262 Figure 9: Payload Section structure 1264 3.6 Interleaving 1266 SL Packets MAY be interleaved. Senders MAY perform interleaving. 1267 Receivers MUST support interleaving. Additional specifications MAY 1268 restrict this support by explicit signaling (see for example 1269 RFCYYYY). 1271 Note for Sync Layer implementers: the AUSequenceNumber field of the 1272 SL Header MUST NOT be used for interleaving since firstly it may 1273 collide with the Scene Description Carousel usage described in 1274 section 6.2 and secondly it is not visible to receivers that do not 1275 implement the Sync Layer and would skip the RSLH section 1276 transporting AUSequenceNumber. 1278 When interleaving of AU or AU fragments is used it SHALL be 1279 implemented using the IndexDelta fields of the Payload Header. 1280 Senders MUST NOT make RTP packets for which IndexDelta rolls over. 1281 Therefore depending on the interleaving scheme (if any), the MTU and 1282 the AU or AU fragment sizes, senders wishing to make optimally sized 1283 RTP packets (i.e. close to the MTU) will need to set 1284 IndexDeltaLength to a properly large value. 1286 Senders SHOULD use non zero values of IndexDeltaLength only for 1287 streams that exhibit interleaving, so that this can be interpreted 1288 by receivers as an indication that interleaving maybe present. 1290 There are, based on this, two ways for a receiver to implement de- 1291 interleaving: 1293 Gentric et al. Expires March 2002 24 1294 RTP Payload Format for MPEG-4 Streams February 2002 1296 . Time-Stamp-Based-Interleaving (TSBI see section 3.6.1) uses 1297 IndexDelta and timestamps. 1298 . Index-Based-Interleaving (see section 3.6.2) uses IndexDelta and 1299 Index. 1301 This is signaled using mime parameters as in the following table. 1302 Note that the need for two methods arises from two facts: firstly 1303 the time stamp based method is more economical and in basic cases 1304 (no multiple AU fragments, CTS always defined) simpler to implement. 1305 Secondly, unfortunately this method does not always work as 1306 explained below. 1308 ================================================================== 1309 | | IndexDeltaLength = 0 | IndexDeltaLength != 0 | 1310 ------------------------------------------------------------------ 1311 | IndexLength=0 | no interleaving | TSBI | 1312 ------------------------------------------------------------------ 1313 | IndexLength!=0 | no interleaving, | Index=0 | Index!=0 | 1314 | | SL.packetSeqNum |------------------------- 1315 | | transport | TSBI | IBI | 1316 ================================================================== 1318 3.6.1 Time stamp based interleaving (TSBI) 1320 The conjunction of RTP time stamp, IndexDelta and CTS may allow a 1321 receiver to un-ambiguously re-order AU or AU fragments based on 1322 their time stamps (CTS). 1324 This is possible and efficient for streams where only complete 1325 Access Units are transported and receivers can always compute the 1326 time stamp of each Access Unit. 1328 In case of Access Units of constant duration (e.g. audio streams) 1329 the explicit presence of CTS in the Payload Header is not even 1330 required; Indeed then we have (i being the index of one AU in one 1331 RTP packet): 1332 CTS(0) = RTP-TS 1333 for (i >= 1): CTS(i) = CTS(i-1) + (IndexDelta(i)+1)*AU_duration 1335 AU_duration, when constant, can be either signaled in SLConfig or be 1336 deduced from the decoder configuration (see the "Config" MIME 1337 parameter). 1339 Senders MUST use either IndexLength=0 or set all Index values in all 1340 packets to zero so that receivers can detect this as an indication 1341 that de-interleaving SHOULD be performed using time stamps. 1343 When using the Sync Layer and when interleaving senders MUST use for 1344 SL.timeStampLength values large enough to prevent the CTS from 1345 rolling over more often than a packet loss burst length. Pre- 1346 existing SL streams that do not comply with this requirement cannot 1348 Gentric et al. Expires March 2002 25 1349 RTP Payload Format for MPEG-4 Streams February 2002 1351 be interleaved using this payload format (or by using IBI as in 1352 3.6.2) 1354 3.6.2 Index based interleaving (IBI) 1356 The timestamp-based interleaving algorithm described in the previous 1357 section does not work when a CTS cannot always be computed for all 1358 AU or AU fragments (for example after a packet loss); this happens: 1359 . If the AU duration is not constant (SL durationFlag = 0) and 1360 CTS is not signaled (SL useTimeStampsFlag= 0). 1361 . When interleaving AU fragments. 1363 When interleaving, senders of such streams MUST use the index-based 1364 technique described in this section. 1366 The conjunction of RTP sequence number, Index and IndexDelta can 1367 produce a quasi-unique identifier for each AU or AU fragment so that 1368 a receiver can unambiguously reconstruct the original order even in 1369 case of out-of-order packets, packet loss or duplication (see the 1370 pseudo code in 3.3.2 and 6.1). Specifically the RTP sequence number 1371 is used to re-order packets and inside one RTP packet we have: 1372 Serial number(0) = Index(0) 1373 Serial number(i+1) = Serial number(i) + IndexDelta(i+1) + 1 (i>=0) 1375 This requires, however, that IndexLength is not too small. For that 1376 reason senders when interleaving in this fashion MUST use for 1377 IndexLength values large enough to prevent Index from rolling over 1378 more often than a typical loss burst length. Pre-existing SL streams 1379 that do not comply with this requirement (specifically if 1380 SL.packetSeqNumLength is too small) cannot be interleaved using this 1381 payload format (or should use TSBI). 1383 Receivers SHOULD interpret non-zero values in the Index field as an 1384 indication that de-interleaving can be performed using Index and 1385 IndexDelta but cannot be performed using timestamps. 1387 3.6.3 SL streams that should not be interleaved 1389 SL streams for which both SL.timeStampLength and 1390 SL.packetSeqNumLength are too small SHOULD NOT be interleaved with 1391 this payload format, the reason being that small values would cause 1392 a receiver to drop a large part of the stream in case of packet 1393 loss. The actual minimal length depends on network loss properties 1394 and on the expected quality of service. 1396 3.7 Fragmentation Rules 1398 MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams 1399 and SHOULD be mapped directly into RTP packets of this format with 1400 two exceptions: 1401 - Access Units larger than the MTU 1402 - When using interleaving for better packet loss resilience. 1404 Gentric et al. Expires March 2002 26 1405 RTP Payload Format for MPEG-4 Streams February 2002 1407 This section gives rules to apply when performing Access Unit 1408 fragmentation. Let us first explain the context before describing 1409 the rules. 1411 For error resilience purposes some MPEG-4 codecs define optional 1412 syntax of Access Units fragments that are independently decodable. 1413 Examples are Video Packets for video and Error Sensitivity 1414 Categories (ESC) for audio. This always corresponds to specific 1415 bitstream syntax, which is signaled in the DecoderSpecificInfo 1416 inside the DecoderConfig in SLConfig, and/or using the corresponding 1417 parameters as described in section 4.1. 1418 Thanks to that, decoders are aware whether encoders are operating in 1419 such a mode or not (however since this codec configuration is an 1420 opaque data block this is not explicitly signaled by this payload 1421 format). 1423 If not operating in such a mode it is obvious that the decoder has 1424 to skip packets after a loss until an Access Unit start is received. 1425 Similarly decoder implementations that do not implement robust 1426 decoding of Access Units fragments have to discard all packets after 1427 a packet loss until an Access Unit start is received. In the same 1428 way decoder implementations that do not implement re-synchronization 1429 at any Access Units start have to discard all packets after a packet 1430 loss until a Random Access Point Access Unit is received. These are 1431 all obvious things that a good implementation would do. 1433 However serious problems would arise for decoder implementations 1434 that try to restart decoding after a packet loss if independently 1435 decodable fragments are signaled (in the decoder configuration) but 1436 the fragments actually received are not independently decodable 1437 because the RTP sender has made RTP packets on different boundaries 1438 than the fragments provided by the encoder (so this issue applies to 1439 the interface between the encoder and the RTP sender and to the RTP 1440 sender component itself). Indeed the decoder has in general no way 1441 to detect such a faulty fragment (except for MPEG-4 video). 1443 For this reason the following rules must be applied: 1445 In the spirit of ALF this payload format should transport either 1446 complete Access Units or fragments of Access Units that are 1447 independently decodable. Specifically when a given codec has an 1448 independently decodable Access Unit fragments optional syntax this 1449 option SHOULD be used. 1451 Independently decodable Access Units fragments SHOULD NOT be split 1452 across several RTP packets. 1454 An MPEG-4 audio stream encoded using the ESC syntax MUST NOT split 1455 one ESC across 2 RTP packets. 1457 When using MPEG-4 Video Packets since all Video Packets start with a 1458 specific resynchronization marker that can be unambiguously detected 1459 this rule is not needed. However it is strongly RECOMMENDED to 1461 Gentric et al. Expires March 2002 27 1462 RTP Payload Format for MPEG-4 Streams February 2002 1464 always adapt the Video Packet size to fit the MTU. In any case a 1465 video AU or AU fragment start MUST always be aligned with either: 1466 . a VOP start. 1467 . a Video Packet start. 1468 . or a GOV followed by the first (or only) Video Packet of the 1469 following VOP. 1471 4. Types and Names 1473 This section describes the MIME types and names associated with this 1474 payload format. Section 4.1 registers the MIME types, as per RFC 1475 2048. 1477 This format may require additional information about the mapping to 1478 be made available to the receiver. This is done using parameters 1479 described in the next section. The absence of any of these fields is 1480 equivalent to a field set to the default value, which is always zero 1481 for numerical parameters. The absence of any such parameters 1482 resolves into a default "basic" configuration compatible with 1483 RFC3016 for MPEG-4 video. 1485 In the MPEG-4 framework the SL stream configuration information is 1486 carried using the Object Descriptor. For compatibility with 1487 receivers that do not implement the full MPEG-4 system specification 1488 this information MAY also be signaled using parameters described 1489 here. When such information is present both in an Object Descriptor 1490 and as a parameter of this payload format it MUST be exactly the 1491 same. 1493 For transport of MPEG-4 audio and video without the use of MPEG-4 1494 systems, as well as to support non-MPEG-4 system receivers, it is 1495 also possible to transport information on the profile and level of 1496 the stream and on the decoder configuration. This is also described 1497 in the next section. 1499 Finally this MIME type also defines a mode parameter and a profile 1500 parameter that are intended for derivations of this payload format. 1501 One such derivation is described in the companion RFC YYYY. 1503 4.1 MIME type registration 1505 MIME media type name: "video" or "audio" or "application" 1507 "video" MUST be used for MPEG-4 Visual streams (i.e. video as 1508 defined in ISO/IEC 14496-2 (Streamtype = 4) and/or graphics as 1509 defined in ISO/IEC 14496-1 (Streamtype = 3)) or MPEG-4 Systems 1510 streams that convey information needed for an audio/visual 1511 presentation. 1513 "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3) 1514 (Streamtype = 5)) or MPEG-4 Systems streams that convey information 1515 needed for an audio only presentation. 1517 Gentric et al. Expires March 2002 28 1518 RTP Payload Format for MPEG-4 Streams February 2002 1520 "application" MUST be used for MPEG-4 Systems streams (ISO/IEC14496- 1521 1 (all other StreamType values)) that serve other purposes than 1522 audio/visual presentation, e.g. in some cases when MPEG-J streams 1523 are transmitted. 1525 MIME subtype name: mpeg4-generic 1527 Required parameters: none 1529 Optional parameters: 1531 mode: 1532 The mode in which this specification is used. This 1533 specification itself defines only the default mode 1534 (Mode=default). When the mode parameter is not present the 1535 default mode SHALL be assumed. In the default mode all 1536 parameters are OPTIONAL and as defined here. Other modes may be 1537 defined as needed in other RFCs. A mode MUST be a subset of 1538 this specification. Specifically when defining a mode care MUST 1539 be taken that an implementation of this specification can 1540 decode the payload format corresponding to this new mode. For 1541 this reason a mode MUST NOT specify new default values for MIME 1542 parameters and MIME parameters MUST be present (unless they 1543 have the default value) even if it is redundant in case the 1544 mode assigns fixed values. A mode may define additionally that 1545 some MIME parameters are required instead of optional, that 1546 some MIME parameters have fixed values (or ranges), and that 1547 there are rules restricting the usage (for example RFCYYYY 1548 forbids the carriage of multiple AU fragments in the same RTP 1549 packet and -logically- uses only TSBI interleaving). 1551 profile: 1552 The meaning of this parameter may be defined by a mode. This is 1553 meant to be used in order to define sub-configurations of a 1554 given mode, for example the maximum delay (and therefore the 1555 size of buffers) induced by the usage of interleaving. 1556 Implementations of this specification can ignore this 1557 parameter. 1559 DTSDeltaLength: 1560 The number of bits on which the DTSDelta field is encoded in 1561 each Payload Header. The default value is zero and indicates 1562 the absence of DTSFlag and DTSDelta in the Payload Header (the 1563 stream does not transport decodingTimeStamps). A value larger 1564 than zero indicates that there is a DTSFlag in each Payload 1565 Header. Since decodingTimeStamp, if present, must be encoded as 1566 a difference to the RTP time stamp, the DTSDeltaLength 1567 parameter MUST be present in order to transport 1568 decodingTimeStamps with this payload format. 1570 CTSDeltaLength: 1571 The number of bits on which the CTSDelta field is encoded. The 1572 default value is zero and indicates the absence of the CTSFlag 1574 Gentric et al. Expires March 2002 29 1575 RTP Payload Format for MPEG-4 Streams February 2002 1577 and CTSDelta fields in Payload Header. Non-zero values MUST NOT 1578 be signaled in the "Single" packing style. Since 1579 compositionTimeStamps, if present, must be encoded as a 1580 difference to the RTP time stamp, the CTSDeltaLength parameter 1581 MUST be present in order to transport compositionTimeStamps 1582 using this payload format (in the "Multiple" packing style). 1583 However CTSDeltaLength SHOULD be set to zero (or not signaled) 1584 for streams that have a constant Access Unit duration (which 1585 can be explicitly signaled using the DurationFlag and 1586 AccessUnitDuration field of SLConfigDescriptor). 1588 OCRDeltaLength: 1589 The number of bits on which the OCRDelta field is encoded in 1590 RSLH. The default value is zero and indicates the absence of 1591 OCR for this stream. Since objectClockReference -if present- 1592 must be encoded as a difference to the RTP time stamp, the 1593 OCRDeltaLength parameter MUST be present in order to transport 1594 objectClockReferences with this payload format. 1596 SizeLength: 1597 The number of bits on which the PayloadSize field of a Payload 1598 Header is encoded. The default value is zero and indicates the 1599 "Single" packing style (unless ConstantSize is present). 1600 Simultaneous presence of this parameter and ConstantSize is 1601 illegal. Either the SizeLength or ConstantSize parameter MUST 1602 be present in order to signal the "Multiple" packing style of 1603 this payload format. 1605 ConstantSize: 1606 The constant size in octets of each AU or AU fragment Payload 1607 for this stream. The default value is zero and indicates 1608 variable AU or AU fragment Payload size (or the "Single" 1609 packing style if SizeLength is absent). Simultaneous presence 1610 of this parameter and SizeLength is illegal. Either the 1611 SizeLength or ConstantSize parameter MUST be present in order 1612 to signal the "Multiple" packing style of this payload format. 1613 When ConstantSize is present the PayloadSize field of the 1614 Payload Header in the RTP packets MUST NOT be present. 1616 IndexLength: 1617 The number of bits on which the Index is encoded in the first 1618 Payload Header of a RTP packet. The default value is zero and 1619 indicates the absence of Index and IndexDelta for all Payload 1620 Headers. Since SL.packetSequenceNumber -if present- must be 1621 mapped in the Payload Header, the IndexLength parameter MUST be 1622 present in order to transport SL.packetSequenceNumber with this 1623 payload format. 1625 IndexDeltaLength: 1626 The number of bits on which the IndexDelta are encoded in any 1627 non-first Payload Header. The default value is zero and 1628 indicates that the serial number MUST be incremented by one for 1629 each AU or AU fragment in the RTP packet (see section 3.5). A 1631 Gentric et al. Expires March 2002 30 1632 RTP Payload Format for MPEG-4 Streams February 2002 1634 non-zero IndexDeltaLength parameter MUST be present when using 1635 interleaving with this payload format. 1637 RSLHSectionSizeLength: 1638 The number of bits that is used to encode the RSLHSectionSize 1639 field. The default value is zero and indicates the absence of 1640 the whole RSLHSection for all RTP packets of this stream. 1642 SLConfigDescriptor: 1643 A base-64 encoding of the SLConfigDescriptor. This SHALL be the 1644 original SLConfigDescriptor and it SHALL be the same as the one 1645 transported by the OD framework, if any. 1647 profile-level-id: 1648 A decimal representation of the MPEG-4 Profile Level indication 1649 value. For audio this parameter indicates which MPEG-4 Audio 1650 tool subsets are applied to encode the audio stream and is 1651 defined in ISO/IEC 14496-1 [1]. For video this parameter 1652 indicates which MPEG-4 Visual tool subsets are applied to 1653 encode the video stream and is defined in Table G-1 of ISO/IEC 1654 14496-2 [2]. This parameter MAY be used in the capability 1655 exchange or session setup procedure to indicate MPEG-4 Profile 1656 and Level combination of which the relevant MPEG-4 media codec 1657 is capable. If this parameter is not specified its default 1658 value is 1 (Simple Profile/Level 1) for video (for 1659 compatibility with RFC 3016) and otherwise 254 (0xFE being 1660 defined in ISO/IEC 14496-1 [1] as being the generic default 1661 value). 1663 config: 1664 A hexadecimal representation of an octet string that expresses 1665 the media payload configuration. Configuration data is mapped 1666 onto the octet string in an MSB-first basis. The first bit of 1667 the configuration data SHALL be located at the MSB of the first 1668 octet. In the last octet, zero-valued padding bits, if 1669 necessary, shall follow the configuration data. For audio 1670 streams, config is the audio object type specific decoder 1671 configuration data AudioSpecificConfig() as defined in ISO/IEC 1672 14496-3 [3]. For video this expresses the MPEG-4 Visual 1673 configuration information, as defined in subclause 6.2.1 Start 1674 codes of ISO/IEC14496-2 [2] and the configuration information 1675 indicated by this parameter SHALL be the same as the 1676 configuration information in the corresponding MPEG-4 Visual 1677 stream, except for first-half-vbv-occupancy and latter-half- 1678 vbv-occupancy, if it exists, which may vary in the repeated 1679 configuration information inside an MPEG-4 Visual stream (See 1680 6.2.1 Start codes of ISO/IEC14496-2). 1682 StreamType: 1683 The integer value that indicates the type of MPEG-4 stream that 1684 is carried; its coding corresponds to the values of the 1685 streamType as defined for the DecoderConfigDescriptor in 1686 ISO/IEC 14496-1. 1688 Gentric et al. Expires March 2002 31 1689 RTP Payload Format for MPEG-4 Streams February 2002 1691 Encoding considerations: 1692 System bitstreams MUST be generated according to MPEG-4 System 1693 specifications (ISO/IEC 14496-1). Video bitstreams MUST be 1694 generated according to MPEG-4 Visual specifications (ISO/IEC 1695 14496-2). Audio bitstreams MUST be generated according to MPEG- 1696 4 Audio specifications (ISO/IEC 14496-3). If the Sync Layer is 1697 used SL streams MUST be generated according to MPEG-4 Sync 1698 Layer specifications (ISO/IEC 14496-1 section 10), then in 1699 order to read the RSLH parts of this format the 1700 SLConfigDescriptor is required. These bitstreams are binary 1701 data and MUST be encoded for non-binary transport (for Email, 1702 the Base64 encoding is sufficient). This type is also defined 1703 for transfer via RTP. The RTP packets MUST be packetized 1704 according to the RTP payload format defined in RFC XXXX. 1706 Security considerations: 1707 As in RFC XXXX. 1709 Interoperability considerations: 1710 MPEG-4 provides a large and rich set of tools for the coding of 1711 visual objects. For effective implementation of the standard, 1712 subsets of the MPEG-4 tool sets have been provided for use in 1713 specific applications. These subsets, called "Profiles", limit 1714 the size of the tool set a decoder is required to implement. In 1715 order to restrict computational complexity, one or more 1716 "Levels" are set for each Profile. A Profile@Level combination 1717 allows: 1718 . A codec builder to implement only the subset of the standard 1719 he needs, while maintaining interoperability with other MPEG-4 1720 devices included in the same combination, and 1721 . Checking whether MPEG-4 devices comply with the standard 1722 ('conformance testing'). 1724 A stream SHALL be compliant with the MPEG-4 Profile@Level 1725 specified by the parameter "profile-level-id". Interoperability 1726 between a sender and a receiver may be achieved by specifying 1727 the parameter "profile-level-id" in MIME content, or by 1728 arranging in the capability exchange/announcement procedure to 1729 set this parameter mutually to the same value. 1731 Published specification: 1732 The specifications for MPEG-4 streams are presented in ISO/IEC 1733 14469-1, 14469-2, and 14469-3. The RTP payload format is 1734 described in RFC XXXX. 1736 Applications that use this media type: 1737 Multimedia streaming and conferencing tools. 1739 Additional information: none 1741 Magic number(s): none 1743 Gentric et al. Expires March 2002 32 1744 RTP Payload Format for MPEG-4 Streams February 2002 1746 File extension(s): 1747 None. A file format with the extension .mp4 has been defined 1748 for MPEG-4 content but is not directly correlated with this 1749 MIME type which sole purpose is RTP transport. 1751 Macintosh File Type Code(s): none 1753 Person & email address to contact for further information: 1754 Authors of RFC XXXX. 1756 Intended usage: COMMON 1758 Author/Change controller: 1759 Authors of RFC XXXX, IETF Audio/Video Transport working group. 1761 4.2 Concatenation of parameters 1763 Multiple parameters SHOULD be expressed as a MIME media type string, 1764 in the form of a semicolon-separated list of parameter=value pairs 1765 (see examples below). 1767 4.3 Usage of SDP 1769 4.3.1 The a=fmtp keyword 1771 It is assumed that one typical way to transport the above-described 1772 parameters associated with this payload format is via an SDP [10] 1773 message for example transported to the client in reply to a RTSP 1774 [13] DESCRIBE message or via SAP [14]. In that case the (a=fmtp) 1775 keyword MUST be used as described in RFC 2327 [10, section 6]. The 1776 syntax being then: 1778 a=fmtp: = 1780 4.3.2 SDP example 1782 The following is an example of SDP syntax for the description of a 1783 session containing one MPEG-4 video, one MPEG-4 audio stream and 1784 three MPEG-4 system streams, the first one being BIFS, the second 1785 one OD stream and the third one IPMP. All are transported using this 1786 format and the AVP profile [12]. Note the usage of some MIME 1787 parameters: all stream display their StreamType; the video stream 1788 uses DTS with DTSDelta encoded on 4 bits; the audio stream uses the 1789 "Multiple" packing style with 12 bits to describe the size of each 1790 AU or AU fragment payload. See the Appendix for more examples. 1792 o= .... 1793 I= .... 1794 c=IN IP4 123.234.71.112 1795 m=video 1034 RTP/AVP 97 1796 a=rtpmap:97 mpeg4-generic 1797 a=fmtp:97 StreamType=4;DTSDeltaLength=4 1798 m=audio 1810 RTP/AVP 98 1800 Gentric et al. Expires March 2002 33 1801 RTP Payload Format for MPEG-4 Streams February 2002 1803 a=rtpmap:98 mpeg4-generic 1804 a=fmtp:98 StreamType=5;SizeLength=12; 1805 m=application 1234 RTP/AVP 99 1806 a=rtpmap:99 mpeg4-generic 1807 a=fmtp:99 StreamType=3 1808 m=application 1236 RTP/AVP 100 1809 a=rtpmap:100 mpeg4-generic 1810 a=fmtp:100 StreamType=1 1811 m=application 1238 RTP/AVP 101 1812 a=rtpmap:101 mpeg4-generic 1813 a=fmtp:101 StreamType=7 1815 5. IANA Considerations 1817 One new MIME subtype is to be registered, see Section 4.1. 1819 6. Other issues 1821 6.1 SL-packetized stream reconstruction 1823 The purpose of this section is to document how a receiver can 1824 reconstruct a valid SL-packetized stream. This reconstruction is 1825 performed by reversing the payload structure rules (section 3). We 1826 explicitly describe here the most complex transformations. 1828 In the following let (i) be the index of SL packets inside one RTP 1829 packet (starting at zero for each RTP packet), let SLPacketHeader.x 1830 denote field x of the reconstructed SL packet header, let 1831 PayloadHeader.x denote field x of the received PayloadHeader, etc. 1833 SLPacketHeader.packetSequenceNumber is restored from 1834 PayloadHeader.Index and PayloadHeader.IndexDelta using: 1836 If ( IndexLength == 0) { // or is absent 1837 if ( SLConfig.packetSeqNumLength == 0 ) { 1838 // this stream does not have SL packet sequence number 1839 } 1840 else { 1841 // illegal, normally the sender MUST map 1842 // SLPacketHeader.packetSequenceNumber in PayloadHeader 1843 // and set a relevant IndexLength value; 1844 // otherwise it is unfortunately impossible for the receiver 1845 // to reconstruct the correct sequence 1846 } 1847 } 1848 else { // IndexLength is not zero 1849 if ( SLConfig.packetSeqNumLength == 0 ) { 1850 // the original SL stream does not have SL packet 1851 // sequence numbers, typically the sender inserted them 1852 // in order to implement interleaving at the RTP level; 1853 // they must be ignored for SL stream reconstruction 1854 } 1855 else { 1857 Gentric et al. Expires March 2002 34 1858 RTP Payload Format for MPEG-4 Streams February 2002 1860 if (i == 0){ // first SL packet in RTP packet 1861 SLPacketHeader.packetSequenceNumber(0) = 1862 PayloadHeader.Index(0); 1863 } 1864 else { // remaining SL packets 1865 SLPacketHeader.packetSequenceNumber(i+1)= 1866 SLPacketHeader.packetSequenceNumber(i) 1867 + PayloadHeader.IndexDelta(i+1) 1868 +1; 1869 } 1870 } 1872 All time stamps (CTS, DTS, OCR), when present, are restored from the 1873 delta values. Time stamps flags (CTSFlag, DTSFlag) in PayloadHeader 1874 are used to reconstruct respectively the compositionTimeStampFlag 1875 and decodingTimeStampFlag of SLPacketHeader. The function 1876 corrected(x) for the RTP time stamp transformation is the mapping 1877 from 32 bits to SLConfig.timeStampLength, which may be smaller or 1878 larger than 32 bits: 1880 If (timeStampLength < 32 ) { // short SL time stamps 1881 corrected(x) = LSB(x); // only the timeStampLength LSBits of x 1882 } 1883 else If (timeStampLength > 32 ) { // long SL time stamps 1884 corrected(x) = x + m; // start with m=0 1885 if ( x(i) < x(i-1) ) { // 32 bits RTPTS roll over has occurred 1886 { 1887 m += 2^32; 1888 } 1889 } 1890 else If (timeStampLength = 32 ) { // recommended value 1891 corrected(x) = x; // direct mapping 1892 } 1894 if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent 1895 // CTS is not transported for this RTP stream 1896 if (i == 0){ // first SL packet in RTP packet 1897 if ( SLConfig.useTimeStamps == 1 ) { 1898 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1899 SLPacketHeader.compositionTimeStampFlag(0) = 1; 1900 SLPacketHeader.compositionTimeStamp(0) = 1901 corrected(RTP TimeStamp); 1902 } 1903 else { 1904 // ignore 1905 } 1906 } 1907 else { 1908 // empty 1909 } 1910 } 1911 else { // non-first SL packets in RTP packet 1913 Gentric et al. Expires March 2002 35 1914 RTP Payload Format for MPEG-4 Streams February 2002 1916 if ( SLConfig.useTimeStamps == 1 ) { 1917 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1918 SLPacketHeader.compositionTimeStampFlag(i) = 0; 1919 } 1920 else { 1921 // ignore 1922 } 1923 } 1924 else { 1925 // empty 1926 } 1927 } 1928 } 1929 else { // CTSDeltaLength is not zero 1930 // CTS is transported for this stream 1931 if ( SLConfig.useTimeStamps == 1 ) { 1932 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1933 SLPacketHeader.compositionTimeStampFlag(i) = 1934 PayloadHeader.CTSFlag(i); 1935 SLPacketHeader.compositionTimeStamp(i) = 1936 corrected(RTP TimeStamp) + 1937 PayloadHeader.CTSDelta(i); 1938 } 1939 else { 1940 // ignore CTSFlag (which must be zero) 1941 } 1942 else { 1943 // this is strange and sub-optimal at best 1944 // a receiver should ignore this 1945 } 1946 } 1948 if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent 1949 // DTS is not transported for this stream 1950 if ( SLConfig.useTimeStamps == 1 ) { 1951 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1952 SLPacketHeader.decodingTimeStampFlag(i) = 0; 1953 } 1954 else { 1955 // ignore 1956 } 1957 } 1958 else { 1959 // empty 1960 } 1961 } 1962 else { 1963 // DTS is transported for this stream 1964 if ( SLConfig.useTimeStamps == 1 ) { 1965 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1966 SLPacketHeader.decodingTimeStampFlag(i) = 1967 PayloadHeader.DTSFlag(i); 1968 SLPacketHeader.decodingTimeStamp(i)= 1970 Gentric et al. Expires March 2002 36 1971 RTP Payload Format for MPEG-4 Streams February 2002 1973 SLPacketHeader.compositionTimeStamp(i) 1974 - PayloadHeader.DTSDelta(i); // DTS <= CTS always 1975 } 1976 else { 1977 // ignore DTSFlag (which must be zero) 1978 } 1979 } 1980 else { 1981 // this is strange and sub-optimal at best 1982 // a receiver should ignore this 1983 } 1984 } 1986 if ( OCRDeltaLength == 0) { // or OCRDeltaLength is absent 1987 // the RTP stream does not transport any OCR 1988 if ( SLConfig.OCRLenght == 0 ) { 1989 // this stream does not have any OCR 1990 } 1991 else { 1992 // illegal, normally the sender MUST detect 1993 // OCRs, replace them with OCRDelta and set 1994 // a relevant OCRDeltaLength value 1995 } 1996 } 1997 else { 1998 if ( SLConfig.OCRLenght == 0 ) { 1999 // this is strange and sub-optimal at best 2000 // a receiver should ignore this 2001 } 2002 else { 2003 SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i); 2004 if ( SLPacketHeader.OCRflag(i) == 1) { 2005 SLPacketHeader.objectClockReference(i) = 2006 corrected(RTP TimeStamp) + RSLH.OCRDelta(i); 2007 } 2008 } 2009 } 2011 In the "Single" packing style the AccessUnitEndFlag, if needed, is 2012 restored from the M bit, as follows: 2014 if ( SLConfig.useAccessUnitEndFlag == 0 ) { 2015 // this SL stream does not signal access unit ends 2016 else { 2017 SLPacketHeader.AccessUnitEndFlag = M bit; 2018 } 2020 In the "Multiple" packing style the AccessUnitEndFlag is untouched 2021 in RSLH. 2023 The other SL packet header fields SHALL remain as found in RSLH. 2025 Gentric et al. Expires March 2002 37 2026 RTP Payload Format for MPEG-4 Streams February 2002 2028 It is obvious that in the general case the reconstruction of the 2029 original SL packetized stream requires SL-awareness. However this 2030 payload format allows in all cases a receiver that does not know 2031 about the SL syntax to reconstruct the semantic of Elementary 2032 Streams for the following very useful features: 2033 - Packet order (decoding order) 2034 - Access Unit boundaries (using the M bit) 2035 - Access Unit fragments (fragment boundaries using PayloadSize) 2036 - Composition Time Stamps, according to: 2037 compositionTimeStamp(i) = RTP TimeStamp + CTSDelta(i); 2038 - Decoding Time Stamps, according to: 2039 decodingTimeStamp(i) = compositionTimeStamp(i) - DTSDelta(i); 2040 - Packet serial number, according to: 2041 if (i == 0){ // first SL packet in RTP packet 2042 packet serial number(0) = Index(0); 2043 } 2044 else { // remaining SL packets 2045 packet serial number (i+1) = packet serial number (i) 2046 + IndexDelta(i+1) + 1; 2047 } 2049 6.2 Handling of scene description streams 2051 MPEG-4 introduces new stream types as described in section 1 namely 2052 Object Descriptors and BIFS. In the following both OD and BIFS are 2053 discussed on the same basis i.e. as "scene description". 2055 Considering scene description as a "stream-able" type of content is 2056 a rather new concept and for that reasons some specific comments are 2057 needed. 2059 Typically scene descriptions are encoded in such a way that 2060 information loss would in the general case cripple the presentation 2061 beyond any hope of repair by the receiver. This is acceptable for a 2062 number of multimedia applications were the scene is first made 2063 available via reliable channels to the client and then played. This 2064 payload format is not primarily intended for this type of 2065 applications for which download of MPEG-4 interchange (.mp4) files 2066 would be typical. However this payload format can also be used. It 2067 is then RECOMMENDED however that the RTP packets should be 2068 transported using TCP (for example inside RTSP as described in [13, 2069 section 10.12]) or any other reliable protocol. 2071 On the other hand MPEG-4 has introduced the possibility to 2072 dynamically change the scene description by sending animation 2073 information (changes in parameters) and structural change 2074 information (updates). Since this information has to be sent in a 2075 timely fashion MPEG-4 has defined a number of techniques in order to 2076 encode the scene description in a manner that makes it behave 2077 similarly to other temporal encoding schemes such as audio and 2078 video. This payload format is intended for this usage. 2080 Gentric et al. Expires March 2002 38 2081 RTP Payload Format for MPEG-4 Streams February 2002 2083 Note that in many cases the application will consist of first the 2084 reliable transmission of a static initial scene followed by the 2085 streaming of animations and updates. For this reason the usage of 2086 this payload format is attractive since it offers a unique solution. 2088 Senders must be aware that suitable schemes should be used when 2089 scene description streams transport sensitive configuration 2090 information. For example in case the RTP packet transporting an OD- 2091 update command would be lost, the corresponding media stream would 2092 not be accessible by the receiver. 2094 Redundancy is a possibility and may either be added by tools 2095 hierarchically higher than this payload format, e.g. by packet based 2096 FEC, re-transmission, or similar tools. In such a case, the general 2097 congestion control principles have to be observed. 2099 Since BIFS and OD streams may be modified during the session with 2100 update commands, there is a need to send both update commands and 2101 full BIFS/OD refresh. For that reason MPEG-4 defines Random Access 2102 Points (RAP) for scene description streams (OD and BIFS) where by 2103 definition a decoder can restart decoding i.e. receives a "full 2104 update" of the scene. This mechanism is called Scene and Object 2105 Description Carousel. The AU Sequence Number field of SL Packet 2106 Header is used to support this behavior at the Sync Layer. When two 2107 access units are sent consecutively with the same AU Sequence 2108 Number, the second one is assumed to be a semantic repetition of the 2109 first. If a receiver starts to listen in the middle of a session or 2110 has detected losses, it can ignore all received AUs until such a 2111 RAP. The periodicity of transmission of these RAPs should be 2112 chosen/adjusted depending on the application and the network it is 2113 deployed on; i.e. exactly like Intra-coded frames for video, it is 2114 the responsibility of the sender to make sure the periodicity of 2115 RAPs is suitable. 2117 6.3 Overlap with RFC 3016 2119 This payload format has been designed to have a (large) overlap with 2120 RFC 3016 [7]. The conditions for this overlap are: 2122 Conditions for RFC 3016: 2123 C1. MPEG-4 video elementary streams only 2124 C2. There MUST be a single VOP or Video Packet per RTP packet (which 2125 is only recommended in RFC 3016) 2126 C3. The decoder configuration MUST be signaled out-of-band either 2127 using the Config mime parameter or using the OD framework 2129 Conditions for this payload format: 2130 C4. No MIME parameters defined (or all set to zero), i.e. "Single" 2131 packing style with empty Payload Header and empty RSLH. 2132 C5. Receivers MUST be ready to accept (and ignore) video 2133 configuration headers (e.g. VOSH, VO and VOL) and visual-object- 2134 sequence-end-code transported in-band. 2136 Gentric et al. Expires March 2002 39 2137 RTP Payload Format for MPEG-4 Streams February 2002 2139 Under conditions C2 and C4 the MPEG-4 video RTP packet structures 2140 are identical. Since C4 and C5 MUST be supported by implementations 2141 of this specification the conditions for RTP streams backward 2142 compatibility of this specification with RFC3016 are established 2143 when RFC3016 is used with condition C1, C2 and C3. Technically the 2144 most stringent condition is C2 but it is also a condition that makes 2145 a lot of sense for many reasons, whatever the application. 2147 Furthermore the MIME parameters have been aligned, specifically the 2148 parameters "config" and "profile-level-id" have the same name and 2149 signification in RFC3016 and in this memo. 2151 The remaining difference is therefore the MIME subtype name. It 2152 would be desirable then that specifications built upon this memo and 2153 enforcing the above minor usage restrictions of RFC3016 in order to 2154 provide a backward compatible solution would then specify that 2155 receivers can interpret the MIME subtype name "MP4V-ES" as being 2156 equivalent to MIME type "video" with subtype name "mpeg4-generic" 2157 and vice versa. 2159 In short this payload format is backward compatible with RFC3016 for 2160 video used in the recommended fashion. 2162 6.4 Multiplexing 2164 An advanced MPEG-4 session may involve a large number of objects 2165 that may be as many as a few hundred, transporting each ES as an 2166 individual RTP stream may not always be practical. Allocating and 2167 controlling hundreds of destination addresses for each MPEG-4 2168 session may pose insurmountable session administration problems. 2169 The input/output processing overhead at the end-points will be 2170 extremely high also. Additionally, low delay transmission of low 2171 bitrate data streams, e.g. facial animation parameters, results in 2172 extremely high header overheads. 2174 To solve these problems, MPEG-4 data transport requires a 2175 multiplexing scheme that allows selective bundling of several ESs. 2176 This is beyond the scope of the payload format defined here. 2178 The MPEG-4's Flexmux multiplexing scheme may be used for this 2179 purpose and a specific RTP payload format is being developed [11]. 2181 Another approach may be to develop a generic RTP multiplexing scheme 2182 usable for MPEG-4 data. The multiplexing scheme reported in [8] may 2183 be a candidate for this approach. 2185 For MPEG-4 applications, the multiplexing technique needs to address 2186 the following requirements: 2188 i. The ESs multiplexed in one stream can change frequently during a 2189 session. Consequently, the coding type, individual packet size and 2190 temporal relationships between the multiplexed data units must be 2191 handled dynamically. 2193 Gentric et al. Expires March 2002 40 2194 RTP Payload Format for MPEG-4 Streams February 2002 2196 ii. The multiplexing scheme should have a mechanism to determine the 2197 ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is 2198 not a part of the SL header. 2200 iii. In general, an SL packet does not contain information about its 2201 size. The multiplexing scheme should be able to delineate the 2202 multiplexed packets whose lengths may vary from a few octets to 2203 close to the path-MTU. 2205 7. Security Considerations 2207 RTP packets using the payload format defined in this specification 2208 are subject to the security considerations discussed in the RTP 2209 specification [5]. This implies that confidentiality of the media 2210 streams is achieved by encryption. Because the data compression used 2211 with this payload format is applied end-to-end, encryption may be 2212 performed on the compressed data so there is no conflict between the 2213 two operations. The packet processing complexity of this payload 2214 type (i.e. excluding media data processing) does not exhibit any 2215 significant non-uniformity in the receiver side to cause a denial- 2216 of-service threat. 2218 However, it is possible to inject non-compliant MPEG streams (Audio, 2219 Video, and Systems) to overload the receiver/decoder's buffers, 2220 which might compromise the functionality of the receiver or even 2221 crash it. This is especially true for end-to-end systems like MPEG 2222 where the buffer models are precisely defined. 2224 MPEG-4 Systems supports stream types including commands that are 2225 executed on the terminal like OD commands, BIFS commands, etc. and 2226 programmatic content like MPEG-J (Java(TM) Byte Code) and 2227 ECMAScript. It is possible to use one or more of the above in a 2228 manner non-compliant to MPEG to crash or temporarily make the 2229 receiver unavailable. 2231 Authentication mechanisms can be used to validate of the sender and 2232 the data to prevent security problems due to non-compliant malignant 2233 MPEG-4 streams. 2235 A security model is defined in MPEG-4 Systems streams carrying MPEG- 2236 J access units which comprises Java(TM) classes and objects. MPEG-J 2237 defines a set of Java APIs and a secure execution model. MPEG-J 2238 content can call this set of APIs and Java(TM) methods from a set of 2239 Java packages supported in the receiver within the defined security 2240 model. According to this security model, downloaded byte code is 2241 forbidden to load libraries, define native methods, start programs, 2242 read or write files, or read system properties. 2244 Receivers can implement intelligent filters to validate the buffer 2245 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 2247 Gentric et al. Expires March 2002 41 2248 RTP Payload Format for MPEG-4 Streams February 2002 2250 ECMAScript) commands in the streams. However, this can increase the 2251 complexity significantly. 2253 8. Acknowledgements 2255 This document evolved across several years through many revisions 2256 thanks to contributions from a large number of people since it is 2257 based on work within the IETF AVT working group and various ISO MPEG 2258 working groups, especially the 4-on-IP ad-hoc group. The authors 2259 wish to thank Olivier Avaro, Stephen Casner, Guido Fransceschini, 2260 Art Howarth, Dave Mackie, Dave Singer, and Stephan Wenger for their 2261 valuable comments and support. Attentive readers and early 2262 implementers also found flaws and bugs, thank you all. 2264 9. References 2266 [1] ISO/IEC 14496-1:2001 MPEG-4 Systems 2268 [2] ISO/IEC 14496-2:2001 MPEG-4 Visual 2270 [3] ISO/IEC 14496-3:2001 MPEG-4 Audio 2272 [4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework. 2274 [5] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A 2275 Transport Protocol for Real Time Applications, RFC 1889, Internet 2276 Engineering Task Force, January 1996. 2278 [6] S. Bradner, Key words for use in RFCs to Indicate Requirement 2279 Levels, RFC 2119, Internet Engineering Task Force, March 1997. 2281 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 2282 payload format for MPEG-4 Audio/Visual streams, Internet Engineering 2283 Task Force, RFC 3016. 2285 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed 2286 RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-04.txt, July 2287 2001. 2289 [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over 2290 IP-based Protocols, work in progress, draft-singer-mpeg4-ip-02.txt, 2291 May 2001. 2293 [10] M. Handley, V. Jacobson, SDP: Session Description Protocol, RFC 2294 2327, Internet Engineering Task Force, April 1998. 2296 [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed 2297 Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt, 2298 February 2001. 2300 [12] H. Schulzrinne, RTP Profile for Audio and Video Conferences 2301 with Minimal Control, RFC 1890, Internet Engineering Task Force, 2302 January 1996. 2304 Gentric et al. Expires March 2002 42 2305 RTP Payload Format for MPEG-4 Streams February 2002 2307 [13] H. Schulzrinne, A. Rao, R. Lanphier, Real Time Streaming 2308 Protocol, RFC 2326, Internet Engineering Task Force, April 1998. 2310 [14] M. Handley, C. Perkins, E. Whelan, Session Announcement 2311 Protocol, RFC 2974, Internet Engineering Task Force, October 2000. 2313 10. Authors' Addresses 2315 Andrea Basso 2316 AT&T Labs Research 2317 200 Laurel Avenue 2318 Middletown, NJ 07748 2319 USA 2320 e-mail: basso@research.att.com 2322 M. Reha Civanlar 2323 AT&T Labs - Research 2324 200 Laurel Ave. South, A5 4D04 2325 Middletown, NJ 07748 2326 USA 2327 e-mail: civanlar@research.att.com 2329 Philippe Gentric 2330 Philips Digital Networks, MP4Net 2331 51 rue Carnot 2332 92156 Suresnes 2333 France 2334 e-mail: philippe.gentric@philips.com 2336 Carsten Herpel 2337 THOMSON multimedia 2338 Karl-Wiechert-Allee 74 2339 30625 Hannover 2340 Germany 2341 e-mail: herpelc@thmulti.com 2343 Zvi Lifshitz 2344 Optibase Ltd. 2345 7 Shenkar St. 2346 Herzliya 46120 2347 Israel 2348 e-mail: zvil@optibase.com 2350 Young-Kwon Lim 2351 net&tv Co., Ltd. 2352 5th Floor Himart Building 2353 1007-46 Sadang-Dong Dongjak-Gu, 2354 Seoul, 156-090, 2355 Korea 2356 e-mail : young@netntv.co.kr 2358 Gentric et al. Expires March 2002 43 2359 RTP Payload Format for MPEG-4 Streams February 2002 2361 Colin Perkins 2362 USC Information Sciences Institute 2363 3811 N. Fairfax Drive suite 200 2364 Arlington, VA 22203 2365 USA 2366 e-mail : csp@isi.edu 2368 Jan van der Meer 2369 Philips Digital Networks 2370 Building WDB-1 2371 Prof Holstlaan 4 2372 5656 AA Eindhoven 2373 Netherlands 2374 e-mail : jan.vandermeer@philips.com 2376 APPENDIX: Examples of usage 2378 This section describes a number of examples of how this payload 2379 format can be used either with or without the Sync Layer. In all 2380 examples the Sync Layer syntax is given (which shows how it may 2381 become invisible in cases 1,3,4 and 5). 2383 A C++-like syntax called SDL (Syntactic Description Language) 2384 defined in [1, section 14] is used to economically describe MPEG-4 2385 system data structures. 2387 These examples assume that the (a=fmtp) SDP syntax is used to convey 2388 the MIME parameters of the payload format. 2390 Appendix.1 RFC 3016 compatible MPEG-4 Video (no SL) 2392 This is an example of a video stream compatible with RFC 3016. 2394 SLConfigDescriptor 2396 In this example the SLConfigDescriptor is: 2398 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2399 tag=SLConfigDescrTag { 2400 bit(8) predefined; 2401 if (predefined==0) { 2402 bit(1) useAccessUnitStartFlag; = 0 2403 bit(1) useAccessUnitEndFlag; = 1 2404 bit(1) useRandomAccessPointFlag; = 0 2405 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 2406 bit(1) usePaddingFlag; = 0 2407 bit(1) useTimeStampsFlag; = 0 2408 bit(1) useIdleFlag; = 0 2409 bit(1) durationFlag; = 0 2410 bit(32) timeStampResolution; = 0 2411 bit(32) OCRResolution; = 0 2412 bit(8) timeStampLength; = 0 2413 bit(8) OCRLength; = 0 2415 Gentric et al. Expires March 2002 44 2416 RTP Payload Format for MPEG-4 Streams February 2002 2418 bit(8) AU_Length; = 0 2419 bit(8) instantBitrateLength; = 0 2420 bit(4) degradationPriorityLength; = 0 2421 bit(5) AU_seqNumLength; = 0 2422 bit(5) packetSeqNumLength; = 0 2423 bit(2) reserved=0b11; 2424 } 2425 if (durationFlag) { 2426 bit(32) timeScale; // NOT USED 2427 bit(16) accessUnitDuration; // NOT USED 2428 bit(16) compositionUnitDuration; // NOT USED 2429 } 2430 if (!useTimeStampsFlag) { 2431 bit(timeStampLength) startDecodingTimeStamp; = 0 2432 bit(timeStampLength) startCompositionTimeStamp; = 0 2433 } 2434 } 2436 SL Packet Header structure 2438 With this configuration we have the following SL packet header 2439 structure: 2441 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2442 if (SL.useAccessUnitEndFlag) { 2443 bit(1) accessUnitEndFlag; // 1 bit 2444 } 2445 } 2447 In this case this payload produces RTP packets that are exactly 2448 conformant to RFC 3016 and the SL is reduced to a purely logical 2449 construction that neither sender nor receiver need to implement. 2451 Parameters 2453 This configuration is the default one; no parameters are required. 2455 RTP packet structure 2457 Note that accessUnitEndFlag is mapped to the RTP header M bit. 2459 +=========================================+=============+ 2460 | Field | size | 2461 +=========================================+=============+ 2462 | RTP header | - | 2463 +-----------------------------------------+-------------+ 2464 | Access Unit or AU fragment | 1400 octets | 2465 +-----------------------------------------+-------------+ 2467 Overhead 2469 In this example we have an RTP overhead of 40 octets for 1400 octets 2470 of payload i.e. 3 % overhead. 2472 Gentric et al. Expires March 2002 45 2473 RTP Payload Format for MPEG-4 Streams February 2002 2475 Appendix.2 MPEG-4 Video with SL 2477 Let us consider the case of a 30 frames per second MPEG-4 video 2478 stream which bit rate is high enough that Access Units have to be 2479 split in several SL packets (typically above 300 kb/s). 2481 Let us assume also that the video codec generates in that case Video 2482 Packets suitable to fit in one SL packet i.e that the video codec is 2483 MTU aware and the MTU is 1500 octets. We assume furthermore that 2484 this stream contains B frames and that decodingTimeStamps are 2485 present. 2487 SLConfigDescriptor 2489 In this example the SLConfigDescriptor is: 2491 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2492 tag=SLConfigDescrTag { 2493 bit(8) predefined; 2494 if (predefined==0) { 2495 bit(1) useAccessUnitStartFlag; = 1 2496 bit(1) useAccessUnitEndFlag; = 0 2497 bit(1) useRandomAccessPointFlag; = 1 2498 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 2499 bit(1) usePaddingFlag; = 0 2500 bit(1) useTimeStampsFlag; = 1 2501 bit(1) useIdleFlag; = 0 2502 bit(1) durationFlag; = 0 2503 bit(32) timeStampResolution; = 30 2504 bit(32) OCRResolution; = 0 2505 bit(8) timeStampLength; = 32 2506 bit(8) OCRLength; = 0 2507 bit(8) AU_Length; = 0 2508 bit(8) instantBitrateLength; = 0 2509 bit(4) degradationPriorityLength; = 0 2510 bit(5) AU_seqNumLength; = 0 2511 bit(5) packetSeqNumLength; = 0 2512 bit(2) reserved=0b11; 2513 } 2514 if (durationFlag) { 2515 bit(32) timeScale; // NOT USED 2516 bit(16) accessUnitDuration; // NOT USED 2517 bit(16) compositionUnitDuration; // NOT USED 2518 } 2519 if (!useTimeStampsFlag) { 2520 bit(timeStampLength) startDecodingTimeStamp; // NOT USED 2521 bit(timeStampLength) startCompositionTimeStamp; // NOT USED 2522 } 2523 } 2524 The useRandomAccessPointFlag is set so that the 2525 randomAccessPointFlag can indicate that the corresponding SL packet 2526 contains a GOV and the first Video Packet of an Intra coded frame. 2528 Gentric et al. Expires March 2002 46 2529 RTP Payload Format for MPEG-4 Streams February 2002 2531 SL Packet Header structure 2532 With this configuration we have the following SL packet header 2533 structure: 2535 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2536 bit(1) accessUnitStartFlag; // 1 bit 2537 if (accessUnitStartFlag) { 2538 bit(1) randomAccessPointFlag; // 1 bit 2539 bit(1) decodingTimeStampFlag; // 1 bit 2540 bit(1) compositionTimeStampFlag; // 1 bit 2541 if (decodingTimeStampFlag) { 2542 bit(SL.timeStampLength) decodingTimeStamp; 2543 } 2544 if (compositionTimeStampFlag) { 2545 bit(SL.timeStampLength) compositionTimeStamp; 2546 } 2547 } 2549 Parameters 2550 decodingTimeStamps are encoded on 32 bits, which is much more than 2551 needed for delta. Therefore the sender will use DTSDeltaLength to 2552 signal that only 7 bits are used for the coding of relative DTS in 2553 the RTP packet. 2555 The RSLHSectionSize cannot exceed 4 (bits), which is encoded on 3 2556 bits and signaled by RSLHSectionSizeLength. The resulting 2557 concatenated fmtp line is: 2559 a=fmtp: DTSDeltaLength=7;RSLHSectionSizeLength=3 2561 RTP packet structure 2562 Two cases can occur; for packets that transport first fragments of 2563 Access Units we have: 2565 +=========================================+=============+ 2566 | Field | size | 2567 +=========================================+=============+ 2568 | RTP header | - | 2569 +-----------------------------------------+-------------+ 2570 | DTSFlag = (1) | 1 bit | 2571 +-----------------------------------------+-------------+ 2572 | DTSDelta | 7 bits | 2573 +-----------------------------------------+-------------+ 2574 | bits to octet alignment | 0 bits | 2575 +-----------------------------------------+-------------+ 2576 | RSLHSectionSize = (100) | 3 bits | 2577 +-----------------------------------------+-------------+ 2578 | accessUnitStartFlag = (1) | 1 bit | 2579 +-----------------------------------------+-------------+ 2580 | randomAccessPointFlag | 1 bit | 2581 +-----------------------------------------+-------------+ 2582 | decodingTimeStampFlag | 1 bit | 2584 Gentric et al. Expires March 2002 47 2585 RTP Payload Format for MPEG-4 Streams February 2002 2587 +-----------------------------------------+-------------+ 2588 | compositionTimeStampFlag | 1 bit | 2589 +-----------------------------------------+-------------+ 2590 | bits to octet alignment =(0) | 1 bit | 2591 +-----------------------------------------+-------------+ 2592 | SL packet payload | N octets | 2593 +-----------------------------------------+-------------+ 2595 For packets that transport non-first fragments of Access Units we 2596 have: 2598 +=========================================+=============+ 2599 | Field | size | 2600 +=========================================+=============+ 2601 | RTP header | - | 2602 +-----------------------------------------+-------------+ 2603 | DTSFlag = 0 | 1 bit | 2604 +-----------------------------------------+-------------+ 2605 | bits to octet alignment = (0000000) | 7 bits | 2606 +-----------------------------------------+-------------+ 2607 | RSLHSectionSize = (001) | 3 bits | 2608 +-----------------------------------------+-------------+ 2609 | accessUnitStartFlag = (0) | 1 bit | 2610 +-----------------------------------------+-------------+ 2611 | bits to octet alignment = (0000) | 4 bits | 2612 +-----------------------------------------+-------------+ 2613 | SL packet payload | N octets | 2614 +-----------------------------------------+-------------+ 2616 Overhead estimation 2618 In this example we have a RTP overhead of 40 + 2 octets for 1400 2619 octets of payload i.e. 3 % overhead. 2621 Appendix.3 Low delay MPEG-4 Audio (no SL) 2623 This example is for a low delay audio service. For this reason a 2624 single Access Unit is transported in each RTP packet (in terms of 2625 Sync Layer each SL packet contains a complete Access Unit). 2627 SLConfigDescriptor 2629 Since CTS=DTS and Access Unit duration is constant, signaling of 2630 MPEG-4 time stamps is not needed (the durationFlag of SLConfig is 2631 set). 2633 We also assume here an audio Object Type for which all Access Units 2634 are Random Access Points, which is signaled using the 2635 hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. 2637 We assume furthermore a mode where the Access Unit size is constant 2638 and equal to 5 octets (which is signaled with AU_Length). 2640 Gentric et al. Expires March 2002 48 2641 RTP Payload Format for MPEG-4 Streams February 2002 2643 In this example the SLConfigDescriptor is: 2645 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2646 tag=SLConfigDescrTag { 2647 bit(8) predefined; 2648 if (predefined==0) { 2649 bit(1) useAccessUnitStartFlag; = 0 2650 bit(1) useAccessUnitEndFlag; = 0 2651 bit(1) useRandomAccessPointFlag; = 0 2652 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 2653 bit(1) usePaddingFlag; = 0 2654 bit(1) useTimeStampsFlag; = 0 2655 bit(1) useIdleFlag; = 0 2656 bit(1) durationFlag; = 1 // signals constant AU duration 2657 bit(32) timeStampResolution; = 0 2658 bit(32) OCRResolution; = 0 2659 bit(8) timeStampLength; = 0 2660 bit(8) OCRLength; = 0 2661 bit(8) AU_Length; = 5 2662 bit(8) instantBitrateLength; = 0 2663 bit(4) degradationPriorityLength; = 0 2664 bit(5) AU_seqNumLength; = 0 2665 bit(5) packetSeqNumLength; = 0 2666 bit(2) reserved=0b11; 2667 } 2668 if (durationFlag) { 2669 bit(32) timeScale; = 1000 // for milliseconds 2670 bit(16) accessUnitDuration; = 10 // ms 2671 bit(16) compositionUnitDuration; = 10 // ms 2672 } 2673 if (!useTimeStampsFlag) { 2674 bit(timeStampLength) startDecodingTimeStamp; = 0 2675 bit(timeStampLength) startCompositionTimeStamp; = 0 2676 } 2677 } 2679 SL packet header 2681 With this configuration the SL packet header is empty. The Sync 2682 Layer is reduced to a purely logical construction that neither 2683 sender nor receiver need to implement. 2685 Parameters 2687 No parameters are required. 2689 RTP packet structure 2691 Note that the RTP header M bit must be set to 1. 2693 +=========================================+=============+ 2694 | Field | size | 2696 Gentric et al. Expires March 2002 49 2697 RTP Payload Format for MPEG-4 Streams February 2002 2699 +=========================================+=============+ 2700 | RTP header | - | 2701 +-----------------------------------------+-------------+ 2702 | Access Unit | 5 octets | 2703 +-----------------------------------------+-------------+ 2705 Overhead estimation 2707 The overhead is extremely large i.e. more than 800 %, since 40 2708 octets of headers are required to transport 5 octets of data. Note 2709 however that RTP header compression would work well since time 2710 stamps increments are constant. 2712 Appendix.4 Media delivery MPEG-4 Audio (no SL) 2714 This example is for a media delivery service where delay is not an 2715 issue but efficiency is. In this case several Access Units are 2716 transported in each RTP packet. 2718 SLConfigDescriptor 2720 Similar to previous example. 2722 SL packet header 2724 With this configuration the SL packet header is empty. The Sync 2725 Layer is reduced to a purely logical construction that neither 2726 sender nor receiver need to implement. 2728 Parameters 2730 The absence of RSLHSectionSizeLength indicates that the RSLHSection 2731 is empty. 2733 The size of SL Packets (which are all complete Access Units in this 2734 case) is constant and is indicated with: 2735 a=fmtp: ConstantSize=5 2737 This also indicates to the receiver that the "Multiple" packing 2738 style will be used, the 2 octets field that would give the size of 2739 the Payload Header Section is ommited since in this case this field 2740 always contains zero (the Payload Header Section is always empty due 2741 to the absence of any other MIME parameter). 2743 RTP packet structure 2744 Note that the RTP header M bit is always set to 1, which indicates 2745 to the receiver that only complete Access Units are transported. 2747 +=========================================+=============+ 2748 | Field | size | 2749 +=========================================+=============+ 2750 | RTP header | - | 2752 Gentric et al. Expires March 2002 50 2753 RTP Payload Format for MPEG-4 Streams February 2002 2755 +-----------------------------------------+-------------+ 2756 | Access Unit data | 5 octets | 2757 +-----------------------------------------+-------------+ 2758 | Access Unit data | 5 octets | 2759 +-----------------------------------------+-------------+ 2760 | etc, until MTU is reached | 2761 +-----------------------------------------+-------------+ 2762 | Access Unit data | 5 octets | 2763 +-----------------------------------------+-------------+ 2765 Overhead estimation 2767 The overhead is 3% i.e. minimal. 2769 Appendix.5 AAC with interleaving (no SL) 2771 Let us consider AAC at 128 kb/s where each Access Unit is in the 2772 average 320 octets. Interleaving is applied using a continuous 2773 interleaving scheme (see table below) where 4 Access Units are used 2774 to construct each RTP packet in order to match a MTU of 1500 octets. 2776 IndexDelta is constant and equal to 2 (since +1 is automatically 2777 added); it is encoded on 2 bits. 2779 As explained in section 3.8 this is a time stamp based interleaving 2780 (TSBI) scheme (IndexLength=0); indeed receivers know that each 2781 payload is a complete Access Unit because all RTP packets have the M 2782 bit set to 1 and therefore, since Access Unit duration is constant, 2783 Access Unit timestamps can be computed from RTP timestamps and 2784 IndexDelta values; this can be used for de-interleaving even in case 2785 of losses. 2786 Note that it is also be possible to use IndexLength=2 so as to 2787 maintain a octet alignement in the Payload Header portions; in this 2788 case however the value of these two bits MUST be zero as stated in 2789 3.8.1. This solution is used in the companion RFC YYYY. 2791 +-----------------------------------------------------------------+ 2792 | RTP packet | RTP Timestamp | Aus | IndexDelta | 2793 +-----------------------------------------------------------------+ 2794 | 1 | CTS(AU1) | 1 | - | 2795 +-----------------------------------------------------------------+ 2796 | 2 | CTS(AU2) | 2, 5 | -,2 | 2797 +-----------------------------------------------------------------+ 2798 | 3 | CTS(AU3) | 3, 6, 9 | -,2,2 | 2799 +-----------------------------------------------------------------+ 2800 | 4 | CTS(AU4) | 4, 7,10,13 | -,2,2,2 | 2801 +-----------------------------------------------------------------+ 2802 | 5 | CTS(AU8) | 8,11,14,17 | -,2,2,2 | 2803 +-----------------------------------------------------------------+ 2804 | 6 | CTS(AU12) | 12,15,18,21 | -,2,2,2 | 2805 +-----------------------------------------------------------------+ 2806 | 7 | CTS(AU16) | 16,19,22,25 | -,2,2,2 | 2808 Gentric et al. Expires March 2002 51 2809 RTP Payload Format for MPEG-4 Streams February 2002 2811 +----------------------------------------------------------------+ 2812 | 8 | CTS(AU20) | 20,23,26,29 | -,2,2,2 | 2813 +-----------------------------------------------------------------+ 2814 | 9 | CTS(AU24) | 24,27,30,33 | -,2,2,2 | 2815 +-----------------------------------------------------------------+ 2816 | 10 | CTS(AU28) | 28,31,34,37 | -,2,2,2 | 2817 +-----------------------------------------------------------------+ 2818 | etc | 2819 +-----------------------------------------------------------------+ 2821 SLConfigDescriptor 2823 Similar to previous example. 2825 SL Packet Header 2827 Similar to previous example (empty). 2829 Parameters 2831 The resulting concatenated fmtp line is: 2833 a=fmtp: SizeLength=9; IndexDeltaLength=2; 2835 RTP packet structure 2837 +=========================================+=============+ 2838 | Field | size | 2839 +=========================================+=============+ 2840 | RTP header | - | 2841 +-----------------------------------------+-------------+ 2842 Payload Header Section 2843 +=========================================+=============+ 2844 | PayloadHeaderSection size = (42) | 2 octets | 2845 +-----------------------------------------+-------------+ 2846 | PayloadSize | 9 bits | 2847 +-----------------------------------------+-------------+ 2848 | PayloadSize | 9 bits | 2849 +-----------------------------------------+-------------+ 2850 | IndexDelta | 2 bits | 2851 +-----------------------------------------+-------------+ 2852 | PayloadSize | 9 bits | 2853 +-----------------------------------------+-------------+ 2854 | IndexDelta | 2 bits | 2855 +-----------------------------------------+-------------+ 2856 | PayloadSize | 9 bits | 2857 +-----------------------------------------+-------------+ 2858 | IndexDelta | 2 bits | 2859 +-----------------------------------------+-------------+ 2860 | bits to octet alignment = (000000) | 6 bits | 2861 +-----------------------------------------+-------------+ 2862 Payload Section 2863 +=========================================+=============+ 2865 Gentric et al. Expires March 2002 52 2866 RTP Payload Format for MPEG-4 Streams February 2002 2868 | AAC Access Unit | x octets | 2869 +-----------------------------------------+-------------+ 2870 | AAC Access Unit | x octets | 2871 +-----------------------------------------+-------------+ 2872 | AAC Access Unit | x octets | 2873 +-----------------------------------------+-------------+ 2874 | AAC Access Unit | x octets | 2875 +-----------------------------------------+-------------+ 2877 Overhead estimation 2879 The PayloadHeaderSection is 8 octets; in this example we have 2880 therefore a RTP overhead of 40 + 8 octets for 1400 octets (approx) 2881 of payload i.e. around 4 % overhead. 2883 Appendix.6 AAC with Index-based interleaving and SL 2885 Let us consider AAC around 130 kb/s where each Access Unit is split 2886 in 4 SL packets corresponding to Error Sensitivity Categories (ESC) 2887 of maximum 90 octets for which interleaving is very useful in terms 2888 of error resilience. We thus use an interleaving scheme where 15 SL 2889 Packets (extracted from 15 consecutive Access Units) are used to 2890 construct each RTP packet in order to match a MTU of 1500 octets. 2891 Note that since ESC fragments are not octet aligned we also use the 2892 paddingFlag and paddingBits features of the Sync Layer. The 2893 interleaving sequence is 4 RTP packets and 350 ms long, which is too 2894 long for conferencing but perfectly OK for Internet radio. 2896 Since the sequence contains 60 SL packets, IndexLength is set to 16 2897 bits so as to provide a safe margin in case of long loss bursts. 2898 This will also indicate to the receiver that this is a Index-Based- 2899 Interleaving scheme (and indeed CTS cannot be computed for SL 2900 packets that are not AU starts so TSBI would not work). 2902 2 bits are enough for IndexDelta, which is constant and equal to 3 2903 (since +1 is automatically added). 2905 Note that the 4th RTP packet in each sequence has its M bit set to 1 2906 since it contains 15 SL packets transporting the end of 15 2907 consecutive Access Units. 2909 With this scheme a sender (for example upon reception of RTCP 2910 reports indicating high loss rates) can (for example) choose to 2911 duplicate for each interleaving sequence the first RTP packet that 2912 contains the most useful data in terms of ESC or apply other error 2913 protection techniques, with due care to congestion issues. 2915 In this example we will also show several other SL features (OCR, AU 2916 boundary flags, padding, as detailed below). 2918 Gentric et al. Expires March 2002 53 2919 RTP Payload Format for MPEG-4 Streams February 2002 2921 One feature demonstrated by this example is the degradation 2922 priority. We assume degradation priority can take 4 different 2923 values, mapped to Error Sensitivity Categories, and is encoded on 2 2924 bits. This interleaving scheme makes sure that only SL packets of 2925 identical degradation priorities are grouped in the same RTP packet 2926 (3.6.3) and that only the first RSLH of each RTP packet transports 2927 the degradation priority. We also assume that for each last SL 2928 packet of each RTP packet the server inserts an OCR. 2930 SLConfigDescriptor 2931 In this example the SLConfigDescriptor is: 2932 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2933 tag=SLConfigDescrTag { 2934 bit(8) predefined; 2935 if (predefined==0) { 2936 bit(1) useAccessUnitStartFlag; = 1 2937 bit(1) useAccessUnitEndFlag; = 1 2938 bit(1) useRandomAccessPointFlag; = 0 2939 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 2940 bit(1) usePaddingFlag; = 1 // we need to signal padding bits 2941 bit(1) useTimeStampsFlag; = 0 2942 bit(1) useIdleFlag; = 0 2943 bit(1) durationFlag; = 1 2944 bit(32) timeStampResolution; = 0 2945 bit(32) OCRResolution; = 30 2946 bit(8) timeStampLength; = 0 2947 bit(8) OCRLength; = 32 2948 bit(8) AU_Length; = 0 2949 bit(8) instantBitrateLength; = 0 2950 bit(4) degradationPriorityLength; = 2 2951 bit(5) AU_seqNumLength; = 0 2952 bit(5) packetSeqNumLength; = 6 2953 bit(2) reserved=0b11; 2954 } 2955 if (durationFlag) { 2956 bit(32) timeScale; = 1000// milliseconds 2957 bit(16) accessUnitDuration; = 23.22 // ms 2958 bit(16) compositionUnitDuration; = 23.22 // ms 2959 } 2960 if (!useTimeStampsFlag) { 2961 bit(timeStampLength) startDecodingTimeStamp; = 0 2962 bit(timeStampLength) startCompositionTimeStamp; = 0 2963 } 2964 } 2966 SL Packet Header structure 2967 With this configuration we have the following SL packet header 2968 structure: 2969 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2970 bit(1) accessUnitStartFlag; 2971 bit(1) accessUnitEndFlag; 2972 bit(1) OCRflag; 2973 bit(1) paddingFlag; 2975 Gentric et al. Expires March 2002 54 2976 RTP Payload Format for MPEG-4 Streams February 2002 2978 if (paddingFlag) bit(3) paddingBits; 2979 bit(SL.packetSeqNumLength) packetSequenceNumber; 2980 bit(1) DegPrioflag; 2981 if (DegPrioflag) { 2982 bit(SL.degradationPriorityLength) degradationPriority;} 2983 if (OCRflag) { 2984 bit(SL.OCRLength) objectClockReference;} 2985 } 2986 } 2988 Parameters 2989 The resulting concatenated fmtp line is: 2990 a=fmtp: SizeLength=7; RSLHSectionSizeLength=8; 2991 IndexLength=16; IndexDeltaLength=2; OCRDeltaLength=16 2993 RTP packet structure 2994 +=========================================+=============+ 2995 | Field | size | 2996 +=========================================+=============+ 2997 | RTP header | - | 2998 +-----------------------------------------+-------------+ 2999 Payload Header Section 3000 +=========================================+=============+ 3001 | Payload Header Section size = 149 bits | 2 octets | 3002 +-----------------------------------------+-------------+ 3003 | PayloadSize | 7 bits | 3004 +-----------------------------------------+-------------+ 3005 | Index | 16 bits | 3006 +-----------------------------------------+-------------+ 3007 | PayloadSize | 7 bits | 3008 +-----------------------------------------+-------------+ 3009 | IndexDelta = (11) | 2 bits | 3010 +-----------------------------------------+-------------+ 3011 | etc + 12 times 9 bits | 3012 +-----------------------------------------+-------------+ 3013 | PayloadSize | 7 bits | 3014 +-----------------------------------------+-------------+ 3015 | IndexDelta = (11) | 2 bits | 3016 +-----------------------------------------+-------------+ 3017 | bits to octet alignment = (000) | 3 bits | 3018 +-----------------------------------------+-------------+ 3019 RSLHSection 3020 +=========================================+=============+ 3021 | RSLHSectionSize = (10000111) | 8 bits | 3022 +-----------------------------------------+-------------+ 3023 | accessUnitStartFlag | 1 bit | 3024 +-----------------------------------------+-------------+ 3025 | accessUnitEndFlag | 1 bit | 3026 +-----------------------------------------+-------------+ 3027 | OCRFlag = (0) | 1 bit | 3028 +-----------------------------------------+-------------+ 3029 | paddingFlag = (1) | 1 bit | 3030 +-----------------------------------------+-------------+ 3032 Gentric et al. Expires March 2002 55 3033 RTP Payload Format for MPEG-4 Streams February 2002 3035 | paddingBits | 3 bits | 3036 +-----------------------------------------+-------------+ 3037 | DegPrioflag = (1) | 1 bit | 3038 +-----------------------------------------+-------------+ 3039 | degradationPriority | 2 bits | 3040 +-----------------------------------------+-------------+ 3041 | accessUnitStartFlag | 1 bit | 3042 +-----------------------------------------+-------------+ 3043 | accessUnitEndFlag | 1 bit | 3044 +-----------------------------------------+-------------+ 3045 | OCRFlag = (0) | 1 bit | 3046 +-----------------------------------------+-------------+ 3047 | paddingFlag = (1) | 1 bit | 3048 +-----------------------------------------+-------------+ 3049 | paddingBits | 3 bits | 3050 +-----------------------------------------+-------------+ 3051 | DegPrioflag = (0) | 1 bit | 3052 +-----------------------------------------+-------------+ 3053 | etc + 12 times 8 bits | 3054 +-----------------------------------------+-------------+ 3055 | accessUnitStartFlag | 1 bit | 3056 +-----------------------------------------+-------------+ 3057 | accessUnitEndFlag | 1 bit | 3058 +-----------------------------------------+-------------+ 3059 | OCRFlag = (1) | 1 bit | 3060 +-----------------------------------------+-------------+ 3061 | OCRDelta | 16 bits | 3062 +-----------------------------------------+-------------+ 3063 | paddingFlag = (0) | 1 bit | 3064 +-----------------------------------------+-------------+ 3065 | DegPrioflag = (0) | 1 bit | 3066 +-----------------------------------------+-------------+ 3067 | bits to octet alignment = (000) | 3 bits | 3068 +-----------------------------------------+-------------+ 3069 Payload Section 3070 +=========================================+=============+ 3071 | SL packet payload |max 90 octets| 3072 +-----------------------------------------+-------------+ 3073 | etc + 13 SL packets | 3074 +-----------------------------------------+-------------+ 3075 | SL packet payload |max 90 octets| 3076 +-----------------------------------------+-------------+ 3078 Note that in the above table the last SL packet in the RTP packet 3079 has a payload that is octet-aligned (at the end). When this happens 3080 paddingFlag is set to zero and the paddingBits field is omitted. 3082 Overhead estimation 3084 The PayloadHeaderSection is 19 octets, the RSLHSection is 16 octets; 3085 in this example we have therefore a RTP overhead of 40 + 35 octets 3086 for 1350 octets of payload i.e. around 6 % overhead. 3088 Gentric et al. Expires March 2002 56