idnits 2.17.1 draft-ietf-avt-mpeg4-multisl-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 50 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 197 has weird spacing: '... media unawa...' == Line 819 has weird spacing: '...aLength bits)...' == Line 2445 has weird spacing: '...dicated with:...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 2002) is 8017 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 491, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '7') (Obsoleted by RFC 6416) == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-04 == Outdated reference: A later version (-04) exists of draft-singer-mpeg4-ip-02 -- Possible downref: Normative reference to a draft: ref. '9' ** Obsolete normative reference: RFC 2327 (ref. '10') (Obsoleted by RFC 4566) == Outdated reference: A later version (-03) exists of draft-curet-avt-rtp-mpeg4-flexmux-00 -- Possible downref: Normative reference to a draft: ref. '11' ** Obsolete normative reference: RFC 1890 (ref. '12') (Obsoleted by RFC 3551) ** Obsolete normative reference: RFC 2326 (ref. '13') (Obsoleted by RFC 7826) ** Downref: Normative reference to an Experimental RFC: RFC 2974 (ref. '14') Summary: 11 errors (**), 0 flaws (~~), 10 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Basso-AT&T 3 Internet Draft Civanlar-AT&T 4 Gentric-Philips 5 Herpel-Thomson 6 Lifshitz-Optibase 7 Lim-mp4cast 8 Perkins-ISI 9 Van Der Meer-Philips 10 November 2001 11 Expires May 2002 12 Document: draft-ietf-avt-mpeg4-multisl-03.txt 14 RTP Payload Format for MPEG-4 Streams 16 Status of this Memo 18 This document is an Internet-Draft and is in full conformance with 19 all provisions of Section 10 of RFC2026. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. Internet-Drafts are draft documents valid for a maximum of 25 six months and may be updated, replaced, or obsoleted by other 26 documents at any time. It is inappropriate to use Internet- Drafts 27 as reference material or to cite them other than as "work in 28 progress." 30 This specification is a product of the Audio/Video Transport working 31 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 32 ad hoc group on MPEG-4 over Internet. Comments are solicited and 33 should be addressed to the working group's mailing list at 34 avt@ietf.org and/or the authors. 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 This document contains a MIME type registration form that is 42 intended to be taken as-is and therefore makes reference to this 43 document, using the temporary placeholder: . 45 Abstract 47 This document describes a payload format for transporting MPEG-4 48 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 49 the coding of natural and synthetic audio-visual data. Several 50 services provided by RTP are beneficial for MPEG-4 encoded data 51 transport over the Internet. Additionally, the use of RTP makes it 52 possible to synchronize MPEG-4 data with other real-time data types. 54 Gentric et al. Expires March 2002 1 55 RTP Payload Format for MPEG-4 Streams September 2001 57 1. Introduction 59 MPEG-4 is a recent standard from ISO/IEC for the coding of natural 60 and synthetic audio-visual data in the form of audiovisual objects 61 that are arranged into an audiovisual scene by means of a scene 62 description [1][2][3][4]. This draft specifies an RTP [5] payload 63 format for transporting MPEG-4 encoded data streams. 65 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 66 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 67 this document are to be interpreted as described in RFC 2119 [6]. 69 The benefits of using RTP for MPEG-4 data stream transport include: 71 i. Ability to synchronize MPEG-4 streams with other RTP payloads 73 ii. Monitoring MPEG-4 delivery performance through RTCP 75 iii. Combining MPEG-4 and other real-time data streams received from 76 multiple end-systems into a set of consolidated streams through RTP 77 mixers 79 iv. Converting data types, etc. through the use of RTP translators. 81 1.1 Overview of MPEG-4 End-System Architecture 83 Two types of terminals can use this specification. One case is a 84 complete MPEG-4 terminal i.e. a terminal implementing the MPEG-4 85 system [1] specification and possibly also MPEG-4 video [2] and 86 audio [3]. Another possibility is a terminal implementing only a 87 part of this set of MPEG-4 specification; one example is a terminal 88 using MPEG-4 video [2] but not MPEG-4 systems as in RFC3016. 90 This document is structured so as to be understandable from both 91 points of view (with or without MPEG-4 systems). The target is also 92 that services deployed for one type of terminal can be adapted for 93 the other type thanks to minor session description change because 94 recorded streams are the same. Another key assumption is that the 95 properties of streams of various type (video, audio, scene 96 description) can be described with the same Elementary Stream model 97 so that this same payload format can transport any MPEG-4 stream. 99 1.1.1 The simplified MPEG-4 model 101 In the simplified MPEG-4 model MPEG-4 systems [1] is not used. 102 However the concept of Elementary Stream remains i.e. both MPEG-4 103 video [2] and MPEG-4 audio [3] describe how respectively audio and 104 video bit streams are fragmented into pieces that are called Access 105 Units. Each Access Unit has by definition a number of media 106 independent basic properties: 107 . composition time stamp 108 . framing 109 . possibly decoding time stamp 111 Gentric et al. Expires March 2002 2 112 RTP Payload Format for MPEG-4 Streams September 2001 114 Furthermore both the video [2] and audio [3] specification also 115 define how Access Units (AU) shall be themselves fragmented since in 116 the spirit of Application Level Framing AUs SHOULD be fragmented in 117 a way that decoders can process the packets immediately after a 118 packet loss. In this case the signaling of Access Unit fragment 119 boundaries is also required. 121 In order to be understandable from this point of view this payload 122 format is described in terms of Access Units (AU) and Access Units 123 fragments, without reference to media specific properties (but for a 124 few exceptions). 126 1.1.2 The complete MPEG-4 model 128 Fig. 1 below shows the layered architecture of a terminal, which 129 implements the complete MPEG-4 systems model. The Compression Layer 130 processes individual audio-visual media streams. The MPEG-4 131 compression schemes are defined in the ISO/IEC specifications 14496- 132 2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve 133 efficient encoding over a bandwidth ranging from a few kbps to many 134 Mbps. The audio-visual content compressed by this layer is organized 135 into Elementary Streams (ESs). 137 The MPEG-4 standard specifies MPEG-4 compliant streams. Within the 138 constraint of this compliance the compression layer is unaware of a 139 specific delivery technology, but it can be made to react to the 140 characteristics of a particular delivery layer such as the path-MTU 141 or loss characteristics. Also, some compressors can be designed to 142 be delivery specific for implementation efficiency. In such cases 143 the compressor may work in a non-optimal fashion with delivery 144 technologies that are different than the one it is specifically 145 designed to operate with. 147 The hierarchical relations, location and properties of ESs in a 148 presentation are described by a dynamic set of Object Descriptors 149 (ODs). Each OD groups one or more ES Descriptors referring to a 150 single content item (audio-visual object). Hence, multiple 151 alternative or hierarchical representations of each content item are 152 possible. 154 ODs are themselves conveyed through one or more ESs. A complete set 155 of ODs can be seen as an MPEG-4 resource or session description at a 156 stream level. The resource description may itself be hierarchical, 157 i.e. an ES conveying an OD may describe other ESs conveying other 158 ODs. 160 The session description is accompanied by a dynamic scene 161 description, Binary Format for Scene (BIFS), again conveyed through 162 one or more ESs. At this level, content is identified in terms of 163 audio-visual objects. The spatio-temporal location of each object is 164 defined by BIFS. The audio-visual content of those objects that are 165 synthetic and static are described by BIFS also. Natural and 167 Gentric et al. Expires March 2002 3 168 RTP Payload Format for MPEG-4 Streams September 2001 170 animated synthetic objects may refer to an OD that points to one or 171 more ESs that carries the coded representation of the object or its 172 animation data. 174 media aware +-----------------------------------------+ 175 delivery unaware | COMPRESSION LAYER | 176 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 177 14496-3 Audio +-----------------------------------------+ 179 Elementary 180 Stream 181 ===================================================Interface 183 (ESI) 184 +-------------------------------------------+ 185 media and | SYNC LAYER | 186 delivery unaware | manages elementary streams, their synch- | 187 14496-1 Systems | ronization and hierarchical relations | 188 +-------------------------------------------+ 190 DMIF 191 Application 192 ====================================================Interface 194 (DAI) 195 +-------------------------------------------+ 196 delivery aware | DELIVERY LAYER | 197 media unaware |provides transparent access to and delivery| 198 14496-6 DMIF | of content irrespective of delivery | 199 | technology | 200 +-------------------------------------------+ 202 Figure 1: Conceptual MPEG-4 terminal architecture 204 By conveying the session (or resource) description as well as the 205 scene (or content composition) description through their own ESs, it 206 is made possible to change portions of the content composition and 207 the number and properties of media streams that carry the audio- 208 visual content separately and dynamically at well known instants in 209 time. 211 One or more initial Scene Description streams and the corresponding 212 OD stream are pointed to by an initial object descriptor (IOD). In 213 this context the IOD needs to be made available to the receivers 214 through some out-of-band means that are out of scope of this payload 215 specification. However in the context of transport on IP networks it 216 is defined in a separate document [9]. 218 The Compression Layer organizes the ESs in Access Units (AU), the 219 smallest elements that can be attributed individual timestamps. The 220 Access Units concept defines the boundary between media specific 221 processing and delivery specific processing. That is to say 223 Gentric et al. Expires March 2002 4 224 RTP Payload Format for MPEG-4 Streams September 2001 226 transport should not depend on the nature of the media data but only 227 on AU properties. 229 1.1.3 The Sync Layer 231 The Sync Layer (SL) that primarily provides the synchronization 232 between streams defines a homogeneous encapsulation of ESs carrying 233 media or control data (ODs, BIFS). Integer or fractional AUs are 234 then encapsulated in SL packets. 236 All consecutive data from one stream is called an SL-packetized 237 stream. The interface between the compression layer and the SL is 238 called the Elementary Stream Interface (ESI). The ESI is informative 239 i.e. it is extremely useful in order to define concepts and 240 mechanisms but does not have to be implemented. 242 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 243 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 244 media unaware but delivery technology aware. It provides transparent 245 access to and delivery of content irrespective of the technologies 246 used. The interface between the SL and DMIF is called the DMIF 247 Application Interface (DAI). It offers content location independent 248 procedures for establishing MPEG-4 sessions and access to transport 249 channels. This payload format can be used as an instance of the 250 MPEG-4 Delivery Layer but is otherwise not tied to DMIF. 252 The ESs from the encoders are fed into the SL with indications of AU 253 boundaries, random access points, desired composition time and the 254 current time. The Sync Layer fragments the ESs into SL packets, each 255 containing a header that encodes information conveyed through the 256 ESI. If the AU is larger than a SL packet, subsequent packets 257 containing remaining parts of the AU are generated with subset 258 headers until the complete AU is packetized. One SL packet describes 259 an Access Units or fragments thereof, the SL packet header contains 260 extended timing and framing information; the SL packet payload 261 contains the bit stream frame (AU) or fragment. For the complete 262 list of features of the Sync Layer refer to the MPEG-4 systems 263 specification [1]. The syntax of the Sync Layer is configurable and 264 can be adapted to the needs of the stream to be transported. This 265 includes the possibility to select the presence or absence of 266 individual syntax elements as well as configuration of their length 267 in bits. The configuration for each individual stream is conveyed in 268 a SLConfigDescriptor, which is an integral part of the ES Descriptor 269 for this stream. The MPEG-4 SLConfigDescriptor, being configuration 270 information, is not carried by the media stream itself but is rather 271 transported via an ObjectDescriptor Stream encoded using the MPEG-4 272 Object Description framework. This can be done in a separate stream 273 using this payload format (see section 5.2 for details). The 274 SLConfigDescriptor MAY also be transported by other means (for 275 example as a parameter, see section 4.1). 277 An important point is to note that this draft could just as well 278 have been entirely written in terms of SL packets instead of Access 280 Gentric et al. Expires March 2002 5 281 RTP Payload Format for MPEG-4 Streams September 2001 283 Units and Access Unit fragments. However this could have created 284 confusion for implementers who only need basic properties and do not 285 want to cope with the additional complexity of the Sync Layer. 286 Instead this specification refers to the Sync Layer only when 287 needed. 289 1.1.4 Where the two models meet 291 In basic cases an Elementary Stream is such that SL packets are 292 reduced to the media (compressed) data (empty headers) and in that 293 case implementations do not actually need to be aware of the Sync 294 Layer at all. In these cases it is logically equivalent to say that 295 the Sync Layer is not implemented or to say that the SL packet 296 headers are completely empty (or fully map into the RTP headers). 297 The Sync Layer can then be seen as a purely conceptual construction 298 that does not have to be implemented at all. 300 The above described MPEG-4 system model also deals with session 301 setup through Object Descriptors. In cases where the complete MPEG-4 302 system framework is not used a replacement for this key functionally 303 is required. In fact for simple (audio/video) systems only the 304 knowledge of the decoder configuration is needed; we will see how 305 this specification defines options so that decoder configuration can 306 also be signaled without MPEG-4 system. 308 In conclusion this payload format is intended to be capable of 309 transporting data formatted according to the Sync Layer 310 specification but is also useful without the Sync Layer, or when the 311 Sync Layer is invisible, which is equivalent to not using it. 313 2. Analysis of the carriage of MPEG-4 over IP 315 As explained above when transporting MPEG-4 audio and video, 316 applications may or may not require the use of MPEG-4 systems. To 317 achieve the highest level of interoperability between all MPEG-4 318 applications, it is desirable that (a) in both cases the same MPEG-4 319 transport format can be used and that (b) receivers that have no 320 MPEG-4 system knowledge can easily skip the MPEG-4 system specific 321 information, if any. 323 2.1 The Sync Layer point of view 325 RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4 326 video, but when using MPEG-4 systems a problem arises from the fact 327 that both RTP and MPEG-4 systems contain a synchronization layer. 328 In particular, the RTP header duplicates some of the information 329 provided in SL packet headers such as the composition timestamps 330 (CTSs) and Access Unit boundaries. 332 To avoid unnecessary overhead and potential interoperability risks 333 when transporting MPEG-4 systems, it is desirable to remove the 334 redundancy between the SL packet header and the RTP packet header. 336 Gentric et al. Expires March 2002 6 337 RTP Payload Format for MPEG-4 Streams September 2001 339 To be independent on the use of MPEG-4 systems, synchronization can 340 rely on the parameters provided in the RTP header. 341 Another desired property is to have compatibility with RFC3016 for 342 MPEG-4 video transport. 343 In case SL headers are used, the redundant fields are removed from 344 the SL header, producing "reduced SL headers". The remaining 345 information from the SL header, if any, is contained inside the RTP 346 packet payload, together with the SL packet payload. 347 The combination of RTP packet headers and reduced SL packet headers 348 can be used to logically map the RTP packets to complete SL packets. 350 Some of the information contained in the reduced SL headers is also 351 useful for transport over RTP when an MPEG-4 system is not used. 353 For that reason the information in the "reduced" SL headers is split 354 into "general useful information" and "MPEG-4 systems only 355 information". 357 The "general useful information" hereinafter called Payload Header 358 is carried by a number of fields configurable using parameters 359 defined in section 4.1; all receivers MUST parse these fields. 361 The "MPEG-4 systems only information", if any, is contained in an 362 auxiliary header, hereinafter called Remaining SL Packet Header 363 (RSLH), also configured using parameters (see section 4.1) and 364 preceded by a length field, so that non-MPEG-4-system devices MAY 365 skip this information. 367 This is depicted in figure 2a. 369 +------------+ 370 extended framing and | AU or AU | 371 timing information | fragment | 372 +------------+ 373 | | 374 | | 375 | | 376 | | 377 V V 379 <----------SL Packet--------> 381 +---------------------------+ 382 | SL Packet | SL Packet | 383 | Header | Payload | 384 +---------------------------+ 385 | | 386 | | 387 +-------------+----------+---+ | 388 | | | | 389 V V V V 390 +-----------+ +-----------+ +-------------+ +-----------+ 391 |RTP Packet | | Payload | | Remaining SL| | SL Packet | 393 Gentric et al. Expires March 2002 7 394 RTP Payload Format for MPEG-4 Streams September 2001 396 | Header | | Header | | Header | | Payload | 397 +-----------+ +-----------+ +-------------+ +-----------+ 399 <----RTP Packet Payload-------------------> 401 Figure 2a: Mapping of ES into SL, then SL Packet into RTP packet 403 2.2 The Elementary Stream point of view 405 Another way to see the mapping of Elementary Streams (i.e. Access 406 Units or AU fragments) into RTP packets is depicted in figure 2.b. 407 In this view the "basic" timing and fragmentation information listed 408 in section 1.1.1 is obtained directly at the codec interfaces and 409 mapped into the RTP header or the RTP Payload Header. 411 For example this RTP payload format has been designed so that it is 412 by default configured to be identical to RFC 3016 for the 413 recommended MPEG-4 video configurations, specifically in this case 414 the Payload Header is empty. Hence receivers that comply with this 415 payload specification can decode such RTP payload without knowledge 416 about the Sync Layer (see the example in Appendix 1). In a similar 417 fashion but with non-empty Payload Headers, MPEG-4 audio (see 418 Appendix 3 and 4 for examples) can be transported without explicit 419 use of the Sync Layer. 421 +------------+ 422 basic framing and | AU or AU | 423 timing information | fragment | 424 +------------+ 425 | | 426 | | 427 +-------------+ | 428 | | | 429 V V V 430 +-----------+ +-----------+ +-----------+ 431 |RTP Packet | | Payload | | | 432 | Header | | Header | | Payload | 433 +-----------+ +-----------+ +-----------+ 435 <----RTP Packet Payload---> 437 Figure 2b: Direct mapping of Elementary Streams into RTP packet 439 2.3 How the two views reconcile 441 A simple concept enables to unify these apparently antagonistic 442 points of view: a "no-SL" terminals can skip (ignore) the Remaining 443 SL Header, if present. 445 Gentric et al. Expires March 2002 8 446 RTP Payload Format for MPEG-4 Streams September 2001 448 3. Payload Format 450 The RTP Payload corresponds to an integer number of Access Units or 451 Access Unit fragments. 453 The RTP payload is composed of 3 sections: 454 . a Payload Header section 455 . a RSLH section 456 . a Payload Section. 458 The AU and AU fragment boundaries and timing information is 459 transported in the Payload Header. 461 When transporting SL streams, SL Packet Headers are transformed into 462 Remaining SL Header (RSLH) with some fields extracted to be mapped 463 in the RTP header and others extracted to be mapped in the 464 corresponding Payload Header. 466 The AU or AU fragment data (SL packet payload) i.e. Elementary 467 stream codec data is unchanged. 469 This payload format has two modes. The "Single" mode is a mode where 470 a single AU or AU fragment is transported per RTP packet. The 471 "Multiple" mode is a mode where possibly more than one AU or AU 472 fragment are transported per RTP packet. The default mode is the 473 "Single" mode. 475 In the "Multiple" mode, AU or AU fragments MUST be in decoding order 476 inside one RTP packet. Decoding order is defined by the relevant 477 codec specification. Decoding order may be different than 478 presentation order, for example for video streams containing B 479 frames. According to the MPEG-4 system model this order is 480 quantified using decoding time stamps (DTS). 482 RTP Packets SHOULD be sent in the decoding order. In case of 483 interleaving the first AU or AU fragment of each RTP packet is used 484 as reference as in the following examples of RTP packets containing 485 interleaved SL packets. 486 This sequence is correct: [0,2,4][1,3,5] 487 This sequence is correct: [0,3,6][1,2][4,5] 488 This sequence is correct: [0,3,6][1,4][2,5] 489 This sequence is prohibited: [0,4,2][1,5,3] 490 This sequence is prohibited: [1,3,5][0,2,4] 491 This sequence is prohibited: [0,3,6][2,5][1,4] 493 In the "Multiple" mode senders MUST make sure that no fields undergo 494 roll over inside one RTP packet. This may limit the number of SL 495 packets inside one RTP packet and, when interleaving, may limit the 496 interleaving period as detailed below. 498 The size and/or number of the payload(s) SHOULD be adjusted such 499 that the resulting RTP packet is not larger than the path-MTU. To 501 Gentric et al. Expires March 2002 9 502 RTP Payload Format for MPEG-4 Streams September 2001 504 handle larger packets, this payload format relies on lower layers 505 for fragmentation, which may not be desirable. 507 3.1 RTP Header Fields Usage 509 Payload Type (PT): The assignment of an RTP payload type for this 510 new packet format is outside the scope of this document, and will 511 not be specified here. It is expected that the RTP profile for a 512 particular class of applications will assign a payload type for this 513 encoding, or if that is not done then a payload type in the dynamic 514 range shall be chosen. 516 Marker (M) bit: The M bit is set to 1 when all AU fragments in the 517 RTP packet are Access Units ends. 519 Specifically the M bit is set to 0 when the RTP packet contains one 520 or more AU fragments that are not Access Unit ends, and the M bit is 521 set to 1 for RTP packets that contain either: 522 . A single complete Access Unit 523 . The last fragment of an Access Unit 524 . Several complete Access Units 525 . Several last fragments of Access Units 526 . A mix of complete Access Units and last fragments of Access Units 528 Therefore for streams where all SL packets are complete Access Units 529 the M bit is 1 for all RTP packets. Note also that in terms of Sync 530 Layer this means that the M bit is related to the accessUnitEndFlag. 532 Extension (X) bit: Defined by the RTP profile used. 534 Sequence Number: The RTP sequence number should be generated by the 535 sender with a constant random offset. 537 Timestamp: Set to the value in the compositionTimeStamp field of the 538 first AU or AU fragment in the RTP packet, if present. 540 If compositionTimeStamp has less than 32 bits length, the RTP 541 timestamp is generated to extend it out to 32 bits. If 542 compositionTimeStamp has more than 32 bits length, the RTP timestamp 543 uses the 32 LSB of it. When using the Sync Layer the resolution of 544 the timestamp (timeStampLength) is available from the SL 545 configuration data and shall be used by receivers to reconstruct 546 compositionTimeStamps with the original bit length. In all other 547 case it is RECOMMENDED to use timeStampLength=32. 549 In case compositionTimeStamp is not present in the current SL 550 packet, but has been present in a previous AU or AU fragmentthe 551 reason is that this is the same Access Unit that has been 552 fragmented, therefore the same timestamp value MUST be taken as RTP 553 timestamp. 555 Gentric et al. Expires March 2002 10 556 RTP Payload Format for MPEG-4 Streams September 2001 558 If compositionTimeStamp is never present in SL packets for this 559 stream, the RTP packetizer SHOULD convey a reading of a local clock 560 at the time the RTP packet is created. 562 In all cases, the sender SHALL always make sure that RTP time stamps 563 are identical only for RTP packets transporting fragments of the 564 same Access Unit. 566 According to RFC1889 [5, Section 5.1] timestamps are recommended to 567 start at a random value for security reasons. However then, a 568 receiver is not in the general case able to reconstruct the original 569 MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for 570 applications where streams from multiple sources are to be 571 synchronized (for example one stream from local storage, another 572 from a streaming server). Therefore the usage of such a random 573 offset SHOULD be avoided. 575 Note that since RTP devices may re-stamp the stream, all time stamps 576 inside of the RTP payload (CTS and DTS in PayloadHeader, OCR in 577 RSLH) MUST be expressed as difference to the RTP time stamp. Since 578 this subtraction may lead to negative values, the offset MUST be 579 encoded as a two's complement signed integer in network octet order. 580 Note these offsets (delta) typically require much fewer bits to be 581 encoded than the original length, which is another justification. 583 When startCompositionTimeStamp is signaled in the SLConfigDescriptor 584 the RTP time stamps MUST start with this value. 586 SSRC, CC and CSRC fields are used as described in RFC 1889 [5]. 588 RTCP SHOULD be used as defined in RFC 1889 [5]. 590 3.2 RTP payload structure 592 The packet payload structure consists of 3 octet-aligned sections. 594 The first section is the Payload Header Section and contains Payload 595 Headers. Each Payload Header contains basic fragmentation and timing 596 information for one AU or AU fragment. The Payload Header structure 597 is described in 3.3. In the "Single" mode this section is empty by 598 default. 600 The second section is the RSLH Section and contains Remaining SL 601 Headers (RSLH). The RSLH structure is described in 3.5. By default 602 this section is empty. 604 The last section (Payload Section) contains the AU or AU fragment 605 codec bit stream fragments. This section is never empty. 607 The Nth Payload Header in the Payload Header Section, the Nth RSLH 608 in the RSLH Section and the Nth AU or AU fragment payload in the 609 Payload Section correspond to the Nth AU or AU fragment transported 610 by the RTP packet. 612 Gentric et al. Expires March 2002 11 613 RTP Payload Format for MPEG-4 Streams September 2001 615 0 1 2 3 616 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 618 |V=2|P|X| CC |M| PT | sequence number | 619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 620 | timestamp | 621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 622 | synchronization source (SSRC) identifier | 623 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 624 : contributing source (CSRC) identifiers : 625 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 626 | | 627 | Payload Header Section (octet aligned) | 628 | | 629 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 630 | | | 631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 632 | | 633 | RSLH Section (octet aligned) | 634 | | 635 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 636 | | | 637 +-+-+-+-+-+-+-+-+ | 638 | | 639 | Payload Section (octet aligned) | 640 | | 641 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 642 | :...OPTIONAL RTP padding | 643 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 645 Figure 3: An RTP packet for MPEG-4 647 3.3 Payload Header Section structure 649 If the Payload Header Section consumes a non-integer number of 650 octets, up to 7 zero-valued padding bits MUST be inserted at the end 651 in order to achieve octet-alignment. This size excludes the padding 652 bits, if any. 654 In the "Single" mode the Payload Header Section consists of a single 655 Payload Header. 657 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 658 | Payload Header (x bits ) : padding bits| 659 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 661 Figure 4: Payload Header Section structure in "Single" mode 663 Gentric et al. Expires March 2002 12 664 RTP Payload Format for MPEG-4 Streams September 2001 666 In the "Multiple" mode the Payload Header section consist of a 2 667 octets field giving the size in bits (in network octet order) of the 668 following block of bit-wise concatenated PayloadHeaders. 670 This size field is absent in the "Single" mode not because it is not 671 needed (which would be a minor gain) but for compatibility with RFC 672 3016. 674 This size field is also absent when the value would always be zero 675 because the Payload Header is always empty, which may happen when a 676 constant payload size in signaled using ConstantSize (see below). 678 0 1 2 3 679 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 680 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 681 | Payload Header section size in bits | | 682 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+-+-+ | 683 | as many bit-wise concatenated Payload Headers | 684 | as AU or AU fragments in this RTP packet | 685 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 686 | : padding bits| 687 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 689 Figure 5: Payload Header Section structure in "Multiple" mode 691 3.4 Payload Header structure 693 The Payload Header content depends on parameters (as described in 694 section 4.1); by default it is empty for the "Single" mode and, 695 except when ConstantSize is signaled, contains at least the 696 PayloadSize field in the "Multiple" mode. 698 When all options are used the Payload Header structure and the 699 relationship with the related parameter is given in table 1. 701 +===========================+=================================+ 702 | Fields of MSLPH | Number of bits (parameters) | 703 +===========================+=================================+ 704 | PayloadSize | SizeLength | 705 +---------------------------+---------------------------------+ 706 | Index | IndexLength | 707 +---------------------------+---------------------------------+ 708 | IndexDelta | IndexDeltaLength | 709 +---------------------------+---------------------------------+ 710 | CTSFlag | 1 If (CTSDeltaLength > 0) | 711 +---------------------------+---------------------------------+ 712 | CTSDelta | CTSDeltaLength If (CTSFlag==1) | 713 +---------------------------+---------------------------------+ 714 | DTSFlag | 1 If (DTSDeltaLength > 0) | 715 +---------------------------+---------------------------------+ 716 | DTSDelta | DTSDeltaLength If (DTSFlag==1) | 718 Gentric et al. Expires March 2002 13 719 RTP Payload Format for MPEG-4 Streams September 2001 721 +---------------------------+---------------------------------+ 723 Table 1: Payload Header fields and parameters giving the sizes 725 In the general case a receiver can only discover the size of a 726 Payload Header by parsing it since for example the presence of 727 CTSDelta is signaled by the value of CTSFlag. 729 3.4.1 Fields of Payload Header 731 PayloadSize: Indicates the size in octets of the associated Payload, 732 which can be found in the Payload Section of the RTP packet. The 733 length in bits of this field is signaled by the SizeLength parameter 734 (see section 4.1). 736 There is an exception to that. In the case that the RTP packet 737 contains only one AU or AU fragment in the "Multiple" mode, the 738 PayloadSize field SHALL contain the size of the entire corresponding 739 Access Unit. There are two reasons, firstly the size of the fragment 740 is not needed when there is only one fragment in the RTP packet, 741 secondly this is useful in order to detect if a full Access Unit has 742 been received after the loss of a packet carrying a M bit set to 1. 744 Index, IndexDelta: encodes the serial number of the associated AU or 745 AU fragment. IndexDelta is useful for interleaving (see section 746 3.8). When transporting a SL stream, Index and IndexDelta SHALL be 747 used to encode the SL Packet Header packetSequenceNumber field. 749 Index is optional and -if present- appears in the first Payload 750 Header of a RTP packet. 752 The length in bits of the Index field is defined by the IndexLength 753 parameter (see section 4.1). 755 IndexDelta is optional and -if present- appears for subsequent (non- 756 first) Payload Headers of a RTP packet. 758 The length in bits of the IndexDelta field is defined by the 759 IndexDeltaLength parameter (see section 4.1). 761 Both Index and IndexDelta MUST be incremented so that 2 consecutive 762 AU or AU fragments SHALL be distinguishable. One exception for Index 763 is described in 3.8.1. 765 If the parameter IndexDeltaLength is defined, non-first AU or AU 766 fragments inside a RTP packet have their serial number encoded as a 767 difference (thus the name IndexDelta). This difference is relative 768 to the previous AU or AU fragment in the RTP packet according to 769 (with i>=0): 770 Serial number(0) = Index(0) 771 Serial number (i+1) = Serial number (i) + IndexDelta(i+1) + 1 773 Gentric et al. Expires March 2002 14 774 RTP Payload Format for MPEG-4 Streams September 2001 776 If the parameter IndexDeltaLength is not defined the default value 777 is zero and then the IndexDelta field is not present for non-first 778 AU or AU fragments. Nevertheless receivers SHALL then apply the 779 above formula with IndexDelta equal to zero. In other words by 780 default the serial number is incremented by 1 for each AU or AU 781 fragment in the RTP packet. 783 CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A 784 value of 1 indicates that the CTSDelta field is present, a value of 785 0 that it is not present. 787 If CTSDeltaLength is not zero, CTSFlag is present in all Payload 788 Headers regardless of whether the AU fragment is an Access Unit 789 start or not. 791 CTSDelta (CTSDeltaLength bits): Specifies the value of the CTS as a 792 2-complement offset (delta) from the timestamp in the RTP header of 793 the RTP packet. The length in bits of each CTSDelta field is 794 specified by the CTSDeltaLength parameter (see section 4.1). 796 The CTSDelta field is present if CTSFlag is 1. 798 For the first Payload Header of each RTP packet CTSFlag is always 0, 799 since the composition time stamp of the first AU or AU fragment in 800 the RTP packet is mapped to the RTP time stamp. When using the Sync 801 Layer the sender MUST remove the compositionTimeStamp from the RSLH. 803 Senders MUST finish assembling a RTP packet for which CTSDelta would 804 roll over since this would prevent the receiver from reconstructing 805 the correct CTS. This can result in sub optimal RTP packets (smaller 806 than the MTU) depending on the MTU, the AU or AU fragment sizes and 807 CTSDeltaLength. 809 DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A 810 value of 1 indicates that DTSDelta is present, a value of 0 that it 811 is not present. 813 If DTSDeltaLength is not zero, DTSFlag is present in all Payload 814 Headers regardless of whether the AU fragment is an Access Unit 815 start or not. When transporting SL streams the receiver needs this 816 flag in order to reconstruct the decodingTimeStampFlag of SL Packet 817 Headers. 819 DTSDelta (DTSDeltaLength bits): encodes (compositionTimeStamp - 820 decodingTimeStamp) for the same AU or AU fragment(always positive). 821 The length in bits of each DTSDelta field is specified by the 822 DTSDeltaLength parameter (see section 4.1). 824 Senders MUST make sure that DTSDeltaLength is large enough to encode 825 the difference between CTS and DTS (otherwise the DTS computed by 826 the receiver would be incorrect). 828 Gentric et al. Expires March 2002 15 829 RTP Payload Format for MPEG-4 Streams September 2001 831 The DTSDelta field appears when DTSFlag is 1. The sender MUST always 832 remove the decodingTimeStamp from the RSLH. 834 If DTSDelta is zero i.e. if decodingTimeStamp equals 835 compositionTimeStamp then DTSFlag MUST be set to 0 and no DTSDelta 836 field SHALL be present. 838 3.5 RSLHSection structure 840 This section is present only when using the Sync Layer, and then, 841 when the rules in the previous section have left remaining fields. 843 This section first consists of a field (RSLHSectionSize) giving the 844 size in bits of the following block of bit-wise concatenated RSLHs 845 (this size does not include padding bits). 847 If the section consumes a non-integer number of octets, up to 7 zero 848 padding bits MUST be inserted at the end in order to achieve octet- 849 alignment. 851 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 852 | RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable | 853 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 854 | number of bits) | 855 | | 856 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 857 | | RSLH (variable number of bits) | 858 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 859 | etc | 860 | as many bit-wise concatenated RSLHs | 861 | as SL Packets in this RTP packet | 862 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 863 | RSLH (variable number of bits) | 864 | +-+-+-+-+-+-+-+ 865 | : padding bits| 866 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 868 Figure 7: RSLHSection structure 870 The length in bits of the RSLHSectionSize field is 871 RSLHSectionSizeLength and is specified with a default value of zero 872 indicating that the whole RSLHSection is absent. Note that for 873 compatibility with RFC 3016 we need to be able to make the 874 RSLHSection disappear completely, including the RSLHSectionSize 875 field. This is the reason why there is such a variable length with a 876 zero default value indicating the absence of the RSLHSectionSize 877 field. 879 +=================================+===============================+ 880 | Fields of RSLHSection | Number of bits | 881 +=================================+===============================+ 882 | RSLHSectionSize | RSLHSectionSizeLength | 884 Gentric et al. Expires March 2002 16 885 RTP Payload Format for MPEG-4 Streams September 2001 887 +---------------------------------+-------------------------------+ 888 | all bit-wise concatenated RSLHs | RSLHSectionSize | 889 +---------------------------------+-------------------------------+ 891 Table 2: Sizes in bits inside RSLHSection 893 Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system 894 awareness, specifically it requires to understand the MPEG-4 895 Sync Layer (SL) syntax and the modifications to this syntax 896 described in the next section. 898 However thanks to the RSLHSectionSize field non-MPEG-4-system 899 receivers CAN skip this part by rounding up RSLPHSize/8 to the next 900 integer number of octets. This means that receivers not implementing 901 the Sync Layer can process streams containing Sync Layer specific 902 items by simply ignoring the parts they would not be able to parse. 904 3.6 RSLH structure 906 RSLH is present only when using the Sync Layer, and then, when the 907 rules in the previous section have left remaining fields. 909 A Remaining SL Packet Header (RSLH) is what remains of an SL header 910 after modifications for mapping into this payload format. 912 The following modifications of the SL Packet Header MUST be applied. 913 The other fields of the SL Packet Header MUST remain unchanged but 914 are bit-shifted to fill in the gaps left by the operations specified 915 below. 917 3.6.1 Removal of fields 919 The following SL Packet Header fields -if present- are removed since 920 they are mapped either in the RTP header or in the corresponding 921 Payload Header: 922 . compositionTimeStampFlag 923 . compositionTimeStamp 924 . decodingTimeStampFlag 925 . decodingTimeStamp 926 . packetSequenceNumber 927 . AccessUnitEndFlag (in "Single" mode only) 929 The AccessUnitEndFlag, when present for a given stream, MUST be 930 removed from every RSLH when using the "Single" mode since it has 931 the same meaning as the Marker bit (and for compatibility with RFC 932 3016). However when using the "Multiple" mode, AccessUnitEndFlag 933 MUST NOT be removed since it is useful to signal individual AU ends. 935 3.6.2 Mapping of OCR 937 Furthermore if the SL Packet header contains an OCR, then this field 938 is encoded in the RSLH as a 2-complement difference (delta) exactly 939 like a compositionTimeStamp or a decodingTimeStamp in the 941 Gentric et al. Expires March 2002 17 942 RTP Payload Format for MPEG-4 Streams September 2001 944 PayloadHeader. The length in bit of this difference is indicated by 945 the OCRDeltaLength parameter (see section 4.1). 947 With this payload format OCRs MUST have the same clock frequency as 948 Time Stamps. 950 If compositionTimeStamp is not present for a SL packet that has OCR 951 then the OCR SHALL be encoded as a difference to the RTP time stamp. 953 3.6.3 Degradation Priority 955 For streams that use the optional degradationPriority field in the 956 SL Packet Headers, only SL packets with the same degradation 957 priority SHALL be transported by one RTP packet so that components 958 may dispatch the RTP packets according to appropriate QoS or 959 protection schemes. Furthermore only the first RSLH of one RTP 960 packet SHALL contain the degradationPriority field since it would be 961 otherwise redundant. 963 3.7 Payload Section structure 965 The Payload Section contains the concatenated AU or AU fragment 966 Payloads. By definition AU or AU fragment Payloads are octet 967 aligned. 969 For efficiency SL packets do not carry their own payload size. This 970 is not an issue for RTP packets that contain a single SL Packet. 971 However in the "Multiple" mode the size of each AU or AU fragment 972 payload MUST be available to the receiver. 974 If the AU or AU fragment payload size is constant for a stream, the 975 size information SHOULD NOT be transported in the RTP packet. 976 However in that case it MUST be signaled using the ConstantSize 977 parameter (see section 4.1). 979 If the AU or AU fragment payload size is variable then the size of 980 each AU or AU fragment payload MUST be indicated in the 981 corresponding Payload Header. In order to do so the Payload Header 982 MUST contain a PayloadSize field. The number of bits on which this 983 PayloadSize field is encoded MUST be indicated using the SizeLength 984 parameter (see section 4.1). 986 The absence of either ConstantSize or SizeLength indicates the 987 "Single" mode i.e. that a single AU or AU fragment is transported in 988 each RTP packet for that stream. 990 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 991 | AU or AU fragment (variable number of octets) | 992 | | 993 | | 994 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 995 | | AU or AU fragment | 997 Gentric et al. Expires March 2002 18 998 RTP Payload Format for MPEG-4 Streams September 2001 1000 +-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1001 | | 1002 | (variable number of octets) | 1003 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1004 | etc | 1005 | as many octet-wise concatenated AU or AU fragment | 1006 | as required to finish RTP packet | 1007 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1009 Figure 8: Payload Section structure 1011 3.8 Interleaving 1013 SL Packets MAY be interleaved. Senders MAY perform interleaving. 1014 Receivers MUST support interleaving. 1016 Note for Sync Layer implementers: the AUSequenceNumber field of the 1017 SL Header MUST NOT be used for interleaving since firstly it may 1018 collide with the Scene Description Carousel usage described in 1019 section 5.2 and secondly it is not visible to receivers that do not 1020 implement the Sync Layer and would skip the RSLH section 1021 transporting AUSequenceNumber. 1023 When interleaving of AU or AU fragments is used it SHALL be 1024 implemented using the IndexDelta fields of the Payload Header. 1025 Senders MUST NOT make RTP packets for which IndexDelta rolls over. 1026 Therefore depending on the interleaving scheme (if any), the MTU and 1027 the AU or AU fragment sizes, senders wishing to make optimally sized 1028 RTP packets (i.e. close to the MTU) will need to set 1029 IndexDeltaLength to a properly large value. 1031 Senders SHALL use non zero values of IndexDeltaLength only for 1032 streams that exhibit interleaving, so that this can be interpreted 1033 by receivers as an indication that interleaving is (maybe) present. 1035 There are, based on this, two ways for a receiver to implement de- 1036 interleaving, using either Index or timestamps. This is signaled 1037 using mime parameters as in the following table, where TSBI and IBI 1038 stand respectively for Time-Stamp-Based-Interleaving (see section 1039 3.8.1) and Index-Based-Interleaving (see section 3.8.2). Note that 1040 the need for two methods arises from two facts: firstly the time 1041 stamp based method is more economical and in basic cases (no 1042 multiple AU fragments, CTS always defined) simpler to implement. 1043 Secondly, unfortunately this method does not always work as 1044 explained below. 1046 ================================================================== 1047 | | IndexDeltaLength = 0 | IndexDeltaLength != 0 | 1048 ------------------------------------------------------------------ 1049 | IndexLength=0 | no interleaving | TSBI | 1050 ------------------------------------------------------------------ 1051 | IndexLength!=0 | no interleaving, | Index=0 | Index!=0 | 1053 Gentric et al. Expires March 2002 19 1054 RTP Payload Format for MPEG-4 Streams September 2001 1056 | | SL.packetSeqNum |------------------------- 1057 | | transport | TSBI | IBI | 1058 ================================================================== 1060 3.8.1 Time stamp based interleaving (TSBI) 1062 The conjunction of RTP time stamp, IndexDelta and CTS may allow a 1063 receiver to un-ambiguously re-order AU or AU fragments based on 1064 their time stamps (CTS). 1066 This is possible and efficient for streams where only complete 1067 Access Units are transported and receivers can always compute the 1068 time stamp of each Access Unit. 1070 In case of Access Units of constant duration (e.g. audio streams) 1071 the explicit presence of CTS in the Payload Header is not even 1072 required; Indeed then we have (i being the index of one AU in one 1073 RTP packet): 1074 CTS(0) = RTP-TS 1075 for (i >= 1): CTS(i) = CTS(i-1) + (IndexDelta(i)+1)*AU-duration 1077 AU-duration, when constant, can be either signaled in SLConfig or be 1078 deduced from the decoder configuration (see the config MIME 1079 parameter). 1081 Senders MUST use either IndexLength=0 or set all Index values in all 1082 packets to zero so that receivers CAN detect this as an indication 1083 that de-interleaving SHOULD be performed using time stamps. 1085 When using the Sync Layer and when interleaving senders MUST use for 1086 SL.timeStampLength values large enough to prevent the CTS from 1087 rolling over more often than a packet loss burst length. Pre- 1088 existing SL streams that do not comply with this requirement cannot 1089 be interleaved using this payload format (or by using 3.8.2) 1091 3.8.2 Index based interleaving (IBI) 1093 The timestamp-based interleaving algorithm described in 3.8.1. does 1094 not work when a CTS cannot always be computed for all AU or AU 1095 fragments (for example after a packet loss); this happens: 1096 . If the AU duration is not constant (SL durationFlag = 0) and CTS 1097 is not signaled (SL useTimeStampsFlag= 0). 1098 . When interleaving AU fragments. 1100 When interleaving, senders of such streams MUST use the index-based 1101 technique described in this section. 1103 The conjunction of RTP sequence number, Index and IndexDelta can 1104 produce a quasi-unique identifier for each AU or AU fragment so that 1105 a receiver can unambiguously reconstruct the original order even in 1106 case of out-of-order packets, packet loss or duplication (see the 1107 pseudo code in 3.4.1 and 5.1). 1109 Gentric et al. Expires March 2002 20 1110 RTP Payload Format for MPEG-4 Streams September 2001 1112 This requires, however, that IndexLength is not too small. For that 1113 reason senders when interleaving in this fashion MUST use for 1114 IndexLength values large enough to prevent Index from rolling over 1115 more often than a typical loss burst loss. Pre-existing SL streams 1116 that do not comply with this requirement (specifically if 1117 SL.packetSeqNumLength is too small) cannot be interleaved using this 1118 payload format (or by using 3.8.1). 1120 Receivers CAN interpret non-zero values in the Index field as an 1121 indication that de-interleaving CAN be performed using Index and 1122 IndexDelta and CANNOT be performed using timestamps. 1124 3.8.3 SL streams that cannot be interleaved 1126 SL streams for which both SL.timeStampLength and 1127 SL.packetSeqNumLength are too small cannot be interleaved with this 1128 payload format. Typically small values would cause a receiver to 1129 drop a large part of the stream in case of packet loss. The actual 1130 minimal value depends on network loss properties and on the expected 1131 quality of service. 1133 3.9 Fragmentation Rules 1135 This section specifies rules for senders in order to prevent media 1136 decoding difficulties at the receiver end. 1138 MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams 1139 and SHOULD be mapped directly into RTP packets of this format with 1140 two exceptions: 1141 - Access Units larger than the MTU 1142 - When using interleaving for better packet loss resilience. 1144 This section gives rules to apply when performing Access Unit 1145 fragmentation. Let us first explain the context before describing 1146 the rules. 1148 Some MPEG-4 codecs define optional syntax for Access Units sub- 1149 entities (fragments) that are independently decodable for error 1150 resilience purposes. Examples are Video Packets for video and Error 1151 Sensitivity Categories (ESC) for audio. This always corresponds to 1152 specific bitstream syntax, which is signaled in the 1153 DecoderSpecificInfo inside the DecoderConfig in SLConfig, and/or 1154 using the corresponding parameters as described in section 4.1. 1155 Thanks to that decoders are aware whether encoders are operating in 1156 such a mode or not (however since this codec configuration is an 1157 opaque data block this is not explicitly signaled by this payload 1158 format). 1160 If not operating in such a mode it is obvious that the decoder has 1161 to skip packets after a loss until an Access Unit start is received. 1162 Similarly decoder implementations that do not implement robust 1163 decoding of Access Units fragments have to discard all packets after 1165 Gentric et al. Expires March 2002 21 1166 RTP Payload Format for MPEG-4 Streams September 2001 1168 a packet loss until an Access Unit start is received. In the same 1169 way decoder implementations that do not implement re-synchronization 1170 at any Access Units start have to discard all packets after a packet 1171 loss until a Random Access Point Access Unit is received. These are 1172 all obvious things that a good implementation would do. 1174 However serious problems would arise for decoder implementations 1175 that try to restart decoding after a packet loss if independently 1176 decodable fragments are signaled (in the decoder configuration) but 1177 the fragments actually received are not independently decodable 1178 because the RTP sender has made RTP packets on different boundaries 1179 than the fragments provided by the encoder (so this issue applies to 1180 the interface between the encoder and the RTP sender and to the RTP 1181 sender component itself), because the decoder has in general no way 1182 to detect such a faulty fragment. 1184 For this reason the following rules must be applied: 1186 In the spirit of ALF this payload format should transport either 1187 complete Access Units or fragments of Access Units that are 1188 independently decodable. Specifically when a given codec has an 1189 independently decodable Access Unit fragments optional syntax this 1190 option SHOULD be used. 1192 Independently decodable Access Units fragments MUST NOT be split 1193 across several RTP packets. 1195 For example an MPEG-4 audio stream encoded using the ESC syntax MUST 1196 NOT split one ESC across 2 RTP packets. 1198 This rule is relaxed when using MPEG-4 Video Packets for two 1199 reasons: firstly Video Packets can be much larger than typical MTU 1200 and secondly all Video Packets start with a specific 1201 resynchronization marker that can be unambiguously detected. 1202 Therefore for video streams using the Video Packet syntax Video 1203 Packets MAY be split across several SL packets although it is 1204 strongly RECOMMENDED to always adapt the Video Packet size to fit 1205 the MTU. However a Video Packet start MUST always be aligned with an 1206 AU fragment start, except when a GOV is present, in which case the 1207 GOV and the first Video Packet of the following VOP MUST be included 1208 in the same SL packet. 1210 4. Types and Names 1212 This section describes the MIME types and names associated with this 1213 payload format. Section 4.1 registers the MIME types, as per RFC 1214 2048. 1216 This format may require additional information about the mapping to 1217 be made available to the receiver. This is done using parameters 1218 described in the next section. The absence of any of these fields is 1219 equivalent to a field set to the default value, which is always 1221 Gentric et al. Expires March 2002 22 1222 RTP Payload Format for MPEG-4 Streams September 2001 1224 zero. The absence of any such parameters resolves into a default 1225 "basic" configuration compatible with RFC3016 for MPEG-4 video. 1227 In the MPEG-4 framework the SL stream configuration information is 1228 carried using the Object Descriptor. For compatibility with 1229 receivers that do not implement the full MPEG-4 system specification 1230 this information MAY also be signaled using parameters described 1231 here. When such information is present both in an Object Descriptor 1232 and as a parameter of this payload format it MUST be exactly the 1233 same. 1235 For transport of MPEG-4 audio and video without the use of MPEG-4 1236 systems, as well as to support non-MPEG-4 system receivers, it is 1237 also possible to transport information on the profile and level of 1238 the stream and on the decoder configuration. This is also described 1239 in the next section. 1241 Finally this MIME type also defines a mode parameter and a profile 1242 parameter that are intended for future derivations of this payload 1243 format. 1245 4.1 MIME type registration 1247 MIME media type name: "video" or "audio" or "application" 1249 "video" SHOULD be used for MPEG-4 Visual streams (i.e. video as 1250 defined in ISO/IEC 14496-2 [2] and/or graphics as defined in ISO/IEC 1251 14496-1 [1]) or MPEG-4 Systems streams that convey information 1252 needed for an audio/visual presentation. 1254 "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or 1255 MPEG-4 Systems streams that convey information needed for an audio 1256 only presentation. 1258 "application" SHOULD be used for MPEG-4 Systems streams 1259 (ISO/IEC14496-1) that serve other purposes than audio/visual 1260 presentation, e.g. in some cases when MPEG-J streams are 1261 transmitted. 1263 MIME subtype name: mpeg4-generic 1265 Required parameters: none 1267 Optional parameters: 1269 mode: 1270 The mode in which this specification is used. This specification 1271 itself defines only the default mode (Mode=default). When the mode 1272 parameter is not present the default mode SHALL be assumed. In the 1273 default mode all parameters are optional and as defined here. Other 1274 modes may be defined as needed in other RFCs. A mode MUST be a 1275 subset of this specification. Specifically when defining a mode care 1276 MUST be taken that an implementation of this specification can 1278 Gentric et al. Expires March 2002 23 1279 RTP Payload Format for MPEG-4 Streams September 2001 1281 decode the payload format corresponding to this new mode. For this 1282 reason a mode MUST NOT specify new default values for MIME 1283 parameters and MIME parameters MUST be present (unless they have the 1284 default value) even if it is redundant in case the mode assigns 1285 fixed values. A mode may define additionally that some MIME 1286 parameters are required instead of optional, that some MIME 1287 parameters have fixed values (or ranges), and that there are rules 1288 restricting the usage (for example forbidding the carriage of 1289 multiple AU fragments in the same RTP packet). 1291 profile: 1292 The meaning of this parameter may be defined by a mode. This is 1293 meant to be used in order to define sub-configurations of a given 1294 mode, for example the maximum delay (and therefore the size of 1295 buffers) induced by the usage of interleaving. Implementations of 1296 this specification can ignore this parameter. 1298 DTSDeltaLength: 1299 The number of bits on which the DTSDelta field is encoded in each 1300 Payload Header. The default value is zero and indicates the absence 1301 of DTSFlag and DTSDelta in the Payload Header (the stream does not 1302 transport decodingTimeStamps). A value larger than zero indicates 1303 that there is a DTSFlag in each Payload Header. Since 1304 decodingTimeStamp, if present, must be encoded as a difference to 1305 the RTP time stamp, the DTSDeltaLength parameter MUST be present in 1306 order to transport decodingTimeStamps with this payload format. 1308 CTSDeltaLength: 1309 The number of bits on which the CTSDelta field is encoded. The 1310 default value is zero and indicates the absence of the CTSFlag and 1311 CTSDelta fields in Payload Header. Non-zero values MUST NOT be 1312 signaled in the "Single" mode. Since compositionTimeStamps, if 1313 present, must be encoded as a difference to the RTP time stamp, the 1314 CTSDeltaLength parameter MUST be present in order to transport 1315 compositionTimeStamps using this payload format (in the "Multiple" 1316 mode). However CTSDeltaLength SHOULD be set to zero (or not 1317 signaled) for streams that have a constant Access Unit duration 1318 (which can be explicitly signaled using the DurationFlag and 1319 AccessUnitDuration field of SLConfigDescriptor). 1321 OCRDeltaLength: 1322 The number of bits on which the OCRDelta field is encoded in RSLH. 1323 The default value is zero and indicates the absence of OCR for this 1324 stream. Since objectClockReference -if present- must be encoded as a 1325 difference to the RTP time stamp, the OCRDeltaLength parameter MUST 1326 be present in order to transport objectClockReferences with this 1327 payload format. 1329 SizeLength: 1330 The number of bits on which the PayloadSize field of a Payload 1331 Header is encoded. The default value is zero and indicates the 1332 "Single" mode (unless ConstantSize is present). Simultaneous 1333 presence of this parameter and ConstantSize is illegal. Either the 1335 Gentric et al. Expires March 2002 24 1336 RTP Payload Format for MPEG-4 Streams September 2001 1338 SizeLength or ConstantSize parameter MUST be present in order to 1339 signal the "Multiple" mode of this payload format. 1341 ConstantSize: 1342 The constant size in octets of each AU or AU fragment Payload for 1343 this stream. The default value is zero and indicates variable AU or 1344 AU fragment Payload size (or the "Single" mode if SizeLength is 1345 absent). Simultaneous presence of this parameter and SizeLength is 1346 illegal. Either the SizeLength or ConstantSize parameter MUST be 1347 present in order to signal the "Multiple" mode of this payload 1348 format. When ConstantSize is present the PayloadSize field of the 1349 Payload Header in the RTP packets MUST NOT be present. 1351 IndexLength: 1352 The number of bits on which the Index is encoded in the first 1353 Payload Header of a RTP packet. The default value is zero and 1354 indicates the absence of Index and IndexDelta for all Payload 1355 Headers. Since SL.packetSequenceNumber -if present- must be mapped 1356 in PayloadHeader, the IndexLength parameter MUST be present in order 1357 to transport SL.packetSequenceNumber with this payload format. 1359 IndexDeltaLength: 1360 The number of bits on which the IndexDelta are encoded in any non- 1361 first Payload Header. The default value is zero and indicates that 1362 the serial number MUST be incremented by one for each AU or AU 1363 fragment in the RTP packet (see section 3.5). IndexDeltaLength 1364 parameter MUST be present when using interleaving with this payload 1365 format. 1367 RSLHSectionSizeLength: 1368 The number of bits that is used to encode the RSLHSectionSize field. 1369 The default value is zero and indicates the absence of the whole 1370 RSLHSection for all RTP packets of this stream. 1372 SLConfigDescriptor: 1373 A base-64 encoding of the SLConfigDescriptor. This SHALL be the 1374 original SLConfigDescriptor and it SHALL be the same as the one 1375 transported by the OD framework, if any. 1377 profile-level-id: 1378 A decimal representation of the MPEG-4 Profile Level indication 1379 value. For audio this parameter indicates which MPEG-4 Audio tool 1380 subsets are applied to encode the audio stream and is defined in 1381 ISO/IEC 14496-1 [1]. For video this parameter indicates which MPEG-4 1382 Visual tool subsets are applied to encode the video stream and is 1383 defined in Table G-1 of ISO/IEC 14496-2 [2]. This parameter MAY be 1384 used in the capability exchange or session setup procedure to 1385 indicate MPEG-4 Profile and Level combination of which the relevant 1386 MPEG-4 media codec is capable. If this parameter is not specified 1387 its default value is 1 (Simple Profile/Level 1) for video (for 1388 compatibility with RFC 3016) and otherwise 0xFE (defined in ISO/IEC 1389 14496-1 [1] as being the generic default value). 1391 Gentric et al. Expires March 2002 25 1392 RTP Payload Format for MPEG-4 Streams September 2001 1394 config: 1395 A hexadecimal representation of an octet string that expresses the 1396 media payload configuration. Configuration data is mapped onto the 1397 octet string in an MSB-first basis. The first bit of the 1398 configuration data SHALL be located at the MSB of the first octet. 1399 In the last octet, zero-valued padding bits, if necessary, shall 1400 follow the configuration data. For audio streams, config is the 1401 audio object type specific decoder configuration data 1402 AudioSpecificConfig() as defined in ISO/IEC 14496-3 [3]. For video 1403 this expresses the MPEG-4 Visual configuration information, as 1404 defined in subclause 6.2.1 Start codes of ISO/IEC14496-2 [2] and the 1405 configuration information indicated by this parameter SHALL be the 1406 same as the configuration information in the corresponding MPEG-4 1407 Visual stream, except for first-half-vbv-occupancy and latter-half- 1408 vbv-occupancy, if it exists, which may vary in the repeated 1409 configuration information inside an MPEG-4 Visual stream (See 6.2.1 1410 Start codes of ISO/IEC14496-2). 1412 StreamType: 1413 The integer value that indicates the type of MPEG-4 stream that is 1414 carried; its coding corresponds to the values of the streamType as 1415 defined for the DecoderConfigDescriptor in ISO/IEC 14496-1. 1417 Encoding considerations: 1418 System bitstreams MUST be generated according to MPEG-4 System 1419 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 1420 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 1421 bitstreams MUST be generated according to MPEG-4 Audio 1422 specifications (ISO/IEC 14496-3). If the Sync Layer is used SL 1423 streams MUST be generated according to MPEG-4 Sync Layer 1424 specifications (ISO/IEC 14496-1 section 10), then in order to read 1425 the RSLH parts of this format the SLConfigDescriptor is required. 1426 These bitstreams are binary data and MUST be encoded for non-binary 1427 transport (for Email, the Base64 encoding is sufficient). This type 1428 is also defined for transfer via RTP. The RTP packets MUST be 1429 packetized according to the RTP payload format defined in RFC . 1432 Security considerations: 1433 As in RFC . 1435 Interoperability considerations: 1436 MPEG-4 provides a large and rich set of tools for the coding of 1437 visual objects. For effective implementation of the standard, 1438 subsets of the MPEG-4 tool sets have been provided for use in 1439 specific applications. These subsets, called 'Profiles', limit the 1440 size of the tool set a decoder is required to implement. In order to 1441 restrict computational complexity, one or more 'Levels' are set for 1442 each Profile. A Profile@Level combination allows: 1443 . a codec builder to implement only the subset of the standard he 1444 needs, while maintaining interoperability with other MPEG-4 devices 1445 included in the same combination, and 1447 Gentric et al. Expires March 2002 26 1448 RTP Payload Format for MPEG-4 Streams September 2001 1450 . checking whether MPEG-4 devices comply with the standard 1451 ('conformance testing'). 1452 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1453 by the parameter "profile-level-id". Interoperability between a 1454 sender and a receiver may be achieved by specifying the parameter 1455 "profile-level-id" in MIME content, or by arranging in the 1456 capability exchange/announcement procedure to set this parameter 1457 mutually to the same value. 1459 Published specification: 1460 The specifications for MPEG-4 streams are presented in ISO/IEC 1461 14469-1, 14469-2, and 14469-3. The RTP payload format is described 1462 in RFC . 1464 Applications that use this media type: 1465 Multimedia streaming and conferencing tools, Internet messaging and 1466 Email applications. 1468 Additional information: none 1470 Magic number(s): none 1472 File extension(s): 1473 None. A file format with the extension .mp4 has been defined for 1474 MPEG-4 content but is not directly correlated with this MIME type 1475 which sole purpose is RTP transport. 1477 Macintosh File Type Code(s): none 1479 Person & email address to contact for further information: 1480 Authors of RFC . 1482 Intended usage: COMMON 1484 Author/Change controller: 1485 Authors of RFC . 1487 4.2 Concatenation of parameters 1489 Multiple parameters SHOULD be expressed as a MIME media type string, 1490 in the form of a semicolon-separated list of parameter=value pairs 1491 (see examples below). 1493 4.3 Usage of SDP 1495 4.3.1 The a=fmtp keyword 1497 It is assumed that one typical way to transport the above-described 1498 parameters associated with this payload format is via an SDP [10] 1499 message for example transported to the client in reply to a RTSP 1500 [13] DESCRIBE message or via SAP [14]. In that case the (a=fmtp) 1501 keyword MUST be used as described in RFC 2327 [10, section 6]. The 1502 syntax being then: 1504 Gentric et al. Expires March 2002 27 1505 RTP Payload Format for MPEG-4 Streams September 2001 1507 a=fmtp: = 1509 4.3.2 SDP example 1511 The following is an example of SDP syntax for the description of a 1512 session containing one MPEG-4 video, one MPEG-4 audio stream and 1513 three MPEG-4 system streams, the first one being BIFS, the second 1514 one OD and the third one IPMP. All are transported using this format 1515 and the AVP profile [12]. Note the usage of some MIME parameters: 1516 all stream display their streamtype; the video stream uses DTS with 1517 DTSDelta encoded on 4 bits; the audio stream uses the "Multiple" 1518 mode with 12 bits to describe the size of each AU or AU fragment 1519 payload. See the Appendix for more examples. 1521 o= .... 1522 I= .... 1523 c=IN IP4 123.234.71.112 1525 m=video 1034 RTP/AVP 97 1526 a=fmtp:97 StreamType=4;DTSDeltaLength=4 1527 a=rtpmap:97 mpeg4-generic 1529 m=audio 1810 RTP/AVP 98 1530 a=fmtp:98 StreamType=5; SizeLength=12; profile-level-id=1; 1531 config=7866E7E6EF 1532 a=rtpmap:98 mpeg4-generic 1534 m=application 1234 RTP/AVP 99 1535 a=rtpmap:99 mpeg4-generic 1536 a=fmtp:99 StreamType=3 1538 m=application 1236 RTP/AVP 99 1539 a=rtpmap:99 mpeg4-generic 1540 a=fmtp:99 StreamType=1 1542 m=application 1238 RTP/AVP 99 1543 a=rtpmap:99 mpeg4-generic 1544 a=fmtp:99 StreamType=7 1546 5. Other issues 1548 5.1 SL packetized stream reconstruction 1550 The purpose of this section is to document how a receiver can 1551 reconstruct a valid SL packetized stream. Since this format directly 1552 transports SL packets this reconstruction is performed by reversing 1553 the payload structure rules (section 3). We explicitly describe here 1554 the most complex transformations. 1556 In the following let (i) be the index of SL packets inside one RTP 1557 packet (starting at zero for each RTP packet), let SLPacketHeader.x 1559 Gentric et al. Expires March 2002 28 1560 RTP Payload Format for MPEG-4 Streams September 2001 1562 denote field x of the reconstructed SL packet header, let 1563 PayloadHeader.x denote field x of the received PayloadHeader, etc. 1565 SLPacketHeader.packetSequenceNumber is restored from 1566 PayloadHeader.Index and PayloadHeader.IndexDelta using: 1568 If ( IndexLength == 0) { // or is absent 1569 if ( SLConfig.packetSeqNumLength == 0 ) { 1570 // this stream does not have SL packet sequence number 1571 } 1572 else { 1573 // illegal, normally the sender MUST map 1574 // SLPacketHeader.packetSequenceNumber in PayloadHeader 1575 // and set a relevant IndexLength value; 1576 // otherwise it is unfortunately impossible for the receiver 1577 // to reconstruct the correct sequence 1578 } 1579 } 1580 else { // IndexLength is not zero 1581 if ( SLConfig.packetSeqNumLength == 0 ) { 1582 // the original SL stream does not have SL packet 1583 // sequence numbers, typically the sender inserted them 1584 // in order to implement interleaving at the RTP level; 1585 // they must be ignored for SL stream reconstruction 1586 } 1587 else { 1588 if (i == 0){ // first SL packet in RTP packet 1589 SLPacketHeader.packetSequenceNumber(0) = 1590 PayloadHeader.Index(0); 1591 } 1592 else { // remaining SL packets 1593 SLPacketHeader.packetSequenceNumber(i+1)= 1594 SLPacketHeader.packetSequenceNumber(i) 1595 + PayloadHeader.IndexDelta(i+1) 1596 +1; 1597 } 1598 } 1600 All time stamps (CTS, DTS, OCR), when present, are restored from the 1601 delta values. Time stamps flags (CTSFlag, DTSFlag) in PayloadHeader 1602 are used to reconstruct respectively the compositionTimeStampFlag 1603 and decodingTimeStampFlag of SLPacketHeader. The function 1604 corrected(x) for the RTP time stamp transformation is the mapping 1605 from 32 bits to SLConfig.timeStampLength, which may be smaller or 1606 larger than 32 bits: 1608 If (timeStampLength < 32 ) { // short SL time stamps 1609 corrected(x) = LSB(x); // only the timeStampLength LSBits of x 1610 } 1611 else If (timeStampLength > 32 ) { // long SL time stamps 1612 corrected(x) = x + m; // start with m=0 1613 if ( x(i) < x(i-1) ) { // 32 bits RTPTS roll over has occurred 1614 { 1616 Gentric et al. Expires March 2002 29 1617 RTP Payload Format for MPEG-4 Streams September 2001 1619 m += 2^32; 1620 } 1621 } 1622 else If (timeStampLength = 32 ) { // recommended value 1623 corrected(x) = x; // direct mapping 1624 } 1626 if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent 1627 // CTS is not transported for this RTP stream 1628 if (i == 0){ // first SL packet in RTP packet 1629 if ( SLConfig.useTimeStamps == 1 ) { 1630 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1631 SLPacketHeader.compositionTimeStampFlag(0) = 1; 1632 SLPacketHeader.compositionTimeStamp(0) = 1633 corrected(RTP TimeStamp); 1634 } 1635 else { 1636 // ignore 1637 } 1638 } 1639 else { 1640 // empty 1641 } 1642 } 1643 else { // non-first SL packets in RTP packet 1644 if ( SLConfig.useTimeStamps == 1 ) { 1645 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1646 SLPacketHeader.compositionTimeStampFlag(i) = 0; 1647 } 1648 else { 1649 // ignore 1650 } 1651 } 1652 else { 1653 // empty 1654 } 1655 } 1656 } 1657 else { // CTSDeltaLength is not zero 1658 // CTS is transported for this stream 1659 if ( SLConfig.useTimeStamps == 1 ) { 1660 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1661 SLPacketHeader.compositionTimeStampFlag(i) = 1662 PayloadHeader.CTSFlag(i); 1663 SLPacketHeader.compositionTimeStamp(i) = 1664 corrected(RTP TimeStamp) + 1665 PayloadHeader.CTSDelta(i); 1666 } 1667 else { 1668 // ignore CTSFlag (which must be zero) 1669 } 1670 else { 1672 Gentric et al. Expires March 2002 30 1673 RTP Payload Format for MPEG-4 Streams September 2001 1675 // this is strange and sub-optimal at best 1676 // a receiver should ignore this 1677 } 1678 } 1680 if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent 1681 // DTS is not transported for this stream 1682 if ( SLConfig.useTimeStamps == 1 ) { 1683 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1684 SLPacketHeader.decodingTimeStampFlag(i) = 0; 1685 } 1686 else { 1687 // ignore 1688 } 1689 } 1690 else { 1691 // empty 1692 } 1693 } 1694 else { 1695 // DTS is transported for this stream 1696 if ( SLConfig.useTimeStamps == 1 ) { 1697 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1698 SLPacketHeader.decodingTimeStampFlag(i) = 1699 PayloadHeader.DTSFlag(i); 1700 SLPacketHeader.decodingTimeStamp(i)= 1701 SLPacketHeader.compositionTimeStamp(i) 1702 - PayloadHeader.DTSDelta(i); // DTS <= CTS always 1703 } 1704 else { 1705 // ignore DTSFlag (which must be zero) 1706 } 1707 } 1708 else { 1709 // this is strange and sub-optimal at best 1710 // a receiver should ignore this 1711 } 1712 } 1714 if ( OCRDeltaLength == 0) { // or OCRDeltaLength is absent 1715 // the RTP stream does not transport any OCR 1716 if ( SLConfig.OCRLenght == 0 ) { 1717 // this stream does not have any OCR 1718 } 1719 else { 1720 // illegal, normally the sender MUST detect 1721 // OCRs, replace them with OCRDelta and set 1722 // a relevant OCRDeltaLength value 1723 } 1724 } 1725 else { 1726 if ( SLConfig.OCRLenght == 0 ) { 1727 // this is strange and sub-optimal at best 1729 Gentric et al. Expires March 2002 31 1730 RTP Payload Format for MPEG-4 Streams September 2001 1732 // a receiver should ignore this 1733 } 1734 else { 1735 SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i); 1736 if ( SLPacketHeader.OCRflag(i) == 1) { 1737 SLPacketHeader.objectClockReference(i) = 1738 corrected(RTP TimeStamp) + RSLH.OCRDelta(i); 1739 } 1740 } 1741 } 1743 In the "Single" mode the AccessUnitEndFlag, if needed, is restored 1744 from the M bit, as follows: 1746 if ( SLConfig.useAccessUnitEndFlag == 0 ) { 1747 // this SL stream does not signal access unit ends 1748 else { 1749 SLPacketHeader.AccessUnitEndFlag = M bit; 1750 } 1752 In the "Multiple" mode the AccessUnitEndFlag is untouched in RSLH. 1754 The other SL packet header fields SHALL remain as found in RSLH. 1756 It is obvious that in the general case the reconstruction of the 1757 original SL packetized stream requires SL-awareness. However this 1758 payload format allows in all cases a receiver that does not know 1759 about the SL syntax to reconstruct the semantic of Elementary 1760 Streams for the following very useful features: 1761 - Packet order (decoding order) 1762 - Access Unit boundaries (using the M bit) 1763 - Access Unit fragments (fragment boundaries using PayloadSize) 1764 - Composition Time Stamps, according to: 1765 compositionTimeStamp(i) = RTP TimeStamp + CTSDelta(i); 1766 - Decoding Time Stamps, according to: 1767 decodingTimeStamp(i) = compositionTimeStamp(i) - DTSDelta(i); 1768 - Packet serial number, according to: 1769 if (i == 0){ // first SL packet in RTP packet 1770 packet serial number(0) = Index(0); 1771 } 1772 else { // remaining SL packets 1773 packet serial number (i+1) = packet serial number (i) 1774 + IndexDelta(i+1) + 1; 1775 } 1777 5.2 Handling of scene description streams 1779 MPEG-4 introduces new stream types as described in section 1 namely 1780 Object Descriptors and BIFS. In the following both OD and BIFS are 1781 discussed on the same basis i.e. as "scene description". 1783 Gentric et al. Expires March 2002 32 1784 RTP Payload Format for MPEG-4 Streams September 2001 1786 Considering scene description as a "stream-able" type of content is 1787 a rather new concept and for that reasons some specific comments are 1788 needed. 1790 Typically scene descriptions are encoded in such a way that 1791 information loss would in the general case cripple the presentation 1792 beyond any hope of repair by the receiver. Still this is well suited 1793 for a number of multimedia applications were the scene is first made 1794 available via reliable channels to the client and then played. This 1795 payload format is not intended for this type of applications for 1796 which download of MPEG-4 interchange (.mp4) files is typical. 1797 However this payload format can also be used. It is then RECOMMENDED 1798 that the RTP packets should be transported using TCP (for example 1799 inside RTSP as described in [13, section 10.12]) or any other 1800 reliable protocol. 1802 On the other hand MPEG-4 has introduced the possibility to 1803 dynamically change the scene description by sending animation 1804 information (changes in parameters) and structural change 1805 information (updates). Since this information has to be sent in a 1806 timely fashion MPEG-4 has defined a number of techniques in order to 1807 encode the scene description in a manner that makes it behave 1808 similarly to other temporal encoding schemes such as audio and 1809 video. This payload format is intended for this usage. 1811 Note that in many cases the application will consist of first the 1812 reliable transmission of a static initial scene followed by the 1813 streaming of animations and updates. For this reason the usage of 1814 this payload format is attractive since it offers a unique solution. 1816 Senders must be aware that suitable schemes should be used when 1817 scene description streams transport sensitive configuration 1818 information. For example in case the RTP packet transporting an OD- 1819 update command would be lost, the corresponding media stream would 1820 not be accessible by the receiver. 1822 Redundancy is a possibility and may either be added by tools 1823 hierarchically higher than this payload format, e.g. by packet based 1824 FEC, re-transmission, or similar tools. In such a case, the general 1825 congestion control principles have to be observed. 1827 Since BIFS and OD streams may be modified during the session with 1828 update commands, there is a need to send both update commands and 1829 full BIFS/OD refresh. For that reason MPEG-4 defines Random Access 1830 Points (RAP) for scene description streams (OD and BIFS) where by 1831 definition a decoder can restart decoding i.e. receives a "full 1832 update" of the scene. This mechanism is called Scene and Object 1833 Description Carousel. The AU Sequence Number field of SL Packet 1834 Header is used to support this behavior at the Sync Layer. When two 1835 access units are sent consecutively with the same AU Sequence 1836 Number, the second one is assumed to be a semantic repetition of the 1837 first. If a receiver starts to listen in the middle of a session or 1838 has detected losses, it can skip all received Access Units until 1840 Gentric et al. Expires March 2002 33 1841 RTP Payload Format for MPEG-4 Streams September 2001 1843 such a RAP. The periodicity of transmission of these RAPs should be 1844 chosen/adjusted depending on the application and the network it is 1845 deployed on; i.e. exactly like Intra-coded frames for video, it is 1846 the responsibility of the sender to make sure the periodicity of 1847 RAPs is suitable. 1849 5.3 Multiplexing 1851 An advanced MPEG-4 session may involve a large number of objects 1852 that may be as many as a few hundred, transporting each ES as an 1853 individual RTP stream may not always be practical. Allocating and 1854 controlling hundreds of destination addresses for each MPEG-4 1855 session may pose insurmountable session administration problems. 1856 The input/output processing overhead at the end-points will be 1857 extremely high also. Additionally, low delay transmission of low 1858 bitrate data streams, e.g. facial animation parameters, results in 1859 extremely high header overheads. 1861 To solve these problems, MPEG-4 data transport requires a 1862 multiplexing scheme that allows selective bundling of several ESs. 1863 This is beyond the scope of the payload format defined here. 1865 The MPEG-4's Flexmux multiplexing scheme may be used for this 1866 purpose and a specific RTP payload format is being developed [11]. 1868 Another approach may be to develop a generic RTP multiplexing scheme 1869 usable for MPEG-4 data. The multiplexing scheme reported in [8] may 1870 be a candidate for this approach. 1872 For MPEG-4 applications, the multiplexing technique needs to address 1873 the following requirements: 1875 i. The ESs multiplexed in one stream can change frequently during a 1876 session. Consequently, the coding type, individual packet size and 1877 temporal relationships between the multiplexed data units must be 1878 handled dynamically. 1880 ii. The multiplexing scheme should have a mechanism to determine the 1881 ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is 1882 not a part of the SL header. 1884 iii. In general, an SL packet does not contain information about its 1885 size. The multiplexing scheme should be able to delineate the 1886 multiplexed packets whose lengths may vary from a few octets to 1887 close to the path-MTU. 1889 5.5 Overlap with RFC 3016 1891 This payload format has been designed to have a (large) overlap with 1892 RFC 3016 [7]. The conditions for this overlap are: 1893 Conditions for RFC 3016: 1894 i. MPEG-4 video elementary streams only 1896 Gentric et al. Expires March 2002 34 1897 RTP Payload Format for MPEG-4 Streams September 2001 1899 ii. There MUST be a single VOP or Video Packet per RTP packet (only 1900 recommended in RFC 3016) 1901 iii. The decoder configuration MUST be signaled out-of-band either 1902 using the Config mime parameter or using the OD framework 1903 Conditions for this payload format: 1904 i. No structural parameters defined (or all set to zero), i.e. 1905 "Single" mode with empty Payload Header and empty RSLH. 1906 ii. Receivers MUST be ready to accept (and ignore) video 1907 configuration headers (e.g. VOSH, VO and VOL) and visual-object- 1908 sequence-end-code transported in-band. 1910 6. Security Considerations 1912 RTP packets using the payload format defined in this specification 1913 are subject to the security considerations discussed in the RTP 1914 specification [5]. This implies that confidentiality of the media 1915 streams is achieved by encryption. Because the data compression used 1916 with this payload format is applied end-to-end, encryption may be 1917 performed on the compressed data so there is no conflict between the 1918 two operations. The packet processing complexity of this payload 1919 type (i.e. excluding media data processing) does not exhibit any 1920 significant non-uniformity in the receiver side to cause a denial- 1921 of-service threat. 1923 However, it is possible to inject non-compliant MPEG streams (Audio, 1924 Video, and Systems) to overload the receiver/decoder's buffers which 1925 might compromise the functionality of the receiver or even crash it. 1926 This is especially true for end-to-end systems like MPEG where the 1927 buffer models are precisely defined. 1929 MPEG-4 Systems supports stream types including commands that are 1930 executed on the terminal like OD commands, BIFS commands, etc. and 1931 programmatic content like MPEG-J (Java(TM) Byte Code) and 1932 ECMAScript. It is possible to use one or more of the above in a 1933 manner non-compliant to MPEG to crash or temporarily make the 1934 receiver unavailable. 1936 Authentication mechanisms can be used to validate of the sender and 1937 the data to prevent security problems due to non-compliant malignant 1938 MPEG-4 streams. 1940 A security model is defined in MPEG-4 Systems streams carrying MPEG- 1941 J access units which comprises Java(TM) classes and objects. MPEG-J 1942 defines a set of Java APIs and a secure execution model. MPEG-J 1943 content can call this set of APIs and Java(TM) methods from a set of 1944 Java packages supported in the receiver within the defined security 1945 model. According to this security model, downloaded byte code is 1946 forbidden to load libraries, define native methods, start programs, 1947 read or write files, or read system properties. 1949 Receivers can implement intelligent filters to validate the buffer 1950 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1952 Gentric et al. Expires March 2002 35 1953 RTP Payload Format for MPEG-4 Streams September 2001 1955 ECMAScript) commands in the streams. However, this can increase the 1956 complexity significantly. 1958 7. Acknowledgements 1960 This document evolved across several years thanks to contributions 1961 from a large number of people since it is based on work within the 1962 IETF AVT working group and various ISO MPEG working groups, 1963 especially the 4-on-IP ad-hoc group. The authors wish to thank 1964 Olivier Avaro, Stephen Casner, Guido Fransceschini, Art Howarth, 1965 Dave Mackie, Dave Singer, and Stephan Wenger for their valuable 1966 comments and support. Attentive readers and early implementers also 1967 found flaws and bugs, thank you all. 1969 8. References 1971 [1] ISO/IEC 14496-1:2001 MPEG-4 Systems 1973 [2] ISO/IEC 14496-2:2001 MPEG-4 Visual 1975 [3] ISO/IEC 14496-3:2001 MPEG-4 Audio 1977 [4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework. 1979 [5] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A 1980 Transport Protocol for Real Time Applications, RFC 1889, Internet 1981 Engineering Task Force, January 1996. 1983 [6] S. Bradner, Key words for use in RFCs to Indicate Requirement 1984 Levels, RFC 2119, Internet Engineering Task Force, March 1997. 1986 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 1987 payload format for MPEG-4 Audio/Visual streams, Internet Engineering 1988 Task Force, RFC 3016. 1990 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed 1991 RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-04.txt, July 1992 2001. 1994 [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over 1995 IP-based Protocols, work in progress, draft-singer-mpeg4-ip-02.txt, 1996 May 2001. 1998 [10] M. Handley, V. Jacobson, SDP: Session Description Protocol, RFC 1999 2327, Internet Engineering Task Force, April 1998. 2001 [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed 2002 Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt, 2003 February 2001. 2005 [12] H. Schulzrinne, RTP Profile for Audio and Video Conferences 2006 with Minimal Control, RFC 1890, Internet Engineering Task Force, 2007 January 1996. 2009 Gentric et al. Expires March 2002 36 2010 RTP Payload Format for MPEG-4 Streams September 2001 2012 [13] H. Schulzrinne, A. Rao, R. Lanphier, Real Time Streaming 2013 Protocol, RFC 2326, Internet Engineering Task Force, April 1998. 2015 [14] M. Handley, C. Perkins, E. Whelan, Session Announcement 2016 Protocol, RFC 2974, Internet Engineering Task Force, October 2000. 2018 9. Authors' Addresses 2020 Andrea Basso 2021 AT&T Labs Research 2022 200 Laurel Avenue 2023 Middletown, NJ 07748 2024 USA 2025 e-mail: basso@research.att.com 2027 M. Reha Civanlar 2028 AT&T Labs - Research 2029 200 Laurel Ave. South, A5 4D04 2030 Middletown, NJ 07748 2031 USA 2032 e-mail: civanlar@research.att.com 2034 Philippe Gentric 2035 Philips Digital Networks, MP4Net 2036 51 rue Carnot 2037 92156 Suresnes 2038 France 2039 e-mail: philippe.gentric@philips.com 2041 Carsten Herpel 2042 THOMSON multimedia 2043 Karl-Wiechert-Allee 74 2044 30625 Hannover 2045 Germany 2046 e-mail: herpelc@thmulti.com 2048 Zvi Lifshitz 2049 Optibase Ltd. 2050 7 Shenkar St. 2051 Herzliya 46120 2052 Israel 2053 e-mail: zvil@optibase.com 2055 Young-kwon Lim 2056 mp4cast (MPEG-4 Internet Broadcasting Solution Consortium) 2057 1001-1 Daechi-Dong Gangnam-Gu 2058 Seoul, 305-333, 2059 Korea 2060 e-mail : young@techway.co.kr 2062 Colin Perkins 2064 Gentric et al. Expires March 2002 37 2065 RTP Payload Format for MPEG-4 Streams September 2001 2067 USC Information Sciences Institute 2068 3811 N. Fairfax Drive suite 200 2069 Arlington, VA 22203 2070 USA 2071 e-mail : csp@isi.edu 2073 Jan van der Meer 2074 Philips Digital Networks 2075 Building WDB-1 2076 Prof Holstlaan 4 2077 5656 AA Eindhoven 2078 Netherlands 2079 e-mail : jan.vandermeer@philips.com 2081 APPENDIX: Examples of usage 2083 This section describes a number of examples of how this payload 2084 format can be used either with or without the Sync Layer. In all 2085 examples however the Sync Layer syntax is given which shows how it 2086 becomes invisible in cases 1,3,4 and 5. 2088 A C++-like syntax called SDL (Syntactic Description Language) 2089 defined in [1, section 14] is used to economically describe MPEG-4 2090 system data structures. 2092 These examples assume that the (a=fmtp) SDP syntax is used to convey 2093 the MIME parameters of the payload format. 2095 Appendix.1 RFC 3016 compatible MPEG-4 Video (no SL) 2097 This is an example of a video stream where the SL is configured to 2098 produce RTP packets compatible with RFC 3016. 2100 SLConfigDescriptor 2102 In this example the SLConfigDescriptor is: 2104 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2105 tag=SLConfigDescrTag { 2106 bit(8) predefined; 2107 if (predefined==0) { 2108 bit(1) useAccessUnitStartFlag; = 0 2109 bit(1) useAccessUnitEndFlag; = 1 2110 bit(1) useRandomAccessPointFlag; = 0 2111 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 2112 bit(1) usePaddingFlag; = 0 2113 bit(1) useTimeStampsFlag; = 0 2115 Gentric et al. Expires March 2002 38 2116 RTP Payload Format for MPEG-4 Streams September 2001 2118 bit(1) useIdleFlag; = 0 2119 bit(1) durationFlag; = 0 2120 bit(32) timeStampResolution; = 0 2121 bit(32) OCRResolution; = 0 2122 bit(8) timeStampLength; = 0 2123 bit(8) OCRLength; = 0 2124 bit(8) AU_Length; = 0 2125 bit(8) instantBitrateLength; = 0 2126 bit(4) degradationPriorityLength; = 0 2127 bit(5) AU_seqNumLength; = 0 2128 bit(5) packetSeqNumLength; = 0 2129 bit(2) reserved=0b11; 2130 } 2131 if (durationFlag) { 2132 bit(32) timeScale; // NOT USED 2133 bit(16) accessUnitDuration; // NOT USED 2134 bit(16) compositionUnitDuration; // NOT USED 2135 } 2136 if (!useTimeStampsFlag) { 2137 bit(timeStampLength) startDecodingTimeStamp; = 0 2138 bit(timeStampLength) startCompositionTimeStamp; = 0 2139 } 2140 } 2142 SL Packet Header structure 2144 With this configuration we have the following SL packet header 2145 structure: 2147 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2148 if (SL.useAccessUnitEndFlag) { 2149 bit(1) accessUnitEndFlag; // 1 bit 2150 } 2151 } 2153 In this case this payload produces RTP packets that are exactly 2154 conformant to RFC 3016 and the SL is reduced to a purely logical 2155 construction that neither sender nor receiver need to implement. 2157 Parameters 2159 This configuration is the default one; no parameters are required. 2161 RTP packet structure 2163 Note that accessUnitEndFlag is mapped to the RTP header M bit. 2165 +=========================================+=============+ 2166 | Field | size | 2167 +=========================================+=============+ 2168 | RTP header | - | 2169 +-----------------------------------------+-------------+ 2170 | Access Unit or AU fragment | 1400 octets | 2172 Gentric et al. Expires March 2002 39 2173 RTP Payload Format for MPEG-4 Streams September 2001 2175 +-----------------------------------------+-------------+ 2177 Overhead 2179 In this example we have an RTP overhead of 40 octets for 1400 octets 2180 of payload i.e. 3 % overhead. 2182 Appendix.2 MPEG-4 Video with SL 2184 Let us consider the case of a 30 frames per second MPEG-4 video 2185 stream which bit rate is high enough that Access Units have to be 2186 split in several SL packets (typically above 300 kb/s). 2188 Let us assume also that the video codec generates in that case Video 2189 Packets suitable to fit in one SL packet i.e that the video codec is 2190 MTU aware and the MTU is 1500 octets. We assume furthermore that 2191 this stream contains B frames and that decodingTimeStamps are 2192 present. 2194 SLConfigDescriptor 2196 In this example the SLConfigDescriptor is: 2198 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2199 tag=SLConfigDescrTag { 2200 bit(8) predefined; 2201 if (predefined==0) { 2202 bit(1) useAccessUnitStartFlag; = 1 2203 bit(1) useAccessUnitEndFlag; = 0 2204 bit(1) useRandomAccessPointFlag; = 1 2205 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 2206 bit(1) usePaddingFlag; = 0 2207 bit(1) useTimeStampsFlag; = 1 2208 bit(1) useIdleFlag; = 0 2209 bit(1) durationFlag; = 0 2210 bit(32) timeStampResolution; = 30 2211 bit(32) OCRResolution; = 0 2212 bit(8) timeStampLength; = 32 2213 bit(8) OCRLength; = 0 2214 bit(8) AU_Length; = 0 2215 bit(8) instantBitrateLength; = 0 2216 bit(4) degradationPriorityLength; = 0 2217 bit(5) AU_seqNumLength; = 0 2218 bit(5) packetSeqNumLength; = 0 2219 bit(2) reserved=0b11; 2220 } 2221 if (durationFlag) { 2222 bit(32) timeScale; // NOT USED 2223 bit(16) accessUnitDuration; // NOT USED 2224 bit(16) compositionUnitDuration; // NOT USED 2225 } 2226 if (!useTimeStampsFlag) { 2227 bit(timeStampLength) startDecodingTimeStamp; // NOT USED 2229 Gentric et al. Expires March 2002 40 2230 RTP Payload Format for MPEG-4 Streams September 2001 2232 bit(timeStampLength) startCompositionTimeStamp; // NOT USED 2233 } 2234 } 2236 The useRandomAccessPointFlag is set so that the 2237 randomAccessPointFlag can indicate that the corresponding SL packet 2238 contains a GOV and the first Video Packet of an Intra coded frame. 2240 SL Packet Header structure 2242 With this configuration we have the following SL packet header 2243 structure: 2245 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2246 bit(1) accessUnitStartFlag; // 1 bit 2247 if (accessUnitStartFlag) { 2248 bit(1) randomAccessPointFlag; // 1 bit 2249 bit(1) decodingTimeStampFlag; // 1 bit 2250 bit(1) compositionTimeStampFlag; // 1 bit 2251 if (decodingTimeStampFlag) { 2252 bit(SL.timeStampLength) decodingTimeStamp; 2253 } 2254 if (compositionTimeStampFlag) { 2255 bit(SL.timeStampLength) compositionTimeStamp; 2256 } 2257 } 2259 Parameters 2261 decodingTimeStamps are encoded on 32 bits, which is much more than 2262 needed for delta. Therefore the sender will use DTSDeltaLength to 2263 signal that only 7 bits are used for the coding of relative DTS in 2264 the RTP packet. 2266 The RSLHSectionSize cannot exceed 4 (bits), which is encoded on 3 2267 bits and signaled by RSLHSectionSizeLength. The resulting 2268 concatenated fmtp line is: 2270 a=fmtp: DTSDeltaLength=7;RSLHSectionSizeLength=3 2272 RTP packet structure 2274 Two cases can occur; for packets that transport first fragments of 2275 Access Units we have: 2277 +=========================================+=============+ 2278 | Field | size | 2279 +=========================================+=============+ 2280 | RTP header | - | 2281 +-----------------------------------------+-------------+ 2282 | DTSFlag = (1) | 1 bit | 2283 +-----------------------------------------+-------------+ 2284 | DTSDelta | 7 bits | 2286 Gentric et al. Expires March 2002 41 2287 RTP Payload Format for MPEG-4 Streams September 2001 2289 +-----------------------------------------+-------------+ 2290 | bits to octet alignment | 0 bits | 2291 +-----------------------------------------+-------------+ 2292 | RSLHSectionSize = (100) | 3 bits | 2293 +-----------------------------------------+-------------+ 2294 | accessUnitStartFlag = (1) | 1 bit | 2295 +-----------------------------------------+-------------+ 2296 | randomAccessPointFlag | 1 bit | 2297 +-----------------------------------------+-------------+ 2298 | decodingTimeStampFlag | 1 bit | 2299 +-----------------------------------------+-------------+ 2300 | compositionTimeStampFlag | 1 bit | 2301 +-----------------------------------------+-------------+ 2302 | bits to octet alignment =(0) | 1 bit | 2303 +-----------------------------------------+-------------+ 2304 | SL packet payload | N octets | 2305 +-----------------------------------------+-------------+ 2307 For packets that transport non-first fragments of Access Units we 2308 have: 2310 +=========================================+=============+ 2311 | Field | size | 2312 +=========================================+=============+ 2313 | RTP header | - | 2314 +-----------------------------------------+-------------+ 2315 | DTSFlag = 0 | 1 bit | 2316 +-----------------------------------------+-------------+ 2317 | bits to octet alignment = (0000000) | 7 bits | 2318 +-----------------------------------------+-------------+ 2319 | RSLHSectionSize = (001) | 3 bits | 2320 +-----------------------------------------+-------------+ 2321 | accessUnitStartFlag = (0) | 1 bit | 2322 +-----------------------------------------+-------------+ 2323 | bits to octet alignment = (0000) | 4 bits | 2324 +-----------------------------------------+-------------+ 2325 | SL packet payload | N octets | 2326 +-----------------------------------------+-------------+ 2328 Overhead estimation 2330 In this example we have a RTP overhead of 40 + 2 octets for 1400 2331 octets of payload i.e. 3 % overhead. 2333 Appendix.3 Low delay MPEG-4 Audio (no SL) 2335 This example is for a low delay audio service. For this reason a 2336 single Access Unit is transported in each RTP packet (in terms of 2337 Sync Layer each SL packet contains a complete Access Unit). 2339 SLConfigDescriptor 2341 Gentric et al. Expires March 2002 42 2342 RTP Payload Format for MPEG-4 Streams September 2001 2344 Since CTS=DTS and Access Unit duration is constant signaling of 2345 MPEG-4 time stamps is not needed (the durationFlag of SLConfig is 2346 set) 2348 We also assume here an audio Object Type for which all Access Units 2349 are Random Access Points, which is signaled using the 2350 hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. 2352 We assume furthermore a mode where the Access Unit size is constant 2353 and equal to 5 octets (which is signaled with AU_Length). 2355 In this example the SLConfigDescriptor is: 2357 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2358 tag=SLConfigDescrTag { 2359 bit(8) predefined; 2360 if (predefined==0) { 2361 bit(1) useAccessUnitStartFlag; = 0 2362 bit(1) useAccessUnitEndFlag; = 0 2363 bit(1) useRandomAccessPointFlag; = 0 2364 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 2365 bit(1) usePaddingFlag; = 0 2366 bit(1) useTimeStampsFlag; = 0 2367 bit(1) useIdleFlag; = 0 2368 bit(1) durationFlag; = 1 // signals constant AU duration 2369 bit(32) timeStampResolution; = 0 2370 bit(32) OCRResolution; = 0 2371 bit(8) timeStampLength; = 0 2372 bit(8) OCRLength; = 0 2373 bit(8) AU_Length; = 5 2374 bit(8) instantBitrateLength; = 0 2375 bit(4) degradationPriorityLength; = 0 2376 bit(5) AU_seqNumLength; = 0 2377 bit(5) packetSeqNumLength; = 0 2378 bit(2) reserved=0b11; 2379 } 2380 if (durationFlag) { 2381 bit(32) timeScale; = 1000 // for milliseconds 2382 bit(16) accessUnitDuration; = 10 // ms 2383 bit(16) compositionUnitDuration; = 10 // ms 2384 } 2385 if (!useTimeStampsFlag) { 2386 bit(timeStampLength) startDecodingTimeStamp; = 0 2387 bit(timeStampLength) startCompositionTimeStamp; = 0 2388 } 2389 } 2391 SL packet header 2393 With this configuration the SL packet header is empty. The Sync 2394 Layer is reduced to a purely logical construction that neither 2395 sender nor receiver need to implement. 2397 Gentric et al. Expires March 2002 43 2398 RTP Payload Format for MPEG-4 Streams September 2001 2400 Parameters 2402 No parameters are required. 2404 RTP packet structure 2406 Note that the RTP header M bit should be always set to 1. 2408 +=========================================+=============+ 2409 | Field | size | 2410 +=========================================+=============+ 2411 | RTP header | - | 2412 +-----------------------------------------+-------------+ 2413 | Access Unit | 5 octets | 2414 +-----------------------------------------+-------------+ 2416 Overhead estimation 2418 The overhead is extremely large i.e. more than 800 %, since 40 2419 octets of headers are required to transport 5 octets of data. Note 2420 however that RTP header compression would work well since time 2421 stamps increments are constant. 2423 Appendix.4 Media delivery MPEG-4 Audio (no SL) 2425 This example is for a media delivery service where delay is not an 2426 issue but efficiency is. In this case several Access Units are 2427 transported in each RTP packet. 2429 SLConfigDescriptor 2431 Similar to previous example. 2433 SL packet header 2435 With this configuration the SL packet header is empty. The Sync 2436 Layer is reduced to a purely logical construction that neither 2437 sender nor receiver need to implement. 2439 Parameters 2441 The absence of RSLHSectionSizeLength indicates that the RSLHSection 2442 is empty. 2444 The size of SL Packets (which are all complete Access Units in this 2445 case) is constant and is indicated with: 2447 a=fmtp: ConstantSize=5 2449 Gentric et al. Expires March 2002 44 2450 RTP Payload Format for MPEG-4 Streams September 2001 2452 This also indicates to the receiver that the Multiple mode will be 2453 used, the 2 octets field that would give the size of the 2454 PayloadHeaderSection is ommited since in this case this field always 2455 contains zero (the PayloadHeaderSection is always empty due to the 2456 absence of any other MIME parameter). 2458 RTP packet structure 2460 Note that the RTP header M bit is always set to 1, which indicates 2461 to the receiver that only complete Access Units are transported. 2463 +=========================================+=============+ 2464 | Field | size | 2465 +=========================================+=============+ 2466 | RTP header | - | 2467 +-----------------------------------------+-------------+ 2468 | Access Unit data | 5 octets | 2469 +-----------------------------------------+-------------+ 2470 | Access Unit data | 5 octets | 2471 +-----------------------------------------+-------------+ 2472 | etc, until MTU is reached | 2473 +-----------------------------------------+-------------+ 2474 | Access Unit data | 5 octets | 2475 +-----------------------------------------+-------------+ 2477 Overhead estimation 2479 The overhead is 3% i.e. minimal. 2481 Appendix.5 AAC with interleaving (no SL) 2483 Let us consider AAC at 128 kb/s where each Access Unit is in the 2484 average 320 octets. Interleaving is applied with a continuous 2485 interleaving scheme (see table below) where 4 Access Units are used 2486 to construct each RTP packet in order to match a MTU of 1500 octets. 2488 IndexDelta is constant and equal to 2 (since +1 is automatically 2489 added); it is encoded on 2 bits. 2491 As explained in section 3.8 this is a time stamp based interleaving 2492 (TSBI) scheme (IndexLength=0); indeed receivers know that each 2493 payload is a complete Access Unit because all RTP packets have the M 2494 bit set to 1 and therefore, since Access Unit duration is constant, 2495 Access Unit timestamps can be computed from RTP timestamps and 2496 IndexDelta values; this can be used for de-interleaving even in case 2497 of losses. 2499 Note that it would also be possible to use IndexLength=2 so as to 2500 maintain a octet alignement in the Payload Header portions; in this 2501 case however the value of these two bits MUST be zero as stated in 2502 3.8.1. 2504 Gentric et al. Expires March 2002 45 2505 RTP Payload Format for MPEG-4 Streams September 2001 2507 +-----------------------------------------------------------------+ 2508 | RTP packet | RTP Timestamp | Aus | IndexDelta | 2509 +-----------------------------------------------------------------+ 2510 | 1 | CTS(AU1) | 1 | - | 2511 +-----------------------------------------------------------------+ 2512 | 2 | CTS(AU2) | 2, 5 | -,2 | 2513 +-----------------------------------------------------------------+ 2514 | 3 | CTS(AU3) | 3, 6, 9 | -,2,2 | 2515 +-----------------------------------------------------------------+ 2516 | 4 | CTS(AU4) | 4, 7,10,13 | -,2,2,2 | 2517 +-----------------------------------------------------------------+ 2518 | 5 | CTS(AU8) | 8,11,14,17 | -,2,2,2 | 2519 +-----------------------------------------------------------------+ 2520 | 6 | CTS(AU12) | 12,15,18,21 | -,2,2,2 | 2521 +-----------------------------------------------------------------+ 2522 | 7 | CTS(AU16) | 16,19,22,25 | -,2,2,2 | 2523 +----------------------------------------------------------------+ 2524 | 8 | CTS(AU20) | 20,23,26,29 | -,2,2,2 | 2525 +-----------------------------------------------------------------+ 2526 | 9 | CTS(AU24) | 24,27,30,33 | -,2,2,2 | 2527 +-----------------------------------------------------------------+ 2528 | 10 | CTS(AU28) | 28,31,34,37 | -,2,2,2 | 2529 +-----------------------------------------------------------------+ 2530 | etc | 2531 +-----------------------------------------------------------------+ 2533 SLConfigDescriptor 2535 Similar to previous example. 2537 SL Packet Header 2539 Similar to previous example (empty). 2541 Parameters 2543 The resulting concatenated fmtp line is: 2545 a=fmtp: SizeLength=9; IndexDeltaLength=2; 2547 RTP packet structure 2549 +=========================================+=============+ 2550 | Field | size | 2551 +=========================================+=============+ 2552 | RTP header | - | 2553 +-----------------------------------------+-------------+ 2554 Payload Header Section 2555 +=========================================+=============+ 2556 | PayloadHeaderSection size = 42 bits | 2 octets | 2557 +-----------------------------------------+-------------+ 2559 Gentric et al. Expires March 2002 46 2560 RTP Payload Format for MPEG-4 Streams September 2001 2562 | PayloadSize | 9 bits | 2563 +-----------------------------------------+-------------+ 2564 | PayloadSize | 9 bits | 2565 +-----------------------------------------+-------------+ 2566 | IndexDelta | 2 bits | 2567 +-----------------------------------------+-------------+ 2568 | PayloadSize | 9 bits | 2569 +-----------------------------------------+-------------+ 2570 | IndexDelta | 2 bits | 2571 +-----------------------------------------+-------------+ 2572 | PayloadSize | 9 bits | 2573 +-----------------------------------------+-------------+ 2574 | IndexDelta | 2 bits | 2575 +-----------------------------------------+-------------+ 2576 | bits to octet alignment = (000000) | 6 bits | 2577 +-----------------------------------------+-------------+ 2578 Payload Section 2579 +=========================================+=============+ 2580 | AAC Access Unit | x octets | 2581 +-----------------------------------------+-------------+ 2582 | AAC Access Unit | x octets | 2583 +-----------------------------------------+-------------+ 2584 | AAC Access Unit | x octets | 2585 +-----------------------------------------+-------------+ 2586 | AAC Access Unit | x octets | 2587 +-----------------------------------------+-------------+ 2589 Overhead estimation 2591 The PayloadHeaderSection is 8 octets; in this example we have 2592 therefore a RTP overhead of 40 + 8 octets for 1400 octets (approx) 2593 of payload i.e. around 4 % overhead. 2595 Appendix.6 AAC with Index-based interleaving and SL 2597 Let us consider AAC around 130 kb/s where each Access Unit is split 2598 in 4 SL packets corresponding to Error Sensitivity Categories (ESC) 2599 of maximum 90 octets for which interleaving is very useful in terms 2600 of error resilience. We thus use an interleaving scheme where 15 SL 2601 Packets (extracted from 15 consecutive Access Units) are used to 2602 construct each RTP packet in order to match a MTU of 1500 octets. 2603 Note that since ESC fragments are not octet aligned we also use the 2604 paddingFlag and paddingBits features of the Sync Layer. The 2605 interleaving sequence is 4 RTP packets and 350 ms long, which is too 2606 long for conferencing but perfectly OK for Internet radio. 2608 Since the sequence contains 60 SL packets, IndexLength is set to 16 2609 bits so as to provide a safe margin in case of long loss bursts. 2610 This will also indicate to the receiver that this is a Index-Based- 2612 Gentric et al. Expires March 2002 47 2613 RTP Payload Format for MPEG-4 Streams September 2001 2615 Interleaving scheme (indeed CTS cannot be computed for SL packets 2616 that are not AU starts). 2618 2 bits are enough for IndexDelta, which is constant and equal to 3 2619 (since +1 is automatically added). 2621 Note that the 4th RTP packet in each sequence has its M bit set to 1 2622 since it contains 15 SL packets transporting the end of 15 2623 consecutive Access Units. 2625 With this scheme a sender (for example upon reception of RTCP 2626 reports indicating high loss rates) can (for example) choose to 2627 duplicate for each interleaving sequence the first RTP packet that 2628 contains the most useful data in terms of ESC or apply other error 2629 protection techniques, with due care to congestion issues. 2631 In this example we will also show several other SL features (OCR, AU 2632 boundary flags, padding, as detailed below). 2634 One feature demonstrated by this example is the degradation 2635 priority. We assume degradation priority can take 4 different 2636 values, mapped to Error Sensitivity Categories, and is encoded on 2 2637 bits. This interleaving scheme makes sure that only SL packets of 2638 identical degradation priorities are grouped in the same RTP packet 2639 (3.6.3) and that only the first RSLH of each RTP packet transports 2640 the degradation priority. 2642 We also assume that for each last SL packet of each RTP packet the 2643 server inserts an OCR. 2645 SLConfigDescriptor 2647 In this example the SLConfigDescriptor is: 2649 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2650 tag=SLConfigDescrTag { 2651 bit(8) predefined; 2652 if (predefined==0) { 2653 bit(1) useAccessUnitStartFlag; = 1 2654 bit(1) useAccessUnitEndFlag; = 1 2655 bit(1) useRandomAccessPointFlag; = 0 2656 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 2657 bit(1) usePaddingFlag; = 1 // we need to signal padding bits 2658 bit(1) useTimeStampsFlag; = 0 2659 bit(1) useIdleFlag; = 0 2660 bit(1) durationFlag; = 1 2661 bit(32) timeStampResolution; = 0 2662 bit(32) OCRResolution; = 30 2663 bit(8) timeStampLength; = 0 2664 bit(8) OCRLength; = 32 2665 bit(8) AU_Length; = 0 2666 bit(8) instantBitrateLength; = 0 2668 Gentric et al. Expires March 2002 48 2669 RTP Payload Format for MPEG-4 Streams September 2001 2671 bit(4) degradationPriorityLength; = 2 2672 bit(5) AU_seqNumLength; = 0 2673 bit(5) packetSeqNumLength; = 6 2674 bit(2) reserved=0b11; 2675 } 2676 if (durationFlag) { 2677 bit(32) timeScale; = 1000// milliseconds 2678 bit(16) accessUnitDuration; = 23.22 // ms 2679 bit(16) compositionUnitDuration; = 23.22 // ms 2680 } 2681 if (!useTimeStampsFlag) { 2682 bit(timeStampLength) startDecodingTimeStamp; = 0 2683 bit(timeStampLength) startCompositionTimeStamp; = 0 2684 } 2685 } 2687 SL Packet Header structure 2689 With this configuration we have the following SL packet header 2690 structure: 2692 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2693 bit(1) accessUnitStartFlag; 2694 bit(1) accessUnitEndFlag; 2695 bit(1) OCRflag; 2696 bit(1) paddingFlag; 2697 if (paddingFlag) bit(3) paddingBits; 2698 bit(SL.packetSeqNumLength) packetSequenceNumber; 2699 bit(1) DegPrioflag; 2700 if (DegPrioflag) { 2701 bit(SL.degradationPriorityLength) degradationPriority;} 2702 if (OCRflag) { 2703 bit(SL.OCRLength) objectClockReference;} 2704 } 2705 } 2707 Parameters 2709 The resulting concatenated fmtp line is: 2711 a=fmtp: SizeLength=7; RSLHSectionSizeLength=8; 2712 IndexLength=16; IndexDeltaLength=2; OCRDeltaLength=16 2714 RTP packet structure 2716 +=========================================+=============+ 2717 | Field | size | 2718 +=========================================+=============+ 2719 | RTP header | - | 2720 +-----------------------------------------+-------------+ 2721 Payload Header Section 2722 +=========================================+=============+ 2723 | Payload Header Section size = 149 bits | 2 octets | 2725 Gentric et al. Expires March 2002 49 2726 RTP Payload Format for MPEG-4 Streams September 2001 2728 +-----------------------------------------+-------------+ 2729 | PayloadSize | 7 bits | 2730 +-----------------------------------------+-------------+ 2731 | Index | 16 bits | 2732 +-----------------------------------------+-------------+ 2733 | PayloadSize | 7 bits | 2734 +-----------------------------------------+-------------+ 2735 | IndexDelta = (11) | 2 bits | 2736 +-----------------------------------------+-------------+ 2737 | etc + 12 times 9 bits | 2738 +-----------------------------------------+-------------+ 2739 | PayloadSize | 7 bits | 2740 +-----------------------------------------+-------------+ 2741 | IndexDelta = (11) | 2 bits | 2742 +-----------------------------------------+-------------+ 2743 | bits to octet alignment = (000) | 3 bits | 2744 +-----------------------------------------+-------------+ 2745 RSLHSection 2746 +=========================================+=============+ 2747 | RSLHSectionSize = (10000111) | 8 bits | 2748 +-----------------------------------------+-------------+ 2749 | accessUnitStartFlag | 1 bit | 2750 +-----------------------------------------+-------------+ 2751 | accessUnitEndFlag | 1 bit | 2752 +-----------------------------------------+-------------+ 2753 | OCRFlag = (0) | 1 bit | 2754 +-----------------------------------------+-------------+ 2755 | paddingFlag = (1) | 1 bit | 2756 +-----------------------------------------+-------------+ 2757 | paddingBits | 3 bits | 2758 +-----------------------------------------+-------------+ 2759 | DegPrioflag = (1) | 1 bit | 2760 +-----------------------------------------+-------------+ 2761 | degradationPriority | 2 bits | 2762 +-----------------------------------------+-------------+ 2763 | accessUnitStartFlag | 1 bit | 2764 +-----------------------------------------+-------------+ 2765 | accessUnitEndFlag | 1 bit | 2766 +-----------------------------------------+-------------+ 2767 | OCRFlag = (0) | 1 bit | 2768 +-----------------------------------------+-------------+ 2769 | paddingFlag = (1) | 1 bit | 2770 +-----------------------------------------+-------------+ 2771 | paddingBits | 3 bits | 2772 +-----------------------------------------+-------------+ 2773 | DegPrioflag = (0) | 1 bit | 2774 +-----------------------------------------+-------------+ 2775 | etc + 12 times 8 bits | 2776 +-----------------------------------------+-------------+ 2777 | accessUnitStartFlag | 1 bit | 2778 +-----------------------------------------+-------------+ 2779 | accessUnitEndFlag | 1 bit | 2780 +-----------------------------------------+-------------+ 2782 Gentric et al. Expires March 2002 50 2783 RTP Payload Format for MPEG-4 Streams September 2001 2785 | OCRFlag = (1) | 1 bit | 2786 +-----------------------------------------+-------------+ 2787 | OCRDelta | 16 bits | 2788 +-----------------------------------------+-------------+ 2789 | paddingFlag = (0) | 1 bit | 2790 +-----------------------------------------+-------------+ 2791 | DegPrioflag = (0) | 1 bit | 2792 +-----------------------------------------+-------------+ 2793 | bits to octet alignment = (000) | 3 bits | 2794 +-----------------------------------------+-------------+ 2795 Payload Section 2796 +=========================================+=============+ 2797 | SL packet payload |max 90 octets| 2798 +-----------------------------------------+-------------+ 2799 | etc + 13 SL packets | 2800 +-----------------------------------------+-------------+ 2801 | SL packet payload |max 90 octets| 2802 +-----------------------------------------+-------------+ 2804 Note that in the above table the last SL packet in the RTP packet 2805 has a payload that is octet-aligned (at the end). When this happens 2806 paddingFlag is set to zero and the paddingBits field is omitted. 2808 Overhead estimation 2810 The PayloadHeaderSection is 19 octets, the RSLHSection is 16 octets; 2811 in this example we have therefore a RTP overhead of 40 + 35 octets 2812 for 1350 octets of payload i.e. around 6 % overhead. 2814 Gentric et al. Expires March 2002 51