idnits 2.17.1 draft-ietf-avt-mpeg4-multisl-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 48 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 206 has weird spacing: '... media unawa...' == Line 693 has weird spacing: '...aLength bits)...' == Line 2329 has weird spacing: '...dicated with:...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 2002) is 8078 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 366, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '7') (Obsoleted by RFC 6416) == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-04 == Outdated reference: A later version (-04) exists of draft-singer-mpeg4-ip-02 -- Possible downref: Normative reference to a draft: ref. '9' ** Obsolete normative reference: RFC 2327 (ref. '10') (Obsoleted by RFC 4566) == Outdated reference: A later version (-03) exists of draft-curet-avt-rtp-mpeg4-flexmux-00 -- Possible downref: Normative reference to a draft: ref. '11' ** Obsolete normative reference: RFC 1890 (ref. '12') (Obsoleted by RFC 3551) ** Obsolete normative reference: RFC 2326 (ref. '13') (Obsoleted by RFC 7826) ** Downref: Normative reference to an Experimental RFC: RFC 2974 (ref. '14') Summary: 11 errors (**), 0 flaws (~~), 10 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Basso-AT&T 3 Internet Draft Civanlar-AT&T 4 Gentric-Philips 5 Herpel-Thomson 6 Lifshitz-Optibase 7 Lim-mp4cast 8 Perkins-ISI 9 Van Der Meer-Philips 10 September 2001 11 Expires March 2002 12 Document: draft-ietf-avt-mpeg4-multisl-02.txt 14 RTP Payload Format for MPEG-4 Streams 16 Status of this Memo 18 This document is an Internet-Draft and is in full conformance with 19 all provisions of Section 10 of RFC2026. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. Internet-Drafts are draft documents valid for a maximum of 25 six months and may be updated, replaced, or obsoleted by other 26 documents at any time. It is inappropriate to use Internet- Drafts 27 as reference material or to cite them other than as "work in 28 progress." 30 This specification is a product of the Audio/Video Transport working 31 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 32 ad hoc group on MPEG-4 over Internet. Comments are solicited and 33 should be addressed to the working group's mailing list at 34 avt@ietf.org and/or the authors. 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 This document contains a MIME type registration form that is 42 intended to be taken as-is and therefore makes reference to this 43 document, using the temporary placeholder: . 45 Abstract 47 This document describes a payload format for transporting MPEG-4 48 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 49 the coding of natural and synthetic audio-visual data. Several 50 services provided by RTP are beneficial for MPEG-4 encoded data 51 transport over the Internet. Additionally, the use of RTP makes it 52 possible to synchronize MPEG-4 data with other real-time data types. 54 Gentric et al. Expires March 2002 1 55 RTP Payload Format for MPEG-4 Streams September 2001 57 1. Introduction 59 MPEG-4 is a recent standard from ISO/IEC for the coding of natural 60 and synthetic audio-visual data in the form of audiovisual objects 61 that are arranged into an audiovisual scene by means of a scene 62 description [1][2][3][4]. This draft specifies an RTP [5] payload 63 format for transporting MPEG-4 encoded data streams. 65 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 66 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 67 this document are to be interpreted as described in RFC 2119 [6]. 69 The benefits of using RTP for MPEG-4 data stream transport include: 71 i. Ability to synchronize MPEG-4 streams with other RTP payloads 73 ii. Monitoring MPEG-4 delivery performance through RTCP 75 iii. Combining MPEG-4 and other real-time data streams received from 76 multiple end-systems into a set of consolidated streams through RTP 77 mixers 79 iv. Converting data types, etc. through the use of RTP translators. 81 1.1 Overview of MPEG-4 End-System Architecture 83 Fig. 1 below shows the layered architecture of a terminal, which 84 implements the complete MPEG-4 systems model. The Compression Layer 85 processes individual audio-visual media streams. The MPEG-4 86 compression schemes are defined in the ISO/IEC specifications 14496- 87 2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve 88 efficient encoding over a bandwidth ranging from several kbps to 89 many Mbps. The audio-visual content compressed by this layer is 90 organized into Elementary Streams (ESs). 91 The MPEG-4 standard specifies MPEG-4 compliant streams. Within the 92 constraint of this compliance the compression layer is unaware of a 93 specific delivery technology, but it can be made to react to the 94 characteristics of a particular delivery layer such as the path-MTU 95 or loss characteristics. Also, some compressors can be designed to 96 be delivery specific for implementation efficiency. In such cases 97 the compressor may work in a non-optimal fashion with delivery 98 technologies that are different than the one it is specifically 99 designed to operate with. 101 The hierarchical relations, location and properties of ESs in a 102 presentation are described by a dynamic set of Object Descriptors 103 (ODs). Each OD groups one or more ES Descriptors referring to a 104 single content item (audio-visual object). Hence, multiple 105 alternative or hierarchical representations of each content item are 106 possible. 108 ODs are themselves conveyed through one or more ESs. A complete set 109 of ODs can be seen as an MPEG-4 resource or session description at a 111 Gentric et al. Expires March 2002 2 112 RTP Payload Format for MPEG-4 Streams September 2001 114 stream level. The resource description may itself be hierarchical, 115 i.e. an ES conveying an OD may describe other ESs conveying other 116 ODs. 118 The session description is accompanied by a dynamic scene 119 description, Binary Format for Scene (BIFS), again conveyed through 120 one or more ESs. At this level, content is identified in terms of 121 audio-visual objects. The spatio-temporal location of each object is 122 defined by BIFS. The audio-visual content of those objects that are 123 synthetic and static are described by BIFS also. Natural and 124 animated synthetic objects may refer to an OD that points to one or 125 more ESs that carries the coded representation of the object or its 126 animation data. 128 By conveying the session (or resource) description as well as the 129 scene (or content composition) description through their own ESs, it 130 is made possible to change portions of the content composition and 131 the number and properties of media streams that carry the audio- 132 visual content separately and dynamically at well known instants in 133 time. 135 One or more initial Scene Description streams and the corresponding 136 OD stream are pointed to by an initial object descriptor (IOD). In 137 this context the IOD needs to be made available to the receivers 138 through some out-of-band means that are out of scope of this payload 139 specification. However in the context of transport on IP networks it 140 is defined in a separate document [9]. Note that for applications 141 that only use audio and/or video this payload format can also be 142 used without IOD and OD streams (decoder configuration is then 143 transported as MIME parameters, see section 4.1). 145 The Compression Layer organizes the ESs in Access Units (AU), the 146 smallest elements that can be attributed individual timestamps. The 147 Access Units concept defines the boundary between media specific 148 processing and delivery specific processing. That is to say 149 transport should not depend on the nature of the media data but only 150 on AU properties. 152 The Sync Layer (SL) that primarily provides the synchronization 153 between streams defines a homogeneous encapsulation of ESs carrying 154 media or control data (ODs, BIFS). Integer or fractional AUs are 155 then encapsulated in SL packets and in the following we will 156 describe this payload format as transporting SL packets, although in 157 many cases SL packet payloads are actually (entire) Access Units 158 payloads i.e. encoded media frames. All consecutive data from one 159 stream is called an SL-packetized stream at this layer. The 160 interface between the compression layer and the SL is called the 161 Elementary Stream Interface (ESI). The ESI is informative i.e. it is 162 extremely useful in order to define concepts and mechanisms but does 163 not have to be implemented. For the same reason this draft describes 164 the transport of SL packets i.e. Access Units or fragments thereof. 165 It is important to note however that a SL stream can be configured 167 Gentric et al. Expires March 2002 3 168 RTP Payload Format for MPEG-4 Streams September 2001 170 so that SL packets are reduced to the media (compressed) data and in 171 that case implementations do not need to be aware of the SL at all. 173 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 174 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 175 media unaware but delivery technology aware. It provides transparent 176 access to and delivery of content irrespective of the technologies 177 used. The interface between the SL and DMIF is called the DMIF 178 Application Interface (DAI). It offers content location independent 179 procedures for establishing MPEG-4 sessions and access to transport 180 channels. The specification of this payload format is considered as 181 a part of the MPEG-4 Delivery Layer. 183 media aware +-----------------------------------------+ 184 delivery unaware | COMPRESSION LAYER | 185 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 186 14496-3 Audio +-----------------------------------------+ 188 Elementary 189 Stream 190 ===================================================Interface 192 (ESI) 193 +-------------------------------------------+ 194 media and | SYNC LAYER | 195 delivery unaware | manages elementary streams, their synch- | 196 14496-1 Systems | ronization and hierarchical relations | 197 +-------------------------------------------+ 199 DMIF 200 Application 201 ====================================================Interface 203 (DAI) 204 +-------------------------------------------+ 205 delivery aware | DELIVERY LAYER | 206 media unaware |provides transparent access to and delivery| 207 14496-6 DMIF | of content irrespective of delivery | 208 | technology | 209 +-------------------------------------------+ 211 Figure 1: Conceptual MPEG-4 terminal architecture 213 1.2 MPEG-4 Elementary Stream Data Packetization 215 The ESs from the encoders are fed into the SL with indications of AU 216 boundaries, random access points, desired composition time and the 217 current time. 219 The Sync Layer fragments the ESs into SL packets, each containing a 220 header that encodes information conveyed through the ESI. If the AU 221 is larger than a SL packet, subsequent packets containing remaining 223 Gentric et al. Expires March 2002 4 224 RTP Payload Format for MPEG-4 Streams September 2001 226 parts of the AU are generated with subset headers until the complete 227 AU is packetized. 229 The syntax of the Sync Layer is configurable and can be adapted to 230 the needs of the stream to be transported. This includes the 231 possibility to select the presence or absence of individual syntax 232 elements as well as configuration of their length in bits. The 233 configuration for each individual stream is conveyed in a 234 SLConfigDescriptor, which is an integral part of the ES Descriptor 235 for this stream. The MPEG-4 SLConfigDescriptor, being configuration 236 information, is not carried by the media stream itself but is rather 237 transported via an ObjectDescriptor Stream encoded using the MPEG-4 238 Object Description framework. This can be done in a separate stream 239 using this payload format (see section 5.2 for details). The 240 SLConfigDescriptor MAY also be transported by other means (for 241 example as a parameter, see section 4.1). Finally streams for which 242 the SL packet headers are completely empty (or fully map into the 243 RTP headers) can also be transported using this payload format; in 244 these cases the Synch Layer can be seen as a purely conceptual 245 construction that does not have to be implemented at all. Since only 246 the knowledge of the decoder configuration is then needed it MAY 247 also be transported as a parameter, as described in section 4.1. 249 2. Analysis of the carriage of MPEG-4 over IP 251 When transporting MPEG-4 audio and video, applications may or may 252 not require the use of MPEG-4 systems. To achieve the highest level 253 of interoperability between all MPEG-4 applications, it is desirable 254 that (a) in both cases the same MPEG-4 transport format can be used 255 and that (b) receivers that have no MPEG-4 system knowledge can 256 easily skip the MPEG-4 system specific information, if any. 258 RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4 259 video, but when using MPEG-4 systems a problem arises from the fact 260 that both RTP and MPEG-4 systems contain a synchronization layer. 261 In particular, the RTP header duplicates some of the information 262 provided in SL packet headers such as the composition timestamps 263 (CTSs) and the marker bit that signals the end of access units. 265 To avoid unnecessary overhead and potential interoperability risks 266 when transporting MPEG-4 systems, it is desirable to remove the 267 redundancy between the SL packet header and the RTP packet header. 268 To be independent on the use of MPEG-4 systems, synchronization can 269 rely on the parameters provided in the RTP header. 271 In case SL headers are used, the redundant fields are removed from 272 the SL header, producing "reduced SL headers". 273 The remaining information from the SL header, if any, is contained 274 inside the RTP packet payload, together with the SL packet payload. 275 The combination of RTP packet headers and reduced SL packet headers 276 can be used to logically map the RTP packets to complete SL packets. 278 Gentric et al. Expires March 2002 5 279 RTP Payload Format for MPEG-4 Streams September 2001 281 Some of the information contained in the reduced SL headers is also 282 useful for transport over RTP when MPEG-4 systems is not used. 284 For that reason the information in the "reduced" SL headers is split 285 into "general useful information" and "MPEG-4 systems only 286 information". 288 The "general useful information" hereinafter called Mapped SL Packet 289 Header (MSLH) is carried by a number of fields configurable using 290 parameters defined in section 4.1; all receivers MUST parse these 291 fields. 293 The "MPEG-4 systems only information", if any, is contained in a 294 reduced SL header, hereinafter called Remaining SL Packet Header 295 (RSLH), also configured using parameters (see section 4.1) and 296 preceded by a length field, so that non-MPEG-4-system devices MAY 297 skip this information. 299 This is depicted in figure 2. 301 <----------SL Packet--------> 303 +---------------------------+ 304 | SL Packet | SL Packet | 305 | Header | Payload | 306 +---------------------------+ 307 | | 308 | | 309 +-------------+----------+---+ | 310 | | | | 311 V V V V 312 +-----------+ +-----------+ +-------------+ +-----------+ 313 |RTP Packet | | Mapped SL | | Remaining SL| | SL Packet | 314 | Header | | Header | | Header | | Payload | 315 +-----------+ +-----------+ +-------------+ +-----------+ 317 <----RTP Packet Payload-------------------> 319 Figure 2: Mapping of SL Packet into RTP packet 321 When the configuration is such that SL packet headers map directly 322 to RTP headers this process of mapping SL packet headers is purely 323 conceptual. For example this RTP payload format has been designed so 324 that it is by default configured to be identical to RFC 3016 for the 325 recommended MPEG-4 video configurations (see section 5.5). Hence 326 receivers that comply with this payload specification can decode 327 such RTP payload without knowledge about the Synch Layer (see the 328 example in Appendix.1). In a similar fashion MPEG-4 audio (see 329 Appendix for examples) can be transported without explicit use of 330 the Synch Layer. 332 Gentric et al. Expires March 2002 6 333 RTP Payload Format for MPEG-4 Streams September 2001 335 3. Payload Format 337 The RTP Payload corresponds to an integer number of SL packets. 339 If multiple SL packets are transported in each RTP packet, they MUST 340 be in decoding order, i.e: 341 i) decodingTimeStamp order, if present 342 ii) packetSequenceNumber order, if present 343 iii) Implicit decoding order in all other cases. 345 The SL Packet Headers are transformed into RSLH with some fields 346 extracted to be mapped in the RTP header and others extracted to be 347 mapped in the corresponding MSLH. The SL Packet Payload is 348 unchanged. 350 This payload format has two modes. The "SingleSL" mode is a mode 351 where a single SL packet is transported per RTP packet. The 352 "MultipleSL" mode is a mode where possibly more than one SL packet 353 are transported per RTP packet. The default mode is the Single-SL 354 mode. The mode can be set to Multiple-SL by adding a non-zero 355 ConstantSize or SizeLength parameter (see section 4.1). 357 RTP Packets SHOULD be sent in the SL stream order (as defined 358 above). In case of interleaving the first SL packet of each RTP 359 packet is used as reference as in the following examples of RTP 360 packets containing interleaved SL packets. 361 This sequence is correct: [0,2,4][1,3,5] 362 This sequence is correct: [0,3,6][1,2][4,5] 363 This sequence is correct: [0,3,6][1,4][2,5] 364 This sequence is prohibited: [0,4,2][1,5,3] 365 This sequence is prohibited: [1,3,5][0,2,4] 366 This sequence is prohibited: [0,3,6][2,5][1,4] 368 In the multiple-SL modes senders MUST make sure that no fields 369 undergo roll over inside one RTP packet. This may limit the number 370 of SL packets inside one RTP packet and, when interleaving, may 371 limit the interleaving period. 373 The size (or number) of the SL packet(s) SHOULD be adjusted such 374 that the resulting RTP packet is not larger than the path-MTU. To 375 handle larger packets, this payload format relies on lower layers 376 for fragmentation, which may not be desirable. 378 3.1 RTP Header Fields Usage 380 Payload Type (PT): The assignment of an RTP payload type for this 381 new packet format is outside the scope of this document, and will 382 not be specified here. It is expected that the RTP profile for a 383 particular class of applications will assign a payload type for this 384 encoding, or if that is not done then a payload type in the dynamic 385 range shall be chosen. 387 Gentric et al. Expires March 2002 7 388 RTP Payload Format for MPEG-4 Streams September 2001 390 Marker (M) bit: The M bit is set to 1 when all SL packets in the RTP 391 packet are Access Units ends i.e. the M bit maps to the Synch Layer 392 accessUnitEndFlag. 394 Specifically the M bit is set to 0 when the RTP packet contains one 395 or more Access Unit fragments that are not Access Unit ends, and the 396 M bit is set to 1 for RTP packets that contain either: 397 . A single complete Access Unit 398 . The last fragment of an Access Unit 399 . Several complete Access Units 400 . Several last fragments of Access Units 401 . A mix of complete Access Units and last fragments of Access Units 403 Therefore for streams where all SL packets are complete Access Units 404 the M bit is 1 for all RTP packets. 406 Extension (X) bit: Defined by the RTP profile used. 408 Sequence Number: The RTP sequence number should be generated by the 409 sender with a constant random offset and does not have to be 410 correlated to any (optional) MPEG-4 SL sequence numbers. 412 Timestamp: Set to the value in the compositionTimeStamp field of the 413 first SL packet in the RTP packet, if present. 415 If compositionTimeStamp has less than 32 bits length, the RTP 416 timestamp is incremented to extend it out to 32 bits. If 417 compositionTimeStamp has more than 32 bits length, the RTP timestamp 418 uses the 32 LSB of it. The resolution of the timestamp 419 (timeStampLength) is available from the SL configuration data and 420 shall be used by receivers to reconstruct compositionTimeStamps with 421 the original bit length. When making SL streams specifically for 422 usage with this payload format it is RECOMMENDED to use 423 timeStampLength=32. 425 In all cases, the sender SHALL always make sure that RTP time stamps 426 are identical only for RTP packets transporting fragments of the 427 same Access Unit. 429 In case compositionTimeStamp is not present in the current SL 430 packet, but has been present in a previous SL packet the reason is 431 that this is the same Access Unit that has been fragmented, 432 therefore the same timestamp value MUST be taken as RTP timestamp. 434 If compositionTimeStamp is never present in SL packets for this 435 stream, the RTP packetizer SHOULD convey a reading of a local clock 436 at the time the RTP packet is created. 438 According to RFC1889 [5, Section 5.1] timestamps are recommended to 439 start at a random value for security reasons. However then, a 440 receiver is not in the general case able to reconstruct the original 441 MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for 442 applications where streams from multiple sources are to be 444 Gentric et al. Expires March 2002 8 445 RTP Payload Format for MPEG-4 Streams September 2001 447 synchronized. Therefore the usage of such a random offset SHOULD be 448 avoided. 450 Note that since RTP devices may re-stamp the stream, all time stamps 451 inside of the RTP payload (CTS and DTS in MSLH, OCR in RSLH) MUST be 452 expressed as difference to the RTP time stamp. Since this 453 subtraction may lead to negative values, the offset MUST be encoded 454 as a two's complement signed integer in network byte order. Note 455 these offsets (delta) typically require much fewer bits to be 456 encoded than the original length, which is another justification. 458 When startCompositionTimeStamp is signaled in the SLConfigDescriptor 459 the RTP time stamps MUST start with this value. 461 SSRC, CC and CSRC fields are used as described in RFC 1889 [5]. 463 RTCP SHOULD be used as defined in RFC 1889 [5]. 465 RTP timestamps in RTCP SR packets: according to the RTP timing 466 model, the RTP timestamp that is carried into an RTCP SR packet is 467 the same as the compositionTimeStamp that would be applied to an RTP 468 packet for data that was sampled at the instant the SR packet is 469 being generated and sent. The RTP timestamp value is calculated from 470 the NTP timestamp for the current time, which also goes in the RTCP 471 SR packet. To perform that calculation, an implementation needs to 472 periodically establish a correspondence between the CTS value of a 473 data packet and the NTP time at which that data was sampled. 475 3.2 RTP payload structure 477 The packet payload structure consists of 3 byte-aligned sections. 479 The first section is the MSLHSection and contains Mapped SL Packet 480 Headers (MSLH). The MSLH structure is described in 3.3. In the 481 Single-SL mode this section is empty by default. 483 The second section is the RSLHSection and contains Remaining SL 484 Headers (RSLH). The RSLH structure is described in 3.5. By default 485 this section is empty. 487 The last section (SLPPSection) contains the SL packet payloads. This 488 section is never empty. 490 The Nth MSLH in the MSLHSection, the Nth RSLH in the RSLHSection and 491 the Nth SL packet payload in the SLPPSection correspond to the Nth 492 SL packet transported by the RTP packet. 494 0 1 2 3 495 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 |V=2|P|X| CC |M| PT | sequence number | 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 500 Gentric et al. Expires March 2002 9 501 RTP Payload Format for MPEG-4 Streams September 2001 503 | timestamp | 504 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 505 | synchronization source (SSRC) identifier | 506 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 507 : contributing source (CSRC) identifiers : 508 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 509 | | 510 | MSLHSection (byte aligned) | 511 | | 512 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 513 | | | 514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 515 | | 516 | RSLHSection (byte aligned) | 517 | | 518 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 519 | | | 520 +-+-+-+-+-+-+-+-+ | 521 | | 522 | SLPPSection (byte aligned) | 523 | | 524 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 525 | :...OPTIONAL RTP padding | 526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 528 Figure 3: An RTP packet for MPEG-4 530 3.3 MSLHSection structure 532 If the MSLHSection consumes a non-integer number of bytes, up to 7 533 zero-valued padding bits MUST be inserted at the end in order to 534 achieve byte-alignment. 536 In the Single-SL mode the MSLHSection consists of a single MSLH. 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 539 | MSLH (x bits ) : padding bits| 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 542 Figure 4: MSLHSection structure in Single-SL mode 544 In the Multiple-SL mode this section consist of a 2 bytes field 545 giving the size in bits (in network byte order) of the following 546 block of bit-wise concatenated MSLHs. 548 This size field is absent in the Single-SL mode not because it is 549 not needed (which would be a minor gain) but for compatibility with 550 RFC 3016. 552 This size field is also absent when the value would always be zero 553 because the MSLH is always empty, which may happen when a constant 554 size in signaled using ConstantSize. 556 Gentric et al. Expires March 2002 10 557 RTP Payload Format for MPEG-4 Streams September 2001 559 0 1 2 3 560 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 561 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 562 | MSLH section size in bits | MSLH | etc | 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 564 | as many bit-wise concatenated MSLHs | 565 | as SL packets in this RTP packet | 566 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 567 | : padding bits| 568 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 570 Figure 5: MSLHSection structure in Multiple-SL mode 572 3.4 MSLH structure 574 The Mapped SL Packet Header content depends on parameters (as 575 described in section 4.1); by default it is empty for the Single-SL 576 mode and, except when ConstantSize is signaled, contains at least 577 the PayloadSize field in the Multiple-SL mode. 579 When all options are used the MSLH structure is given in figure 6. 581 +============================+ 582 |PayloadSize | 583 +----------------------------+ 584 |Index or IndexDelta | 585 +----------------------------+ 586 |CTSFlag | 587 +----------------------------+ 588 |CTSDelta | 589 +----------------------------+ 590 |DTSFlag | 591 +----------------------------+ 592 |DTSDelta | 593 +============================+ 595 Figure 6: Mapped SL Packet Header (MSLH) structure 597 In the general case a receiver can only discover the size of a MSLH 598 by parsing it since for example the presence of CTSDelta is signaled 599 by the value of CTSFlag. 601 3.4.1 Fields of MSLH 603 PayloadSize: Indicates the size in bytes of the associated SL Packet 604 Payload, which can be found in the SLPPSection of the RTP packet. 605 The length in bits of this field is signaled by the SizeLength 606 parameter (see section 4.1). 608 There is an exception to that. In the case that the RTP packet 609 contains only one SL packet in the "Multiple SL mode", the 611 Gentric et al. Expires March 2002 11 612 RTP Payload Format for MPEG-4 Streams September 2001 614 PayloadSize field SHALL contain the size of the entire corresponding 615 Access Unit. There are two reasons, firstly the size of the fragment 616 is not needed when there is only one fragment, secondly this is 617 useful in order to detect that a full Access Unit has been received 618 after the loss of a packet carrying a M bit set to 1. 620 Index, IndexDelta: Encodes the packetSequenceNumber (serial number) 621 of the SL Packet. When making streams specifically for transport 622 with this payload format IndexDelta is useful for interleaving (see 623 section 3.8). Since a mapping of packetSequenceNumber to RTP 624 sequence number is not possible in the Multiple-SL mode there is no 625 requirement for a correspondence. 627 Index is optional and -if present- appears for the first SL packet 628 in a RTP packet. 630 The length in bits of the Index field is defined by the IndexLength 631 parameter (see section 4.1). 633 IndexDelta is optional and -if present- appears for subsequent (non- 634 first) SL packets in a RTP packet. 636 The length in bits of the IndexDelta field is defined by the 637 IndexDeltaLength parameter (see section 4.1). 639 Both Index and IndexDelta MUST be incremented so that 2 different SL 640 packets SHALL NOT have the same packetSequenceNumber. One exception 641 for Index is described in 3.8.1. 643 If the parameter IndexDeltaLength is defined, non-first SL packets 644 inside a RTP packet have their packetSequenceNumber encoded as a 645 difference (thus the name IndexDelta). This difference is relative 646 to the previous SL packet in the RTP packet according to (with 647 i>=0): 648 packetSequenceNumber(0) = Index(0) 649 packetSequenceNumber(i+1) = packetSequenceNumber(i) + 650 IndexDelta(i+1) + 1 652 If the parameter IndexDeltaLength is not defined the default value 653 is zero and then the IndexDelta field is not present for non-first 654 SL packets. Nevertheless receivers SHALL then apply the above 655 formula with IndexDelta equal to zero. In other words by default 656 packetSequenceNumber is incremented by 1 for each SL packet in one 657 RTP packet. 659 CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A 660 value of 1 indicates that the CTSDelta field is present, a value of 661 0 that it is not present. 663 If CTSDeltaLength is not zero, CTSFlag is present in all MSLH 664 regardless of whether the SL packet is an Access Unit start or not. 666 Gentric et al. Expires March 2002 12 667 RTP Payload Format for MPEG-4 Streams September 2001 669 CTSDelta (CTSDeltaLength bits): Specifies the value of the CTS as a 670 2-complement offset (delta) from the timestamp in the RTP header of 671 the RTP packet. The length in bits of each CTSDelta field is 672 specified by the CTSDeltaLength parameter (see section 4.1). 674 The CTSDelta field is present if CTSFlag is 1. 676 For the first MSLH of each RTP packet CTSFlag is always 0, since the 677 composition time stamp of the first SL packet in the RTP packet is 678 mapped to the RTP time stamp. In all cases the sender MUST remove 679 the compositionTimeStamp from the RSLH. 681 Senders MUST NOT assemble RTP packets for which CTSDelta rolls over 682 inside the RTP packet. 684 DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A 685 value of 1 indicates that DTSDelta is present, a value of 0 that it 686 is not present. 688 If DTSDeltaLength is not zero, DTSFlag is present in all MSLH 689 regardless of whether the SL packet is an Access Unit start or not; 690 the receiver needs this flag in order to reconstruct the 691 decodingTimeStampFlag of SL Headers. 693 DTSDelta (DTSDeltaLength bits): encodes (compositionTimeStamp - 694 decodingTimeStamp) for the same SL packet (always positive). The 695 length in bits of each DTSDelta field is specified by the 696 DTSDeltaLength parameter (see section 4.1). 698 Senders MUST NOT assemble RTP packets for which the difference 699 between compositionTimeStamp and decodingTimeStamp cannot be 700 expressed on DTSDeltaLength bits. 702 The DTSDelta field appears when DTSFlag is 1. The sender MUST always 703 remove the decodingTimeStamp from the RSLH. 705 If DTSDelta is zero i.e. if decodingTimeStamp equals 706 compositionTimeStamp then DTSFlag MUST be set to 0 and no DTSDelta 707 field SHALL be present. 709 At the sender side the computation of DTSDelta MUST be performed by 710 taking into account roll over. For example for a SL stream with the 711 following (CTS, DTS) pairs (assuming timeStampLength=3): 712 (4,3), (5,4), (6,5), (7,6), (0,7); DTSDelta for the last pair is 713 logically (1) and not (-7) which would be illegal and could cause 714 receivers implemented following section 5.1 to fail. 716 3.4.2 Relationship between sizes of MSLH fields and parameters 718 The relationship between a Mapped SL Packet Header and the related 719 parameters is as follows: 721 +===========================+=================================+ 723 Gentric et al. Expires March 2002 13 724 RTP Payload Format for MPEG-4 Streams September 2001 726 | Fields of MSLPH | Number of bits (parameters) | 727 +===========================+=================================+ 728 | PayloadSize | SizeLength | 729 +---------------------------+---------------------------------+ 730 | Index | IndexLength | 731 +---------------------------+---------------------------------+ 732 | IndexDelta | IndexDeltaLength | 733 +---------------------------+---------------------------------+ 734 | CTSFlag | 1 If (CTSDeltaLength > 0) | 735 +---------------------------+---------------------------------+ 736 | CTSDelta | CTSDeltaLength If (CTSFlag==1) | 737 +---------------------------+---------------------------------+ 738 | DTSFlag | 1 If (DTSDeltaLength > 0) | 739 +---------------------------+---------------------------------+ 740 | DTSDelta | DTSDeltaLength If (DTSFlag==1) | 741 +---------------------------+---------------------------------+ 743 Table 1: Relationship between MSLH field size and parameters 745 3.5 RSLHSection structure 747 This section consists of a field (RSLHSectionSize) giving the size 748 in bits of the following block of bit-wise concatenated RSLHs. 750 If the section consumes a non-integer number of bytes, up to 7 zero 751 padding bits MUST be inserted at the end in order to achieve byte- 752 alignment. 754 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 755 | RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable | 756 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 757 | number of bits) | 758 | | 759 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 760 | | RSLH (variable number of bits) | 761 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 762 | etc | 763 | as many bit-wise concatenated RSLHs | 764 | as SL Packets in this RTP packet | 765 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 766 | RSLH (variable number of bits) | 767 | +-+-+-+-+-+-+-+ 768 | : padding bits| 769 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 771 Figure 7: RSLHSection structure 773 The length in bits of the RSLHSectionSize field is 774 RSLHSectionSizeLength and is specified with a default value of zero 775 indicating that the whole RSLHSection is absent. Compatibility with 776 RFC 3016 requires that the RSLHSection should be empty, including 777 the RSLHSectionSize field. This is the reason why there is such a 779 Gentric et al. Expires March 2002 14 780 RTP Payload Format for MPEG-4 Streams September 2001 782 variable length with a default value indicating absence of the 783 RSLHSectionSize field. 785 +=================================+===============================+ 786 | Fields of RSLHSection | Number of bits | 787 +=================================+===============================+ 788 | RSLHSectionSize | RSLHSectionSizeLength | 789 +---------------------------------+-------------------------------+ 790 | all bit-wise concatenated RSLHs | RSLHSectionSize | 791 +---------------------------------+-------------------------------+ 793 Table 2: Sizes in bits inside RSLHSection 795 Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system 796 awareness, specifically it requires to understand the MPEG-4 797 Synchronization Layer (SL) syntax and the modifications to this 798 syntax described in the next section. 800 However thanks to the RSLHSectionSize field non-MPEG-4-system 801 receivers MAY skip this part by rounding up RSLPHSize/8 to the next 802 integer number of bytes. 804 3.6 RSLH structure 806 A Remaining SL Packet Header (RSLH) is what remains of an SL header 807 after modifications for mapping into this payload format. 809 The following modifications of the SL packet header MUST be applied. 810 The other fields of the SL packet header MUST remain unchanged but 811 are bit-shifted to fill in the gaps left by the operations specified 812 below. 814 3.6.1 Removal of fields 816 The following SL Packet Header fields -if present- are removed since 817 they are mapped either in the RTP header or in the corresponding 818 MSLH: 819 . compositionTimeStampFlag 820 . compositionTimeStamp 821 . decodingTimeStampFlag 822 . decodingTimeStamp 823 . packetSequenceNumber 824 . AccessUnitEndFlag (in Single-SL mode only) 826 The AccessUnitEndFlag, when present for a given stream, MUST be 827 removed from every RSLH when using the Single-SL mode since it has 828 the same meaning as the Marker bit (and for compatibility with RFC 829 3016). However when using the Multiple-SL mode, AccessUnitEndFlag 830 MUST NOT be removed since it is useful to signal individual AU ends. 832 3.6.2 Mapping of OCR 834 Gentric et al. Expires March 2002 15 835 RTP Payload Format for MPEG-4 Streams September 2001 837 Furthermore if the SL Packet header contains an OCR, then this field 838 is encoded in the RSLH as a 2-complement difference (delta) exactly 839 like a compositionTimeStamp or a decodingTimeStamp in the MSLH. The 840 length in bit of this difference is indicated by the OCRDeltaLength 841 parameter (see section 4.1). 843 With this payload format OCRs MUST have the same clock resolution as 844 Time Stamps. 846 If compositionTimeStamp is not present for a SL packet that has OCR 847 then the OCR SHALL be encoded as a difference to the RTP time stamp. 849 3.6.3 Degradation Priority 851 For streams that use the optional degradationPriority field in the 852 SL Packet Headers, only SL packets with the same degradation 853 priority SHALL be transported by one RTP packet so that components 854 may dispatch the RTP packets according to appropriate QoS or 855 protection schemes. Furthermore only the first RSLH of one RTP 856 packet SHALL contain the degradationPriority field since it would be 857 otherwise redundant. 859 3.7 SLPPSection structure 861 The SLPPSection (SL Packet Payload Section) contains the 862 concatenated SL Packet Payloads. By definition SL Packet Payloads 863 are byte aligned. 865 For efficiency SL packets do not carry their own payload size. This 866 is not an issue for RTP packets that contain a single SL Packet. 868 However in the Multiple-SL mode the size of each SL packet payload 869 MUST be available to the receiver. 871 If the SL packet payload size is constant for a stream, the size 872 information SHOULD NOT be transported in the RTP packet. However in 873 that case it MUST be signaled using the ConstantSize parameter (see 874 section 4.1). 876 If the SL packet payload size is variable then the size of each SL 877 packet payload MUST be indicated in the corresponding MSLH. In order 878 to do so the MSLH MUST contain a PayloadSize field. The number of 879 bits on which this PayloadSize field is encoded MUST be indicated 880 using the SizeLength parameter (see section 4.1). 882 The absence of either ConstantSize or SizeLength indicates the 883 Single-SL mode i.e. that a single SL packet is transported in each 884 RTP packet for that stream. 886 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 887 | SLPP (variable number of bytes) | 888 | | 890 Gentric et al. Expires March 2002 16 891 RTP Payload Format for MPEG-4 Streams September 2001 893 | | 894 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 895 | | SLPP (variable number of bytes) | 896 +-+-+-+-+-+-+-+-+-+-+-+-+-+ | 897 | | 898 | | 899 | | 900 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 901 | etc | 902 | as many byte-wise concatenated SLPPs | 903 | as SL Packets in this RTP packet | 904 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 906 Figure 8: SLPPSection structure 908 3.8 Interleaving 910 SL Packets MAY be interleaved. Senders MAY perform interleaving. 911 Receivers MUST support interleaving. 913 The AUSequenceNumber field of the SL header MUST NOT be used for 914 interleaving since firstly it may collide with the Scene Description 915 Carousel usage described in section 5.2 and secondly it is not 916 visible to non-MPEG-4 system receivers. 918 When interleaving of SL packets is used it SHALL be implemented 919 using the IndexDelta fields of MSLH. Senders MUST use properly large 920 values for IndexDeltaLength, as required by the interleaving 921 algorithm. 923 Senders SHALL use non zero values of IndexDeltaLength only for 924 streams that MAY exhibit interleaving, so that this CAN be 925 interpreted by receivers as an indication that interleaving may be 926 present. 928 There are, based on this, two ways for a receiver to implement de- 929 interleaving, using either Index or timestamps. This is signaled 930 using mime parameters as in the following table, where TSBI and IBI 931 stand respectively for Time-Stamp-Based-Interleaving (see section 932 3.8.1) and Index-Based-Interleaving (see section 3.8.2). 934 ================================================================== 935 | | IndexDeltaLength = 0 | IndexDeltaLength != 0 | 936 ------------------------------------------------------------------ 937 | IndexLength=0 | no interleaving | TSBI | 938 ------------------------------------------------------------------ 939 | IndexLength!=0 | no interleaving, | Index=0 | Index!=0 | 940 | | SL.packetSeqNum |------------------------- 941 | | transport | TSBI | IBI | 942 ================================================================== 944 Gentric et al. Expires March 2002 17 945 RTP Payload Format for MPEG-4 Streams September 2001 947 3.8.1 Time stamp based interleaving 949 The conjunction of RTP time stamp, IndexDelta and CTS may allow a 950 receiver to un-ambiguously re-order SL packets based on their time 951 stamps (CTS). 953 This is possible and efficient for streams where SL packets 954 transport complete Access Units and receivers can always compute the 955 CTS of each Access Unit. 957 In case of Access Units of constant duration (e.g. audio streams) 958 the explicit presence of CTS in MSLH is not even required. 959 Indeed then we have (i being the index of SL packets in one RTP 960 packet): 961 CTS(0) = RTP-TS 962 for (i >= 1): CTS(i) = CTS(i-1) + (IndexDelta(i)+1)*AU-duration 964 AU-duration, when constant, can be either signaled in SLConfig or be 965 deduced from the decoder configuration (see the config MIME 966 parameter). 968 Senders MUST use either IndexLength=0 or set all Index values in all 969 packets to zero so that receivers CAN detect this as an indication 970 that de-interleaving SHOULD be performed using time stamps. 972 In cases where CTS is transported in MSLH senders MUST use properly 973 large values for SL.timeStampLength when interleaving (in order to 974 prevent the CTS from rolling over). Pre-existing SL streams that do 975 not comply with this requirement cannot be interleaved using this 976 payload format (or by using 3.8.2) 978 3.8.2 Index based interleaving 980 If the AU duration is not constant (SLConfigDescriptor.durationFlag 981 = 0) and CTS is not signaled (SLConfigDescriptor.useTimeStampsFlag= 982 0) or SL packets transport AU fragments, then the timestamp-based 983 interleaving algorithm described in 3.8.1. would not work because a 984 CTS cannot always be computed for all SL packets (for example after 985 a packet loss). 987 When interleaving, senders of such streams MUST use the index-based 988 technique described in this section. 990 The conjunction of RTP sequence number, Index and IndexDelta can 991 produce a quasi-unique identifier for each SL packet so that a 992 receiver can unambiguously reconstruct the original order even in 993 case of out-of-order packets, packet loss or duplication (see the 994 pseudo code in 3.4.1 and 5.1). 996 This requires, however, that IndexLength is not too small. For that 997 reason senders MUST use properly large values for IndexLength when 998 interleaving in this fashion. Pre-existing SL streams that do not 999 comply with this requirement (specifically if SL.packetSeqNumLength 1001 Gentric et al. Expires March 2002 18 1002 RTP Payload Format for MPEG-4 Streams September 2001 1004 is too small) cannot be interleaved using this payload format (or by 1005 using 3.8.1). 1007 Receivers CAN interpret non-zero values in the Index field as an 1008 indication that de-interleaving CAN be performed using Index and 1009 IndexDelta and CANNOT be performed using timestamps. 1011 3.8.3 SL streams that cannot be interleaved 1013 SL streams for which both SL.timeStampLength and 1014 SL.packetSeqNumLength are too small cannot be interleaved with this 1015 payload format. 1017 3.9 Fragmentation Rules 1019 This section specifies rules for senders in order to prevent media 1020 decoding difficulties at the receiver end. 1022 MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams 1023 and SHOULD be mapped directly into RTP packets of this format with 1024 two exceptions: 1025 - Access Units larger than the MTU 1026 - When using interleaving for better packet loss resilience. 1028 In all cases Access Unit start MUST be aligned with SL packet start. 1030 This section gives rules to apply when performing Access Unit 1031 fragmentation. 1033 Some MPEG-4 codecs define optional syntax for Access Units sub- 1034 entities (fragments) that are independently decodable for error 1035 resilience purposes. Examples are Video Packets for video and Error 1036 Sensitivity Categories (ESC) for audio. This always corresponds to 1037 specific bitstream syntax, which is signaled in the 1038 DecoderSpecificInfo inside the DecoderConfig in SLConfig, and/or 1039 using the corresponding parameters as described in section 4.1. 1040 Therefore encoders and decoders are both aware whether they are 1041 operating in such a mode or not (however since this codec 1042 configuration is an opaque data block this is not explicitly 1043 signaled by this payload format). 1045 If not operating in such a mode it is obvious that the decoder has 1046 to skip packets after a loss until an Access Unit start is received. 1047 Similarly decoder implementations that do not implement robust 1048 decoding of Access Units fragments have to discard all packets after 1049 a packet loss until an Access Unit start is received. In the same 1050 way decoder implementations that do not implement re-synchronization 1051 at any Access Units start have to discard all packets after a packet 1052 loss until a Random Access Point Access Unit is received. These are 1053 all obvious things that a good implementation would do. 1055 However serious problems would arise for decoder implementations 1056 that try to restart decoding after a packet loss if independently 1058 Gentric et al. Expires March 2002 19 1059 RTP Payload Format for MPEG-4 Streams September 2001 1061 decodable fragments are signaled (in the decoder configuration) but 1062 the fragments actually received are not independently decodable 1063 because the RTP sender has made RTP packets on different boundaries 1064 than the fragments provided by the encoder (so this issue applies to 1065 the interface between the encoder and the RTP sender and to the RTP 1066 sender component itself), because the decoder has in general no way 1067 to detect such a faulty fragment. 1069 For this reason the following rules must apply to SL streams that 1070 are specifically made for transport with this payload format: 1072 SL packets SHOULD be codec-semantic entities in the spirit of ALF 1073 i.e. either complete Access Units or fragments of Access Units that 1074 are independently decodable. Specifically when a given codec has an 1075 independently decodable Access Unit fragments optional syntax this 1076 option SHOULD be used. 1078 Furthermore when streams are generated using independently decodable 1079 Access Units fragments these Access Units fragments MUST be mapped 1080 one-to-one into SL packets. Consequently independently decodable 1081 Access Units fragments MUST NOT be split across several SL packets 1082 and therefore MUST NOT be split across several RTP packets. 1084 For example an MPEG-4 audio stream encoded using the ESC syntax MUST 1085 NOT split one ESC across 2 RTP packets. 1087 This rule is relaxed when using MPEG-4 Video Packets for two 1088 reasons: firstly Video Packets can be much larger than typical MTU 1089 and secondly all Video Packets start with a specific 1090 resynchronization marker that can be unambiguously detected. 1091 Therefore for video streams using the Video Packet syntax Video 1092 Packets MAY be split across several SL packets although it is 1093 strongly RECOMMENDED to always adapt the Video Packet size to fit 1094 the MTU. A Video Packet start MUST always be aligned with a SL 1095 packet start, except when a GOV is present, in which case the GOV 1096 and the first Video Packet of the following VOP MUST be included in 1097 the same SL packet. 1099 4. Types and Names 1101 This section describes the MIME types and names associated with this 1102 payload format. Section 4.1 is intended for registration with IANA 1103 as in RFC 2048. 1105 This format may require additional information about the mapping to 1106 be made available to the receiver. This is done using parameters 1107 described in the next section. The absence of any of these fields is 1108 equivalent to a field set to the default value, which is always 1109 zero. The absence of any such parameters resolves into a default 1110 "basic" configuration compatible with RFC3016 for MPEG-4 video. 1112 In the MPEG-4 framework the SL stream configuration information is 1113 carried using the Object Descriptor. For compatibility with 1115 Gentric et al. Expires March 2002 20 1116 RTP Payload Format for MPEG-4 Streams September 2001 1118 receivers that do not implement the full MPEG-4 system specification 1119 this information MAY also be signaled using parameters described 1120 here. When such information is present both in an Object Descriptor 1121 and as a parameter of this payload format it MUST be exactly the 1122 same. 1124 For transport of MPEG-4 audio and video without the use of MPEG-4 1125 systems, as well as to support non-MPEG-4 system receivers, it is 1126 also possible to transport information on the profile and level of 1127 the stream and on the decoder configuration. This is also described 1128 in the next section. 1130 Finally this MIME type also defines a mode parameter and a profile 1131 parameter that are intended for future derivations of this payload 1132 format. 1134 4.1 MIME type registration 1136 MIME media type name: "video" or "audio" or "application" 1138 "video" SHOULD be used for MPEG-4 Visual streams (i.e. video as 1139 defined in ISO/IEC 14496-2 [2] and/or graphics as defined in ISO/IEC 1140 14496-1 [1]) or MPEG-4 Systems streams that convey information 1141 needed for an audio/visual presentation. 1143 "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or 1144 MPEG-4 Systems streams that convey information needed for an audio 1145 only presentation. 1147 "application" SHOULD be used for MPEG-4 Systems streams 1148 (ISO/IEC14496-1) that serve other purposes than audio/visual 1149 presentation, e.g. in some cases when MPEG-J streams are 1150 transmitted. 1152 MIME subtype name: mpeg4-generic 1154 Required parameters: none 1156 Optional parameters: 1158 Mode: 1159 The mode in which this specification is used. This specification 1160 itself defines only the default mode (Mode=default). When the mode 1161 parameter is not present the default mode SHALL be assumed. In the 1162 default mode all parameters are optional and as defined here. Other 1163 modes may be defined as needed in other RFCs. A mode MUST be a 1164 subset of this specification. Specifically when defining a mode care 1165 MUST be taken that an implementation of this specification can 1166 decode the payload format corresponding to this new mode. For this 1167 reason a mode MUST NOT specify new default values for MIME 1168 parameters and MIME parameters MUST be present (unless they have the 1169 default value) even if it is redundant in case the mode assigns 1170 fixed values. A mode may define additionally that some MIME 1172 Gentric et al. Expires March 2002 21 1173 RTP Payload Format for MPEG-4 Streams September 2001 1175 parameters are required instead of optional, that some MIME 1176 parameters have fixed values (or ranges), and that there are rules 1177 restricting the usage (for example forbidding the carriage of 1178 multiple AU fragments in the same RTP packet). 1180 Profile: 1181 The meaning of this parameter may be defined by a mode. This is 1182 meant to be used in order to define sub-configurations of a given 1183 mode, for example the maximum delay (and therefore the size of 1184 buffers) induced by the usage of interleaving. Implementations of 1185 this specification can ignore this parameter. 1187 DTSDeltaLength: 1188 The number of bits on which the DTSDelta field is encoded in MSLH. 1189 The default value is zero and indicates the absence of DTSFlag and 1190 DTSDelta in MSLH (the stream does not transport decodingTimeStamps). 1191 A value larger than zero indicates that there is a DTSFlag in each 1192 MSLH. Since decodingTimeStamp, if present, must be encoded as a 1193 difference to the RTP time stamp, the DTSDeltaLength parameter MUST 1194 be present in order to transport decodingTimeStamps with this 1195 payload format. 1197 CTSDeltaLength: 1198 The number of bits on which the CTSDelta field is encoded in (non- 1199 first) MSLH. The default value is zero and indicates the absence of 1200 the CTSFlag and CTSDelta fields in MSLH. Non-zero values MUST NOT be 1201 signaled in the Single-SL mode. Since compositionTimeStamps, if 1202 present, must be encoded as a difference to the RTP time stamp, the 1203 CTSDeltaLength parameter MUST be present in order to transport 1204 compositionTimeStamps using this payload format (in the Multiple-SL 1205 mode). However CTSDeltaLength SHOULD be set to zero (or not 1206 signaled) for streams that have a constant Access Unit duration 1207 (which can be explicitly signaled using the DurationFlag and 1208 AccessUnitDuration field of SLConfigDescriptor). 1210 OCRDeltaLength: 1211 The number of bits on which the OCRDelta field is encoded in RSLH. 1212 The default value is zero and indicates the absence of OCR for this 1213 stream. Since objectClockReference -if present- must be encoded as a 1214 difference to the RTP time stamp, the OCRDeltaLength parameter MUST 1215 be present in order to transport objectClockReferences with this 1216 payload format. 1218 SizeLength: 1219 The number of bits on which the PayloadSize field of MSLH is 1220 encoded. The default value is zero and indicates the Single-SL mode 1221 (unless ConstantSize is present). Simultaneous presence of this 1222 parameter and ConstantSize is illegal. Either the SizeLength or 1223 ConstantSize parameter MUST be present in order to signal the 1224 Multiple-SL mode of this payload format. 1226 ConstantSize: 1228 Gentric et al. Expires March 2002 22 1229 RTP Payload Format for MPEG-4 Streams September 2001 1231 The constant size in bytes of each SL Packet Payload for this 1232 stream. The default value is zero and indicates variable SL Packet 1233 Payload size (or the Single-SL mode if SizeLength is absent). 1234 Simultaneous presence of this parameter and SizeLength is illegal. 1235 Either the SizeLength or ConstantSize parameter MUST be present in 1236 order to signal the Multiple-SL mode of this payload format. When 1237 ConstantSize is present the PayloadSize of MSLH in the RTP packets 1238 MUST NOT be present. 1240 IndexLength: 1241 The number of bits on which the Index is encoded in the first MSLH. 1242 The default value is zero and indicates the absence of Index and 1243 IndexDelta for all MSLHs. Since packetSequenceNumber -if present- 1244 must be mapped in MSLH, the IndexLength parameter MUST be present in 1245 order to transport packetSequenceNumber with this payload format. 1247 IndexDeltaLength: 1248 The number of bits on which the IndexDelta are encoded in any non- 1249 first MSLH. The default value is zero and indicates that 1250 packetSequenceNumber MUST be incremented by one for each SL packet 1251 in the RTP packet (see section 3.5). IndexDeltaLength parameter MUST 1252 be present when using interleaving with this payload format. 1254 RSLHSectionSizeLength: 1255 The number of bits that is used to encode the RSLHSectionSize field. 1256 The default value is zero and indicates the absence of the whole 1257 RSLHSection for all RTP packets of this stream. 1259 SLConfigDescriptor: 1260 A base-64 encoding of the SLConfigDescriptor. This SHALL be the 1261 original SLConfigDescriptor and it SHALL be the same as the one 1262 transported by the OD framework, if any. 1264 profile-level-id: 1265 A decimal representation of the MPEG-4 Profile Level indication 1266 value. For audio this parameter indicates which MPEG-4 Audio tool 1267 subsets are applied to encode the audio stream and is defined in 1268 ISO/IEC 14496-1 [1]. For video this parameter indicates which MPEG-4 1269 Visual tool subsets are applied to encode the video stream and is 1270 defined in Table G-1 of ISO/IEC 14496-2 [2]. This parameter MAY be 1271 used in the capability exchange or session setup procedure to 1272 indicate MPEG-4 Profile and Level combination of which the relevant 1273 MPEG-4 media codec is capable. If this parameter is not specified 1274 its default value is 1 (Simple Profile/Level 1) for video (for 1275 compatibility with RFC 3016) and otherwise 0xFE (defined in ISO/IEC 1276 14496-1 [1] as being the generic default value). 1278 Config: 1279 A hexadecimal representation of an octet string that expresses the 1280 media payload configuration. Configuration data is mapped onto the 1281 octet string in an MSB-first basis. The first bit of the 1282 configuration data SHALL be located at the MSB of the first octet. 1283 In the last octet, zero-valued padding bits, if necessary, shall 1285 Gentric et al. Expires March 2002 23 1286 RTP Payload Format for MPEG-4 Streams September 2001 1288 follow the configuration data. For audio streams, config is the 1289 audio object type specific decoder configuration data 1290 AudioSpecificConfig() as defined in ISO/IEC 14496-3 [3]. For video 1291 this expresses the MPEG-4 Visual configuration information, as 1292 defined in subclause 6.2.1 Start codes of ISO/IEC14496-2 [2] and the 1293 configuration information indicated by this parameter SHALL be the 1294 same as the configuration information in the corresponding MPEG-4 1295 Visual stream, except for first-half-vbv-occupancy and latter-half- 1296 vbv-occupancy, if it exists, which may vary in the repeated 1297 configuration information inside an MPEG-4 Visual stream (See 6.2.1 1298 Start codes of ISO/IEC14496-2). 1300 StreamType: 1301 The integer value that indicates the type of MPEG-4 stream that is 1302 carried; its coding corresponds to the values of the streamType as 1303 defined for the DecoderConfigDescriptor in ISO/IEC 14496-1. 1305 Encoding considerations: 1306 System bitstreams MUST be generated according to MPEG-4 System 1307 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 1308 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 1309 bitstreams MUST be generated according to MPEG-4 Audio 1310 specifications (ISO/IEC 14496-3). All SL streams MUST be generated 1311 according to MPEG-4 Sync Layer specifications (ISO/IEC 14496-1 1312 section 10), in order to read this format the SLConfigDescriptor may 1313 be required. These bitstreams are binary data and MUST be encoded 1314 for non-binary transport (for Email, the Base64 encoding is 1315 sufficient). This type is also defined for transfer via RTP. The 1316 RTP packets MUST be packetized according to the RTP payload format 1317 defined in RFC . 1319 Security considerations: 1320 As in RFC . 1322 Interoperability considerations: 1323 MPEG-4 provides a large and rich set of tools for the coding of 1324 visual objects. For effective implementation of the standard, 1325 subsets of the MPEG-4 tool sets have been provided for use in 1326 specific applications. These subsets, called 'Profiles', limit the 1327 size of the tool set a decoder is required to implement. In order to 1328 restrict computational complexity, one or more 'Levels' are set for 1329 each Profile. A Profile@Level combination allows: 1330 . a codec builder to implement only the subset of the standard he 1331 needs, while maintaining interoperability with other MPEG-4 devices 1332 included in the same combination, and 1333 . checking whether MPEG-4 devices comply with the standard 1334 ('conformance testing'). 1335 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1336 by the parameter "profile-level-id". Interoperability between a 1337 sender and a receiver may be achieved by specifying the parameter 1338 "profile-level-id" in MIME content, or by arranging in the 1339 capability exchange/announcement procedure to set this parameter 1340 mutually to the same value. 1342 Gentric et al. Expires March 2002 24 1343 RTP Payload Format for MPEG-4 Streams September 2001 1345 Published specification: 1346 The specifications for MPEG-4 streams are presented in ISO/IEC 1347 14469-1, 14469-2, and 14469-3. The RTP payload format is described 1348 in RFC . 1350 Applications that use this media type: 1351 Multimedia streaming and conferencing tools, Internet messaging and 1352 Email applications. 1354 Additional information: none 1356 Magic number(s): none 1358 File extension(s): 1359 None. A file format with the extension .mp4 has been defined for 1360 MPEG-4 content but is not directly correlated with this MIME type 1361 which sole purpose is RTP transport. 1363 Macintosh File Type Code(s): none 1365 Person & email address to contact for further information: 1366 Authors of RFC . 1368 Intended usage: COMMON 1370 Author/Change controller: 1371 Authors of RFC . 1373 4.2 Concatenation of parameters 1375 Multiple parameters SHOULD be expressed as a MIME media type string, 1376 in the form of a semicolon-separated list of parameter=value pairs 1377 (see examples below). 1379 4.3 Usage of SDP 1381 4.3.1 The a=fmtp keyword 1383 It is assumed that one typical way to transport the above-described 1384 parameters associated with this payload format is via an SDP [10] 1385 message for example transported to the client in reply to a RTSP 1386 [13] DESCRIBE message or via SAP [14]. In that case the (a=fmtp) 1387 keyword MUST be used as described in RFC 2327 [10, section 6]. The 1388 syntax being then: 1390 a=fmtp: = 1392 4.3.2 SDP example 1394 The following is an example of SDP syntax for the description of a 1395 session containing one MPEG-4 video, one MPEG-4 audio stream and 1396 three MPEG-4 system streams, the first one being BIFS, the second 1398 Gentric et al. Expires March 2002 25 1399 RTP Payload Format for MPEG-4 Streams September 2001 1401 one OD and the third one IPMP. All are transported using this format 1402 and the AVP profile [12]. Note the usage of some MIME parameters: 1403 all stream display their streamtype; the video stream uses DTS with 1404 DTSDelta encoded on 4 bits; the audio stream uses the multiple-SL 1405 mode with 12 bits to describe the size of each SL packet payload. 1406 See the Appendix for more examples. 1408 o= .... 1409 I= .... 1410 c=IN IP4 123.234.71.112 1412 m=video 1034 RTP/AVP 97 1413 a=fmtp:97 StreamType=4;DTSDeltaLength=4 1414 a=rtpmap:97 mpeg4-generic 1416 m=audio 1810 RTP/AVP 98 1417 a=fmtp:98 StreamType=5; SizeLength=12; profile-level-id=1; 1418 config=7866E7E6EF 1419 a=rtpmap:98 mpeg4-generic 1421 m=application 1234 RTP/AVP 99 1422 a=rtpmap:99 mpeg4-generic 1423 a=fmtp:99 StreamType=3; 1425 m=application 1236 RTP/AVP 99 1426 a=rtpmap:99 mpeg4-generic 1427 a=fmtp:99 StreamType=1; 1429 m=application 1238 RTP/AVP 99 1430 a=rtpmap:99 mpeg4-generic 1431 a=fmtp:99 StreamType=7; 1433 5. Other issues 1435 5.1 SL packetized stream reconstruction 1437 The purpose of this section is to document how a receiver can 1438 reconstruct a valid SL packetized stream. Since this format directly 1439 transports SL packets this reconstruction is performed by reversing 1440 the payload structure rules (section 3). We explicitly describe here 1441 the most complex transformations. 1443 In the following let (i) be the index of SL packets inside one RTP 1444 packet (starting at zero for each RTP packet), let SLPacketHeader.x 1445 denote field x of the reconstructed SL packet header, let MSLH.x 1446 denote field x of the received MSLH, etc. 1448 SLPacketHeader.packetSequenceNumber is restored from MSLH.Index and 1449 MSLH.IndexDelta using: 1451 If ( IndexLength == 0) { // or is absent 1452 if ( SLConfig.packetSeqNumLength == 0 ) { 1453 // this stream does not have SL packet sequence number 1455 Gentric et al. Expires March 2002 26 1456 RTP Payload Format for MPEG-4 Streams September 2001 1458 } 1459 else { 1460 // illegal, normally the sender MUST map 1461 // SLPacketHeader.packetSequenceNumber in MSLH 1462 // and set a relevant IndexLength value; 1463 // otherwise it is unfortunately impossible for the receiver 1464 // to reconstruct the correct sequence 1465 } 1466 } 1467 else { // IndexLength is not zero 1468 if ( SLConfig.packetSeqNumLength == 0 ) { 1469 // the original SL stream does not have SL packet 1470 // sequence numbers, typically the sender inserted them 1471 // in order to implement interleaving at the RTP level; 1472 // they must be ignored for SL stream reconstruction 1473 } 1474 else { 1475 if (i == 0){ // first SL packet in RTP packet 1476 SLPacketHeader.packetSequenceNumber(0) = MSLH.Index(0); 1477 } 1478 else { // remaining SL packets 1479 SLPacketHeader.packetSequenceNumber(i+1)= 1480 SLPacketHeader.packetSequenceNumber(i) 1481 + MSLH.IndexDelta(i+1) 1482 +1; 1483 } 1484 } 1486 All time stamps (CTS, DTS, OCR), when present, are restored from the 1487 delta values. Time stamps flags (CTSFlag, DTSFlag) in MSLH are used 1488 to reconstruct respectively the compositionTimeStampFlag and 1489 decodingTimeStampFlag of SLPacketHeader. The function corrected(x) 1490 for the RTP time stamp transformation is the mapping from 32 bits to 1491 SLConfig.timeStampLength, which may be smaller or larger than 32 1492 bits: 1494 If (timeStampLength < 32 ) { // short SL time stamps 1495 corrected(x) = LSB(x); // only the timeStampLength LSBits of x 1496 } 1497 else If (timeStampLength > 32 ) { // long SL time stamps 1498 corrected(x) = x + m; // start with m=0 1499 if ( x(i) < x(i-1) ) { // 32 bits RTPTS roll over has occurred 1500 { 1501 m += 2^32; 1502 } 1503 } 1504 else If (timeStampLength = 32 ) { // recommended value 1505 corrected(x) = x; // direct mapping 1506 } 1508 if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent 1509 // CTS is not transported for this RTP stream 1511 Gentric et al. Expires March 2002 27 1512 RTP Payload Format for MPEG-4 Streams September 2001 1514 if (i == 0){ // first SL packet in RTP packet 1515 if ( SLConfig.useTimeStamps == 1 ) { 1516 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1517 SLPacketHeader.compositionTimeStampFlag(0) = 1; 1518 SLPacketHeader.compositionTimeStamp(0) = 1519 corrected(RTP TimeStamp); 1520 } 1521 else { 1522 // ignore 1523 } 1524 } 1525 else { 1526 // empty 1527 } 1528 } 1529 else { // non-first SL packets in RTP packet 1530 if ( SLConfig.useTimeStamps == 1 ) { 1531 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1532 SLPacketHeader.compositionTimeStampFlag(i) = 0; 1533 } 1534 else { 1535 // ignore 1536 } 1537 } 1538 else { 1539 // empty 1540 } 1541 } 1542 } 1543 else { // CTSDeltaLength is not zero 1544 // CTS is transported for this stream 1545 if ( SLConfig.useTimeStamps == 1 ) { 1546 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1547 SLPacketHeader.compositionTimeStampFlag(i) = 1548 MSLH.CTSFlag(i); 1549 SLPacketHeader.compositionTimeStamp(i) = 1550 corrected(RTP TimeStamp) + MSLH.CTSDelta(i); 1551 } 1552 else { 1553 // ignore CTSFlag (which must be zero) 1554 } 1555 else { 1556 // this is strange and sub-optimal at best 1557 // a receiver should ignore this 1558 } 1559 } 1561 if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent 1562 // DTS is not transported for this stream 1563 if ( SLConfig.useTimeStamps == 1 ) { 1564 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1565 SLPacketHeader.decodingTimeStampFlag(i) = 0; 1566 } 1568 Gentric et al. Expires March 2002 28 1569 RTP Payload Format for MPEG-4 Streams September 2001 1571 else { 1572 // ignore 1573 } 1574 } 1575 else { 1576 // empty 1577 } 1578 } 1579 else { 1580 // DTS is transported for this stream 1581 if ( SLConfig.useTimeStamps == 1 ) { 1582 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1583 SLPacketHeader.decodingTimeStampFlag(i) = 1584 MSLH.DTSFlag(i); 1585 SLPacketHeader.decodingTimeStamp(i)= 1586 SLPacketHeader.compositionTimeStamp(i) 1587 - MSLH.DTSDelta(i); // DTS <= CTS always 1588 } 1589 else { 1590 // ignore DTSFlag (which must be zero) 1591 } 1592 } 1593 else { 1594 // this is strange and sub-optimal at best 1595 // a receiver should ignore this 1596 } 1597 } 1599 if ( OCRDeltaLength == 0) { // or OCRDeltaLength is absent 1600 // the RTP stream does not transport any OCR 1601 if ( SLConfig.OCRLenght == 0 ) { 1602 // this stream does not have any OCR 1603 } 1604 else { 1605 // illegal, normally the sender MUST detect 1606 // OCRs, replace them with OCRDelta and set 1607 // a relevant OCRDeltaLength value 1608 } 1609 } 1610 else { 1611 if ( SLConfig.OCRLenght == 0 ) { 1612 // this is strange and sub-optimal at best 1613 // a receiver should ignore this 1614 } 1615 else { 1616 SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i); 1617 if ( SLPacketHeader.OCRflag(i) == 1) { 1618 SLPacketHeader.objectClockReference(i) = 1619 corrected(RTP TimeStamp) + RSLH.OCRDelta(i); 1620 } 1621 } 1622 } 1624 Gentric et al. Expires March 2002 29 1625 RTP Payload Format for MPEG-4 Streams September 2001 1627 In the SingleSL mode the AccessUnitEndFlag, if needed, is restored 1628 from the M bit, as follows: 1630 if ( SLConfig.useAccessUnitEndFlag == 0 ) { 1631 // this SL stream does not signal access unit ends 1632 else { 1633 SLPacketHeader.AccessUnitEndFlag = M bit; 1634 } 1636 In the multipleSL mode the AccessUnitEndFlag is untouched in RSLH. 1638 The other SL packet header fields SHALL remain as found in RSLH. 1640 It is obvious that in the general case the reconstruction of the 1641 original SL packetized stream requires SL-awareness. However this 1642 payload format allows in all cases a receiver that does not know 1643 about the SL syntax to reconstruct the semantic of SL for the 1644 following very useful features: 1645 - Packet order (decoding order) 1646 - Access Unit boundaries (using the M bit) 1647 - Access Unit fragments (i.e. SL packet boundaries using 1648 MSLH.PayloadSize) 1649 - Composition Time Stamps (using the RTP Time Stamp and 1650 MSLH.CTSDelta) 1651 - Decoding Time Stamps (using the RTP Time Stamp and MSLH.DTSDelta) 1652 - Packet sequence number (using the RTP Time Sequence number and 1653 MSLH.Index) 1655 5.2 Handling of scene description streams 1657 MPEG-4 introduces new stream types as described in section 1 namely 1658 Object Descriptors and BIFS. In the following both OD and BIFS are 1659 discussed on the same basis i.e. as "scene description". 1661 Considering scene description as a "stream-able" type of content is 1662 a rather new concept and for that reasons some specific comments are 1663 needed. 1665 Typically scene descriptions are encoded in such a way that 1666 information loss would in the general case cripple the presentation 1667 beyond any hope of repair by the receiver. Still this is well suited 1668 for a number of multimedia applications were the scene is first made 1669 available via reliable channels to the client and then played. This 1670 payload format is not intended for this type of applications for 1671 which download of MPEG-4 interchange (.mp4) files is typical. 1672 However this payload format can also be used. It is then RECOMMENDED 1673 that the RTP packets should be transported using TCP (for example 1674 inside RTSP as described in [13, section 10.12]) or any other 1675 reliable protocol. 1677 On the other hand MPEG-4 has introduced the possibility to 1678 dynamically change the scene description by sending animation 1680 Gentric et al. Expires March 2002 30 1681 RTP Payload Format for MPEG-4 Streams September 2001 1683 information (changes in parameters) and structural change 1684 information (updates). Since this information has to be sent in a 1685 timely fashion MPEG-4 has defined a number of techniques in order to 1686 encode the scene description in a manner that makes it behave 1687 similarly to other temporal encoding schemes such as audio and 1688 video. This payload format is intended for this usage. 1690 Note that in many cases the application will consist of first the 1691 reliable transmission of a static initial scene followed by the 1692 streaming of animations and updates. For this reason the usage of 1693 this payload format is attractive since it offers a unique solution. 1695 Senders must be aware that suitable schemes should be used when 1696 scene description streams transport sensitive configuration 1697 information. For example in case the RTP packet transporting an OD- 1698 update command would be lost, the corresponding media stream would 1699 not be accessible by the receiver. 1701 Redundancy is a possibility and may either be added by tools 1702 hierarchically higher than this payload format, e.g. by packet based 1703 FEC, re-transmission, or similar tools. In such a case, the general 1704 congestion control principles have to be observed. 1706 Since BIFS and OD streams may be modified during the session with 1707 update commands, there is a need to send both update commands and 1708 full BIFS/OD refresh. For that reason MPEG-4 defines Random Access 1709 Points (RAP) for scene description streams (OD and BIFS) where by 1710 definition a decoder can restart decoding i.e. receives a "full 1711 update" of the scene. This mechanism is called Scene and Object 1712 Description Carousel. The AU Sequence Number field of SL Packet 1713 Header is used to support this behavior at the Synchronization 1714 Layer. When two access units are sent consecutively with the same AU 1715 Sequence Number, the second one is assumed to be a semantic 1716 repetition of the first. If a receiver starts to listen in the 1717 middle of a session or has detected losses, it can skip all received 1718 Access Units until such a RAP. The periodicity of transmission of 1719 these RAPs should be chosen/adjusted depending on the application 1720 and the network it is deployed on; i.e. exactly like Intra-coded 1721 frames for video, it is the responsibility of the sender to make 1722 sure the periodicity of RAPs is suitable. 1724 5.3 Multiplexing 1726 An advanced MPEG-4 session may involve a large number of objects 1727 that may be as many as a few hundred, transporting each ES as an 1728 individual RTP stream may not always be practical. Allocating and 1729 controlling hundreds of destination addresses for each MPEG-4 1730 session may pose insurmountable session administration problems. 1731 The input/output processing overhead at the end-points will be 1732 extremely high also. Additionally, low delay transmission of low 1733 bitrate data streams, e.g. facial animation parameters, results in 1734 extremely high header overheads. 1736 Gentric et al. Expires March 2002 31 1737 RTP Payload Format for MPEG-4 Streams September 2001 1739 To solve these problems, MPEG-4 data transport requires a 1740 multiplexing scheme that allows selective bundling of several ESs. 1741 This is beyond the scope of the payload format defined here. 1743 The MPEG-4's Flexmux multiplexing scheme may be used for this 1744 purpose and a specific RTP payload format is being developed [11]. 1746 Another approach may be to develop a generic RTP multiplexing scheme 1747 usable for MPEG-4 data. The multiplexing scheme reported in [8] may 1748 be a candidate for this approach. 1750 For MPEG-4 applications, the multiplexing technique needs to address 1751 the following requirements: 1753 i. The ESs multiplexed in one stream can change frequently during a 1754 session. Consequently, the coding type, individual packet size and 1755 temporal relationships between the multiplexed data units must be 1756 handled dynamically. 1758 ii. The multiplexing scheme should have a mechanism to determine the 1759 ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is 1760 not a part of the SL header. 1762 iii. In general, an SL packet does not contain information about its 1763 size. The multiplexing scheme should be able to delineate the 1764 multiplexed packets whose lengths may vary from a few bytes to close 1765 to the path-MTU. 1767 5.5 Overlap with RFC 3016 1769 This payload format has been designed to have a (large) overlap with 1770 RFC 3016 [7]. The conditions for this overlap are: 1771 Conditions for RFC 3016: 1772 i. MPEG-4 video elementary streams only 1773 ii. There MUST be a single VOP or Video Packet per RTP packet (only 1774 recommended in RFC 3016) 1775 iii. The decoder configuration MUST be signaled out-of-band either 1776 using the Config mime parameter or using the OD framework 1777 Conditions for this payload format: 1778 i. No structural parameters defined (or all set to zero), i.e. 1779 Single-SL mode with empty MSLH and empty RSLH. 1780 ii. Receivers MUST be ready to accept (and ignore) video 1781 configuration headers (e.g. VOSH, VO and VOL) and visual-object- 1782 sequence-end-code transported in-band. 1784 6. Security Considerations 1786 RTP packets using the payload format defined in this specification 1787 are subject to the security considerations discussed in the RTP 1788 specification [5]. This implies that confidentiality of the media 1789 streams is achieved by encryption. Because the data compression used 1790 with this payload format is applied end-to-end, encryption may be 1791 performed on the compressed data so there is no conflict between the 1793 Gentric et al. Expires March 2002 32 1794 RTP Payload Format for MPEG-4 Streams September 2001 1796 two operations. The packet processing complexity of this payload 1797 type (i.e. excluding media data processing) does not exhibit any 1798 significant non-uniformity in the receiver side to cause a denial- 1799 of-service threat. 1801 However, it is possible to inject non-compliant MPEG streams (Audio, 1802 Video, and Systems) to overload the receiver/decoder's buffers which 1803 might compromise the functionality of the receiver or even crash it. 1804 This is especially true for end-to-end systems like MPEG where the 1805 buffer models are precisely defined. 1807 MPEG-4 Systems supports stream types including commands that are 1808 executed on the terminal like OD commands, BIFS commands, etc. and 1809 programmatic content like MPEG-J (Java(TM) Byte Code) and 1810 ECMAScript. It is possible to use one or more of the above in a 1811 manner non-compliant to MPEG to crash or temporarily make the 1812 receiver unavailable. 1814 Authentication mechanisms can be used to validate of the sender and 1815 the data to prevent security problems due to non-compliant malignant 1816 MPEG-4 streams. 1818 A security model is defined in MPEG-4 Systems streams carrying MPEG- 1819 J access units which comprises Java(TM) classes and objects. MPEG-J 1820 defines a set of Java APIs and a secure execution model. MPEG-J 1821 content can call this set of APIs and Java(TM) methods from a set of 1822 Java packages supported in the receiver within the defined security 1823 model. According to this security model, downloaded byte code is 1824 forbidden to load libraries, define native methods, start programs, 1825 read or write files, or read system properties. 1827 Receivers can implement intelligent filters to validate the buffer 1828 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1829 ECMAScript) commands in the streams. However, this can increase the 1830 complexity significantly. 1832 7. Acknowledgements 1834 This document evolved across several years thanks to contributions 1835 from a large number of people since it is based on work within the 1836 IETF AVT working group and various ISO MPEG working groups, 1837 especially the 4-on-IP ad-hoc group. The authors wish to thank 1838 Olivier Avaro, Stephen Casner, Guido Fransceschini, Art Howarth, 1839 Dave Mackie, Dave Singer, and Stephan Wenger for their valuable 1840 comments and support. Attentive readers and early implementers also 1841 found flaws and bugs, thank you all. 1843 8. References 1845 [1] ISO/IEC 14496-1:2001 MPEG-4 Systems 1847 [2] ISO/IEC 14496-2:2001 MPEG-4 Visual 1849 Gentric et al. Expires March 2002 33 1850 RTP Payload Format for MPEG-4 Streams September 2001 1852 [3] ISO/IEC 14496-3:2001 MPEG-4 Audio 1854 [4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework. 1856 [5] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A 1857 Transport Protocol for Real Time Applications, RFC 1889, Internet 1858 Engineering Task Force, January 1996. 1860 [6] S. Bradner, Key words for use in RFCs to Indicate Requirement 1861 Levels, RFC 2119, Internet Engineering Task Force, March 1997. 1863 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 1864 payload format for MPEG-4 Audio/Visual streams, Internet Engineering 1865 Task Force, RFC 3016. 1867 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed 1868 RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-04.txt, July 1869 2001. 1871 [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over 1872 IP-based Protocols, work in progress, draft-singer-mpeg4-ip-02.txt, 1873 May 2001. 1875 [10] M. Handley, V. Jacobson, SDP: Session Description Protocol, RFC 1876 2327, Internet Engineering Task Force, April 1998. 1878 [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed 1879 Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt, 1880 February 2001. 1882 [12] H. Schulzrinne, RTP Profile for Audio and Video Conferences 1883 with Minimal Control, RFC 1890, Internet Engineering Task Force, 1884 January 1996. 1886 [13] H. Schulzrinne, A. Rao, R. Lanphier, Real Time Streaming 1887 Protocol, RFC 2326, Internet Engineering Task Force, April 1998. 1889 [14] M. Handley, C. Perkins, E. Whelan, Session Announcement 1890 Protocol, RFC 2974, Internet Engineering Task Force, October 2000. 1892 9. Authors' Addresses 1894 Andrea Basso 1895 AT&T Labs Research 1896 200 Laurel Avenue 1897 Middletown, NJ 07748 1898 USA 1899 e-mail: basso@research.att.com 1901 M. Reha Civanlar 1902 AT&T Labs - Research 1903 200 Laurel Ave. South, A5 4D04 1905 Gentric et al. Expires March 2002 34 1906 RTP Payload Format for MPEG-4 Streams September 2001 1908 Middletown, NJ 07748 1909 USA 1910 e-mail: civanlar@research.att.com 1912 Philippe Gentric 1913 Philips Digital Networks, MP4Net 1914 51 rue Carnot 1915 92156 Suresnes 1916 France 1917 e-mail: philippe.gentric@philips.com 1919 Carsten Herpel 1920 THOMSON multimedia 1921 Karl-Wiechert-Allee 74 1922 30625 Hannover 1923 Germany 1924 e-mail: herpelc@thmulti.com 1926 Zvi Lifshitz 1927 Optibase Ltd. 1928 7 Shenkar St. 1929 Herzliya 46120 1930 Israel 1931 e-mail: zvil@optibase.com 1933 Young-kwon Lim 1934 mp4cast (MPEG-4 Internet Broadcasting Solution Consortium) 1935 1001-1 Daechi-Dong Gangnam-Gu 1936 Seoul, 305-333, 1937 Korea 1938 e-mail : young@techway.co.kr 1940 Colin Perkins 1941 USC Information Sciences Institute 1942 4350 N. Fairfax Drive #620 1943 Arlington, VA 22203 1944 USA 1945 e-mail : csp@isi.edu 1947 Jan van der Meer 1948 Philips Digital Networks 1949 Building WDB-1 1950 Prof Holstlaan 4 1951 5656 AA Eindhoven 1952 Netherlands 1953 e-mail : jan.vandermeer@philips.com 1955 Gentric et al. Expires March 2002 35 1956 RTP Payload Format for MPEG-4 Streams September 2001 1958 APPENDIX: Examples of usage 1960 This payload format has been designed to transport efficiently a 1961 very versatile packetization scheme: the MPEG-4 Synch Layer; as a 1962 result its complexity is larger than the average RTP payload format. 1963 For this reason this section describes a number of key examples of 1964 how this payload format can be used. 1966 A C++-like syntax called SDL (Syntactic Description Language) 1967 defined in [1, section 14] is used to economically describe MPEG-4 1968 system data structures. 1970 However, as discussed in section 2, this payload format can also be 1971 used without explicit knowledge of SL (logically equivalent to 1972 configuring the SL headers as being empty), several examples 1973 (Appendix 1,3,4,5) cover this case. 1975 Furthermore these examples assume that the (a=fmtp) SDP syntax is 1976 used to convey the MIME parameters of the payload format. 1978 Appendix.1 RFC 3016 compatible MPEG-4 Video (no SL) 1980 This is an example of a video stream where the SL is configured to 1981 produce RTP packets compatible with RFC 3016. 1983 SLConfigDescriptor 1985 In this example the SLConfigDescriptor is: 1987 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1988 tag=SLConfigDescrTag { 1989 bit(8) predefined; 1990 if (predefined==0) { 1991 bit(1) useAccessUnitStartFlag; = 0 1992 bit(1) useAccessUnitEndFlag; = 1 1993 bit(1) useRandomAccessPointFlag; = 0 1994 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1995 bit(1) usePaddingFlag; = 0 1996 bit(1) useTimeStampsFlag; = 0 1997 bit(1) useIdleFlag; = 0 1998 bit(1) durationFlag; = 0 1999 bit(32) timeStampResolution; = 0 2000 bit(32) OCRResolution; = 0 2001 bit(8) timeStampLength; = 0 2002 bit(8) OCRLength; = 0 2003 bit(8) AU_Length; = 0 2004 bit(8) instantBitrateLength; = 0 2005 bit(4) degradationPriorityLength; = 0 2006 bit(5) AU_seqNumLength; = 0 2007 bit(5) packetSeqNumLength; = 0 2008 bit(2) reserved=0b11; 2009 } 2010 if (durationFlag) { 2012 Gentric et al. Expires March 2002 36 2013 RTP Payload Format for MPEG-4 Streams September 2001 2015 bit(32) timeScale; // NOT USED 2016 bit(16) accessUnitDuration; // NOT USED 2017 bit(16) compositionUnitDuration; // NOT USED 2018 } 2019 if (!useTimeStampsFlag) { 2020 bit(timeStampLength) startDecodingTimeStamp; = 0 2021 bit(timeStampLength) startCompositionTimeStamp; = 0 2022 } 2023 } 2025 SL Packet Header structure 2027 With this configuration we have the following SL packet header 2028 structure: 2030 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2031 if (SL.useAccessUnitEndFlag) { 2032 bit(1) accessUnitEndFlag; // 1 bit 2033 } 2034 } 2036 In this case this payload produces RTP packets that are exactly 2037 conformant to RFC 3016 and the Synch Layer is reduced to a purely 2038 logical construction that neither sender nor receiver need to 2039 implement. 2041 Parameters 2043 This configuration is the default one; no parameters are required. 2045 RTP packet structure 2047 Note that accessUnitEndFlag is mapped to the RTP header M bit. 2049 +=========================================+=============+ 2050 | Field | size | 2051 +=========================================+=============+ 2052 | RTP header | - | 2053 +-----------------------------------------+-------------+ 2054 | SL packet payload | 1400 bytes | 2055 +-----------------------------------------+-------------+ 2057 Overhead 2059 In this example we have an RTP overhead of 40 bytes for 1400 bytes 2060 of payload i.e. 3 % overhead. 2062 Appendix.2 MPEG-4 Video with SL 2064 Let us consider the case of a 30 frames per second MPEG-4 video 2065 stream which bit rate is high enough that Access Units have to be 2066 split in several SL packets (typically above 300 kb/s). 2068 Gentric et al. Expires March 2002 37 2069 RTP Payload Format for MPEG-4 Streams September 2001 2071 Let us assume also that the video codec generates in that case Video 2072 Packets suitable to fit in one SL packet i.e that the video codec is 2073 MTU aware and the MTU is 1500 bytes. We assume furthermore that this 2074 stream contains B frames and that decodingTimeStamps are present. 2076 SLConfigDescriptor 2078 In this example the SLConfigDescriptor is: 2080 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2081 tag=SLConfigDescrTag { 2082 bit(8) predefined; 2083 if (predefined==0) { 2084 bit(1) useAccessUnitStartFlag; = 1 2085 bit(1) useAccessUnitEndFlag; = 0 2086 bit(1) useRandomAccessPointFlag; = 1 2087 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 2088 bit(1) usePaddingFlag; = 0 2089 bit(1) useTimeStampsFlag; = 1 2090 bit(1) useIdleFlag; = 0 2091 bit(1) durationFlag; = 0 2092 bit(32) timeStampResolution; = 30 2093 bit(32) OCRResolution; = 0 2094 bit(8) timeStampLength; = 32 2095 bit(8) OCRLength; = 0 2096 bit(8) AU_Length; = 0 2097 bit(8) instantBitrateLength; = 0 2098 bit(4) degradationPriorityLength; = 0 2099 bit(5) AU_seqNumLength; = 0 2100 bit(5) packetSeqNumLength; = 0 2101 bit(2) reserved=0b11; 2102 } 2103 if (durationFlag) { 2104 bit(32) timeScale; // NOT USED 2105 bit(16) accessUnitDuration; // NOT USED 2106 bit(16) compositionUnitDuration; // NOT USED 2107 } 2108 if (!useTimeStampsFlag) { 2109 bit(timeStampLength) startDecodingTimeStamp; // NOT USED 2110 bit(timeStampLength) startCompositionTimeStamp; // NOT USED 2111 } 2112 } 2114 The useRandomAccessPointFlag is set so that the 2115 randomAccessPointFlag can indicate that the corresponding SL packet 2116 contains a GOV and the first Video Packet of an Intra coded frame. 2118 SL Packet Header structure 2120 With this configuration we have the following SL packet header 2121 structure: 2123 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2125 Gentric et al. Expires March 2002 38 2126 RTP Payload Format for MPEG-4 Streams September 2001 2128 bit(1) accessUnitStartFlag; // 1 bit 2129 if (accessUnitStartFlag) { 2130 bit(1) randomAccessPointFlag; // 1 bit 2131 bit(1) decodingTimeStampFlag; // 1 bit 2132 bit(1) compositionTimeStampFlag; // 1 bit 2133 if (decodingTimeStampFlag) { 2134 bit(SL.timeStampLength) decodingTimeStamp; 2135 } 2136 if (compositionTimeStampFlag) { 2137 bit(SL.timeStampLength) compositionTimeStamp; 2138 } 2139 } 2141 Parameters 2143 decodingTimeStamps are encoded on 32 bits, which is much more than 2144 needed for delta. Therefore the sender will use DTSDeltaLength to 2145 signal that only 7 bits are used for the coding of relative DTS in 2146 the RTP packet. 2148 The RSLHSectionSize cannot exceed 4 (bits), which is encoded on 3 2149 bits and signaled by RSLHSectionSizeLength. The resulting 2150 concatenated fmtp line is: 2152 a=fmtp: DTSDeltaLength=7;RSLHSectionSizeLength=3 2154 RTP packet structure 2156 Two cases can occur; for packets that transport first fragments of 2157 Access Units we have: 2159 +=========================================+=============+ 2160 | Field | size | 2161 +=========================================+=============+ 2162 | RTP header | - | 2163 +-----------------------------------------+-------------+ 2164 | DTSFlag = (1) | 1 bit | 2165 +-----------------------------------------+-------------+ 2166 | DTSDelta | 7 bits | 2167 +-----------------------------------------+-------------+ 2168 | bits to byte alignment | 0 bits | 2169 +-----------------------------------------+-------------+ 2170 | RSLHSectionSize = (100) | 3 bits | 2171 +-----------------------------------------+-------------+ 2172 | accessUnitStartFlag = (1) | 1 bit | 2173 +-----------------------------------------+-------------+ 2174 | randomAccessPointFlag | 1 bit | 2175 +-----------------------------------------+-------------+ 2176 | decodingTimeStampFlag | 1 bit | 2177 +-----------------------------------------+-------------+ 2178 | compositionTimeStampFlag | 1 bit | 2179 +-----------------------------------------+-------------+ 2180 | bits to byte alignment =(0) | 1 bit | 2182 Gentric et al. Expires March 2002 39 2183 RTP Payload Format for MPEG-4 Streams September 2001 2185 +-----------------------------------------+-------------+ 2186 | SL packet payload | N bytes | 2187 +-----------------------------------------+-------------+ 2189 For packets that transport non-first fragments of Access Units we 2190 have: 2192 +=========================================+=============+ 2193 | Field | size | 2194 +=========================================+=============+ 2195 | RTP header | - | 2196 +-----------------------------------------+-------------+ 2197 | DTSFlag = 0 | 1 bit | 2198 +-----------------------------------------+-------------+ 2199 | bits to byte alignment = (0000000) | 7 bits | 2200 +-----------------------------------------+-------------+ 2201 | RSLHSectionSize = (001) | 3 bits | 2202 +-----------------------------------------+-------------+ 2203 | accessUnitStartFlag = (0) | 1 bit | 2204 +-----------------------------------------+-------------+ 2205 | bits to byte alignment = (0000) | 4 bits | 2206 +-----------------------------------------+-------------+ 2207 | SL packet payload | N bytes | 2208 +-----------------------------------------+-------------+ 2210 Overhead estimation 2212 In this example we have a RTP overhead of 40 + 2 bytes for 1400 2213 bytes of payload i.e. 3 % overhead. 2215 Appendix.3 Low delay MPEG-4 Audio (no SL) 2217 This example is for a low delay audio service. For this reason a 2218 single SL packet is transported in each RTP packet. Actually each SL 2219 packet contains a complete Access Unit. 2221 SLConfigDescriptor 2223 Since CTS=DTS and Access Unit duration is constant signaling of 2224 MPEG-4 time stamps is not needed (the durationFlag of SLConfig is 2225 set) 2227 We also assume here an audio Object Type for which all Access Units 2228 are Random Access Points, which is signaled using the 2229 hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. 2231 We assume furthermore a mode where the Access Unit size is constant 2232 and equal to 5 bytes (which is signaled with AU_Length). 2234 In this example the SLConfigDescriptor is: 2236 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2238 Gentric et al. Expires March 2002 40 2239 RTP Payload Format for MPEG-4 Streams September 2001 2241 tag=SLConfigDescrTag { 2242 bit(8) predefined; 2243 if (predefined==0) { 2244 bit(1) useAccessUnitStartFlag; = 0 2245 bit(1) useAccessUnitEndFlag; = 0 2246 bit(1) useRandomAccessPointFlag; = 0 2247 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 2248 bit(1) usePaddingFlag; = 0 2249 bit(1) useTimeStampsFlag; = 0 2250 bit(1) useIdleFlag; = 0 2251 bit(1) durationFlag; = 1 // signals constant AU duration 2252 bit(32) timeStampResolution; = 0 2253 bit(32) OCRResolution; = 0 2254 bit(8) timeStampLength; = 0 2255 bit(8) OCRLength; = 0 2256 bit(8) AU_Length; = 5 2257 bit(8) instantBitrateLength; = 0 2258 bit(4) degradationPriorityLength; = 0 2259 bit(5) AU_seqNumLength; = 0 2260 bit(5) packetSeqNumLength; = 0 2261 bit(2) reserved=0b11; 2262 } 2263 if (durationFlag) { 2264 bit(32) timeScale; = 1000 // for milliseconds 2265 bit(16) accessUnitDuration; = 10 // ms 2266 bit(16) compositionUnitDuration; = 10 // ms 2267 } 2268 if (!useTimeStampsFlag) { 2269 bit(timeStampLength) startDecodingTimeStamp; = 0 2270 bit(timeStampLength) startCompositionTimeStamp; = 0 2271 } 2272 } 2274 SL packet header 2276 With this configuration the SL packet header is empty. The Synch 2277 Layer is reduced to a purely logical construction that neither 2278 sender nor receiver need to implement. 2280 Parameters 2282 No parameters are required. 2284 RTP packet structure 2286 Note that the RTP header M bit should be always set to 1. 2288 +=========================================+=============+ 2289 | Field | size | 2290 +=========================================+=============+ 2291 | RTP header | - | 2292 +-----------------------------------------+-------------+ 2293 | SL packet payload | 5 bytes | 2295 Gentric et al. Expires March 2002 41 2296 RTP Payload Format for MPEG-4 Streams September 2001 2298 +-----------------------------------------+-------------+ 2300 Overhead estimation 2302 The overhead is extremely large i.e. more than 800 %, since 40 bytes 2303 of headers are required to transport 5 bytes of data. Note however 2304 that RTP header compression would work well since time stamps 2305 increments are constant. 2307 Appendix.4 Media delivery MPEG-4 Audio (no SL) 2309 This example is for a media delivery service where delay is not an 2310 issue but efficiency is. In this case several SL Packets are 2311 transported in each RTP packet. 2313 SLConfigDescriptor 2315 Similar to previous example. 2317 SL packet header 2319 With this configuration the SL packet header is empty. The Synch 2320 Layer is reduced to a purely logical construction that neither 2321 sender nor receiver need to implement. 2323 Parameters 2325 The absence of RSLHSectionSizeLength indicates that the RSLHSection 2326 is empty. 2328 The size of SL Packets (which are all complete Access Units in this 2329 case) is constant and is indicated with: 2331 a=fmtp: ConstantSize=5 2333 This also indicates to the receiver that the Multiple-SL mode will 2334 be used, the 2 bytes field that would give the size of the 2335 MSLHSection is ommited since in this case this field always contains 2336 zero (the MSLHSection is always empty due to the absence of any 2337 other MIME parameter). 2339 RTP packet structure 2341 Note that the RTP header M bit is always set to 1, which indicates 2342 to the receiver that only complete Access Units are transported. 2344 +=========================================+=============+ 2345 | Field | size | 2347 Gentric et al. Expires March 2002 42 2348 RTP Payload Format for MPEG-4 Streams September 2001 2350 +=========================================+=============+ 2351 | RTP header | - | 2352 +-----------------------------------------+-------------+ 2353 | SL packet payload | 5 bytes | 2354 +-----------------------------------------+-------------+ 2355 | SL packet payload | 5 bytes | 2356 +-----------------------------------------+-------------+ 2357 | etc, until MTU is reached | 2358 +-----------------------------------------+-------------+ 2359 | SL packet payload | 5 bytes | 2360 +-----------------------------------------+-------------+ 2362 Overhead estimation 2364 The overhead is 3% i.e. minimal. 2366 Appendix.5 AAC with interleaving (no SL) 2368 Let us consider AAC at 128 kb/s where each Access Unit is in the 2369 average 320 bytes. Interleaving is applied with a continuous 2370 interleaving scheme (see table below) where 4 Access Units are used 2371 to construct each RTP packet in order to match a MTU of 1500 bytes. 2373 IndexDelta is constant and equal to 2 (since +1 is automatically 2374 added); it is encoded on 2 bits. 2376 As explained in section 3.8 this is a time stamp based interleaving 2377 scheme (IndexLength=0); indeed receivers know that each SL packet is 2378 a complete Access Unit because all RTP packets have the M bit set to 2379 1 and therefore, since Access Unit duration is constant, Access Unit 2380 timestamps can be computed from RTP timestamps and IndexDelta 2381 values; this can be used for de-interleaving even in case of losses. 2383 Note that it would also be possible to use IndexLength=2 so as to 2384 maintain a byte alignement in the MSLH portions; in this case 2385 however the value of these two bits MUST be zero as stated in 3.8.1. 2387 +-----------------------------------------------------------------+ 2388 | RTP packet | RTP Timestamp | Aus | IndexDelta | 2389 +-----------------------------------------------------------------+ 2390 | 1 | CTS(AU1) | 1 | - | 2391 +-----------------------------------------------------------------+ 2392 | 2 | CTS(AU2) | 2, 5 | -,2 | 2393 +-----------------------------------------------------------------+ 2394 | 3 | CTS(AU3) | 3, 6, 9 | -,2,2 | 2395 +-----------------------------------------------------------------+ 2396 | 4 | CTS(AU4) | 4, 7,10,13 | -,2,2,2 | 2397 +-----------------------------------------------------------------+ 2398 | 5 | CTS(AU8) | 8,11,14,17 | -,2,2,2 | 2399 +-----------------------------------------------------------------+ 2400 | 6 | CTS(AU12) | 12,15,18,21 | -,2,2,2 | 2402 Gentric et al. Expires March 2002 43 2403 RTP Payload Format for MPEG-4 Streams September 2001 2405 +-----------------------------------------------------------------+ 2406 | 7 | CTS(AU16) | 16,19,22,25 | -,2,2,2 | 2407 +----------------------------------------------------------------+ 2408 | 8 | CTS(AU20) | 20,23,26,29 | -,2,2,2 | 2409 +-----------------------------------------------------------------+ 2410 | 9 | CTS(AU24) | 24,27,30,33 | -,2,2,2 | 2411 +-----------------------------------------------------------------+ 2412 | 10 | CTS(AU28) | 28,31,34,37 | -,2,2,2 | 2413 +-----------------------------------------------------------------+ 2414 | etc | 2415 +-----------------------------------------------------------------+ 2417 SLConfigDescriptor 2419 Similar to previous example. 2421 SL Packet Header 2423 Similar to previous example (empty). 2425 Parameters 2427 The resulting concatenated fmtp line is: 2429 a=fmtp: SizeLength=9; IndexDeltaLength=2; 2431 RTP packet structure 2433 +=========================================+=============+ 2434 | Field | size | 2435 +=========================================+=============+ 2436 | RTP header | - | 2437 +-----------------------------------------+-------------+ 2438 MSLHSection 2439 +=========================================+=============+ 2440 | MSLHSection size in bits = 42 bits | 2 bytes | 2441 +-----------------------------------------+-------------+ 2442 | PayloadSize | 9 bits | 2443 +-----------------------------------------+-------------+ 2444 | PayloadSize | 9 bits | 2445 +-----------------------------------------+-------------+ 2446 | IndexDelta | 2 bits | 2447 +-----------------------------------------+-------------+ 2448 | PayloadSize | 9 bits | 2449 +-----------------------------------------+-------------+ 2450 | IndexDelta | 2 bits | 2451 +-----------------------------------------+-------------+ 2452 | PayloadSize | 9 bits | 2453 +-----------------------------------------+-------------+ 2454 | IndexDelta | 2 bits | 2455 +-----------------------------------------+-------------+ 2456 | bits to byte alignment = (000000) | 6 bits | 2457 +-----------------------------------------+-------------+ 2459 Gentric et al. Expires March 2002 44 2460 RTP Payload Format for MPEG-4 Streams September 2001 2462 SLPPSection 2463 +=========================================+=============+ 2464 | AAC Access Unit | x bytes | 2465 +-----------------------------------------+-------------+ 2466 | AAC Access Unit | x bytes | 2467 +-----------------------------------------+-------------+ 2468 | AAC Access Unit | x bytes | 2469 +-----------------------------------------+-------------+ 2470 | AAC Access Unit | x bytes | 2471 +-----------------------------------------+-------------+ 2473 Overhead estimation 2475 The MSLHSection is 8 bytes; in this example we have therefore a RTP 2476 overhead of 40 + 8 bytes for 1400 bytes (approx) of payload i.e. 2477 around 4 % overhead. 2479 Appendix.6 AAC with Index-based interleaving and SL 2481 Let us consider AAC around 130 kb/s where each Access Unit is split 2482 in 4 SL packets corresponding to Error Sensitivity Categories (ESC) 2483 of maximum 90 bytes for which interleaving is very useful in terms 2484 of error resilience. We thus use an interleaving scheme where 15 SL 2485 Packets (extracted from 15 consecutive Access Units) are used to 2486 construct each RTP packet in order to match a MTU of 1500 bytes. 2487 Note that since ESC fragments are not byte aligned we also use the 2488 paddingFlag and paddingBits features of the Synch Layer. The 2489 interleaving sequence is 4 RTP packets and 350 ms long, which is too 2490 long for conferencing but perfectly OK for Internet radio. 2492 Since the sequence contains 60 SL packets, IndexLength is set to 16 2493 bits so as to provide a safe margin in case of long loss bursts. 2494 This will also indicate to the receiver that this is a Index-Based- 2495 Interleaving scheme (indeed CTS cannot be computed for SL packets 2496 that are not AU starts). 2498 2 bits are enough for IndexDelta, which is constant and equal to 3 2499 (since +1 is automatically added). 2501 Note that the 4th RTP packet in each sequence has its M bit set to 1 2502 since it contains 15 SL packets transporting the end of 15 2503 consecutive Access Units. 2505 With this scheme a sender (for example upon reception of RTCP 2506 reports indicating high loss rates) can (for example) choose to 2507 duplicate for each interleaving sequence the first RTP packet that 2508 contains the most useful data in terms of ESC or apply other error 2509 protection techniques, with due care to congestion issues. 2511 Gentric et al. Expires March 2002 45 2512 RTP Payload Format for MPEG-4 Streams September 2001 2514 In this example we will also show several other SL features (OCR, AU 2515 boundary flags, padding, as detailed below). 2517 One feature demonstrated by this example is the degradation 2518 priority. We assume degradation priority can take 4 different 2519 values, mapped to Error Sensitivity Categories, and is encoded on 2 2520 bits. This interleaving scheme makes sure that only SL packets of 2521 identical degradation priorities are grouped in the same RTP packet 2522 (3.6.3) and that only the first RSLH of each RTP packet transports 2523 the degradation priority. 2525 We also assume that for each last SL packet of each RTP packet the 2526 server inserts an OCR. 2528 SLConfigDescriptor 2530 In this example the SLConfigDescriptor is: 2532 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2533 tag=SLConfigDescrTag { 2534 bit(8) predefined; 2535 if (predefined==0) { 2536 bit(1) useAccessUnitStartFlag; = 1 2537 bit(1) useAccessUnitEndFlag; = 1 2538 bit(1) useRandomAccessPointFlag; = 0 2539 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 2540 bit(1) usePaddingFlag; = 1 // we need to signal padding bits 2541 bit(1) useTimeStampsFlag; = 0 2542 bit(1) useIdleFlag; = 0 2543 bit(1) durationFlag; = 1 2544 bit(32) timeStampResolution; = 0 2545 bit(32) OCRResolution; = 30 2546 bit(8) timeStampLength; = 0 2547 bit(8) OCRLength; = 32 2548 bit(8) AU_Length; = 0 2549 bit(8) instantBitrateLength; = 0 2550 bit(4) degradationPriorityLength; = 2 2551 bit(5) AU_seqNumLength; = 0 2552 bit(5) packetSeqNumLength; = 6 2553 bit(2) reserved=0b11; 2554 } 2555 if (durationFlag) { 2556 bit(32) timeScale; = 1000// milliseconds 2557 bit(16) accessUnitDuration; = 23.22 // ms 2558 bit(16) compositionUnitDuration; = 23.22 // ms 2559 } 2560 if (!useTimeStampsFlag) { 2561 bit(timeStampLength) startDecodingTimeStamp; = 0 2562 bit(timeStampLength) startCompositionTimeStamp; = 0 2563 } 2564 } 2566 Gentric et al. Expires March 2002 46 2567 RTP Payload Format for MPEG-4 Streams September 2001 2569 SL Packet Header structure 2571 With this configuration we have the following SL packet header 2572 structure: 2574 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2575 bit(1) accessUnitStartFlag; 2576 bit(1) accessUnitEndFlag; 2577 bit(1) OCRflag; 2578 bit(1) paddingFlag; 2579 if (paddingFlag) bit(3) paddingBits; 2580 bit(SL.packetSeqNumLength) packetSequenceNumber; 2581 bit(1) DegPrioflag; 2582 if (DegPrioflag) { 2583 bit(SL.degradationPriorityLength) degradationPriority;} 2584 if (OCRflag) { 2585 bit(SL.OCRLength) objectClockReference;} 2586 } 2587 } 2589 Parameters 2591 The resulting concatenated fmtp line is: 2593 a=fmtp: SizeLength=7; RSLHSectionSizeLength=8; 2594 IndexLength=16; IndexDeltaLength=2; OCRDeltaLength=16 2596 RTP packet structure 2598 +=========================================+=============+ 2599 | Field | size | 2600 +=========================================+=============+ 2601 | RTP header | - | 2602 +-----------------------------------------+-------------+ 2603 MSLHSection 2604 +=========================================+=============+ 2605 | MSLHSection size in bits = 149 | 2 bytes | 2606 +-----------------------------------------+-------------+ 2607 | PayloadSize | 7 bits | 2608 +-----------------------------------------+-------------+ 2609 | Index | 16 bits | 2610 +-----------------------------------------+-------------+ 2611 | PayloadSize | 7 bits | 2612 +-----------------------------------------+-------------+ 2613 | IndexDelta = (11) | 2 bits | 2614 +-----------------------------------------+-------------+ 2615 | etc + 12 times 9 bits | 2616 +-----------------------------------------+-------------+ 2617 | PayloadSize | 7 bits | 2618 +-----------------------------------------+-------------+ 2619 | IndexDelta = (11) | 2 bits | 2620 +-----------------------------------------+-------------+ 2621 | bits to byte alignment = (000) | 3 bits | 2623 Gentric et al. Expires March 2002 47 2624 RTP Payload Format for MPEG-4 Streams September 2001 2626 +-----------------------------------------+-------------+ 2627 RSLHSection 2628 +=========================================+=============+ 2629 | RSLHSectionSize = (10000111) | 8 bits | 2630 +-----------------------------------------+-------------+ 2631 | accessUnitStartFlag | 1 bit | 2632 +-----------------------------------------+-------------+ 2633 | accessUnitEndFlag | 1 bit | 2634 +-----------------------------------------+-------------+ 2635 | OCRFlag = (0) | 1 bit | 2636 +-----------------------------------------+-------------+ 2637 | paddingFlag = (1) | 1 bit | 2638 +-----------------------------------------+-------------+ 2639 | paddingBits | 3 bits | 2640 +-----------------------------------------+-------------+ 2641 | DegPrioflag = (1) | 1 bit | 2642 +-----------------------------------------+-------------+ 2643 | degradationPriority | 2 bits | 2644 +-----------------------------------------+-------------+ 2645 | accessUnitStartFlag | 1 bit | 2646 +-----------------------------------------+-------------+ 2647 | accessUnitEndFlag | 1 bit | 2648 +-----------------------------------------+-------------+ 2649 | OCRFlag = (0) | 1 bit | 2650 +-----------------------------------------+-------------+ 2651 | paddingFlag = (1) | 1 bit | 2652 +-----------------------------------------+-------------+ 2653 | paddingBits | 3 bits | 2654 +-----------------------------------------+-------------+ 2655 | DegPrioflag = (0) | 1 bit | 2656 +-----------------------------------------+-------------+ 2657 | etc + 12 times 8 bits | 2658 +-----------------------------------------+-------------+ 2659 | accessUnitStartFlag | 1 bit | 2660 +-----------------------------------------+-------------+ 2661 | accessUnitEndFlag | 1 bit | 2662 +-----------------------------------------+-------------+ 2663 | OCRFlag = (1) | 1 bit | 2664 +-----------------------------------------+-------------+ 2665 | OCRDelta | 16 bits | 2666 +-----------------------------------------+-------------+ 2667 | paddingFlag = (0) | 1 bit | 2668 +-----------------------------------------+-------------+ 2669 | DegPrioflag = (0) | 1 bit | 2670 +-----------------------------------------+-------------+ 2671 | bits to byte alignment = (000) | 3 bits | 2672 +-----------------------------------------+-------------+ 2673 SLPPSection 2674 +=========================================+=============+ 2675 | SL packet payload |max 90 bytes | 2676 +-----------------------------------------+-------------+ 2677 | etc + 13 SL packets | 2678 +-----------------------------------------+-------------+ 2680 Gentric et al. Expires March 2002 48 2681 RTP Payload Format for MPEG-4 Streams September 2001 2683 | SL packet payload |max 90 bytes | 2684 +-----------------------------------------+-------------+ 2686 Note that in the above table the last SL packet in the RTP packet 2687 has a payload that is byte-aligned (at the end). When this happens 2688 paddingFlag is set to zero and the paddingBits field is omitted. 2690 Overhead estimation 2692 The MSLHSection is 19 bytes, the RSLHSection is 16 bytes; in this 2693 example we have therefore a RTP overhead of 40 + 35 bytes for 1350 2694 bytes of payload i.e. around 6 % overhead. 2696 Gentric et al. Expires March 2002 49