idnits 2.17.1 draft-ietf-avt-mpeg4-multisl-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 2 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 46 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 202 has weird spacing: '... media unawa...' == Line 659 has weird spacing: '...aLength bits)...' == Line 2130 has weird spacing: '...dicated with:...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2001) is 8311 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 357, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '7') (Obsoleted by RFC 6416) == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-02 == Outdated reference: A later version (-04) exists of draft-singer-mpeg4-ip-02 -- Possible downref: Normative reference to a draft: ref. '9' ** Obsolete normative reference: RFC 2327 (ref. '10') (Obsoleted by RFC 4566) == Outdated reference: A later version (-03) exists of draft-curet-avt-rtp-mpeg4-flexmux-00 -- Possible downref: Normative reference to a draft: ref. '11' ** Obsolete normative reference: RFC 1890 (ref. '12') (Obsoleted by RFC 3551) ** Obsolete normative reference: RFC 2326 (ref. '13') (Obsoleted by RFC 7826) ** Downref: Normative reference to an Experimental RFC: RFC 2974 (ref. '14') Summary: 11 errors (**), 0 flaws (~~), 11 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Avaro-France Telecom 3 Internet Draft Basso-AT&T 4 Casner-Packet Design 5 Civanlar-AT&T 6 Gentric-Philips 7 Herpel-Thomson 8 Lifshitz-Optibase 9 Lim-mp4cast 10 Perkins-ISI 11 van der Meer-Philips 12 July 2001 13 Expires Jan. 2002 14 Document: draft-ietf-avt-mpeg4-multisl-01.txt 16 RTP Payload Format for MPEG-4 Streams 18 Status of this Memo 20 This document is an Internet-Draft and is in full conformance with 21 all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. Internet-Drafts are draft documents valid for a maximum of 27 six months and may be updated, replaced, or obsoleted by other 28 documents at any time. It is inappropriate to use Internet- Drafts 29 as reference material or to cite them other than as "work in 30 progress." 32 This specification is a product of the Audio/Video Transport working 33 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 34 ad hoc group on MPEG-4 over Internet. Comments are solicited and 35 should be addressed to the working group's mailing list at 36 avt@ietf.org and/or the authors. 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html. 43 This document contains a MIME type registration form that is 44 intended to be taken as-is and therefore makes reference to this 45 document, using the temporary placeholder: . 47 Abstract 49 This document describes a payload format for transporting MPEG-4 50 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 51 the coding of natural and synthetic audio-visual data. Several 52 services provided by RTP are beneficial for MPEG-4 encoded data 54 Gentric et al. Expires January 2002 1 55 transport over the Internet. Additionally, the use of RTP makes it 56 possible to synchronize MPEG-4 data with other real-time data types. 58 1. Introduction 60 MPEG-4 is a recent standard from ISO/IEC for the coding of natural 61 and synthetic audio-visual data in the form of audiovisual objects 62 that are arranged into an audiovisual scene by means of a scene 63 description [1][2][3][4]. This draft specifies an RTP [5] payload 64 format for transporting MPEG-4 encoded data streams. 66 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 67 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 68 this document are to be interpreted as described in RFC 2119 [6]. 70 The benefits of using RTP for MPEG-4 data stream transport include: 72 i. Ability to synchronize MPEG-4 streams with other RTP payloads 74 ii. Monitoring MPEG-4 delivery performance through RTCP 76 iii. Combining MPEG-4 and other real-time data streams received from 77 multiple end-systems into a set of consolidated streams through RTP 78 mixers 80 iv. Converting data types, etc. through the use of RTP translators. 82 1.1 Overview of MPEG-4 End-System Architecture 84 Fig. 1 below shows the layered architecture of a terminal which 85 implements the complete MPEG-4 systems model. The Compression Layer 86 processes individual audio-visual media streams. The MPEG-4 87 compression schemes are defined in the ISO/IEC specifications 14496- 88 2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve 89 efficient encoding over a bandwidth ranging from several kbps to 90 many Mbps. The audio-visual content compressed by this layer is 91 organized into Elementary Streams (ESs). 92 The MPEG-4 standard specifies MPEG-4 compliant streams. Within the 93 constraint of this compliance the compression layer is unaware of a 94 specific delivery technology, but it can be made to react to the 95 characteristics of a particular delivery layer such as the path-MTU 96 or loss characteristics. Also, some compressors can be designed to 97 be delivery specific for implementation efficiency. In such cases 98 the compressor may work in a non-optimal fashion with delivery 99 technologies that are different than the one it is specifically 100 designed to operate with. 102 The hierarchical relations, location and properties of ESs in a 103 presentation are described by a dynamic set of Object Descriptors 104 (ODs). Each OD groups one or more ES Descriptors referring to a 105 single content item (audio-visual object). Hence, multiple 106 alternative or hierarchical representations of each content item are 107 possible. 109 Gentric et al. Expires January 2002 2 110 ODs are themselves conveyed through one or more ESs. A complete set 111 of ODs can be seen as an MPEG-4 resource or session description at a 112 stream level. The resource description may itself be hierarchical, 113 i.e. an ES conveying an OD may describe other ESs conveying other 114 ODs. 116 The session description is accompanied by a dynamic scene 117 description, Binary Format for Scene (BIFS), again conveyed through 118 one or more ESs. At this level, content is identified in terms of 119 audio-visual objects. The spatio-temporal location of each object is 120 defined by BIFS. The audio-visual content of those objects that are 121 synthetic and static are described by BIFS also. Natural and 122 animated synthetic objects may refer to an OD that points to one or 123 more ESs that carries the coded representation of the object or its 124 animation data. 126 By conveying the session (or resource) description as well as the 127 scene (or content composition) description through their own ESs, it 128 is made possible to change portions of the content composition and 129 the number and properties of media streams that carry the audio- 130 visual content separately and dynamically at well known instants in 131 time. 133 One or more initial Scene Description streams and the corresponding 134 OD stream are pointed to by an initial object descriptor (IOD). In 135 this context the IOD needs to be made available to the receivers 136 through some out-of-band means that are out of scope of this payload 137 specification. However in the context of transport on IP networks it 138 is defined in a separate document [9]. Note that for applications 139 that only use audio and/or video this payload format can also be 140 used without IOD and OD streams (decoder configuration is then 141 transported as MIME parameters, see section 4.1). 143 The Compression Layer organizes the ESs in Access Units (AU), the 144 smallest elements that can be attributed individual timestamps. The 145 Access Units concept defines the boundary between media specific 146 processing and delivery specific processing. That is to say 147 transport should not depend on the nature of the media data but only 148 on AU properties. 150 The Sync Layer (SL) that primarily provides the synchronization 151 between streams defines a homogeneous encapsulation of ESs carrying 152 media or control data (ODs, BIFS). Integer or fractional AUs are 153 then encapsulated in SL packets and in the following we will 154 describe this payload format as transporting SL packets, although in 155 many cases SL packet payloads are actually (entire) Access Units 156 payloads i.e. encoded media frames. All consecutive data from one 157 stream is called an SL-packetized stream at this layer. The 158 interface between the compression layer and the SL is called the 159 Elementary Stream Interface (ESI). The ESI is informative i.e. it is 160 extremely useful in order to define concepts and mechanisms but does 161 not have to be implemented. For the same reason this draft describes 163 Gentric et al. Expires January 2002 3 164 the transport of SL packets i.e. Access Units or fragments thereof. 165 It is important to note however that a SL stream can be configured 166 so that SL packets are reduced to the media (compressed) data and in 167 that case implementations do not need to be aware of the SL at all. 169 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 170 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 171 media unaware but delivery technology aware. It provides transparent 172 access to and delivery of content irrespective of the technologies 173 used. The interface between the SL and DMIF is called the DMIF 174 Application Interface (DAI). It offers content location independent 175 procedures for establishing MPEG-4 sessions and access to transport 176 channels. The specification of this payload format is considered as 177 a part of the MPEG-4 Delivery Layer. 179 media aware +-----------------------------------------+ 180 delivery unaware | COMPRESSION LAYER | 181 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 182 14496-3 Audio +-----------------------------------------+ 184 Elementary 185 Stream 186 ===================================================Interface 188 (ESI) 189 +-------------------------------------------+ 190 media and | SYNC LAYER | 191 delivery unaware | manages elementary streams, their synch- | 192 14496-1 Systems | ronization and hierarchical relations | 193 +-------------------------------------------+ 195 DMIF 196 Application 197 ====================================================Interface 199 (DAI) 200 +-------------------------------------------+ 201 delivery aware | DELIVERY LAYER | 202 media unaware |provides transparent access to and delivery| 203 14496-6 DMIF | of content irrespective of delivery | 204 | technology | 205 +-------------------------------------------+ 207 Figure 1: Conceptual MPEG-4 terminal architecture 209 1.2 MPEG-4 Elementary Stream Data Packetization 211 The ESs from the encoders are fed into the SL with indications of AU 212 boundaries, random access points, desired composition time and the 213 current time. 215 Gentric et al. Expires January 2002 4 216 The Sync Layer fragments the ESs into SL packets, each containing a 217 header that encodes information conveyed through the ESI. If the AU 218 is larger than a SL packet, subsequent packets containing remaining 219 parts of the AU are generated with subset headers until the complete 220 AU is packetized. 222 The syntax of the Sync Layer is configurable and can be adapted to 223 the needs of the stream to be transported. This includes the 224 possibility to select the presence or absence of individual syntax 225 elements as well as configuration of their length in bits. The 226 configuration for each individual stream is conveyed in a 227 SLConfigDescriptor, which is an integral part of the ES Descriptor 228 for this stream. The MPEG-4 SLConfigDescriptor, being configuration 229 information, is not carried by the media stream itself but is rather 230 transported via an ObjectDescriptor Stream encoded using the MPEG-4 231 Object Description framework. This can be done in a separate stream 232 using this payload format (see section 5.2 for details). The 233 SLConfigDescriptor MAY also be transported by other means (for 234 example as a parameter, see section 4.1). Finally streams for which 235 the SL packet headers are completely empty (or fully map into the 236 RTP headers) can also be transported using this payload format; in 237 these cases the Synch Layer can be seen as a purely conceptual 238 construction that does not have to be implemented at all. Since only 239 the knowledge of the decoder configuration is then needed it MAY 240 also be transported as a parameter, as described in section 4.1. 242 2. Analysis of the carriage of MPEG-4 over IP 244 When transporting MPEG-4 audio and video, applications may or may 245 not require the use of MPEG-4 systems. To achieve the highest level 246 of interoperability between all MPEG-4 applications, it is desirable 247 that (a) in both cases the same MPEG-4 transport format can be used 248 and that (b) receivers that have no MPEG-4 system knowledge can 249 easily skip the MPEG-4 system specific information, if any. 251 RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4 252 video, but when using MPEG-4 systems a problem arises from the fact 253 that both RTP and MPEG-4 systems contain a synchronization layer. 254 In particular, the RTP header duplicates some of the information 255 provided in SL packet headers such as the composition timestamps 256 (CTSs) and the marker bit that signals the end of access units. 258 To avoid unnecessary overhead and potential interoperability risks 259 when transporting MPEG-4 systems, it is desirable to remove the 260 redundancy between the SL packet header and the RTP packet header. 261 To be independent on the use of MPEG-4 systems, synchronization can 262 rely on the parameters provided in the RTP header. 264 In case SL headers are used, the redundant fields are removed from 265 the SL header, producing "reduced SL headers". 266 The remaining information from the SL header, if any, is contained 267 inside the RTP packet payload, together with the SL packet payload. 269 Gentric et al. Expires January 2002 5 270 The combination of RTP packet headers and reduced SL packet headers 271 can be used to logically map the RTP packets to complete SL packets. 273 Some of the information contained in the reduced SL headers is also 274 useful for transport over RTP when MPEG-4 systems is not used. 276 For that reason the information in the "reduced" SL headers is split 277 into "general useful information" and "MPEG-4 systems only 278 information". 280 The "general useful information" hereinafter called Mapped SL Packet 281 Header (MSLH) is carried by a number of fields configurable using 282 parameters defined in section 4.1; all receivers MUST parse these 283 fields. 285 The "MPEG-4 systems only information", if any, is contained in a 286 reduced SL header, hereinafter called Remaining SL Packet Header 287 (RSLH), also configured using parameters (see section 4.1) and 288 preceded by a length field, so that non-MPEG-4-system devices MAY 289 skip this information. 291 This is depicted in figure 2. 293 <----------SL Packet--------> 295 +---------------------------+ 296 | SL Packet | SL Packet | 297 | Header | Payload | 298 +---------------------------+ 299 | | 300 | | 301 +-------------+----------+---+ | 302 | | | | 303 V V V V 304 +-----------+ +-----------+ +-------------+ +-----------+ 305 |RTP Packet | | Mapped SL | | Remaining SL| | SL Packet | 306 | Header | | Header | | Header | | Payload | 307 +-----------+ +-----------+ +-------------+ +-----------+ 309 <----RTP Packet Payload-------------------> 311 Figure 2: Mapping of SL Packet into RTP packet 313 When the configuration is such that SL packet headers map directly 314 to RTP headers this process of mapping SL packet headers is purely 315 conceptual. For example this RTP payload format has been designed so 316 that it is by default configured to be identical to RFC 3016 for the 317 recommended MPEG-4 video configurations (see section 5.5). Hence 318 receivers that comply with this payload specification can decode 319 such RTP payload without knowledge about the Synch Layer (see the 320 example in Appendix.1). In a similar fashion MPEG-4 audio (see 322 Gentric et al. Expires January 2002 6 323 Appendix for examples) can be transported without explicit use of 324 the Synch Layer. 326 3. Payload Format 328 The RTP Payload corresponds to an integer number of SL packets. 330 If multiple SL packets are transported in each RTP packet, they MUST 331 be in decoding order, i.e: 332 i) decodingTimeStamp order, if present 333 ii) packetSequenceNumber order, if present 334 iii) Implicit decoding order in all other cases. 336 The SL Packet Headers are transformed into RSLH with some fields 337 extracted to be mapped in the RTP header and others extracted to be 338 mapped in the corresponding MSLH. The SL Packet Payload is 339 unchanged. 341 This payload format has two modes. The "SingleSL" mode is a mode 342 where a single SL packet is transported per RTP packet. The 343 "MultipleSL" mode is a mode where possibly more than one SL packet 344 are transported per RTP packet. The default mode is the Single-SL 345 mode. The mode can be set to Multiple-SL by adding a non-zero 346 ConstantSize or SizeLength parameter (see section 4.1). 348 RTP Packets SHOULD be sent in the SL stream order (as defined 349 above). In case of interleaving the first SL packet of each RTP 350 packet is used as reference as in the following examples of RTP 351 packets containing interleaved SL packets. 352 This sequence is correct: [0,2,4][1,3,5] 353 This sequence is correct: [0,3,6][1,2][4,5] 354 This sequence is correct: [0,3,6][1,4][2,5] 355 This sequence is prohibited: [0,4,2][1,5,3] 356 This sequence is prohibited: [1,3,5][0,2,4] 357 This sequence is prohibited: [0,3,6][2,5][1,4] 359 The size (or number) of the SL packet(s) SHOULD be adjusted such 360 that the resulting RTP packet is not larger than the path-MTU. To 361 handle larger packets, this payload format relies on lower layers 362 for fragmentation, which may not be desirable. 364 3.1 RTP Header Fields Usage 366 Payload Type (PT): The assignment of an RTP payload type for this 367 new packet format is outside the scope of this document, and will 368 not be specified here. It is expected that the RTP profile for a 369 particular class of applications will assign a payload type for this 370 encoding, or if that is not done then a payload type in the dynamic 371 range shall be chosen. 373 Marker (M) bit: The M bit is set to 1 when all SL packets in the RTP 374 packet are Access Units ends i.e. the M bit maps to the Synch Layer 375 accessUnitEndFlag. 377 Gentric et al. Expires January 2002 7 378 Specifically the M bit is set to 0 when the RTP packet contains one 379 or more Access Unit fragments that are not Access Unit ends, and the 380 M bit is set to 1 for RTP packets that contain either: 381 . A single complete Access Unit 382 . The last fragment of an Access Unit 383 . Several complete Access Units 384 . Several last fragments of Access Units 385 . A mix of complete Access Units and last fragments of Access Units 387 Therefore for streams where all SL packets are complete Access Units 388 the M bit is 1 for all RTP packets. 390 Extension (X) bit: Defined by the RTP profile used. 392 Sequence Number: The RTP sequence number should be generated by the 393 sender with a constant random offset and does not have to be 394 correlated to any (optional) MPEG-4 SL sequence numbers. 396 Timestamp: Set to the value in the compositionTimeStamp field of the 397 first SL packet in the RTP packet, if present. If 398 compositionTimeStamp has less than 32 bits length, the MSBs of 399 timestamp MUST be set to zero. 401 Although it is available from the SL configuration data, the 402 resolution of the timestamp may need to be conveyed explicitly 403 through some out-of-band means to be used by network elements that 404 are not MPEG-4 aware. 406 If compositionTimeStamp has more than 32 bits length, this payload 407 format cannot be used. 409 In all cases, the sender SHALL always make sure that RTP time stamps 410 are identical only for RTP packets transporting fragments of the 411 same Access Unit. 413 In case compositionTimeStamp is not present in the current SL 414 packet, but has been present in a previous SL packet the reason is 415 that this is the same Access Unit that has been fragmented, 416 therefore the same timestamp value MUST be taken as RTP timestamp. 418 If compositionTimeStamp is never present in SL packets for this 419 stream, the RTP packetizer SHOULD convey a reading of a local clock 420 at the time the RTP packet is created. 422 According to RFC1889 [5, Section 5.1] timestamps are recommended to 423 start at a random value for security reasons. However then, a 424 receiver is not in the general case able to reconstruct the original 425 MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for 426 applications where streams from multiple sources are to be 427 synchronized. Therefore the usage of such a random offset SHOULD be 428 avoided. 430 Gentric et al. Expires January 2002 8 431 Note that since RTP devices may re-stamp the stream, all time stamps 432 inside of the RTP payload (CTS and DTS in MSLH, OCR in RSLH) MUST be 433 expressed as difference to the RTP time stamp. Since this 434 subtraction may lead to negative values, the offset MUST be encoded 435 as a two's complement signed integer in network byte order. Note 436 these offsets (delta) typically require much fewer bits to be 437 encoded than the original length, which is another justification. 439 When startCompositionTimeStamp is signaled in the SLConfigDescriptor 440 the RTP time stamps MUST start with this value. 442 SSRC, CC and CSRC fields are used as described in RFC 1889 [5]. 444 RTCP SHOULD be used as defined in RFC 1889 [5]. 446 RTP timestamps in RTCP SR packets: according to the RTP timing 447 model, the RTP timestamp that is carried into an RTCP SR packet is 448 the same as the compositionTimeStamp that would be applied to an RTP 449 packet for data that was sampled at the instant the SR packet is 450 being generated and sent. The RTP timestamp value is calculated from 451 the NTP timestamp for the current time, which also goes in the RTCP 452 SR packet. To perform that calculation, an implementation needs to 453 periodically establish a correspondence between the CTS value of a 454 data packet and the NTP time at which that data was sampled. 456 3.2 RTP payload structure 458 The packet payload structure consists of 3 byte-aligned sections. 460 The first section is the MSLHSection and contains Mapped SL Packet 461 Headers (MSLH). The MSLH structure is described in 3.3. In the 462 Single-SL mode this section is empty by default. 464 The second section is the RSLHSection and contains Remaining SL 465 Headers (RSLH). The RSLH structure is described in 3.5. By default 466 this section is empty. 468 The last section (SLPPSection) contains the SL packet payloads. This 469 section is never empty. 471 The Nth MSLH in the MSLHSection, the Nth RSLH in the RSLHSection and 472 the Nth SL packet payload in the SLPPSection correspond to the Nth 473 SL packet transported by the RTP packet. 475 0 1 2 3 476 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 478 |V=2|P|X| CC |M| PT | sequence number | 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 480 | timestamp | 481 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 482 | synchronization source (SSRC) identifier | 484 Gentric et al. Expires January 2002 9 485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 486 : contributing source (CSRC) identifiers : 487 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 488 | | 489 | MSLHSection (byte aligned) | 490 | | 491 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 492 | | | 493 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 494 | | 495 | RSLHSection (byte aligned) | 496 | | 497 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 498 | | | 499 +-+-+-+-+-+-+-+-+ | 500 | | 501 | SLPPSection (byte aligned) | 502 | | 503 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 504 | :...OPTIONAL RTP padding | 505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 507 Figure 3: An RTP packet for MPEG-4 509 3.3 MSLHSection structure 511 If the MSLHSection consumes a non-integer number of bytes, up to 7 512 zero-valued padding bits MUST be inserted at the end in order to 513 achieve byte-alignment. 515 In the Single-SL mode the MSLHSection consists of a single MSLH. 517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 518 | MSLH (x bits ) : padding bits| 519 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 521 Figure 4: MSLHSection structure in Single-SL mode 523 In the Multiple-SL mode this section consist of a 2 bytes field 524 giving the size in bits (in network byte order) of the following 525 block of bit-wise concatenated MSLHs. 527 This size field is absent in the Single-SL mode not because it is 528 not needed (which would be a minor gain) but for compatibility with 529 RFC 3016. 531 This size field is also absent when the value would always be zero 532 because the MSLH is always empty, which may happen when a constant 533 size in signaled using ConstantSize. 535 0 1 2 3 537 Gentric et al. Expires January 2002 10 538 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 539 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 540 | MSLH section size in bits | MSLH | etc | 541 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 542 | as many bit-wise concatenated MSLHs | 543 | as SL packets in this RTP packet | 544 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 | : padding bits| 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 548 Figure 5: MSLHSection structure in Multiple-SL mode 550 3.4 MSLH structure 552 The Mapped SL Packet Header content depends on parameters (as 553 described in section 4.1); by default it is empty for the Single-SL 554 mode and, except when ConstantSize is signaled, contains at least 555 the PayloadSize field in the Multiple-SL mode. 557 When all options are used the MSLH structure is given in figure 6. 559 +============================+ 560 |PayloadSize | 561 +----------------------------+ 562 |Index or IndexDelta | 563 +----------------------------+ 564 |CTSFlag | 565 +----------------------------+ 566 |CTSDelta | 567 +----------------------------+ 568 |DTSFlag | 569 +----------------------------+ 570 |DTSDelta | 571 +============================+ 573 Figure 6: Mapped SL Packet Header (MSLH) structure 575 In the general case a receiver can only discover the size of a MSLH 576 by parsing it since for example the presence of CTSDelta is signaled 577 by the value of CTSFlag. 579 3.4.1 Fields of MSLH 581 PayloadSize: Indicates the size in bytes of the associated SL Packet 582 Payload, which can be found in the SLPPSection of the RTP packet. 583 The length in bits of this field is signaled by the SizeLength 584 parameter (see section 4.1). 586 There is an exception to that: when the RTP packet contains a single 587 SL packet the PayloadSize field SHALL contain the size of the entire 588 corresponding Access Unit, for two reasons, firstly the size of the 589 fragment is not needed when there is only one fragment, secondly 591 Gentric et al. Expires January 2002 11 592 this is useful in order to detect that a full Access Unit has been 593 received after the loss of a packet carrying M bit set to 1. 595 Index, IndexDelta: Encodes the packetSequenceNumber (serial number) 596 of the SL Packet. When making streams specifically for transport 597 with this payload format IndexDelta is useful for interleaving (see 598 section 3.8). Since a mapping of packetSequenceNumber to RTP 599 sequence number is not possible in the Multiple-SL mode there is no 600 requirement for a correspondence. 602 Index is optional and -if present- appears for the first SL packet 603 in a RTP packet. 605 The length in bits of the Index field is defined by the IndexLength 606 parameter (see section 4.1). 608 IndexDelta is optional and -if present- appears for subsequent (non- 609 first) SL packets in a RTP packet. 611 The length in bits of the IndexDelta field is defined by the 612 IndexDeltaLength parameter (see section 4.1). 614 If the parameter IndexDeltaLength is defined, non-first SL packets 615 inside a RTP packet have their packetSequenceNumber encoded as a 616 difference (thus the name IndexDelta). This difference is relative 617 to the previous SL packet in the RTP packet according to (with 618 i>=0): 619 packetSequenceNumber(0) = Index(0) 620 packetSequenceNumber(i+1) = packetSequenceNumber(i) + 621 IndexDelta(i+1) + 1 623 If the parameter IndexDeltaLength is not defined the default value 624 is zero and then the IndexDelta field is not present for non-first 625 SL packets. Nevertheless receivers SHALL then apply the above 626 formula with IndexDelta equal to zero. In other words by default 627 packetSequenceNumber is incremented by 1 for each SL packet in one 628 RTP packet. 630 CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A 631 value of 1 indicates that the CTSDelta field is present, a value of 632 0 that it is not present. 634 If CTSDeltaLength is not zero, CTSFlag is present in all MSLH 635 regardless of whether the SL packet is an Access Unit start or not. 637 CTSDelta (CTSDeltaLength bits): Specifies the value of the CTS as a 638 2-complement offset (delta) from the timestamp in the RTP header of 639 the RTP packet. The length in bits of each CTSDelta field is 640 specified by the CTSDeltaLength parameter (see section 4.1). 642 The CTSDelta field is present if CTSFlag is 1. 644 Gentric et al. Expires January 2002 12 645 For the first MSLH of each RTP packet CTSFlag is always 0, since the 646 composition time stamp of the first SL packet in the RTP packet is 647 mapped to the RTP time stamp. In all cases the sender MUST remove 648 the compositionTimeStamp from the RSLH. 650 DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A 651 value of 1 indicates that DTSDelta is present, a value of 0 that it 652 is not present. 654 If DTSDeltaLength is not zero, DTSFlag is present in all MSLH 655 regardless of whether the SL packet is an Access Unit start or not; 656 the receiver needs this flag in order to reconstruct the 657 decodingTimeStampFlag of SL Headers. 659 DTSDelta (DTSDeltaLength bits): encodes (compositionTimeStamp - 660 decodingTimeStamp) for the same SL packet (always positive). The 661 length in bits of each DTSDelta field is specified by the 662 DTSDeltaLength parameter (see section 4.1). 664 The DTSDelta field appears when DTSFlag is 1. The sender MUST always 665 remove the decodingTimeStamp from the RSLH. 667 If DTSDelta is zero i.e. if decodingTimeStamp equals 668 compositionTimeStamp then DTSFlag MUST be set to 0 and no DTSDelta 669 field SHALL be present. 671 3.4.2 Relationship between sizes of MSLH fields and parameters 673 The relationship between a Mapped SL Packet Header and the related 674 parameters is as follows: 676 +===========================+=================================+ 677 | Fields of MSLPH | Number of bits (parameters) | 678 +===========================+=================================+ 679 | PayloadSize | SizeLength | 680 +---------------------------+---------------------------------+ 681 | Index | IndexLength | 682 +---------------------------+---------------------------------+ 683 | IndexDelta | IndexDeltaLength | 684 +---------------------------+---------------------------------+ 685 | CTSFlag | 1 If (CTSDeltaLength > 0) | 686 +---------------------------+---------------------------------+ 687 | CTSDelta | CTSDeltaLength If (CTSFlag==1) | 688 +---------------------------+---------------------------------+ 689 | DTSFlag | 1 If (DTSDeltaLength > 0) | 690 +---------------------------+---------------------------------+ 691 | DTSDelta | DTSDeltaLength If (DTSFlag==1) | 692 +---------------------------+---------------------------------+ 694 Table 1: Relationship between MSLH field size and parameters 696 3.5 RSLHSection structure 698 Gentric et al. Expires January 2002 13 699 This section consists of a field (RSLHSectionSize) giving the size 700 in bits of the following block of bit-wise concatenated RSLHs. 702 If the section consumes a non-integer number of bytes, up to 7 zero 703 padding bits MUST be inserted at the end in order to achieve byte- 704 alignment. 706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 707 | RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable | 708 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 709 | number of bits) | 710 | | 711 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 712 | | RSLH (variable number of bits) | 713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 714 | etc | 715 | as many bit-wise concatenated RSLHs | 716 | as SL Packets in this RTP packet | 717 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 718 | RSLH (variable number of bits) | 719 | +-+-+-+-+-+-+-+ 720 | : padding bits| 721 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 723 Figure 7: RSLHSection structure 725 The length in bits of the RSLHSectionSize field is 726 RSLHSectionSizeLength and is specified with a default value of zero 727 indicating that the whole RSLHSection is absent. Compatibility with 728 RFC 3016 requires that the RSLHSection should be empty, including 729 the RSLHSectionSize field. This is the reason why there is such a 730 variable length with a default value indicating absence of the 731 RSLHSectionSize field. 733 +=================================+===============================+ 734 | Fields of RSLHSection | Number of bits | 735 +=================================+===============================+ 736 | RSLHSectionSize | RSLHSectionSizeLength | 737 +---------------------------------+-------------------------------+ 738 | all bit-wise concatenated RSLHs | RSLHSectionSize | 739 +---------------------------------+-------------------------------+ 741 Table 2: Sizes in bits inside RSLHSection 743 Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system 744 awareness, specifically it requires to understand the MPEG-4 745 Synchronization Layer (SL) syntax and the modifications to this 746 syntax described in the next section. 748 However thanks to the RSLHSectionSize field non-MPEG-4-system 749 receivers MAY skip this part by rounding up RSLPHSize/8 to the next 750 integer number of bytes. 752 Gentric et al. Expires January 2002 14 753 3.6 RSLH structure 755 A Remaining SL Packet Header (RSLH) is what remains of an SL header 756 after modifications for mapping into this payload format. 758 The following modifications of the SL packet header MUST be applied. 759 The other fields of the SL packet header MUST remain unchanged but 760 are bit-shifted to fill in the gaps left by the operations specified 761 below. 763 3.6.1 Removal of fields 765 The following SL Packet Header fields -if present- are removed since 766 they are mapped either in the RTP header or in the corresponding 767 MSLH: 768 . compositionTimeStampFlag 769 . compositionTimeStamp 770 . decodingTimeStampFlag 771 . decodingTimeStamp 772 . packetSequenceNumber 773 . AccessUnitEndFlag (in Single-SL mode only) 775 The AccessUnitEndFlag, when present for a given stream, MUST be 776 removed from every RSLH when using the Single-SL mode since it has 777 the same meaning as the Marker bit (and for compatibility with RFC 778 3016). However when using the Multiple-SL mode, AccessUnitEndFlag 779 MUST NOT be removed since it is useful to signal individual AU ends. 781 3.6.2 Mapping of OCR 783 Furthermore if the SL Packet header contains an OCR, then this field 784 is encoded in the RSLH as a 2-complement difference (delta) exactly 785 like a compositionTimeStamp or a decodingTimeStamp in the MSLH. The 786 length in bit of this difference is indicated by the OCRDeltaLength 787 parameter (see section 4.1). 789 With this payload format OCRs MUST have the same clock resolution as 790 Time Stamps. 792 If compositionTimeStamp is not present for a SL packet that has OCR 793 then the OCR SHALL be encoded as a difference to the RTP time stamp. 795 3.6.3 Degradation Priority 797 For streams that use the optional degradationPriority field in the 798 SL Packet Headers, only SL packets with the same degradation 799 priority SHALL be transported by one RTP packet so that components 800 may dispatch the RTP packets according to appropriate QOS or 801 protection schemes. Furthermore only the first RSLH of one RTP 802 packet SHALL contain the degradationPriority field since it would be 803 otherwise redundant. 805 3.7 SLPPSection structure 807 Gentric et al. Expires January 2002 15 808 The SLPPSection (SL Packet Payload Section) contains the 809 concatenated SL Packet Payloads. By definition SL Packet Payloads 810 are byte aligned. 812 For efficiency SL packets do not carry their own payload size. This 813 is not an issue for RTP packets that contain a single SL Packet. 815 However in the Multiple-SL mode the size of each SL packet payload 816 MUST be available to the receiver. 818 If the SL packet payload size is constant for a stream, the size 819 information SHOULD NOT be transported in the RTP packet. However in 820 that case it MUST be signaled using the ConstantSize parameter (see 821 section 4.1). 823 If the SL packet payload size is variable then the size of each SL 824 packet payload MUST be indicated in the corresponding MSLH. In order 825 to do so the MSLH MUST contain a PayloadSize field. The number of 826 bits on which this PayloadSize field is encoded MUST be indicated 827 using the SizeLength parameter (see section 4.1). 829 The absence of either ConstantSize or SizeLength indicates the 830 Single-SL mode i.e. that a single SL packet is transported in each 831 RTP packet for that stream. 833 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 834 | SLPP (variable number of bytes) | 835 | | 836 | | 837 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 838 | | SLPP (variable number of bytes) | 839 +-+-+-+-+-+-+-+-+-+-+-+-+-+ | 840 | | 841 | | 842 | | 843 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 844 | etc | 845 | as many byte-wise concatenated SLPPs | 846 | as SL Packets in this RTP packet | 847 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 849 Figure 8: SLPPSection structure 851 3.8 Interleaving 853 SL Packets MAY be interleaved. Senders MAY perform interleaving. 854 Receivers MUST support interleaving. 856 When interleaving of SL packets is used it SHALL be implemented 857 using the Index and IndexDelta fields of MSLH. 859 Gentric et al. Expires January 2002 16 860 The conjunction of RTP sequence number and Index, IndexDelta can 861 produce a quasi-unique identifier for each SL packet so that a 862 receiver can unambiguously reconstruct the original order even in 863 case of out-of-order packets, packet loss or duplication. 865 However implementors of receivers must take care that when 866 IndexLength is small, Index will rollover often; for that reason 867 timestamps SHOULD be used as a basis for implementation of de- 868 interleaving, i.e. the reordering algorithm should consider 869 timestamps and IndexDelta first and use Index only when CTS are not 870 available. Symmetrically senders MUST either use properly large 871 values for IndexLength or use small values only when CTS are either 872 present in MSLH or can be otherwise unambiguously computed for each 873 SL packet (for example audio streams as in Appendix.5). 875 The AUSequenceNumber field of the SL header MUST NOT be used for 876 interleaving since firstly it may collide with the Scene Description 877 Carousel usage described in section 4.1 and secondly it is not 878 visible to non-MPEG-4 system receivers. 880 3.9 Fragmentation Rules 882 This section specifies rules for senders in order to prevent media 883 decoding difficulties at the receiver end. 885 MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams 886 and SHOULD be mapped directly into RTP packets of this format with 887 two exceptions: 888 - Access Units larger than the MTU 889 - When using interleaving for better packet loss resilience. 891 In all cases Access Unit start MUST be aligned with SL packet start. 893 This section gives rules to apply when performing Access Unit 894 fragmentation. 896 Some MPEG-4 codecs define optional syntax for Access Units sub- 897 entities (fragments) that are independently decodable for error 898 resilience purposes. Examples are Video Packets for video and Error 899 Sensitivity Categories (ESC) for audio. This always corresponds to 900 specific bitstream syntax, which is signaled in the 901 DecoderSpecificInfo inside the DecoderConfig in SLConfig, and/or 902 using the corresponding parameters as described in section 4.1. 903 Therefore encoders and decoders are both aware whether they are 904 operating in such a mode or not (however since this codec 905 configuration is an opaque data block this is not explicitly 906 signaled by this payload format). 908 If not operating in such a mode it is obvious that the decoder has 909 to skip packets after a loss until an Access Unit start is received. 910 Similarly decoder implementations that do not implement robust 911 decoding of Access Units fragments have to discard all packets after 912 a packet loss until an Access Unit start is received. In the same 914 Gentric et al. Expires January 2002 17 915 way decoder implementations that do not implement re-synchronization 916 at any Access Units start have to discard all packets after a packet 917 loss until a Random Access Point Access Unit is received. These are 918 all obvious things that a good implementation would do. 920 However serious problems would arise for decoder implementations 921 that try to restart decoding after a packet loss if independently 922 decodable fragments are signaled (in the decoder configuration) but 923 the fragments actually received are not independently decodable 924 because the RTP sender has made RTP packets on different boundaries 925 than the fragments provided by the encoder (so this issue applies to 926 the interface between the encoder and the RTP sender and to the RTP 927 sender component itself), because the decoder has in general no way 928 to detect such a faulty fragment. 930 For this reason the following rules must apply to SL streams that 931 are specifically made for transport with this payload format: 933 SL packets SHOULD be codec-semantic entities in the spirit of ALF 934 i.e. either complete Access Units or fragments of Access Units that 935 are independently decodable. Specifically when a given codec has an 936 independently decodable Access Unit fragments optional syntax this 937 option SHOULD be used. 939 Furthermore when streams are generated using independently decodable 940 Access Units fragments these Access Units fragments MUST be mapped 941 one-to-one into SL packets. Consequently independently decodable 942 Access Units fragments MUST NOT be split across several SL packets 943 and therefore MUST NOT be split across several RTP packets. 945 For example an MPEG-4 audio stream encoded using the ESC syntax MUST 946 NOT split one ESC across 2 RTP packets. 948 This rule is relaxed when using MPEG-4 Video Packets for two 949 reasons: firstly Video Packets can be much larger than typical MTU 950 and secondly all Video Packets start with a specific 951 resynchronization marker that can be unambiguously detected. 952 Therefore for video streams using the Video Packet syntax Video 953 Packets MAY be split across several SL packets although it is 954 strongly RECOMMENDED to always adapt the Video Packet size to fit 955 the MTU. A Video Packet start MUST always be aligned with a SL 956 packet start, except when a GOV is present, in which case the GOV 957 and the first Video Packet of the following VOP MUST be included in 958 the same SL packet. 960 4. Types and Names 962 This section describes the MIME types and names associated with this 963 payload format. Section 4.1 is intended for registration with IANA 964 as in RFC 2048. 966 This format may require additional information about the mapping to 967 be made available to the receiver. This is done using parameters 969 Gentric et al. Expires January 2002 18 970 described in the next section. The absence of any of these fields is 971 equivalent to a field set to the default value, which is always 972 zero. The absence of any such parameters resolves into a default 973 "basic" configuration compatible with RFC3016 for MPEG-4 video. 975 In the MPEG-4 framework the SL stream configuration information is 976 carried using the Object Descriptor. For compatibility with 977 receivers that do not implement the full MPEG-4 system specification 978 this information MAY also be signaled using parameters described 979 here. When such information is present both in an Object Descriptor 980 and as a parameter of this payload format it MUST be exactly the 981 same. 983 For transport of MPEG-4 audio and video without the use of MPEG-4 984 systems, as well as to support non-MPEG-4 system receivers, it is 985 also possible to transport information on the profile and level of 986 the stream and on the decoder configuration. This is also described 987 in the next section. 989 Finally this MIME type also defines a mode parameter and a profile 990 parameter that are intended for future derivations of this payload 991 format. 993 4.1 MIME type registration 995 MIME media type name: "video" or "audio" or "application" 997 "video" SHOULD be used for MPEG-4 Visual streams (i.e. video as 998 defined in ISO/IEC 14496-2 [2] and/or graphics as defined in ISO/IEC 999 14496-1 [1]) or MPEG-4 Systems streams that convey information 1000 needed for an audio/visual presentation. 1002 "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or 1003 MPEG-4 Systems streams that convey information needed for an audio 1004 only presentation. 1006 "application" SHOULD be used for MPEG-4 Systems streams 1007 (ISO/IEC14496-1) that serve other purposes than audio/visual 1008 presentation, e.g. in some cases when MPEG-J streams are 1009 transmitted. 1011 MIME subtype name: mpeg4-generic 1013 Required parameters: none 1015 Optional parameters: 1017 Mode: 1018 The mode in which this specification is used. This specification 1019 itself defines only the default mode (Mode=default). When the mode 1020 parameter is not present the default mode SHALL be assumed. In the 1021 default mode all parameters are optional and as defined here. Other 1022 modes may be defined as needed in other RFCs. A mode MUST be a 1024 Gentric et al. Expires January 2002 19 1025 subset of this specification. Specifically when defining a mode care 1026 MUST be taken that an implementation of this specification can 1027 decode the payload format corresponding to this new mode. For this 1028 reason a mode MUST NOT specify new default values for MIME 1029 parameters and MIME parameters MUST be present (unless they have the 1030 default value) even if it is redundant in case the mode assigns 1031 fixed values. A mode may define additionally that some MIME 1032 parameters are required instead of optional, that some MIME 1033 parameters have fixed values (or ranges), and that there are rules 1034 restricting the usage (for example forbidding the carriage of 1035 multiple AU fragments in the same RTP packet). 1037 Profile: 1038 The meaning of this parameter may be defined by a mode. This is 1039 meant to be used in order to define sub-configurations of a given 1040 mode, for example the maximum delay (and therefore the size of 1041 buffers) induced by the usage of interleaving. Implementations of 1042 this specification can ignore this parameter. 1044 DTSDeltaLength: 1045 The number of bits on which the DTSDelta field is encoded in MSLH. 1046 The default value is zero and indicates the absence of DTSFlag and 1047 DTSDelta in MSLH (the stream does not transport decodingTimeStamps). 1048 A value larger than zero indicates that there is a DTSFlag in each 1049 MSLH. Since decodingTimeStamp -if present- must be encoded as a 1050 difference to the RTP time stamp, the DTSDeltaLength parameter MUST 1051 be present in order to transport decodingTimeStamps with this 1052 payload format. 1054 CTSDeltaLength: 1055 The number of bits on which the CTSDelta field is encoded in (non- 1056 first) MSLH. The default value is zero and indicates the absence of 1057 the CTSFlag and CTSDelta fields in MSLH. Non-zero values MUST NOT be 1058 signaled in the Single-SL mode. Since compositionTimeStamps �if 1059 present- must be encoded as a difference to the RTP time stamp, the 1060 CTSDeltaLength parameter MUST be present in order to transport 1061 compositionTimeStamps using this payload format (in the Multiple-SL 1062 mode). However CTSDeltaLength SHOULD be set to zero (or not 1063 signaled) for streams that have a constant Access Unit duration 1064 (which can be explicitly signaled using the DurationFlag and 1065 AccessUnitDuration field of SLConfigDescriptor). 1067 OCRDeltaLength: 1068 The number of bits on which the OCRDelta field is encoded in RSLH. 1069 The default value is zero and indicates the absence of OCR for this 1070 stream. Since objectClockReference -if present- must be encoded as a 1071 difference to the RTP time stamp, the OCRDeltaLength parameter MUST 1072 be present in order to transport objectClockReferences with this 1073 payload format. 1075 SizeLength: 1076 The number of bits on which the PayloadSize field of MSLH is 1077 encoded. The default value is zero and indicates the Single-SL mode 1079 Gentric et al. Expires January 2002 20 1080 (unless ConstantSize is present). Simultaneous presence of this 1081 parameter and ConstantSize is illegal. Either the SizeLength or 1082 ConstantSize parameter MUST be present in order to signal the 1083 Multiple-SL mode of this payload format. 1085 ConstantSize: 1086 The constant size in bytes of each SL Packet Payload for this 1087 stream. The default value is zero and indicates variable SL Packet 1088 Payload size (or the Single-SL mode if SizeLength is absent). 1089 Simultaneous presence of this parameter and SizeLength is illegal. 1090 Either the SizeLength or ConstantSize parameter MUST be present in 1091 order to signal the Multiple-SL mode of this payload format. When 1092 ConstantSize is present the PayloadSize of MSLH in the RTP packets 1093 MUST NOT be present. 1095 IndexLength: 1096 The number of bits on which the Index is encoded in the first MSLH. 1097 The default value is zero and indicates the absence of Index and 1098 IndexDelta for all MSLHs. Since packetSequenceNumber -if present- 1099 must be mapped in MSLH, the IndexLength parameter MUST be present in 1100 order to transport packetSequenceNumber with this payload format. 1102 IndexDeltaLength: 1103 The number of bits on which the IndexDelta are encoded in any non- 1104 first MSLH. The default value is zero and indicates that 1105 packetSequenceNumber MUST be incremented by one for each SL packet 1106 in the RTP packet (see section 3.5). IndexDeltaLength parameter MUST 1107 be present when using interleaving with this payload format. 1109 RSLHSectionSizeLength: 1110 The number of bits that is used to encode the RSLHSectionSize field. 1111 The default value is zero and indicates the absence of the whole 1112 RSLHSection for all RTP packets of this stream. 1114 SLConfigDescriptor: 1115 A base-64 encoding of the SLConfigDescriptor. This SHALL be the 1116 original SLConfigDescriptor and it SHALL be the same as the one 1117 transported by the OD framework, if any. 1119 profile-level-id: 1120 A decimal representation of the MPEG-4 Profile Level indication 1121 value. For audio this parameter indicates which MPEG-4 Audio tool 1122 subsets are applied to encode the audio stream and is defined in 1123 ISO/IEC 14496-1 [1]. For video this parameter indicates which MPEG-4 1124 Visual tool subsets are applied to encode the video stream and is 1125 defined in Table G-1 of ISO/IEC 14496-2 [2]. This parameter MAY be 1126 used in the capability exchange or session setup procedure to 1127 indicate MPEG-4 Profile and Level combination of which the relevant 1128 MPEG-4 media codec is capable. If this parameter is not specified 1129 its default value is 1 (Simple Profile/Level 1) for video (for 1130 compatibility with RFC 3016) and otherwise 0xFE (defined in ISO/IEC 1131 14496-1 [1] as being the generic default value). 1133 Gentric et al. Expires January 2002 21 1134 Config: 1135 A hexadecimal representation of an octet string that expresses the 1136 media payload configuration. Configuration data is mapped onto the 1137 octet string in an MSB-first basis. The first bit of the 1138 configuration data SHALL be located at the MSB of the first octet. 1139 In the last octet, zero-valued padding bits, if necessary, shall 1140 follow the configuration data. For audio streams, config is the 1141 audio object type specific decoder configuration data 1142 AudioSpecificConfig() as defined in ISO/IEC 14496-3 [3]. For video 1143 this expresses the MPEG-4 Visual configuration information, as 1144 defined in subclause 6.2.1 Start codes of ISO/IEC14496-2 [2] and the 1145 configuration information indicated by this parameter SHALL be the 1146 same as the configuration information in the corresponding MPEG-4 1147 Visual stream, except for first-half-vbv-occupancy and latter-half- 1148 vbv-occupancy, if it exists, which may vary in the repeated 1149 configuration information inside an MPEG-4 Visual stream (See 6.2.1 1150 Start codes of ISO/IEC14496-2). 1152 StreamType: 1153 The integer value that indicates the type of MPEG-4 stream that is 1154 carried; its coding corresponds to the values of the streamType as 1155 defined for the DecoderConfigDescriptor in ISO/IEC 14496-1. 1157 Encoding considerations: 1158 System bitstreams MUST be generated according to MPEG-4 System 1159 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 1160 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 1161 bitstreams MUST be generated according to MPEG-4 Visual 1162 specifications (ISO/IEC 14496-3). All SL streams MUST be generated 1163 according to MPEG-4 Sync Layer specifications (ISO/IEC 14496-1 1164 section 10), in order to read this format the SLConfigDescriptor may 1165 be required. These bitstream are binary data and MUST be encoded for 1166 non-binary transport (for Email, the Base64 encoding is sufficient). 1167 This type is also defined for transfer via RTP. The RTP packets 1168 MUST be packetized according to the RTP payload format defined in 1169 RFC . 1171 Security considerations: 1172 As in RFC . 1174 Interoperability considerations: 1175 MPEG-4 provides a large and rich set of tools for the coding of 1176 visual objects. For effective implementation of the standard, 1177 subsets of the MPEG-4 tool sets have been provided for use in 1178 specific applications. These subsets, called 'Profiles', limit the 1179 size of the tool set a decoder is required to implement. In order to 1180 restrict computational complexity, one or more 'Levels' are set for 1181 each Profile. A Profile@Level combination allows: 1182 . a codec builder to implement only the subset of the standard he 1183 needs, while maintaining interoperability with other MPEG-4 devices 1184 included in the same combination, and 1185 . checking whether MPEG-4 devices comply with the standard 1186 ('conformance testing'). 1188 Gentric et al. Expires January 2002 22 1189 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1190 by the parameter "profile-level-id". Interoperability between a 1191 sender and a receiver may be achieved by specifying the parameter 1192 "profile-level-id" in MIME content, or by arranging in the 1193 capability exchange/announcement procedure to set this parameter 1194 mutually to the same value. 1196 Published specification: 1197 The specifications for MPEG-4 streams are presented in ISO/IEC 1198 14469-1, 14469-2, and 14469-3. The RTP payload format is described 1199 in RFC . 1201 Applications that use this media type: 1202 Multimedia streaming and conferencing tools, Internet messaging and 1203 Email applications. Also trans-galactic supra-relativistic 1204 elementary particle hyperspace tunneling communication devices :-) 1206 Additional information: none 1208 Magic number(s): none 1210 File extension(s): 1211 None. A file format with the extension .mp4 has been defined for 1212 MPEG-4 content but is not directly correlated with this MIME type 1213 which sole purpose is RTP transport. 1215 Macintosh File Type Code(s): none 1217 Person & email address to contact for further information: 1218 Authors of RFC . 1220 Intended usage: COMMON 1222 Author/Change controller: 1223 Authors of RFC . 1225 4.2 Concatenation of parameters 1227 Multiple parameters SHOULD be expressed as a MIME media type string, 1228 in the form of a semicolon-separated list of parameter=value pairs 1229 (see examples in Appendix). 1231 4.3 Usage of SDP 1233 4.3.1 The a=fmtp keyword 1235 It is assumed that one typical way to transport the above-described 1236 parameters associated with this payload format is via an SDP [10] 1237 message for example transported to the client in reply to a RTSP 1238 [13] DESCRIBE message or via SAP [14]. In that case the (a=fmtp) 1239 keyword MUST be used as described in RFC 2327 [10, section 6]. The 1240 syntax being then: 1242 Gentric et al. Expires January 2002 23 1243 a=fmtp: = 1245 4.3.2 SDP example 1247 The following is an example of SDP syntax for the description of a 1248 session containing one MPEG-4 audio stream, one MPEG-4 video and 1249 three MPEG-4 system streams, the first one being BIFS, the second 1250 one OD and the third one IPMP. All are transported using this format 1251 and the AVP profile [12]. Note that the video stream DTSDelta are 1252 encoded on 4 bits in this example. See the Appendix for more 1253 examples. 1255 o= .... 1256 I= .... 1257 c=IN IP4 123.234.71.112 1259 m=video 1034 RTP/AVP 97 1260 a=fmtp:97 StreamType=4;DTSDeltaLength=4 1261 a=rtpmap:97 mpeg4-sl 1263 m=audio 810 RTP/AVP 98 1264 a=fmtp:98 StreamType=5; profile-level-id=1; config=7866E7E6EF 1265 a=rtpmap:98 mpeg4-sl 1267 m=application 1234 RTP/AVP 99 1268 a=rtpmap:99 mpeg4-sl 1269 a=fmtp:99 StreamType=3; 1271 m=application 1236 RTP/AVP 99 1272 a=rtpmap:99 mpeg4-sl 1273 a=fmtp:99 StreamType=1; 1275 m=application 1238 RTP/AVP 99 1276 a=rtpmap:99 mpeg4-sl 1277 a=fmtp:99 StreamType=7; 1279 5. Other issues 1281 5.1 SL packetized stream reconstruction 1283 The purpose of this section is to document how a receiver can 1284 reconstruct a valid SL packetized stream. Since this format directly 1285 transports SL packets this reconstruction is performed by reversing 1286 the payload structure rules (section 3). We explicitly describe here 1287 the most complex transformations. 1289 In the following let (i) be the index of SL packets inside one RTP 1290 packet (starting at zero for each RTP packet), let SLPacketHeader.x 1291 denote field x of the reconstructed SL packet header, let MSLH.x 1292 denote field x of the received MSLH, etc. 1294 SLPacketHeader.packetSequenceNumber is restored from MSLH.Index and 1295 MSLH.IndexDelta using: 1297 Gentric et al. Expires January 2002 24 1298 If ( IndexLength == 0) { // or is absent 1299 if ( SLConfig.packetSeqNumLength == 0 ) { 1300 // this stream does not have SL packet sequence number 1301 } 1302 else { 1303 // illegal, normally the sender MUST map 1304 // SLPacketHeader.packetSequenceNumber in MSLH 1305 // and set a relevant IndexLength value; 1306 // otherwise it is unfortunately impossible for the receiver 1307 // to reconstruct the correct sequence 1308 } 1309 } 1310 else { // IndexLength is not zero 1311 if ( SLConfig.packetSeqNumLength == 0 ) { 1312 // the original SL stream does not have SL packet 1313 // sequence numbers, typically the sender inserted them 1314 // in order to implement interleaving at the RTP level; 1315 // they must be ignored for SL stream reconstruction 1316 } 1317 else { 1318 if (i == 0){ // first SL packet in RTP packet 1319 SLPacketHeader.packetSequenceNumber(0) = MSLH.Index(0); 1320 } 1321 else { // remaining SL packets 1322 SLPacketHeader.packetSequenceNumber(i+1)= 1323 SLPacketHeader.packetSequenceNumber(i) 1324 + MSLH.IndexDelta(i+1) 1325 +1; 1326 } 1327 } 1329 All time stamps (CTS, DTS, OCR), when present, are restored from the 1330 delta values. Time stamps flags (CTSFlag, DTSFlag) in MSLH are used 1331 to reconstruct respectively the compositionTimeStampFlag and 1332 decodingTimeStampFlag of SLPacketHeader. 1334 if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent 1335 // CTS is not transported for this RTP stream 1336 if (i == 0){ // first SL packet in RTP packet 1337 if ( SLConfig.useTimeStamps == 1 ) { 1338 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1339 SLPacketHeader.compositionTimeStampFlag(0) = 1; 1340 SLPacketHeader.compositionTimeStamp(0) = RTP TimeStamp; 1341 } 1342 else { 1343 // ignore 1344 } 1345 } 1346 else { 1347 // empty 1348 } 1349 } 1351 Gentric et al. Expires January 2002 25 1352 else { // non-first SL packets in RTP packet 1353 if ( SLConfig.useTimeStamps == 1 ) { 1354 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1355 SLPacketHeader.compositionTimeStampFlag(i) = 0; 1356 } 1357 else { 1358 // ignore 1359 } 1360 } 1361 else { 1362 // empty 1363 } 1364 } 1365 } 1366 else { // CTSDeltaLength is not zero 1367 // CTS is transported for this stream 1368 if ( SLConfig.useTimeStamps == 1 ) { 1369 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1370 SLPacketHeader.compositionTimeStampFlag(i) = 1371 MSLH.CTSFlag(i); 1372 SLPacketHeader.compositionTimeStamp(i) = 1373 RTP TimeStamp + MSLH.CTSDelta(i); 1374 } 1375 else { 1376 // ignore CTSFlag (which must be zero) 1377 } 1378 else { 1379 // this is strange and sub-optimal at best 1380 // a receiver should ignore this 1381 } 1382 } 1384 if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent 1385 // DTS is not transported for this stream 1386 if ( SLConfig.useTimeStamps == 1 ) { 1387 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1388 SLPacketHeader.decodingTimeStampFlag(i) = 0; 1389 } 1390 else { 1391 // ignore 1392 } 1393 } 1394 else { 1395 // empty 1396 } 1397 } 1398 else { 1399 // DTS is transported for this stream 1400 if ( SLConfig.useTimeStamps == 1 ) { 1401 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1402 SLPacketHeader.decodingTimeStampFlag(i) = 1403 MSLH.DTSFlag(i); 1404 SLPacketHeader.decodingTimeStamp(i) = 1406 Gentric et al. Expires January 2002 26 1407 RTP TimeStamp + MSLH.DTSDelta(i); 1408 } 1409 else { 1410 // ignore DTSFlag (which must be zero) 1411 } 1412 } 1413 else { 1414 // this is strange and sub-optimal at best 1415 // a receiver should ignore this 1416 } 1417 } 1419 if ( OCRDeltaLength == 0) { // or OCRDeltaLength is absent 1420 // the RTP stream does not transport any OCR 1421 if ( SLConfig.OCRLenght == 0 ) { 1422 // this stream does not have any OCR 1423 } 1424 else { 1425 // illegal, normally the sender MUST detect 1426 // OCRs, replace them with OCRDelta and set 1427 // a relevant OCRDeltaLength value 1428 } 1429 } 1430 else { 1431 if ( SLConfig.OCRLenght == 0 ) { 1432 // this is strange and sub-optimal at best 1433 // a receiver should ignore this 1434 } 1435 else { 1436 SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i); 1437 if ( SLPacketHeader.OCRflag(i) == 1) { 1438 SLPacketHeader.objectClockReference(i) = 1439 RTP TimeStamp + RSLH.OCRDelta(i); 1440 } 1441 } 1442 } 1444 In the SingleSL mode the AccessUnitEndFlag, if needed, is restored 1445 from the M bit, as follows: 1447 if ( SLConfig.useAccessUnitEndFlag == 0 ) { 1448 // this SL stream does not signal access unit ends 1449 else { 1450 SLPacketHeader.AccessUnitEndFlag = M bit; 1451 } 1453 In the multipleSL mode the AccessUnitEndFlag is untouched in RSLH. 1455 The other SL packet header fields SHALL remain as found in RSLH. 1457 It is obvious that in the general case the reconstruction of the 1458 original SL packetized stream requires SL-awareness. However this 1460 Gentric et al. Expires January 2002 27 1461 payload format allows in all cases a receiver that does not know 1462 about the SL syntax to reconstruct the semantic of SL for the 1463 following very useful features: 1464 - Packet order (decoding order) 1465 - Access Unit boundaries (using the M bit) 1466 - Access Unit fragments (i.e. SL packet boundaries using 1467 MSLH.PayloadSize) 1468 - Composition Time Stamps (using the RTP Time Stamp and 1469 MSLH.CTSDelta) 1470 - Decoding Time Stamps (using the RTP Time Stamp and MSLH.DTSDelta) 1471 - Packet sequence number (using the RTP Time Sequence number and 1472 MSLH.Index) 1474 5.2 Handling of scene description streams 1476 MPEG-4 introduces new stream types as described in section 1 namely 1477 Object Descriptors and BIFS. In the following both OD and BIFS are 1478 discussed on the same basis i.e. as "scene description". 1480 Considering scene description as a "stream-able" type of content is 1481 a rather new concept and for that reasons some specific comments are 1482 needed. 1484 Typically scene descriptions are encoded in such a way that 1485 information loss would in the general case cripple the presentation 1486 beyond any hope of repair by the receiver. Still this is well suited 1487 for a number of multimedia applications were the scene is first made 1488 available via reliable channels to the client and then played. This 1489 payload format is not intended for this type of applications for 1490 which download of MPEG-4 interchange (.mp4) files is typical. 1491 However it can also be used if the RTP packets are transported using 1492 TCP or any other reliable protocol. 1494 On the other hand MPEG-4 has introduced the possibility to 1495 dynamically change the scene description by sending animation 1496 information (changes in parameters) and structural change 1497 information (updates). Since this information has to be sent in a 1498 timely fashion MPEG-4 has defined a number of techniques in order to 1499 encode the scene description in a manner that makes it behave 1500 similarly to other temporal encoding schemes such as audio and 1501 video. This payload format is intended for this usage. 1503 Note that in many cases the application will consist of first the 1504 reliable transmission of a static initial scene followed by the 1505 streaming of animations and updates. For this reason the usage of 1506 this payload format is attractive since it offers a unique solution. 1508 Senders must be aware that suitable schemes should be used when 1509 scene description streams transport sensitive configuration 1510 information. For example in case the RTP packet transporting an OD- 1511 update command would be lost, the corresponding media stream would 1512 not be accessible by the receiver. 1514 Gentric et al. Expires January 2002 28 1515 Redundancy is a possibility and may either be added by tools 1516 hierarchically higher than this payload format, e.g. by packet based 1517 FEC, re-transmission, or similar tools. In such a case, the general 1518 congestion control principles have to be observed. 1520 Since BIFS and OD streams may be modified during the session with 1521 update commands, there is a need to send both update commands and 1522 full BIFS/OD refresh. For that reason MPEG-4 defines Random Access 1523 Points (RAP) for scene description streams (OD and BIFS) where by 1524 definition a decoder can restart decoding i.e. receives a "full 1525 update" of the scene. This mechanism is called Scene and Object 1526 Description Carrousel. The AU Sequence Number field of SL Packet 1527 Header is used to support this behavior at the Synchronization 1528 Layer. When two access units are sent consecutively with the same AU 1529 Sequence Number, the second one is assumed to be a semantic 1530 repetition of the first. If a receiver starts to listen in the 1531 middle of a session or has detected losses, it can skip all received 1532 Access Units until such a RAP. The periodicity of transmission of 1533 these RAPs should be chosen/adjusted depending on the application 1534 and the network it is deployed on; i.e. exactly like Intra-coded 1535 frames for video, it is the responsibility of the sender to make 1536 sure the periodicity of RAPs is suitable. 1538 5.3 Multiplexing 1540 An advanced MPEG-4 session may involve a large number of objects 1541 that may be as many as a few hundred, transporting each ES as an 1542 individual RTP stream may not always be practical. Allocating and 1543 controlling hundreds of destination addresses for each MPEG-4 1544 session may pose insurmountable session administration problems. 1545 The input/output processing overhead at the end-points will be 1546 extremely high also. Additionally, low delay transmission of low 1547 bitrate data streams, e.g. facial animation parameters, results in 1548 extremely high header overheads. 1550 To solve these problems, MPEG-4 data transport requires a 1551 multiplexing scheme that allows selective bundling of several ESs. 1552 This is beyond the scope of the payload format defined here. 1554 The MPEG-4's Flexmux multiplexing scheme may be used for this 1555 purpose and a specific RTP payload format is being developed [11]. 1557 Another approach may be to develop a generic RTP multiplexing scheme 1558 usable for MPEG-4 data. The multiplexing scheme reported in [8] may 1559 be a candidate for this approach. 1561 For MPEG-4 applications, the multiplexing technique needs to address 1562 the following requirements: 1564 i. The ESs multiplexed in one stream can change frequently during a 1565 session. Consequently, the coding type, individual packet size and 1566 temporal relationships between the multiplexed data units must be 1567 handled dynamically. 1569 Gentric et al. Expires January 2002 29 1570 ii. The multiplexing scheme should have a mechanism to determine the 1571 ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is 1572 not a part of the SL header. 1574 iii. In general, an SL packet does not contain information about its 1575 size. The multiplexing scheme should be able to delineate the 1576 multiplexed packets whose lengths may vary from a few bytes to close 1577 to the path-MTU. 1579 5.5 Overlap with RFC 3016 1581 This payload format has been designed to have a (large) overlap with 1582 RFC 3016 [7]. The conditions for this overlap are: 1583 Conditions for RFC 3016: 1584 i. MPEG-4 video elementary streams only 1585 ii. There MUST be a single VOP or Video Packet per RTP packet (only 1586 recommended in RFC 3016) 1587 iii. The decoder configuration MUST be signaled out-of-band either 1588 using the Config mime parameter or using the OD framework 1589 Conditions for this payload format: 1590 i. No structural parameters defined (or all set to zero), i.e. 1591 Single-SL mode with empty MSLH and empty RSLH. 1592 ii. Receivers MUST be ready to accept (and ignore) video 1593 configuration headers (e.g. VOSH, VO and VOL) and visual-object- 1594 sequence-end-code transported in-band. 1596 6. Security Considerations 1598 RTP packets using the payload format defined in this specification 1599 are subject to the security considerations discussed in the RTP 1600 specification [5]. This implies that confidentiality of the media 1601 streams is achieved by encryption. Because the data compression used 1602 with this payload format is applied end-to-end, encryption may be 1603 performed on the compressed data so there is no conflict between the 1604 two operations. The packet processing complexity of this payload 1605 type (i.e. excluding media data processing) does not exhibit any 1606 significant non-uniformity in the receiver side to cause a denial- 1607 of-service threat. 1609 However, it is possible to inject non-compliant MPEG streams (Audio, 1610 Video, and Systems) to overload the receiver/decoder's buffers which 1611 might compromise the functionality of the receiver or even crash it. 1612 This is especially true for end-to-end systems like MPEG where the 1613 buffer models are precisely defined. 1615 MPEG-4 Systems supports stream types including commands that are 1616 executed on the terminal like OD commands, BIFS commands, etc. and 1617 programmatic content like MPEG-J (Java(TM) Byte Code) and 1618 ECMAScript. It is possible to use one or more of the above in a 1619 manner non-compliant to MPEG to crash or temporarily make the 1620 receiver unavailable. 1622 Gentric et al. Expires January 2002 30 1623 Authentication mechanisms can be used to validate of the sender and 1624 the data to prevent security problems due to non-compliant malignant 1625 MPEG-4 streams. 1627 A security model is defined in MPEG-4 Systems streams carrying MPEG- 1628 J access units which comprises Java(TM) classes and objects. MPEG-J 1629 defines a set of Java APIs and a secure execution model. MPEG-J 1630 content can call this set of APIs and Java(TM) methods from a set of 1631 Java packages supported in the receiver within the defined security 1632 model. According to this security model, downloaded byte code is 1633 forbidden to load libraries, define native methods, start programs, 1634 read or write files, or read system properties. 1636 Receivers can implement intelligent filters to validate the buffer 1637 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1638 ECMAScript) commands in the streams. However, this can increase the 1639 complexity significantly. 1641 7. Acknowledgements 1642 This document evolved across several years thanks to contributions 1643 from a large number of people since it is based on work within the 1644 IETF AVT working group and various ISO MPEG working groups, 1645 especially the 4-on-IP ad-hoc group in the last stages. The authors 1646 wish to thank Guido Fransceschini, Art Howarth, Dave Mackie, Dave 1647 Singer, and Stephan Wenger for their valuable comments. 1649 8. References 1651 [1] ISO/IEC 14496-1:2001 MPEG-4 Systems 1653 [2] ISO/IEC 14496-2:2001 MPEG-4 Visual 1655 [3] ISO/IEC 14496-3:2001 MPEG-4 Audio 1657 [4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework. 1659 [5] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A 1660 Transport Protocol for Real Time Applications, RFC 1889, Internet 1661 Engineering Task Force, January 1996. 1663 [6] S. Bradner, Key words for use in RFCs to Indicate Requirement 1664 Levels, RFC 2119, Internet Engineering Task Force, March 1997. 1666 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 1667 payload format for MPEG-4 Audio/Visual streams, Internet Engineering 1668 Task Force, RFC 3016. 1670 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed 1671 RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-02.txt, 1672 November 2000. 1674 Gentric et al. Expires January 2002 31 1676 [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over 1677 IP-based Protocols, work in progress, draft-singer-mpeg4-ip-02.txt, 1678 May 2001. 1680 [10] M. Handley, V. Jacobson, SDP: Session Description Protocol, RFC 1681 2327, Internet Engineering Task Force, April 1998. 1683 [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed 1684 Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt, 1685 February 2001. 1687 [12] H. Schulzrinne, RTP Profile for Audio and Video Conferences 1688 with Minimal Control, RFC 1890, Internet Engineering Task Force, 1689 January 1996. 1691 [13] H. Schulzrinne, A. Rao, R. Lanphier, Real Time Streaming 1692 Protocol, RFC 2326, Internet Engineering Task Force, April 1998. 1694 [14] M. Handley, C. Perkins, E. Whelan, Session Announcement 1695 Protocol, RFC 2974, Internet Engineering Task Force, October 2000. 1697 9. Authors' Addresses 1699 Olivier Avaro 1700 France Telecom 1701 35 A Schutzenhuttenweg 1702 60598 Frankfurt am Main 1703 Deutschland 1704 e-mail: olivier.avaro@francetelecom.fr 1706 Andrea Basso 1707 AT&T Labs Research 1708 200 Laurel Avenue 1709 Middletown, NJ 07748 1710 USA 1711 e-mail: basso@research.att.com 1713 Stephen L. Casner 1714 Packet Design, Inc. 1715 66 Willow Place 1716 Menlo Park, CA 94025 1717 USA 1718 e-mail: casner@acm.org 1720 M. Reha Civanlar 1721 AT&T Labs - Research 1722 100 Schultz Drive 1723 Red Bank, NJ 07701 1724 USA 1725 e-mail: civanlar@research.att.com 1727 Philippe Gentric 1729 Gentric et al. Expires January 2002 32 1730 Philips Digital Networks � MP4Net 1731 51 rue Carnot 1732 92156 Suresnes 1733 France 1734 e-mail: philippe.gentric@philips.com 1736 Carsten Herpel 1737 THOMSON multimedia 1738 Karl-Wiechert-Allee 74 1739 30625 Hannover 1740 Germany 1741 e-mail: herpelc@thmulti.com 1743 Zvi Lifshitz 1744 Optibase Ltd. 1745 7 Shenkar St. 1746 Herzliya 46120 1747 Israel 1748 e-mail: zvil@optibase.com 1750 Young-kwon Lim 1751 mp4cast (MPEG-4 Internet Broadcasting Solution Consortium) 1752 1001-1 Daechi-Dong Gangnam-Gu 1753 Seoul, 305-333, 1754 Korea 1755 e-mail : young@techway.co.kr 1757 Colin Perkins 1758 USC Information Sciences Institute 1759 4350 N. Fairfax Drive #620 1760 Arlington, VA 22203 1761 USA 1762 e-mail : csp@isi.edu 1764 Jan van der Meer 1765 Philips Digital Networks 1766 Cederlaan 4 1767 5600 JB Eindhoven 1768 Netherlands 1769 e-mail : jan.vandermeer@philips.com 1771 APPENDIX: Examples of usage 1773 This payload format has been designed to transport efficiently a 1774 very versatile packetization scheme: the MPEG-4 Synch Layer; as a 1775 result its complexity is larger than the average RTP payload format. 1777 Gentric et al. Expires January 2002 33 1778 For this reason this section describes a number of key examples of 1779 how this payload format can be used. 1781 A C++-like syntax called SDL (Syntactic Description Language) 1782 defined in [1, section 14] is used to economically describe MPEG-4 1783 system data structures. 1785 However, as discussed in section 2, this payload format can also be 1786 used without explicit knowledge of SL (logically equivalent to 1787 configuring the SL headers as being empty), several examples 1788 (Appendix 1,3,4,5) cover this case. 1790 Furthermore these examples assume that the (a=fmtp) SDP syntax is 1791 used to convey the MIME parameters of the payload format. 1793 Appendix.1 RFC 3016 compatible MPEG-4 Video (no SL) 1795 This is an example of a video stream where the SL is configured to 1796 produce RTP packets compatible with RFC 3016. 1798 SLConfigDescriptor 1800 In this example the SLConfigDescriptor is: 1802 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1803 tag=SLConfigDescrTag { 1804 bit(8) predefined; 1805 if (predefined==0) { 1806 bit(1) useAccessUnitStartFlag; = 0 1807 bit(1) useAccessUnitEndFlag; = 1 1808 bit(1) useRandomAccessPointFlag; = 0 1809 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1810 bit(1) usePaddingFlag; = 0 1811 bit(1) useTimeStampsFlag; = 0 1812 bit(1) useIdleFlag; = 0 1813 bit(1) durationFlag; = 0 1814 bit(32) timeStampResolution; = 0 1815 bit(32) OCRResolution; = 0 1816 bit(8) timeStampLength; = 0 1817 bit(8) OCRLength; = 0 1818 bit(8) AU_Length; = 0 1819 bit(8) instantBitrateLength; = 0 1820 bit(4) degradationPriorityLength; = 0 1821 bit(5) AU_seqNumLength; = 0 1822 bit(5) packetSeqNumLength; = 0 1823 bit(2) reserved=0b11; 1824 } 1825 if (durationFlag) { 1826 bit(32) timeScale; // NOT USED 1827 bit(16) accessUnitDuration; // NOT USED 1828 bit(16) compositionUnitDuration; // NOT USED 1829 } 1830 if (!useTimeStampsFlag) { 1832 Gentric et al. Expires January 2002 34 1833 bit(timeStampLength) startDecodingTimeStamp; = 0 1834 bit(timeStampLength) startCompositionTimeStamp; = 0 1835 } 1836 } 1838 SL Packet Header structure 1840 With this configuration we have the following SL packet header 1841 structure: 1843 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1844 if (SL.useAccessUnitEndFlag) { 1845 bit(1) accessUnitEndFlag; // 1 bit 1846 } 1847 } 1849 In this case this payload produces RTP packets that are exactly 1850 conformant to RFC 3016 and the Synch Layer is reduced to a purely 1851 logical construction that neither sender nor receiver need to 1852 implement. 1854 Parameters 1856 This configuration is the default one; no parameters are required. 1858 RTP packet structure 1860 Note that accessUnitEndFlag is mapped to the RTP header M bit. 1862 +=========================================+=============+ 1863 | Field | size | 1864 +=========================================+=============+ 1865 | RTP header | - | 1866 +-----------------------------------------+-------------+ 1867 | SL packet payload | 1400 bytes | 1868 +-----------------------------------------+-------------+ 1870 Overhead 1872 In this example we have an RTP overhead of 40 bytes for 1400 bytes 1873 of payload i.e. 3 % overhead. 1875 Appendix.2 MPEG-4 Video with SL 1877 Let us consider the case of a 30 frames per second MPEG-4 video 1878 stream which bit rate is high enough that Access Units have to be 1879 split in several SL packets (typically above 300 kb/s). 1881 Let us assume also that the video codec generates in that case Video 1882 Packets suitable to fit in one SL packet i.e that the video codec is 1883 MTU aware and the MTU is 1500 bytes. We assume furthermore that this 1884 stream contains B frames and that decodingTimeStamps are present. 1886 Gentric et al. Expires January 2002 35 1887 SLConfigDescriptor 1889 In this example the SLConfigDescriptor is: 1891 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1892 tag=SLConfigDescrTag { 1893 bit(8) predefined; 1894 if (predefined==0) { 1895 bit(1) useAccessUnitStartFlag; = 1 1896 bit(1) useAccessUnitEndFlag; = 0 1897 bit(1) useRandomAccessPointFlag; = 1 1898 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1899 bit(1) usePaddingFlag; = 0 1900 bit(1) useTimeStampsFlag; = 1 1901 bit(1) useIdleFlag; = 0 1902 bit(1) durationFlag; = 0 1903 bit(32) timeStampResolution; = 30 1904 bit(32) OCRResolution; = 0 1905 bit(8) timeStampLength; = 32 1906 bit(8) OCRLength; = 0 1907 bit(8) AU_Length; = 0 1908 bit(8) instantBitrateLength; = 0 1909 bit(4) degradationPriorityLength; = 0 1910 bit(5) AU_seqNumLength; = 0 1911 bit(5) packetSeqNumLength; = 0 1912 bit(2) reserved=0b11; 1913 } 1914 if (durationFlag) { 1915 bit(32) timeScale; // NOT USED 1916 bit(16) accessUnitDuration; // NOT USED 1917 bit(16) compositionUnitDuration; // NOT USED 1918 } 1919 if (!useTimeStampsFlag) { 1920 bit(timeStampLength) startDecodingTimeStamp; // NOT USED 1921 bit(timeStampLength) startCompositionTimeStamp; // NOT USED 1922 } 1923 } 1925 The useRandomAccessPointFlag is set so that the 1926 randomAccessPointFlag can indicate that the corresponding SL packet 1927 contains a GOV and the first Video Packet of an Intra coded frame. 1929 SL Packet Header structure 1931 With this configuration we have the following SL packet header 1932 structure: 1934 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1935 bit(1) accessUnitStartFlag; // 1 bit 1936 if (accessUnitStartFlag) { 1937 bit(1) randomAccessPointFlag; // 1 bit 1938 bit(1) decodingTimeStampFlag; // 1 bit 1939 bit(1) compositionTimeStampFlag; // 1 bit 1941 Gentric et al. Expires January 2002 36 1942 if (decodingTimeStampFlag) { 1943 bit(SL.timeStampLength) decodingTimeStamp; 1944 } 1945 if (compositionTimeStampFlag) { 1946 bit(SL.timeStampLength) compositionTimeStamp; 1947 } 1948 } 1950 Parameters 1952 decodingTimeStamps are encoded on 32 bits, which is much more than 1953 needed for delta. Therefore the sender will use DTSDeltaLength to 1954 signal that only 7 bits are used for the coding of relative DTS in 1955 the RTP packet. 1957 The RSLHSectionSize cannot exceed 2 bits, which is encoded on 2 bits 1958 and signaled by RSLHSectionSizeLength. The resulting concatenated 1959 fmtp line is: 1961 a=fmtp: DTSDeltaLength=7;RSLHSectionSizeLength=3 1963 RTP packet structure 1965 Two cases can occur; for packets that transport first fragments of 1966 Access Units we have: 1968 +=========================================+=============+ 1969 | Field | size | 1970 +=========================================+=============+ 1971 | RTP header | - | 1972 +-----------------------------------------+-------------+ 1973 | DTSFlag = 1 | 1 bit | 1974 +-----------------------------------------+-------------+ 1975 | DTSDelta | 7 bits | 1976 +-----------------------------------------+-------------+ 1977 | bits to byte alignment | 0 bits | 1978 +-----------------------------------------+-------------+ 1979 | RSLHSectionSize = 4 | 3 bits | 1980 +-----------------------------------------+-------------+ 1981 | accessUnitStartFlag = 1 | 1 bit | 1982 +-----------------------------------------+-------------+ 1983 | randomAccessPointFlag | 1 bit | 1984 +-----------------------------------------+-------------+ 1985 | decodingTimeStampFlag | 1 bit | 1986 +-----------------------------------------+-------------+ 1987 | compositionTimeStampFlag | 1 bit | 1988 +-----------------------------------------+-------------+ 1989 | bits to byte alignment | 1 bit | 1990 +-----------------------------------------+-------------+ 1991 | SL packet payload | N bytes | 1992 +-----------------------------------------+-------------+ 1994 Gentric et al. Expires January 2002 37 1995 For packets that transport non-first fragments of Access Units we 1996 have: 1998 +=========================================+=============+ 1999 | Field | size | 2000 +=========================================+=============+ 2001 | RTP header | - | 2002 +-----------------------------------------+-------------+ 2003 | DTSFlag = 0 | 1 bit | 2004 +-----------------------------------------+-------------+ 2005 | bits to byte alignment | 7 bits | 2006 +-----------------------------------------+-------------+ 2007 | RSLHSectionSize = 1 | 3 bits | 2008 +-----------------------------------------+-------------+ 2009 | accessUnitStartFlag = 0 | 1 bit | 2010 +-----------------------------------------+-------------+ 2011 | bits to byte alignment | 4 bits | 2012 +-----------------------------------------+-------------+ 2013 | SL packet payload | N bytes | 2014 +-----------------------------------------+-------------+ 2016 Overhead estimation 2018 In this example we have a RTP overhead of 40 + 2 bytes for 1400 2019 bytes of payload i.e. 3 % overhead. 2021 Appendix.3 Low delay MPEG-4 Audio (no SL) 2023 This example is for a low delay audio service. For this reason a 2024 single SL packet is transported in each RTP packet. Actually each SL 2025 packet contains a complete Access Unit. 2027 SLConfigDescriptor 2029 Since CTS=DTS and Access Unit duration is constant signaling of 2030 MPEG-4 time stamps is not needed (the durationFlag of SLConfig is 2031 set) 2033 We also assume here an audio Object Type for which all Access Units 2034 are Random Access Points, which is signaled using the 2035 hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. 2037 We assume furthermore a mode where the Access Unit size is constant 2038 and equal to 5 bytes (which is signaled with AU_Length). 2040 In this example the SLConfigDescriptor is: 2042 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2043 tag=SLConfigDescrTag { 2044 bit(8) predefined; 2045 if (predefined==0) { 2046 bit(1) useAccessUnitStartFlag; = 0 2047 bit(1) useAccessUnitEndFlag; = 0 2049 Gentric et al. Expires January 2002 38 2050 bit(1) useRandomAccessPointFlag; = 0 2051 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 2052 bit(1) usePaddingFlag; = 0 2053 bit(1) useTimeStampsFlag; = 0 2054 bit(1) useIdleFlag; = 0 2055 bit(1) durationFlag; = 1 // signals constant AU duration 2056 bit(32) timeStampResolution; = 0 2057 bit(32) OCRResolution; = 0 2058 bit(8) timeStampLength; = 0 2059 bit(8) OCRLength; = 0 2060 bit(8) AU_Length; = 5 2061 bit(8) instantBitrateLength; = 0 2062 bit(4) degradationPriorityLength; = 0 2063 bit(5) AU_seqNumLength; = 0 2064 bit(5) packetSeqNumLength; = 0 2065 bit(2) reserved=0b11; 2066 } 2067 if (durationFlag) { 2068 bit(32) timeScale; = 1000 // for milliseconds 2069 bit(16) accessUnitDuration; = 10 // ms 2070 bit(16) compositionUnitDuration; = 10 // ms 2071 } 2072 if (!useTimeStampsFlag) { 2073 bit(timeStampLength) startDecodingTimeStamp; = 0 2074 bit(timeStampLength) startCompositionTimeStamp; = 0 2075 } 2076 } 2078 SL packet header 2080 With this configuration the SL packet header is empty. The Synch 2081 Layer is reduced to a purely logical construction that neither 2082 sender nor receiver need to implement. 2084 Parameters 2086 No parameters are required. 2088 RTP packet structure 2090 Note that the RTP header M bit should be always set to 1. 2092 +=========================================+=============+ 2093 | Field | size | 2094 +=========================================+=============+ 2095 | RTP header | - | 2096 +-----------------------------------------+-------------+ 2097 | SL packet payload | 5 bytes | 2098 +-----------------------------------------+-------------+ 2100 Overhead estimation 2102 Gentric et al. Expires January 2002 39 2103 The overhead is extremely large i.e. more than 800 %, since 40 bytes 2104 of headers are required to transport 5 bytes of data. Note however 2105 that RTP header compression would work well since time stamps 2106 increments are constant. 2108 Appendix.4 Media delivery MPEG-4 Audio (no SL) 2110 This example is for a media delivery service where delay is not an 2111 issue but efficiency is. In this case several SL Packets are 2112 transported in each RTP packet. 2114 SLConfigDescriptor 2116 Similar to previous example. 2118 SL packet header 2120 With this configuration the SL packet header is empty. The Synch 2121 Layer is reduced to a purely logical construction that neither 2122 sender nor receiver need to implement. 2124 Parameters 2126 The absence of RSLHSectionSizeLength indicates that the RSLHSection 2127 is empty. 2129 The size of SL Packets (which are all complete Access Units in this 2130 case) is constant and is indicated with: 2132 a=fmtp: ConstantSize=5 2134 This also indicates to the receiver that the Multiple-SL mode will 2135 be used, the 2 bytes field that would give the size of the 2136 MSLHSection is ommited since in this case this field always contains 2137 zero (the MSLHSection is always empty). 2139 RTP packet structure 2141 Note that the RTP header M bit is always set to 1, which indicates 2142 to the receiver that only complete Access Units are transported. 2144 +=========================================+=============+ 2145 | Field | size | 2146 +=========================================+=============+ 2147 | RTP header | - | 2148 +-----------------------------------------+-------------+ 2149 | SL packet payload | 5 bytes | 2150 +-----------------------------------------+-------------+ 2151 | SL packet payload | 5 bytes | 2153 Gentric et al. Expires January 2002 40 2154 +-----------------------------------------+-------------+ 2155 | etc, until MTU is reached | 2156 +-----------------------------------------+-------------+ 2157 | SL packet payload | 5 bytes | 2158 +-----------------------------------------+-------------+ 2160 Overhead estimation 2162 The overhead is 3% i.e. minimal. 2164 Appendix.5 AAC with interleaving (no SL) 2166 Let us consider AAC at 128 kb/s where each Access Unit is in the 2167 average 320 bytes. Interleaving is applied with a continuous 2168 interleaving scheme (see table below) where 4 Access Units are used 2169 to construct each RTP packet in order to match a MTU of 1500 bytes. 2171 IndexDelta is constant and equal to 2 (since +1 is automatically 2172 added); it is encoded on 3 bits. 2174 Index (being encoded on 3 bits) rolls over very fast and is not very 2175 useful for reordering. However this a case as explained in section 2176 3.8 where time stamps should be used for de-interleaving; receivers 2177 know that each SL packet is a complete Access Unit because all RTP 2178 packets have the M bit set to 1 and therefore, since Access Unit 2179 duration is constant, Access Unit timestamps can be computed from 2180 RTP timestamps and IndexDelta values; this can be used for de- 2181 interleaving even in case of losses. 2183 +-----------------------------------------------------------------+ 2184 | RTP packet | RTP Timestamp | Aus | Index,IndexDelta | 2185 +-----------------------------------------------------------------+ 2186 | 1 | CTS(AU1) | 1 | 1 | 2187 +-----------------------------------------------------------------+ 2188 | 2 | CTS(AU2) | 2, 5 | 2,2 | 2189 +-----------------------------------------------------------------+ 2190 | 3 | CTS(AU3) | 3, 6, 9 | 3,2,2 | 2191 +-----------------------------------------------------------------+ 2192 | 4 | CTS(AU4) | 4, 7,10,13 | 4,2,2,2 | 2193 +-----------------------------------------------------------------+ 2194 | 5 | CTS(AU8) | 8,11,14,17 | 0,2,2,2 | 2195 +-----------------------------------------------------------------+ 2196 | 6 | CTS(AU12) | 12,15,18,21 | 4,2,2,2 | 2197 +-----------------------------------------------------------------+ 2198 | 7 | CTS(AU16) | 16,19,22,25 | 0,2,2,2 | 2199 +----------------------------------------------------------------+ 2200 | 8 | CTS(AU20) | 20,23,26,29 | 4,2,2,2 | 2201 +-----------------------------------------------------------------+ 2202 | 9 | CTS(AU24) | 24,27,30,33 | 0,2,2,2 | 2203 +-----------------------------------------------------------------+ 2204 | 10 | CTS(AU28) | 28,31,34,37 | 4,2,2,2 | 2206 Gentric et al. Expires January 2002 41 2207 +-----------------------------------------------------------------+ 2208 | etc | 2209 +-----------------------------------------------------------------+ 2211 SLConfigDescriptor 2213 Similar to previous example. 2215 SL Packet Header 2217 Similar to previous example (empty). 2219 Parameters 2221 The resulting concatenated fmtp line is: 2223 a=fmtp: SizeLength=13;IndexLength=3;IndexDeltaLength=3 2225 RTP packet structure 2227 +=========================================+=============+ 2228 | Field | size | 2229 +=========================================+=============+ 2230 | RTP header | - | 2231 +-----------------------------------------+-------------+ 2232 MSLHSection 2233 +=========================================+=============+ 2234 | MSLHSection size in bits = 135 | 2 bytes | 2235 +-----------------------------------------+-------------+ 2236 | PayloadSize | 13 bits | 2237 +-----------------------------------------+-------------+ 2238 | Index | 3 bits | 2239 +-----------------------------------------+-------------+ 2240 | PayloadSize | 13 bits | 2241 +-----------------------------------------+-------------+ 2242 | IndexDelta | 3 bits | 2243 +-----------------------------------------+-------------+ 2244 | PayloadSize | 13 bits | 2245 +-----------------------------------------+-------------+ 2246 | IndexDelta | 3 bits | 2247 +-----------------------------------------+-------------+ 2248 | PayloadSize | 13 bits | 2249 +-----------------------------------------+-------------+ 2250 | IndexDelta | 3 bits | 2251 +-----------------------------------------+-------------+ 2252 | bits to byte alignment | 0 bits | 2253 +-----------------------------------------+-------------+ 2254 SLPPSection 2255 +=========================================+=============+ 2256 | AAC Access Unit | x bytes | 2257 +-----------------------------------------+-------------+ 2258 | AAC Access Unit | x bytes | 2259 +-----------------------------------------+-------------+ 2261 Gentric et al. Expires January 2002 42 2262 | AAC Access Unit | x bytes | 2263 +-----------------------------------------+-------------+ 2264 | AAC Access Unit | x bytes | 2265 +-----------------------------------------+-------------+ 2267 Overhead estimation 2269 The MSLHSection is 8 bytes; in this example we have therefore a RTP 2270 overhead of 40 + 8 bytes for 1400 bytes (approx) of payload i.e. 2271 around 4 % overhead. 2273 Appendix.6 A more complex case: AAC with interleaving and SL 2275 Let us consider AAC around 130 kb/s where each Access Unit is split 2276 in 4 SL packets corresponding to Error Sensitivity Categories (ESC) 2277 of maximum 90 bytes for which interleaving is very useful in terms 2278 of error resilience. We thus use an interleaving scheme where 15 SL 2279 Packets (extracted from 15 consecutive Access Units) are used to 2280 construct each RTP packet in order to match a MTU of 1500 bytes. 2281 Note that since ESC fragments are not byte aligned we also use the 2282 paddingFlag and paddingBits features of the Synch Layer. 2284 The interleaving sequence is 4 RTP packets and 350 ms long, which is 2285 too long for conferencing but perfectly OK for Internet radio. 2287 Since the sequence contains 60 SL packets, the sequence number can 2288 be encoded on 6 bits. However 2 bits are actually enough if the 2289 sender always resets the SL packet sequence number to zero at the 2290 start of each sequence, since only the first MSLH in each of the 4 2291 RTP packets in the sequence carries an absolute sequence number 2292 value (0,1,2,3). 2294 2 bits are also enough for IndexDelta, which is constant and equal 2295 to 3 (since +1 is automatically added). 2297 Note that the 4th RTP packet in each sequence has its M bit set to 1 2298 since it contains 15 SL packets transporting the end of 15 2299 consecutive Access Units. 2301 With this scheme a sender (for example upon reception of RTCP 2302 reports indicating high loss rates) can (for example) choose to 2303 duplicate for each interleaving sequence the first RTP packet that 2304 contains the most useful data in terms of ESC or apply other error 2305 protection techniques, with due care to congestion issues. 2307 In this example we will also show several other SL features (OCR, AU 2308 boundary flags, padding, as detailed below). 2310 One feature demonstrated by this example is the degradation 2311 priority. We assume degradation priority can take 4 different 2313 Gentric et al. Expires January 2002 43 2314 values, mapped to Error Sensitivity Categories, and is encoded on 2 2315 bits. This interleaving scheme makes sure that only SL packets of 2316 identical degradation priorities are grouped in the same RTP packet 2317 (3.6.3) and that only the first RSLH of each RTP packet transports 2318 the degradation priority. 2320 We also assume that for each last SL packet of each RTP packet the 2321 server inserts an OCR. 2323 SLConfigDescriptor 2325 In this example the SLConfigDescriptor is: 2327 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2328 tag=SLConfigDescrTag { 2329 bit(8) predefined; 2330 if (predefined==0) { 2331 bit(1) useAccessUnitStartFlag; = 1 2332 bit(1) useAccessUnitEndFlag; = 1 2333 bit(1) useRandomAccessPointFlag; = 0 2334 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 2335 bit(1) usePaddingFlag; = 1 // we need to signal padding bits 2336 bit(1) useTimeStampsFlag; = 0 2337 bit(1) useIdleFlag; = 0 2338 bit(1) durationFlag; = 1 2339 bit(32) timeStampResolution; = 0 2340 bit(32) OCRResolution; = 30 2341 bit(8) timeStampLength; = 0 2342 bit(8) OCRLength; = 32 2343 bit(8) AU_Length; = 0 2344 bit(8) instantBitrateLength; = 0 2345 bit(4) degradationPriorityLength; = 2 2346 bit(5) AU_seqNumLength; = 0 2347 bit(5) packetSeqNumLength; = 6 2348 bit(2) reserved=0b11; 2349 } 2350 if (durationFlag) { 2351 bit(32) timeScale; = 1000// milliseconds 2352 bit(16) accessUnitDuration; = 23.22 // ms 2353 bit(16) compositionUnitDuration; = 23.22 // ms 2354 } 2355 if (!useTimeStampsFlag) { 2356 bit(timeStampLength) startDecodingTimeStamp; = 0 2357 bit(timeStampLength) startCompositionTimeStamp; = 0 2358 } 2359 } 2361 SL Packet Header structure 2363 With this configuration we have the following SL packet header 2364 structure: 2366 Gentric et al. Expires January 2002 44 2367 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2368 bit(1) accessUnitStartFlag; 2369 bit(1) accessUnitEndFlag; 2370 bit(1) OCRflag; 2371 bit(1) paddingFlag; 2372 if (paddingFlag) bit(3) paddingBits; 2373 bit(SL.packetSeqNumLength) packetSequenceNumber; 2374 bit(1) DegPrioflag; 2375 if (DegPrioflag) { 2376 bit(SL.degradationPriorityLength) degradationPriority;} 2377 if (OCRflag) { 2378 bit(SL.OCRLength) objectClockReference;} 2379 } 2380 } 2382 Parameters 2384 The RSLHSectionSize cannot exceed 2 bits, which is encoded on 2 bits 2385 and signaled by RSLHSectionSizeLength. 2387 The resulting concatenated fmtp line is: 2389 a=fmtp: 2390 SizeLength=6;RSLHSectionSizeLength=2;IndexLength=2;IndexDeltaLength= 2391 2;OCRDeltaLength=16 2393 RTP packet structure 2395 +=========================================+=============+ 2396 | Field | size | 2397 +=========================================+=============+ 2398 | RTP header | - | 2399 +-----------------------------------------+-------------+ 2400 MSLHSection 2401 +=========================================+=============+ 2402 | MSLHSection size in bits = 135 | 2 bytes | 2403 +-----------------------------------------+-------------+ 2404 | PayloadSize | 7 bits | 2405 +-----------------------------------------+-------------+ 2406 | Index = 0 or 1 or 2 or 3 | 2 bits | 2407 +-----------------------------------------+-------------+ 2408 | PayloadSize | 7 bits | 2409 +-----------------------------------------+-------------+ 2410 | IndexDelta = 3 | 2 bits | 2411 +-----------------------------------------+-------------+ 2412 | etc + 12 times 9 bits | 2413 +-----------------------------------------+-------------+ 2414 | PayloadSize | 7 bits | 2415 +-----------------------------------------+-------------+ 2416 | IndexDelta = 3 | 2 bits | 2417 +-----------------------------------------+-------------+ 2418 | bits to byte alignment | 7 bits | 2419 +-----------------------------------------+-------------+ 2421 Gentric et al. Expires January 2002 45 2422 RSLHSection 2423 +=========================================+=============+ 2424 | RSLHSectionSize | 6 bits | 2425 +-----------------------------------------+-------------+ 2426 | accessUnitStartFlag | 1 bit | 2427 +-----------------------------------------+-------------+ 2428 | accessUnitEndFlag | 1 bit | 2429 +-----------------------------------------+-------------+ 2430 | OCRFlag = 0 | 1 bit | 2431 +-----------------------------------------+-------------+ 2432 | paddingFlag = 1 | 1 bit | 2433 +-----------------------------------------+-------------+ 2434 | paddingBits | 3 bits | 2435 +-----------------------------------------+-------------+ 2436 | DegPrioflag = 1 | 1 bit | 2437 +-----------------------------------------+-------------+ 2438 | degradationPriority | 2 bits | 2439 +-----------------------------------------+-------------+ 2440 | accessUnitStartFlag | 1 bit | 2441 +-----------------------------------------+-------------+ 2442 | accessUnitEndFlag | 1 bit | 2443 +-----------------------------------------+-------------+ 2444 | OCRFlag = 0 | 1 bit | 2445 +-----------------------------------------+-------------+ 2446 | paddingFlag = 1 | 1 bit | 2447 +-----------------------------------------+-------------+ 2448 | paddingBits | 3 bits | 2449 +-----------------------------------------+-------------+ 2450 | DegPrioflag = 0 | 1 bit | 2451 +-----------------------------------------+-------------+ 2452 | etc + 12 times 8 bits | 2453 +-----------------------------------------+-------------+ 2454 | accessUnitStartFlag | 1 bit | 2455 +-----------------------------------------+-------------+ 2456 | accessUnitEndFlag | 1 bit | 2457 +-----------------------------------------+-------------+ 2458 | OCRFlag = 1 | 1 bit | 2459 +-----------------------------------------+-------------+ 2460 | OCRDelta | 16 bits | 2461 +-----------------------------------------+-------------+ 2462 | paddingFlag = 0 | 1 bit | 2463 +-----------------------------------------+-------------+ 2464 | DegPrioflag = 0 | 1 bit | 2465 +-----------------------------------------+-------------+ 2466 | bits to byte alignment | 5 bits | 2467 +-----------------------------------------+-------------+ 2468 SLPPSection 2469 +=========================================+=============+ 2470 | SL packet payload |max 90 bytes | 2471 +-----------------------------------------+-------------+ 2472 | etc + 13 SL packets | 2473 +-----------------------------------------+-------------+ 2474 | SL packet payload |max 90 bytes | 2476 Gentric et al. Expires January 2002 46 2477 +-----------------------------------------+-------------+ 2479 Note that in the above table the last SL packet in the RTP packet 2480 has a payload that is byte-aligned (at the end). When this happens 2481 paddingFlag is set to zero and the paddingBits field is omitted. 2483 Overhead estimation 2485 The MSLHSection is 19 bytes, the RSLHSection is 16 bytes; in this 2486 example we have therefore a RTP overhead of 40 + 35 bytes for 1350 2487 bytes (max) of payload i.e. around 6 % overhead. 2489 Gentric et al. Expires January 2002 47