idnits 2.17.1 draft-gentric-avt-mpeg4-multisl-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 4 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 39 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 178 has weird spacing: '... media unawa...' == Line 1861 has weird spacing: '...dicated with:...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 2001) is 8405 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '10' is defined on line 1421, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '7') (Obsoleted by RFC 6416) == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-02 == Outdated reference: A later version (-04) exists of draft-singer-mpeg4-ip-01 -- Possible downref: Normative reference to a draft: ref. '9' ** Obsolete normative reference: RFC 2327 (ref. '10') (Obsoleted by RFC 4566) == Outdated reference: A later version (-03) exists of draft-curet-avt-rtp-mpeg4-flexmux-00 -- Possible downref: Normative reference to a draft: ref. '11' ** Obsolete normative reference: RFC 1890 (ref. '12') (Obsoleted by RFC 3551) Summary: 9 errors (**), 0 flaws (~~), 10 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Avaro-France Telecom 2 Internet Draft Basso-AT&T 3 Casner-Packet Design 4 Civanlar-AT&T 5 Gentric-Philips 6 Herpel-Thomson 7 Lifshitz-Optibase 8 Lim-mp4cast 9 Perkins-ISI 10 van der Meer-Philips 11 April 2001 12 Expires Oct. 2001 13 Document: draft-gentric-avt-mpeg4-multisl-03.txt 15 RTP Payload Format for MPEG-4 Streams 17 Status of this Memo 19 This document is an Internet-Draft and is in full conformance with 20 all provisions of Section 10 of RFC2026. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. Internet-Drafts are draft documents valid for a maximum of 26 six months and may be updated, replaced, or obsoleted by other 27 documents at any time. It is inappropriate to use Internet- Drafts 28 as reference material or to cite them other than as "work in 29 progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 Abstract 38 This document describes a payload format for transporting MPEG-4 39 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 40 the coding of natural and synthetic audio-visual data. Several 41 services provided by RTP are beneficial for MPEG-4 encoded data 42 transport over the Internet. Additionally, the use of RTP makes it 43 possible to synchronize MPEG-4 data with other real-time data types. 45 This specification is a product of the Audio/Video Transport working 46 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 47 ad hoc group on MPEG-4 over Internet. Comments are solicited and 48 should be addressed to the working group's mailing list at rem- 49 conf@es.net and/or the authors. 51 Gentric et al. Expires October 2001 1 52 1. Introduction 54 MPEG-4 is a recent standard from ISO/IEC for the coding of natural 55 and synthetic audio-visual data in the form of audiovisual objects 56 that are arranged into an audiovisual scene by means of a scene 57 description [1][2][3][4]. This draft specifies an RTP [5] payload 58 format for transporting MPEG-4 encoded data streams. 60 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 61 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 62 this document are to be interpreted as described in RFC 2119 [6]. 64 The benefits of using RTP for MPEG-4 data stream transport include: 66 i. Ability to synchronize MPEG-4 streams with other RTP payloads 68 ii. Monitoring MPEG-4 delivery performance through RTCP 70 iii. Combining MPEG-4 and other real-time data streams received from 71 multiple end-systems into a set of consolidated streams through RTP 72 mixers 74 iv. Converting data types, etc. through the use of RTP translators. 76 1.1 Overview of MPEG-4 End-System Architecture 78 Fig. 1 below shows the general layered architecture of MPEG-4 79 terminals. The Compression Layer processes individual audio-visual 80 media streams. The MPEG-4 compression schemes are defined in the 81 ISO/IEC specifications 14496-2 [2] and 14496-3 [3]. The compression 82 schemes in MPEG-4 achieve efficient encoding over a bandwidth 83 ranging from several kbps to many Mbps. The audio-visual content 84 compressed by this layer is organized into Elementary Streams (ESs). 85 The MPEG-4 standard specifies MPEG-4 compliant streams. Within the 86 constraint of this compliance the compression layer is unaware of a 87 specific delivery technology, but it can be made to react to the 88 characteristics of a particular delivery layer such as the path-MTU 89 or loss characteristics. Also, some compressors can be designed to 90 be delivery specific for implementation efficiency. In such cases 91 the compressor may work in a non-optimal fashion with delivery 92 technologies that are different than the one it is specifically 93 designed to operate with. 95 The hierarchical relations, location and properties of ESs in a 96 presentation are described by a dynamic set of Object Descriptors 97 (ODs). Each OD groups one or more ES Descriptors referring to a 98 single content item (audio-visual object). Hence, multiple 99 alternative or hierarchical representations of each content item are 100 possible. 102 ODs are themselves conveyed through one or more ESs. A complete set 103 of ODs can be seen as an MPEG-4 resource or session description at a 105 Gentric et al. Expires July 2001 2 106 stream level. The resource description may itself be hierarchical, 107 i.e. an ES conveying an OD may describe other ESs conveying other 108 ODs. 110 The session description is accompanied by a dynamic scene 111 description, Binary Format for Scene (BIFS), again conveyed through 112 one or more ESs. At this level, content is identified in terms of 113 audio-visual objects. The spatio-temporal location of each object is 114 defined by BIFS. The audio-visual content of those objects that are 115 synthetic and static are described by BIFS also. Natural and 116 animated synthetic objects may refer to an OD that points to one or 117 more ESs that carries the coded representation of the object or its 118 animation data. 120 By conveying the session (or resource) description as well as the 121 scene (or content composition) description through their own ESs, it 122 is made possible to change portions of the content composition and 123 the number and properties of media streams that carry the audio- 124 visual content separately and dynamically at well known instants in 125 time. 127 One or more initial Scene Description streams and the corresponding 128 OD stream have to be pointed to by an initial object descriptor 129 (IOD). The IOD needs to be made available to the receivers through 130 some out-of-band means that are out of scope of this payload 131 specification. However in the context of transport on IP networks it 132 is defined in a separate document [9]. 134 A homogeneous encapsulation of ESs carrying media or control (ODs, 135 BIFS) data is defined by the Sync Layer (SL) that primarily provides 136 the synchronization between streams. The Compression Layer organizes 137 the ESs in Access Units (AU), the smallest elements that can be 138 attributed individual timestamps. Integer or fractional AUs are then 139 encapsulated in SL packets. All consecutive data from one stream is 140 called an SL-packetized stream at this layer. The interface between 141 the compression layer and the SL is called the Elementary Stream 142 Interface (ESI). The ESI is informative. 144 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 145 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 146 media unaware but delivery technology aware. It provides transparent 147 access to and delivery of content irrespective of the technologies 148 used. The interface between the SL and DMIF is called the DMIF 149 Application Interface (DAI). It offers content location independent 150 procedures for establishing MPEG-4 sessions and access to transport 151 channels. The specification of this payload format is considered as 152 a part of the MPEG-4 Delivery Layer. 154 media aware +-----------------------------------------+ 155 delivery unaware | COMPRESSION LAYER | 156 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 157 14496-3 Audio +-----------------------------------------+ 159 Gentric et al. Expires July 2001 3 160 Elementary 161 Stream 162 ===================================================Interface 164 (ESI) 165 +-------------------------------------------+ 166 media and | SYNC LAYER | 167 delivery unaware | manages elementary streams, their synch- | 168 14496-1 Systems | ronization and hierarchical relations | 169 +-------------------------------------------+ 171 DMIF 172 Application 173 ====================================================Interface 175 (DAI) 176 +-------------------------------------------+ 177 delivery aware | DELIVERY LAYER | 178 media unaware |provides transparent access to and delivery| 179 14496-6 DMIF | of content irrespective of delivery | 180 | technology | 181 +-------------------------------------------+ 183 Figure 1: General MPEG-4 terminal architecture 185 1.2 MPEG-4 Elementary Stream Data Packetization 187 The ESs from the encoders are fed into the SL with indications of AU 188 boundaries, random access points, desired composition time and the 189 current time. 191 The Sync Layer fragments the ESs into SL packets, each containing a 192 header that encodes information conveyed through the ESI. If the AU 193 is larger than a SL packet, subsequent packets containing remaining 194 parts of the AU are generated with subset headers until the complete 195 AU is packetized. 197 The syntax of the Sync Layer is configurable and can be adapted to 198 the needs of the stream to be transported. This includes the 199 possibility to select the presence or absence of individual syntax 200 elements as well as configuration of their length in bits. The 201 configuration for each individual stream is conveyed in a 202 SLConfigDescriptor, which is an integral part of the ES Descriptor 203 for this stream. The MPEG-4 SLConfigDescriptor, being configuration 204 information, is not carried by the media stream itself but is rather 205 transported via an ObjectDescriptor Stream encoded using the MPEG-4 206 Object Description framework. This can be done in a separate stream 207 using this payload format (see section 4.2 for details). The 208 SLConfigDescriptor MAY be transported by other means (for example as 209 a a=fmtp parameter, see section 5). 211 Gentric et al. Expires July 2001 4 212 2. Analysis of the carriage of MPEG-4 over IP 214 When transporting MPEG-4 audio and video, applications may or may 215 not require the use of MPEG-4 systems. To achieve the highest level 216 of interoperability between all MPEG-4 applications, it is desirable 217 that (a) in both cases the same MPEG-4 transport format can be used 218 and that (b) receivers that have no MPEG-4 system knowledge can 219 easily skip the MPEG-4 system specific information, if any. 221 RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4 222 video, but when using MPEG-4 systems a problem arises from the fact 223 that both RTP and MPEG-4 systems contain a synchronization layer. 224 In particular, the RTP header duplicates some of the information 225 provided in SL packet headers such as the composition timestamps 226 (CTSs) and the marker bit that signals the end of access units. 228 To avoid unnecessary overhead and potential interoperability risks 229 when transporting MPEG-4 systems, it is desirable to remove the 230 redundancy between the SL packet header and the RTP packet header. 231 To be independent on the use of MPEG-4 systems, synchronization can 232 rely on the parameters provided in the RTP header. 234 In case SL headers are used, the redundant fields are removed from 235 the SL header, producing "reduced SL headers". 236 The remaining information from the SL header, if any, is contained 237 inside the RTP packet payload, together with the SL packet payload. 238 The combination of RTP packet headers and reduced SL packet headers 239 can be used to logically map the RTP packets to complete SL packets. 241 Some of the information contained in the reduced SL headers is also 242 useful for transport over RTP when MPEG-4 systems is not used. 244 For that reason the information in the "reduced" SL headers is split 245 into "general useful information" and "MPEG-4 systems only 246 information". 248 The "general useful information" hereinafter called Mapped SL Packet 249 Header (MSLH) is carried by a number of fields configurable using 250 parameters defined in section 5.1; all receivers can parse these 251 fields. 253 The "MPEG-4 systems only information", if any, is contained in a 254 reduced SL header, hereinafter called Remaining SL Packet Header 255 (RSLH), also signaled by parameters (see section 5.1)and preceded by 256 a length field, so as to enable easy skipping of this information by 257 non-MPEG-4 system devices. 259 This is depicted in figure 2. 261 <----------SL Packet--------> 263 +---------------------------+ 265 Gentric et al. Expires July 2001 5 266 | SL Packet | SL Packet | 267 | Header | Payload | 268 +---------------------------+ 269 | | 270 | | 271 +-------------+----------+---+ | 272 | | | | 273 V V V V 274 +-----------+ +-----------+ +-------------+ +-----------+ 275 |RTP Packet | | Mapped SL | | Remaining SL| | SL Packet | 276 | Header | | Header | | Header | | Payload | 277 +-----------+ +-----------+ +-------------+ +-----------+ 279 <----RTP Packet Payload-------------------> 281 Figure 2: Mapping of SL Packet into RTP packet 283 This RTP payload format has been designed so that it can be 284 configured (using parameters described in section 5.1) to be 285 identical to RFC 3016 for the recommended MPEG-4 video 286 configurations. Hence receivers that comply with this payload 287 specification can decode such RTP payload. 289 3. Payload Format 291 The RTP Payload corresponds to an integer number of SL packets. 293 SL packets inside RTP packets MUST be in the SL stream order i.e: 294 i) decodingTimeStamp order, if present 295 ii) packetSequenceNumber order, if present 296 iii) Implicit decoding order in all other cases. 298 The SL Packet Headers are transformed into RSLH with some fields 299 extracted to be mapped in the RTP header and others extracted to be 300 mapped in the corresponding MSLH. The SL Packet Payload is 301 unchanged. 303 This payload format has two modes. The "SingleSL" mode is a mode 304 where a single SL packet is transported per RTP packet. The 305 "MultipleSL" mode is a mode where more than one SL packet are 306 transported per RTP packet. The default mode is the Single-SL mode. 307 The mode can be set to Multiple-SL by adding a non-zero SLPPSize or 308 SLPPSizeLength parameter (see section 5.1). 310 RTP Packets SHOULD be sent in the SL stream order (as defined 311 above). 313 The size (or number) of the SL packet(s) SHOULD be adjusted such 314 that the resulting RTP packet is not larger than the path-MTU. To 315 handle larger packets, this payload format relies on lower layers 316 for fragmentation, which may not be desirable. 318 Gentric et al. Expires July 2001 6 319 3.1 RTP Header Fields Usage 321 Payload Type (PT): The assignment of an RTP payload type for this 322 new packet format is outside the scope of this document, and will 323 not be specified here. It is expected that the RTP profile for a 324 particular class of applications will assign a payload type for this 325 encoding, or if that is not done then a payload type in the dynamic 326 range shall be chosen. 328 Marker (M) bit: The M bit is set to 1 when all SL packets in the RTP 329 packet are Access Units ends i.e. the M bit maps to the SL 330 accessUnitEndFlag. 332 M is set to 1 when the RTP packet contains either: 333 . a single SL packet containing a full Access Unit 334 . a single SL packet transporting the last fragment of an Access 335 Unit 336 . multiple SL packets each containing a full Access Unit 337 . multiple SL packets each containing the last fragment of an Access 338 Unit 339 . multiple SL packets each containing either a full Access Unit or 340 the last fragment of an Access Unit 342 The last 2 cases occur when using specific interleaving schemes. In 343 some interleaving schemes it may not be practical to reshuffle the 344 SL packets so as to group Access Unit ends in the same RTP packet. 345 In that case, Access Unit boundaries SHOULD be transported using one 346 or both of the SL flags accessUnitStartFlag and accessUnitEndFlag. 348 Extension (X) bit: Defined by the RTP profile used. 350 Sequence Number: The RTP sequence number should be generated by the 351 sender with a constant random offset and does not have to be 352 correlated to any (optional) MPEG-4 SL sequence numbers. 354 Timestamp: Set to the value in the compositionTimeStamp field of the 355 first SL packet, if present. If compositionTimeStamp has less than 356 32 bits length, the MSBs of timestamp MUST be set to zero. 358 Although it is available from the SL configuration data, the 359 resolution of the timestamp may need to be conveyed explicitly 360 through some out-of-band means to be used by network elements which 361 are not MPEG-4 aware. 363 If compositionTimeStamp has more than 32 bits length, this payload 364 format cannot be used. 366 In all cases, the sender SHALL always make sure that RTP time stamps 367 are identical only for RTP packets transporting fragments of the 368 same Access Unit. 370 In case compositionTimeStamp is not present in the current SL 371 packet, but has been present in a previous SL packet the reason is 373 Gentric et al. Expires July 2001 7 374 that this is the same Access Unit that has been fragmented therefore 375 the same timestamp value MUST be taken as RTP timestamp. 377 If compositionTimeStamp is never present in SL packets for this 378 stream, the RTP packetizer SHOULD convey a reading of a local clock 379 at the time the RTP packet is created. 381 According to RFC1889 [5, Section 5.1] timestamps are recommended to 382 start at a random value for security reasons. However then, a 383 receiver is not in the general case able to reconstruct the original 384 MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for 385 applications where streams from multiple sources are to be 386 synchronized. Therefore the usage of such a random offset SHOULD be 387 avoided. 389 Note that since RTP devices may re-stamp the stream, all time stamps 390 inside of the RTP payload (CTS and DTS in MSLH, OCR in RSLH) MUST be 391 expressed as difference to the RTP time stamp. Since this 392 subtraction may lead to negative values, the offset MUST be encoded 393 as a two's complement signed integer in network byte order. Note 394 these offsets (delta) typically require much fewer bits to be 395 encoded than the original length, which is another justification. 397 SSRC, CC and CSRC fields are used as described in RFC 1889 [5]. 399 RTCP SHOULD be used as defined in RFC 1889 [5]. 401 RTP timestamps in RTCP SR packets: according to the RTP timing 402 model, the RTP timestamp that is carried into an RTCP SR packet is 403 the same as the compositionTimeStamp that would be applied to an RTP 404 packet for data that was sampled at the instant the SR packet is 405 being generated and sent. The RTP timestamp value is calculated from 406 the NTP timestamp for the current time, which also goes in the RTCP 407 SR packet. To perform that calculation, an implementation needs to 408 periodically establish a correspondence between the CTS value of a 409 data packet and the NTP time at which that data was sampled. 411 3.2 RTP payload structure 413 The packet payload structure consists of 3 byte-aligned sections. 415 The first section is the MSLHSection and contains Mapped SL Packet 416 Headers (MSLH). The MSLH structure is described in 3.3. In the 417 Single-SL mode this section is empty by default. 419 The second section is the RSLHSection and contains Remaining SL 420 Headers (RSLH). The RSLH structure is described in 3.5. By default 421 this section is empty. 423 The last section (SLPPSection) contains the SL packet payloads. This 424 section is never empty. 426 Gentric et al. Expires July 2001 8 427 The Nth MSLH in the MSLHSection, the Nth RSLH in the RSLHSection and 428 the Nth SL packet payload in the SLPPSection correspond to the Nth 429 SL packet transported by the RTP packet. 431 0 1 2 3 432 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 434 |V=2|P|X| CC |M| PT | sequence number | 435 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 | timestamp | 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 438 | synchronization source (SSRC) identifier | 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 440 : contributing source (CSRC) identifiers : 441 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 442 | | 443 | MSLHSection (byte aligned) | 444 | | 445 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 446 | | | 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 448 | | 449 | RSLHSection (byte aligned) | 450 | | 451 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | | | 453 +-+-+-+-+-+-+-+-+ | 454 | | 455 | SLPPSection (byte aligned) | 456 | | 457 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | :...OPTIONAL RTP padding | 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 461 Figure 3: An RTP packet for MPEG-4 463 3.3 MSLHSection structure 465 If the MSLHSection consumes a non-integer number of bytes, up to 7 466 zero-valued padding bits MUST be inserted at the end in order to 467 achieve byte-alignment. 469 In the Single-SL mode the MSLHSection consists of a single MSLH. 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | MSLH (x bits ) : padding bits| 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 475 Figure 4: MSLHSection structure in Single-SL mode 477 Gentric et al. Expires July 2001 9 478 In the Multiple-SL mode this section consist of a 2 bytes field 479 giving the size in bits (in network byte order) of the following 480 block of bit-wise concatenated MSLHs. 482 This size field is absent in the Single-SL mode not because it is 483 not needed (which would be a minor gain) but for compatibility with 484 RFC 3016. 486 0 1 2 3 487 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 488 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 489 | MSLH section size in bits | MSLH | etc | 490 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 491 | as many bit-wise concatenated MSLHs | 492 | as SL packets in this RTP packet | 493 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 494 | : padding bits| 495 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 Figure 5: MSLHSection structure in Multiple-SL mode 499 3.4 MSLH structure 501 The Mapped SL Packet Header content depends on parameters (as 502 described in section 5.1); by default it is empty for the Single-SL 503 mode and contains only the SLPPayloadSize (SL Packet Payload Size) 504 field in the Multiple-SL mode. 506 When all options are used the MSLH structure is given in figure 6. 508 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 509 | SLPPayloadSize | 510 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 511 | SLPSeqNum/SLPSeqNumDelta | 512 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 513 | CTSFlag | 514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 515 | CTSDelta | 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 517 | DTSFlag | 518 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 519 | DTSDelta | 520 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 522 Figure 6: Mapped SL Packet Header (MSLH) structure 524 In the general case a receiver can only discover the size of a MSLH 525 by parsing it since for example the presence of CTSDelta is signaled 526 by the value of CTSFlag. 528 3.4.1 Fields of MSLH 530 Gentric et al. Expires July 2001 10 531 SLPPayloadSize (SL Packet Payload Size): Indicates the size in bytes 532 of the associated SL Packet Payload, which can be found in the 533 SLPPSection of the RTP packet. The length in bits of this field is 534 signaled by the SLPPSizeLength parameter (see section 5.1). 536 SLPSeqNum/SLPSeqNumDelta: Encodes the packetSequenceNumber (serial 537 number) of the SL Packet. 539 SLPSeqNum is found only for the first SL packet. SLPSeqNumDelta is 540 optional and -if present- appears for subsequent (non-first) SL 541 packets. 543 The length in bits of the SLPSeqNum field is defined by the 544 SLPSeqNumLength parameter (see section 5.1). 546 The length in bits of the SLPSeqNumDelta field is defined by the 547 SLPSeqNumDeltaLength parameter (see section 5.1). 549 If the parameter SLPSeqNumDeltaLength is defined, non-first SL 550 packets have their packetSequenceNumber encoded as a difference 551 named SLPSeqNumDelta. This difference is relative to the previous SL 552 packet in the RTP packet according to (with i>=0): 553 packetSequenceNumber(0) = SLPSeqNum(0) 554 packetSequenceNumber(i+1) = packetSequenceNumber(i) + 555 SLPSeqNumDelta(i+1) + 1 557 If the parameter SLPSeqNumDeltaLength is not defined the default 558 value is zero i.e. the SLPSeqNumDelta field is not present for non- 559 first SL packets. Furthermore receivers SHALL then apply the above 560 formula with SLPSeqNumDelta equal to zero i.e. by default 561 packetSequenceNumber is incremented by 1 for each SL packet in one 562 RTP packet. This means that for streams that use 563 packetSequenceNumber and are not interleaved the transport of 564 packetSequenceNumber in the Multiple-SL mode is "almost free". 566 CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A 567 value of 1 indicates that the field is present, a value of 0 that it 568 is not present. 570 If CTSDeltaLength is not zero this field is present in all MSLH 571 since the receiver needs it to reconstruct the 572 compositionTimeStampFlag of SL Headers. 574 CTSDelta: Specifies the value of the CTS as a 2-complement offset 575 (delta) from the timestamp in the RTP header of this RTP packet. 576 The length in bits of each CTSDelta field is specified by the 577 CTSDeltaLength parameter (see section 5.1). 579 This field is present if CTSFlag is 1 except for the first MSLH 580 since the composition time stamp of the first SL packet is mapped to 581 the RTP time stamp, regardless of whether CTSFlag is 1. In all cases 582 the sender MUST remove the compositionTimeStamp from the RSLH. 584 Gentric et al. Expires July 2001 11 585 DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A 586 value of 1 indicates that DTSDelta is present, a value of 0 that it 587 is not present. 589 If DTSDeltaLength is not zero this field is present in all MSLH 590 since the receiver needs it to reconstruct the decodingTimeStampFlag 591 of SL Headers. 593 DTSDelta: Specifies the value of the decodingTimeStamp as a 2 594 complement offset (delta) from the timestamp in the RTP header of 595 this packet. The length in bits of each DTSDelta field is specified 596 by the DTSDeltaLength parameter (see section 5.1). 598 This field appears when DTSFlag is 1. The sender MUST always remove 599 the decodingTimeStamp from the RSLH. 601 3.4.2 Relationship between sizes of MSLH fields and parameters 603 The relationship between a Mapped SL Packet Header and the related 604 parameters is as follows: 606 +===========================+=================================+ 607 | Fields of MSLPH | Number of bits (parameters) | 608 +===========================+=================================+ 609 | SLPPayloadSize | SLPPSizeLength | 610 +---------------------------+---------------------------------+ 611 | SLPSeqNum | SLPSeqNumLength | 612 +---------------------------+---------------------------------+ 613 | SLPSeqNumDelta | SLPSeqNumDeltaLength | 614 +---------------------------+---------------------------------+ 615 | CTSFlag | 1 If ( CTSDeltaLength > 0 ) | 616 +---------------------------+---------------------------------+ 617 | CTSDelta | CTSDeltaLength If(CTSFlag==1) | 618 +---------------------------+---------------------------------+ 619 | DTSFlag | 1 If ( DTSDeltaLength > 0 ) | 620 +---------------------------+---------------------------------+ 621 | DTSDelta | DTSDeltaLength If(DTSFlag==1) | 622 +---------------------------+---------------------------------+ 624 Table 1: Relationship between MSLH fields� size and parameters 626 3.5 RSLHSection structure 628 This section consists of a field (RSLHSectionSize) giving the size 629 in bits of the following block of bit-wise concatenated RSLHs. 631 If the section consumes a non-integer number of bytes, up to 7 zero 632 padding bits MUST be inserted at the end in order to achieve byte- 633 alignment. 635 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 636 | RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable | 637 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 639 Gentric et al. Expires July 2001 12 640 | number of bits) | 641 | | 642 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 643 | | RSLH (variable number of bits) | 644 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 645 | etc | 646 | as many bit-wise concatenated RSLHs | 647 | as SL Packets in this RTP packet | 648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 649 | RSLH (variable number of bits) | 650 | +-+-+-+-+-+-+-+ 651 | : padding bits| 652 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 654 Figure 7: RSLHSection structure 656 The length in bits of the RSLHSectionSize field is 657 RSLHSectionSizeLength and is specified with a default value of zero 658 indicating that the whole RSLHSection is absent. 660 +=================================+===============================+ 661 | Fields of RSLHSection | Number of bits | 662 +=================================+===============================+ 663 | RSLHSectionSize | RSLHSectionSizeLength | 664 +---------------------------------+-------------------------------+ 665 | all bit-wise concatenated RSLHs | RSLHSectionSize | 666 +---------------------------------+-------------------------------+ 668 Table 2: Sizes in bits inside RSLHSection 670 Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system 671 awareness, specifically it requires to understand the MPEG-4 672 Synchronization Layer (SL) syntax and the modifications to this 673 syntax described in the next section. 675 However thanks to the RSLHSectionSize field non-MPEG-4-system 676 receivers MAY skip this part by rounding up RSLPHSize/8 to the next 677 integer number of bytes. 679 3.6 RSLH structure 681 A Remaining SL Packet Header (RSLH) is what remains of an SL header 682 after modifications for mapping into this payload format. 684 The following modifications of the SL packet header MUST be applied. 685 The other fields of the SL packet header MUST remain unchanged but 686 are bit-shifted to fill in the gaps left by the operations specified 687 below. 689 3.6.1 Removal of fields 691 Gentric et al. Expires July 2001 13 692 The following SL Packet Header fields -if present- are removed since 693 they are mapped either in the RTP header or in the corresponding 694 MSLH: 695 . compositionTimeStampFlag 696 . compositionTimeStamp 697 . decodingTimeStampFlag 698 . decodingTimeStamp 699 . packetSequenceNumber 700 . AccessUnitEndFlag (in Single-SL mode only) 702 The AccessUnitEndFlag, when present for a given stream, MUST be 703 removed from every RSLH when using the Single-SL mode since it has 704 the same meaning as the Marker bit (and for compatibility with RFC 705 3016). However when using the Multiple-SL mode, AccessUnitEndFlag 706 MUST NOT be removed since it is useful to signal individual AU ends. 708 3.6.2 Mapping of OCR 710 Furthermore if the SL Packet header contains an OCR, then this field 711 is encoded in the RSLH as a 2-complement difference (delta) exactly 712 like a compositionTimeStamp or a decodingTimeStamp in the MSLH. The 713 length in bit of this difference is indicated by the OCRDeltaLength 714 parameter (see section 5.1). 716 With this payload format OCRs MUST have the same clock resolution as 717 Time Stamps. 719 If compositionTimeStamp is not present for a SL packet that has OCR 720 then the OCR SHALL be encoded as a difference to the RTP time stamp. 722 3.6.3 Degradation Priority 724 For streams that use the optional degradationPriority field in the 725 SL Packet Headers, only SL packets with the same degradation 726 priority SHALL be transported by one RTP packet so that components 727 may dispatch the RTP packets according to appropriate QOS or 728 protection schemes. Furthermore only the first RSLH of one RTP 729 packet SHALL contain the degradationPriority field since it would be 730 otherwise redundant. 732 3.7 SLPPSection structure 734 The SLPPSection (SL Packet Payload Section) contains the 735 concatenated SL Packet Payloads. By definition SL Packet Payloads 736 are byte aligned. 738 For efficiency SL packets do not carry their own payload size. This 739 is not an issue for RTP packets that contain a single SL Packet. 741 However in the Multiple-SL mode the size of each SL packet payload 742 MUST be available to the receiver. 744 Gentric et al. Expires July 2001 14 745 If the SL packet payload size is constant for a stream, the size 746 information SHOULD NOT be transported in the RTP packet. However in 747 that case it MUST be signaled using the SLPPSize parameter (see 748 section 5.1). 750 If the SL packet payload size is variable then the size of each SL 751 packet payload MUST be indicated in the corresponding MSLH. In order 752 to do so the MSLH MUST contain a SLPPayloadSize field. The number of 753 bits on which this SLPPayloadSize field is encoded MUST be indicated 754 using the SLPPSizeLength parameter (see section 5.1). 756 The absence of either SLPPSize or SLPPSizeLength indicates the 757 Single-SL mode i.e. that a single SL packet is transported in each 758 RTP packet for that stream. 760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 761 | SLPP (variable number of bytes) | 762 | | 763 | | 764 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 765 | | SLPP (variable number of bytes) | 766 +-+-+-+-+-+-+-+-+-+-+-+-+-+ | 767 | | 768 | | 769 | | 770 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 771 | etc | 772 | as many byte-wise concatenated SLPPs | 773 | as SL Packets in this RTP packet | 774 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 776 Figure 8: SLPPSection structure 778 3.8 Interleaving 780 SL Packets MAY be interleaved. Senders MAY perform interleaving. 781 Receivers MUST support interleaving. 783 When interleaving of SL packets is used it SHALL be implemented 784 using the SLPSeqNum field of MSLH. 786 The AUSequenceNumber field of the SL header MUST NOT be used for 787 interleaving since firstly it may collide with the Scene Description 788 Carousel usage described in section 5.1 and secondly it is not 789 visible to non-MPEG-4 system receivers. 791 The conjunction of RTP sequence number and SLPSeqNum can produce a 792 quasi-unique identifier for each SL packet so that a receiver can 793 unambiguously reconstruct the original order even in case of out-of- 794 order packets, packet loss or duplication. 796 3.9 Fragmentation Rules 798 Gentric et al. Expires July 2001 15 799 This section specifies rules for senders in order to prevent media 800 decoding difficulties at the receiver end. 802 MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams 803 and SHOULD be mapped one-to-one to RTP packets of this format with 804 two exceptions: 805 - Access Units larger than the MTU, 806 - When using interleaving for better packet loss resilience. 808 In all cases Access Unit start MUST be aligned with SL packet start. 810 This section gives rules to apply when performing Access Unit 811 fragmentation. 813 Some MPEG-4 codecs define optional syntax for Access Units sub- 814 entities (fragments) that are independently decodable for error 815 resilience purposes. Examples are Video Packets for video and Error 816 Sensitivity Categories (ESC) for audio. This always corresponds to 817 specific bitstream syntax, which is signaled in the 818 DecoderSpecificInfo inside the DecoderConfig in SLConfig, and/or 819 using the corresponding parameters as described in section 5.1. 820 Therefore encoders and decoders are both aware whether they are 821 operating in such a mode or not (however since this codec 822 configuration is an opaque data block this is not explicitly 823 signaled by this payload format). 825 If not operating in such a mode it is obvious that the decoder has 826 to skip packets after a loss until an Access Unit start is received. 827 Similarly decoder implementations that do not implement robust 828 decoding of Access Units fragments have to discard all packets after 829 a packet loss until an Access Unit start is received. In the same 830 way decoder implementations that do not implement re-synchronization 831 at any Access Units start have to discard all packets after a packet 832 loss until a Random Access Point Access Unit is received. 834 One problem would arise however for decoder implementations that try 835 to restart decoding after a packet loss if independently decodable 836 fragments are signaled (in the decoder configuration) but the 837 fragments actually received are not independently decodable because 838 the RTP sender has made RTP packets on different boundaries than the 839 fragments provided by the encoder (so this issue applies to the 840 interface between the encoder and the RTP sender and to the RTP 841 sender component itself). 843 For this reason the following rules must apply to SL streams that 844 are specifically made for transport with this payload format: 846 SL packets SHOULD be codec-semantic entities in the spirit of ALF 847 i.e. either complete Access Units or fragments of Access Units that 848 are independently decodable. Specifically when a given codec has an 849 independently decodable Access Unit fragments optional syntax this 850 option SHOULD be used. 852 Gentric et al. Expires July 2001 16 853 Furthermore when streams are generated using independently decodable 854 Access Units fragments these Access Units fragments MUST be mapped 855 one-to-one into SL packets. Consequently independently decodable 856 Access Units fragments MUST NOT be split across several SL packets 857 and therefore MUST NOT be split across several RTP packets. 859 For example an MPEG-4 audio stream encoded using the ESC syntax MUST 860 NOT split one ESC across 2 RTP packets. 862 This rule is relaxed when using MPEG-4 Video Packets for two 863 reasons: firstly Video Packets can be much larger than typical MTU 864 and secondly all Video Packets start with a specific 865 resynchronization marker that can be unambiguously detected. 866 Therefore for video streams using the Video Packet syntax Video 867 Packets MAY be split across several SL packets although it is 868 strongly RECOMMENDED to always adapt the Video Packet size to fit 869 the MTU. A Video Packet start MUST always be aligned with a SL 870 packet start, except when a GOV is present, in which case the GOV 871 and the first Video Packet of the following VOP MUST be included in 872 the same SL packet. 874 4. Other issues 876 4.1 SL packetized stream reconstruction 878 The MPEG-4 over IP framework [9] requires that the way a receiver 879 can reconstruct a valid SL packetized stream shall be documented, 880 this is the purpose of this section. 882 Since this format directly transports SL packets this reconstruction 883 is trivial with the following rules: 885 - SLPacketHeader.packetSequenceNumber is restored from 886 MSLH.SLPSeqNum for the first SL packet in the RTP packet (i= 0): 887 SLPacketHeader.packetSequenceNumber(0) = MSLH.SLPSeqNum(0) 888 and for subsequent packets using (for i>=0) : 889 SLPacketHeader.packetSequenceNumber(i+1) = 890 SLPacketHeader.packetSequenceNumber(i) + MSLH.SLPSeqNumDelta(i+1) +1 892 - All time stamps (CTS, DTS, OCR), when present, are restored from 893 the delta values. 894 - Time stamps flags (CTSFlag, DTSFlag) in MSLH are used to 895 reconstruct respectively the compositionTimeStampFlag and 896 decodingTimeStampFlag of SLPacketHeader. 898 Specifically the reconstruction depends on the parameters as 899 follows: 901 If CTSDeltaLength is absent or equals 0: 902 The SL stream reconstruction rules are: 903 . for the first (or only) SL packet: 904 . if SLConfig.useTimeStamps == true, then: 906 Gentric et al. Expires July 2001 17 907 . SLPacketHeader.compositionTimeStampFlag = true 908 . SLPacketHeader.compositionTimeStamp = RTP TimeStamp 909 . if SLConfig.useTimeStamps == false, then: 910 . SLPacketHeader.compositionTimeStampFlag is not defined 911 . for the following SL packets: 912 . SLPacketHeader.compositionTimeStampFlag = false 914 If CTSDeltaLength is not zero: 915 . SLPacketHeader.compositionTimeStampFlag = MSLH.CTSFlag 916 . SLPacketHeader.compositionTimeStamp = RTP TimeStamp + 917 MSLH.CTSDelta 919 - The other SL packet header fields SHALL remain as found in RSLH. 921 It is obvious that in the general case the reconstruction of the 922 original SL packetized stream requires SL-awareness. However this 923 payload format allows in all cases a receiver that does not know 924 about the SL syntax to reconstruct the semantic of SL for the 925 following very useful features: 926 - Packet order (decoding order) 927 - Access Unit boundaries (using the M bit) 928 - Access Unit fragments (i.e. SL packet boundaries using 929 MSLH.SLPPayloadSize) 930 - Composition Time Stamps (using the RTP Time Stamp and 931 MSLH.CTSDelta) 932 - Decoding Time Stamps (using the RTP Time Stamp and MSLH.DTSDelta) 933 - Packet sequence number (using the RTP Time Sequence number and 934 MSLH.SLPSeqNum) 936 4.2 Handling of scene description streams 938 MPEG-4 introduces new stream types as described in section 1 namely 939 Object Descriptors and BIFS. In the following both OD and BIFS are 940 discussed on the same basis i.e. as "scene description". 942 Considering Scene description as a "stream-able" type of content is 943 a rather new concept and for that reasons some specific comments are 944 needed. 946 Typically scene descriptions are encoded in such a way that 947 information loss would in the general case cripple the presentation 948 beyond any hope of repair by the receiver. Still this is well suited 949 for a number of multimedia applications were the scene is first made 950 available via reliable channels to the client and then played. This 951 payload format is not intended for this type of applications for 952 which download of MPEG-4 interchange (.mp4) files is typical. 953 However it can also be used if the RTP packets are transported using 954 TCP or any other reliable protocol. 956 On the other hand MPEG-4 has introduced the possibility to 957 dynamically change the scene description by sending animation 958 information (changes in parameters) and structural change 959 information (updates). Since this information has to be sent in a 961 Gentric et al. Expires July 2001 18 962 timely fashion MPEG-4 has defined a number of techniques in order to 963 encode the scene description in a manner that makes it behave 964 similarly to other temporal encoding schemes such as audio and 965 video. This payload format is intended for this usage. 967 Note that in many cases the application will consist of first the 968 reliable transmission of a static initial scene followed by the 969 streaming of animations and updates. For this reason the usage of 970 this payload format is attractive since it offers a unique solution. 972 Senders must be aware that suitable schemes should be used when 973 scene description streams transport sensitive configuration 974 information. For example in case the RTP packet transporting an OD- 975 update command would be lost, the corresponding media stream would 976 not be accessible by the receiver. 978 Redundancy is a possibility and may either be added by tools 979 hierarchically higher than this payload format, e.g. by packet based 980 FEC, re-transmission, or similar tools. In such a case, the general 981 congestion control principles have to be observed. 983 Since BIFS and OD streams may be modified during the session with 984 update commands, there is a need to send both update commands and 985 full BIFS/OD refresh. For that reason MPEG-4 defines Random Access 986 Points (RAP) for scene description streams (OD and BIFS) where by 987 definition a decoder can restart decoding i.e. receives a "full 988 update" of the scene. This mechanism is called Scene and Object 989 Description Carrousel. The AU Sequence Number field of SL Packet 990 Header is used to support this behavior at the Synchronization 991 Layer. When two access units are sent consecutively with the same AU 992 Sequence Number, the second one is assumed to be a semantic 993 repetition of the first. If a receiver starts to listen in the 994 middle of a session or has detected losses, it can skip all received 995 Access Units until such a RAP. The periodicity of transmission of 996 these RAPs should be chosen/adjusted depending on the application 997 and the network it is deployed on; i.e. exactly like Intra-coded 998 frames for video, it is the responsibility of the sender to make 999 sure the periodicity of RAPs is suitable. 1001 4.3 Multiplexing 1003 An advanced MPEG-4 session may involve a large number of objects 1004 that may be as many as a few hundred, transporting each ES as an 1005 individual RTP stream may not always be practical. Allocating and 1006 controlling hundreds of destination addresses for each MPEG-4 1007 session may pose insurmountable session administration problems. 1008 The input/output processing overhead at the end-points will be 1009 extremely high also. Additionally, low delay transmission of low 1010 bitrate data streams, e.g. facial animation parameters, results in 1011 extremely high header overheads. 1013 Gentric et al. Expires July 2001 19 1014 To solve these problems, MPEG-4 data transport requires a 1015 multiplexing scheme that allows selective bundling of several ESs. 1016 This is beyond the scope of the payload format defined here. 1018 The MPEG-4's Flexmux multiplexing scheme may be used for this 1019 purpose and a specific RTP payload format is being developed [11]. 1021 Another approach may be to develop a generic RTP multiplexing scheme 1022 usable for MPEG-4 data. The multiplexing scheme reported in [8] may 1023 be a candidate for this approach. 1025 For MPEG-4 applications, the multiplexing technique needs to address 1026 the following requirements: 1028 i. The ESs multiplexed in one stream can change frequently during a 1029 session. Consequently, the coding type, individual packet size and 1030 temporal relationships between the multiplexed data units must be 1031 handled dynamically. 1033 ii. The multiplexing scheme should have a mechanism to determine the 1034 ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is 1035 not a part of the SL header. 1037 iii. In general, an SL packet does not contain information about its 1038 size. The multiplexing scheme should be able to delineate the 1039 multiplexed packets whose lengths may vary from a few bytes to close 1040 to the path-MTU. 1042 4.5 Overlap with RFC 3016 1044 This payload format has been designed to have an overlap with RFC 1045 3016 [7]. The conditions for this overlap are: 1046 Conditions for RFC 3016: 1047 i. MPEG-4 video elementary streams only 1048 ii. Maximum one VOP or Video Packet per RTP packet 1049 Conditions for this payload format: 1050 i. No structural parameters defined (or all set to zero), i.e. 1051 Single-SL mode with empty MSLH and empty RSLH. 1052 ii. Receivers MUST be ready to accept (ignore) video configuration 1053 headers (e.g. VOSH, VO and VOL) and visual-object-sequence-end-code 1054 transported in-band. 1056 5. Types and Names 1058 This section describes the MIME types and names associated with this 1059 payload format. Section 5.1 is intended for registration with IANA 1060 in RFC 2048. 1062 This format may require additional information about the mapping to 1063 be made available to the receiver. This is done using parameters 1064 described in the next section. The absence of any of these fields is 1065 equivalent to a field set to the default value, which is always 1067 Gentric et al. Expires July 2001 20 1068 zero. The absence of any such parameters resolves into a default 1069 "basic" configuration. 1071 In the MPEG-4 framework the SL stream configuration information is 1072 carried using the Object Descriptor. For compatibility with 1073 receivers that do not implement the full MPEG-4 system specification 1074 this information MAY also be signaled using parameters described 1075 here. When such information is present both in an Object Descriptor 1076 and as a parameter of this payload format it MUST be exactly the 1077 same. 1079 For transport of MPEG-4 audio and video without the use of MPEG-4 1080 systems, as well as to support non-MPEG-4 system receivers, it is 1081 also possible to transport information on the profile and level of 1082 the stream and on the decoder configuration. This is also described 1083 in the next section. 1085 5.1 MIME type registration 1087 MIME media type name: "video" or �audio" or "application" 1089 "video" SHOULD be used for MPEG-4 Video streams (ISO/IEC 14496-2) or 1090 MPEG-4 Systems streams that convey information needed for an 1091 audio/visual presentation. 1093 "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or 1094 MPEG-4 Systems streams that convey information needed for an audio 1095 only presentation. 1097 "application" SHOULD be used for MPEG-4 Systems streams 1098 (ISO/IEC14496-1) that serve other purposes than audio/visual 1099 presentation, e.g. in some cases when MPEG-J streams are 1100 transmitted. 1102 MIME subtype name: mpeg4-sl 1104 Required parameters: none 1106 Optional parameters: 1108 DTSDeltaLength: 1109 The number of bits on which the DTSDelta field is encoded in MSLH. 1110 The default value is zero and indicates the absence of DTSFlag and 1111 DTSDelta in MSLH (the stream does not transport decodingTimeStamps). 1112 A value larger than zero indicates that there is a DTSFlag in each 1113 MSLH. Since decodingTimeStamp -if present- must be encoded as a 1114 difference to the RTP time stamp, the DTSDeltaLength parameter MUST 1115 be present in order to transport decodingTimeStamps with this 1116 payload format. 1118 CTSDeltaLength: 1119 The number of bits on which the CTSDelta field is encoded in (non- 1120 first) MSLH. The default value is zero and indicates the absence of 1122 Gentric et al. Expires July 2001 21 1123 the CTSFlag and CTSDelta fields in MSLH. Non-zero values MOST NOT be 1124 signaled in the Single-SL mode. Since compositionTimeStamps -if 1125 present- must be encoded as a difference to the RTP time stamp, the 1126 CTSDeltaLength parameter MUST be present in order to transport 1127 compositionTimeStamps using this payload format (in the Multiple-SL 1128 mode). However CTSDeltaLength SHOULD be set to zero (or not 1129 signaled) for streams that have a constant Access Unit duration 1130 (which can be explicitly signaled using the DurationFlag and 1131 AccessUnitDuration field of SLConfigDescriptor). 1133 OCRDeltaLength: 1134 The number of bits on which the OCRDelta field is encoded in RSLH. 1135 The default value is zero and indicates the absence of OCR for this 1136 stream. Since objectClockReference -if present- must be encoded as a 1137 difference to the RTP time stamp, the OCRDeltaLength parameter MUST 1138 be present in order to transport objectClockReferences with this 1139 payload format. 1141 SLPPSizeLength: 1142 The number of bits on which the SLPPayloadSize field of MSLH is 1143 encoded. The default value is zero and indicates the Single-SL mode 1144 (unless SLPPSize is present). Simultaneous presence of this 1145 parameter and SLPPSize is illegal. Either the SLPPSizeLength or 1146 SLPPSize parameter MUST be present in order to signal the Multiple- 1147 SL mode of this payload format. 1149 SLPPSize: 1150 The constant size in bytes of each SL Packet Payload for this 1151 stream. The default value is zero and indicates variable SL Packet 1152 Payload size (or the Single-SL mode if SLPPSizeLength is absent). 1153 Simultaneous presence of this parameter and SLPPSizeLength is 1154 illegal. Either the SLPPSizeLength or SLPPSize parameter MUST be 1155 present in order to signal the Multiple-SL mode of this payload 1156 format. When SLPPSize is present the SLPPayloadSize of MSLH in the 1157 RTP packets MUST NOT be present. 1159 SLPSeqNumLength: 1160 The number of bits on which the SLPSeqNum is encoded in the first 1161 MSLH. The default value is zero and indicates the absence of 1162 SLPSeqNum and SLPSeqNumDelta for all MSLHs. Since 1163 packetSequenceNumber -if present- must be mapped in MSLH, the 1164 SLPSeqNumLength parameter MUST be present in order to transport 1165 packetSequenceNumber with this payload format. 1167 SLPSeqNumDeltaLength: 1168 The number of bits on which the SLPSeqNumDelta are encoded in any 1169 non-first MSLH. The default value is zero and indicates that 1170 packetSequenceNumber MUST be incremented by one for each SL packet 1171 in the RTP packet (see section 3.5). Since when interleaving 1172 packetSequenceNumber does not increment by 1 inside a RTP packet, 1173 the SLPSeqNumDeltaLength parameter MUST be present when using 1174 interleaving with this payload format. 1176 Gentric et al. Expires July 2001 22 1177 RSLHSectionSizeLength: 1178 The number of bits that is used to encode the RSLHSectionSize field. 1179 The default value is zero and indicates the absence of the whole 1180 RSLHSection for all RTP packets of this stream. Compatibility with 1181 RFC 3016 requires that the RSLHSection must be empty, including the 1182 RSLHSectionSize field. This is the reason why there is such a 1183 variable length with a default value indicating absence of the 1184 RSLHSectionSize field. 1186 SLConfigDescriptor: 1187 A base-64 encoding of the SLConfigDescriptor. This SHALL be the 1188 original SLConfigDescriptor and it SHALL be the same as the one 1189 transported by the OD framework, if any. 1191 profile-level-id: 1192 A decimal representation of the MPEG-4 Profile Level indication 1193 value. For audio this parameter indicates which MPEG-4 Audio tool 1194 subsets are applied to encode the audio stream and is defined in 1195 defined in ISO/IEC 14496-1. For video this parameter indicates which 1196 MPEG-4 Visual tool subsets are applied to encode the video stream 1197 and is defined in Table G-1 of ISO/IEC 14496-2. This parameter MAY 1198 be used in the capability exchange or session setup procedure to 1199 indicate MPEG-4 Profile and Level combination of which the relevant 1200 MPEG-4 media codec is capable. If this parameter is not specified by 1201 the procedure, its default value of 1 (Simple Profile/Level 1) is 1202 used. 1204 Config: 1205 A hexadecimal representation of an octet string that expresses the 1206 media payload configuration. Configuration data is mapped onto the 1207 octet string in an MSB-first basis. The first bit of the 1208 configuration data SHALL be located at the MSB of the first octet. 1209 In the last octet, zero-valued padding bits, if necessary, shall 1210 follow the configuration data. For audio this is a 1211 "StreamMuxConfig", as defined in ISO/IEC 14496-3. For video this 1212 expresses the MPEG-4 Visual configuration information, as defined in 1213 subclause 6.2.1 Start codes of ISO/IEC14496-2[2][4][9] and the 1214 configuration information indicated by this parameter SHALL be the 1215 same as the configuration information in the corresponding MPEG-4 1216 Visual stream, except for first-half-vbv-occupancy and latter-half- 1217 vbv-occupancy, if it exists, which may vary in the repeated 1218 configuration information inside an MPEG-4 Visual stream (See 6.2.1 1219 Start codes of ISO/IEC14496-2). 1221 object-type: 1222 A decimal representation of the MPEG-4 Audio Object Type value 1223 defined in ISO/IEC 14496-3. This parameter specifies the tool used 1224 by the encoder. It CAN be used to limit the capability within the 1225 specified "profile-level-id". 1227 Bitrate: 1228 A decimal representation of the audio bitrate in bits per second for 1229 the audio bit stream. 1231 Gentric et al. Expires July 2001 23 1232 Encoding considerations: 1233 System bitstreams MUST be generated according to MPEG-4 System 1234 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 1235 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 1236 bitstreams MUST be generated according to MPEG-4 Visual 1237 specifications (ISO/IEC 14496-3). All SL streams MUST be generated 1238 according to MPEG-4 Sync Layer specifications (ISO/IEC 14496-1 1239 section 10), in order to read this format the SLConfigDescriptor is 1240 REQUIRED. These bitstream are binary data and MUST be encoded for 1241 non-binary transport (for Email, the Base64 encoding is sufficient). 1242 This type is also defined for transfer via RTP. The RTP packets 1243 MUST be packetized according to the RTP payload format defined in 1244 RFC . 1246 Security considerations: 1247 As in RFC . 1249 Interoperability considerations: 1250 MPEG-4 provides a large and rich set of tools for the coding of 1251 visual objects. For effective implementation of the standard, 1252 subsets of the MPEG-4 tool sets have been provided for use in 1253 specific applications. These subsets, called 'Profiles', limit the 1254 size of the tool set a decoder is required to implement. In order to 1255 restrict computational complexity, one or more 'Levels' are set for 1256 each Profile. A Profile@Level combination allows: 1257 o a codec builder to implement only the subset of the standard he 1258 needs, while maintaining interworking with other MPEG-4 devices 1259 included in the same combination, and 1260 o checking whether MPEG-4 devices comply with the standard 1261 ('conformance testing'). 1262 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1263 by the parameter "profile-level-id". Interoperability between a 1264 sender and a receiver may be achieved by specifying the parameter 1265 "profile-level-id" in MIME content, or by arranging in the 1266 capability exchange/announcement procedure to set this parameter 1267 mutually to the same value. 1269 Published specification: 1270 The specifications for MPEG-4 streams are presented in ISO/IEC 1271 14469-1, 14469-2, and 14469-3. The RTP payload format is described 1272 in RFC . 1274 Applications which use this media type: 1275 Multimedia streaming and conferencing tools, Internet messaging and 1276 Email applications. Also supra-relativistic elementary particle 1277 hyperspace tunneling trans-galactic communication devices :-) 1279 Additional information: none 1281 Magic number(s): none 1283 File extension(s): none 1285 Gentric et al. Expires July 2001 24 1286 Macintosh File Type Code(s): none 1288 Person & email address to contact for further information: 1289 Authors of RFC . 1291 Intended usage: COMMON 1293 Author/Change controller: 1294 Authors of RFC . 1296 5.2 Concatenation of parameters 1298 Multiple parameters SHOULD be expressed as a MIME media type string, 1299 in the form of a semicolon-separated list of parameter=value pairs 1300 (see examples in Appendix). 1302 5.3 Usage of SDP 1304 5.3.1 The a=fmtp keyword 1306 It is assumed that one typical way to transport the above-described 1307 parameters associated with this payload format is via a SDP message 1308 for example transported to the client in reply to a RTSP DESCRIBE of 1309 via SAP. In that case the (a=fmtp) keyword MUST be used as described 1310 in RFC 2327 [10, section 6]. The syntax being then: 1312 a=fmtp: = 1314 5.3.2 SDP example 1316 The following is an example of SDP syntax for the description of a 1317 session containing one MPEG-4 audio stream, one MPEG-4 video and two 1318 MPEG-4 system stream, transported using this format and the AVP 1319 profile [12]. Note that the video stream DTSDelta are encoded on 4 1320 bits in this example. See the Appendix for more examples. 1322 o= .... 1323 I= .... 1324 c=IN IP4 123.234.71.112 1325 m=video 1034 RTP/AVP 97 1326 a=fmtp:DTSDeltaLength=4 1327 a=rtpmap:97 mpeg4-sl 1328 m=audio 810 RTP/AVP 98 1329 a=fmtp: profile-level-id=1; config=7866E7E6EF 1330 a=rtpmpa:98 mpeg4-sl 1331 m=application 1234 RTP/AVP 99 1332 a=rtpmap:99 mpeg4-sl 1333 m=application 1234 RTP/AVP 99 1334 a=rtpmap:99 mpeg4-sl 1336 6. Security Considerations 1338 Gentric et al. Expires July 2001 25 1339 RTP packets using the payload format defined in this specification 1340 are subject to the security considerations discussed in the RTP 1341 specification [5]. This implies that confidentiality of the media 1342 streams is achieved by encryption. Because the data compression used 1343 with this payload format is applied end-to-end, encryption may be 1344 performed on the compressed data so there is no conflict between the 1345 two operations. The packet processing complexity of this payload 1346 type �i.e. excluding media data processing- does not exhibit any 1347 significant non-uniformity in the receiver side to cause a denial- 1348 of-service threat. 1350 However, it is possible to inject non-compliant MPEG streams (Audio, 1351 Video, and Systems) to overload the receiver/decoder's buffers which 1352 might compromise the functionality of the receiver or even crash it. 1353 This is especially true for end-to-end systems like MPEG where the 1354 buffer models are precisely defined. 1356 MPEG-4 Systems supports stream types including commands that are 1357 executed on the terminal like OD commands, BIFS commands, etc. and 1358 programmatic content like MPEG-J (Java(TM) Byte Code) and 1359 ECMASCRIPT. It is possible to use one or more of the above in a 1360 manner non-compliant to MPEG to crash or temporarily make the 1361 receiver unavailable. 1363 Authentication mechanisms can be used to validate of the sender and 1364 the data to prevent security problems due to non-compliant malignant 1365 MPEG-4 streams. 1367 A security model is defined in MPEG-4 Systems streams carrying MPEG- 1368 J access units which comprises Java(TM) classes and objects. MPEG-J 1369 defines a set of Java APIs and a secure execution model. MPEG-J 1370 content can call this set of APIs and Java(TM) methods from a set of 1371 Java packages supported in the receiver within the defined security 1372 model. According to this security model, downloaded byte code is 1373 forbidden to load libraries, define native methods, start programs, 1374 read or write files, or read system properties. 1376 Receivers can implement intelligent filters to validate the buffer 1377 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1378 ECMAScript) commands in the streams. However, this can increase the 1379 complexity significantly. 1381 7. Acknowledgements 1382 This document evolved across several years thanks to contributions 1383 from a large number of people since it is based on work within the 1384 IETF AVT working group and various ISO MPEG working groups, 1385 especially the 4-on-IP ad-hoc group in the last stages. The authors 1386 wish to thank Guido Fransceschini, Art Howarth, Dave Mackie, Dave 1387 Singer, and Stephan Wenger for their valuable comments. 1389 8. References 1391 [1] ISO/IEC 14496-1:2000 MPEG-4 Systems October 2000 1393 Gentric et al. Expires July 2001 26 1395 [2] ISO/IEC 14496-2:1999/Amd.1:2000(E) MPEG-4 Visual January 2000 1397 [3] ISO/IEC 14496-3:1999/FDAM 1:20000 MPEG-4 Audio January 2000 1399 [4] ISO/IEC 14496-6 FDIS Delivery Multimedia Integration Framework, 1400 November 1998. 1402 [5] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport 1403 Protocol for Real Time Applications RFC 1889, Internet Engineering 1404 Task Force, January 1996. 1406 [6] S. Bradner, Key words for use in RFCs to Indicate Requirement 1407 Levels, RFC 2119, Internet Engineering Task Force, March 1997. 1409 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 1410 payload format for MPEG-4 Audio/Visual streams, Internet Engineering 1411 Task Force, RFC 3016. 1413 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed 1414 RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-02.txt, 1415 November 2000. 1417 [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over 1418 IP-based Protocols, work in progress, draft-singer-mpeg4-ip- 1419 01.txt,October 2000. 1421 [10] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327, 1422 Internet Engineering Task Force, April 1998. 1424 [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed 1425 Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt, 1426 February 2001. 1428 [12] H. Schulzrinne, RTP Profile for Audio and Video Conferences 1429 with Minimal Control, RFC1890, Internet Engineering Task Force, 1430 January 1996. 1432 9. Authors' Addresses 1434 Olivier Avaro 1435 France Telecom 1436 35 A Schutzenhuttenweg 1437 60598 Frankfurt am Main 1438 Deutschland 1439 e-mail: olivier.avaro@francetelecom.fr 1441 Andrea Basso 1442 AT&T Labs Research 1443 200 Laurel Avenue 1444 Middletown, NJ 07748 1445 USA 1447 Gentric et al. Expires July 2001 27 1448 e-mail: basso@research.att.com 1450 Stephen L. Casner 1451 Packet Design, Inc. 1452 66 Willow Place 1453 Menlo Park, CA 94025 1454 USA 1455 e-mail: casner@acm.org 1457 M. Reha Civanlar 1458 AT&T Labs - Research 1459 100 Schultz Drive 1460 Red Bank, NJ 07701 1461 USA 1462 e-mail: civanlar@research.att.com 1464 Philippe Gentric 1465 Philips Digital Networks 1466 22 Avenue Descartes 1467 94453 Limeil-Brevannes CEDEX 1468 France 1469 e-mail: philippe.gentric@philips.com 1471 Carsten Herpel 1472 THOMSON multimedia 1473 Karl-Wiechert-Allee 74 1474 30625 Hannover 1475 Germany 1476 e-mail: herpelc@thmulti.com 1478 Zvi Lifshitz 1479 Optibase Ltd. 1480 7 Shenkar St. 1481 Herzliya 46120 1482 Israel 1483 e-mail: zvil@optibase.com 1485 Young-kwon Lim 1486 mp4cast (MPEG-4 Internet Broadcasting Solution Consortium) 1487 1001-1 Daechi-Dong Gangnam-Gu 1488 Seoul, 305-333, 1489 Korea 1490 e-mail : young@techway.co.kr 1492 Colin Perkins 1493 USC Information Sciences Institute 1494 4350 N. Fairfax Drive #620 1495 Arlington, VA 22203 1496 USA 1497 e-mail : csp@isi.edu 1499 Jan van der Meer 1500 Philips Digital Networks 1502 Gentric et al. Expires July 2001 28 1503 Cederlaan 4 1504 5600 JB Eindhoven 1505 Netherlands 1506 e-mail : jan.vandermeer@philips.com 1508 APPENDIX: Examples of usage of this payload format 1510 This payload format has been designed to transport with flexibility 1511 a very versatile packetization scheme (the MPEG-4 Synchronization 1512 Layer); its complexity is therefore larger than the average for RTP 1513 payload formats. For this reason this section describes a number of 1514 key examples of how this payload format can be used. 1516 A C++-like syntax called SDL (Syntactic Description Language) 1517 defined in [1, section 14] is used to economically describe MPEG-4 1518 system data structures. 1520 Appendix.1 MPEG-4 Video 1522 Let us consider the case of a 30 frames per second MPEG-4 video 1523 stream which bit rate is high enough that Access Units have to be 1524 split in several SL packets (typically above 300 kb/s). 1526 Let us assume also that the video codec generates in that case Video 1527 Packets suitable to fit in one SL packet i.e that the video codec is 1528 MTU aware and the MTU is 1500 bytes. We assume furthermore that this 1529 stream contains B frames and that decodingTimeStamps are present. 1531 SLConfigDescriptor 1533 In this example the SLConfigDescriptor is: 1535 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1536 tag=SLConfigDescrTag { 1537 bit(8) predefined; 1538 if (predefined==0) { 1539 bit(1) useAccessUnitStartFlag; = 1 1540 bit(1) useAccessUnitEndFlag; = 0 1541 bit(1) useRandomAccessPointFlag; = 1 1542 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1543 bit(1) usePaddingFlag; = 0 1544 bit(1) useTimeStampsFlag; = 1 1545 bit(1) useIdleFlag; = 0 1546 bit(1) durationFlag; = 0 1547 bit(32) timeStampResolution; = 30 1548 bit(32) OCRResolution; = 0 1549 bit(8) timeStampLength; = 32 1550 bit(8) OCRLength; = 0 1551 bit(8) AU_Length; = 0 1552 bit(8) instantBitrateLength; = 0 1554 Gentric et al. Expires July 2001 29 1555 bit(4) degradationPriorityLength; = 0 1556 bit(5) AU_seqNumLength; = 0 1557 bit(5) packetSeqNumLength; = 0 1558 bit(2) reserved=0b11; 1559 } 1560 if (durationFlag) { 1561 bit(32) timeScale; // NOT USED 1562 bit(16) accessUnitDuration; // NOT USED 1563 bit(16) compositionUnitDuration; // NOT USED 1564 } 1565 if (!useTimeStampsFlag) { 1566 bit(timeStampLength) startDecodingTimeStamp; = 0 1567 bit(timeStampLength) startCompositionTimeStamp; = 0 1568 } 1569 } 1571 The useRandomAccessPointFlag is set so that the 1572 randomAccessPointFlag can indicate that the corresponding SL packet 1573 contains a GOV and the first Video Packet of an Intra coded frame. 1575 SL Packet Header structure 1577 With this configuration we have the following SL packet header 1578 structure: 1580 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1581 if (SL.useAccessUnitStartFlag) { 1582 bit(1) accessUnitStartFlag; // 1 bit 1583 } 1584 if (accessUnitStartFlag) { 1585 if (SL.useRandomAccessPointFlag) { 1586 bit(1) randomAccessPointFlag; // 1 bit 1587 } 1588 if (SL.useTimeStampsFlag) { 1589 bit(1) decodingTimeStampFlag; // 1 bit 1590 bit(1) compositionTimeStampFlag; // 1 bit 1591 } 1592 if (decodingTimeStampFlag) { 1593 bit(SL.timeStampLength) decodingTimeStamp; 1594 } 1595 if (compositionTimeStampFlag) { 1596 bit(SL.timeStampLength) compositionTimeStamp; 1597 } 1598 } 1599 } 1601 Parameters 1603 decodingTimeStamps are encoded on 32 bits, which is much more than 1604 needed for delta. Therefore the sender will use DTSDeltaLength to 1605 signal that only 6 bits are used for the coding of relative DTS in 1606 the RTP packet. 1608 Gentric et al. Expires July 2001 30 1609 The RSLHSectionSize cannot exceed 2 bits, which is encoded on 2 bits 1610 and signaled by RSLHSectionSizeLength. The resulting concatenated 1611 fmtp line is: 1613 a=fmtp: DTSDeltaLength=6;RSLHSectionSizeLength=2 1615 RTP packet structure 1617 Two cases can occur; for packets that transport first fragments of 1618 Access Units we have: 1620 +=========================================+=============+ 1621 | Field | size | 1622 +=========================================+=============+ 1623 | RTP header | - | 1624 +-----------------------------------------+-------------+ 1625 | CTSFlag = 1 | 1 bit | 1626 +-----------------------------------------+-------------+ 1627 | DTSFlag = 1 | 1 bit | 1628 +-----------------------------------------+-------------+ 1629 | DTSDelta | 6 bits | 1630 +-----------------------------------------+-------------+ 1631 | bits to byte alignment | 0 bits | 1632 +-----------------------------------------+-------------+ 1633 | RSLHSectionSize = 2 | 2 bits | 1634 +-----------------------------------------+-------------+ 1635 | accessUnitStartFlag = 1 | 1 bit | 1636 +-----------------------------------------+-------------+ 1637 | randomAccessPointFlag | 1 bit | 1638 +-----------------------------------------+-------------+ 1639 | bits to byte alignment | 4 bits | 1640 +-----------------------------------------+-------------+ 1641 | SL packet payload | N bytes | 1642 +-----------------------------------------+-------------+ 1644 For packets that transport non-first fragments of Access Units we 1645 have: 1647 +=========================================+=============+ 1648 | Field | size | 1649 +=========================================+=============+ 1650 | RTP header | - | 1651 +-----------------------------------------+-------------+ 1652 | CTSFlag = 0 | 1 bit | 1653 +-----------------------------------------+-------------+ 1654 | DTSFlag = 0 | 1 bit | 1655 +-----------------------------------------+-------------+ 1656 | bits to byte alignment | 6 bits | 1657 +-----------------------------------------+-------------+ 1658 | RSLHSectionSize = 2 | 2 bits | 1659 +-----------------------------------------+-------------+ 1660 | accessUnitStartFlag = 0 | 1 bit | 1662 Gentric et al. Expires July 2001 31 1663 +-----------------------------------------+-------------+ 1664 | randomAccessPointFlag | 1 bit | 1665 +-----------------------------------------+-------------+ 1666 | zero bits to byte alignment | 4 bits | 1667 +-----------------------------------------+-------------+ 1668 | SL packet payload | N bytes | 1669 +-----------------------------------------+-------------+ 1671 Note the compositionTimeStamp is never present since it would be 1672 redundant with the RTP time stamp. However the value of CTSFlag is 1 1673 to indicate to the receiver that the value of 1674 compositionTimeStampFlag for the corresponding reconstructed SL 1675 packed. 1677 Overhead estimation 1679 In this example we have a RTP overhead of 40 + 2 bytes for 1400 1680 bytes of payload i.e. 3 % overhead. 1682 Appendix.2 RFC 3016 compatible MPEG-4 Video 1684 This is an example of a video stream where the SL is configured to 1685 produce RTP packets compatible with RFC 3016. 1687 SLConfigDescriptor 1689 In this example the SLConfigDescriptor is: 1691 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1692 tag=SLConfigDescrTag { 1693 bit(8) predefined; 1694 if (predefined==0) { 1695 bit(1) useAccessUnitStartFlag; = 0 1696 bit(1) useAccessUnitEndFlag; = 1 1697 bit(1) useRandomAccessPointFlag; = 0 1698 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1699 bit(1) usePaddingFlag; = 0 1700 bit(1) useTimeStampsFlag; = 0 1701 bit(1) useIdleFlag; = 0 1702 bit(1) durationFlag; = 0 1703 bit(32) timeStampResolution; = 0 1704 bit(32) OCRResolution; = 0 1705 bit(8) timeStampLength; = 0 1706 bit(8) OCRLength; = 0 1707 bit(8) AU_Length; = 0 1708 bit(8) instantBitrateLength; = 0 1709 bit(4) degradationPriorityLength; = 0 1710 bit(5) AU_seqNumLength; = 0 1711 bit(5) packetSeqNumLength; = 0 1712 bit(2) reserved=0b11; 1713 } 1714 if (durationFlag) { 1716 Gentric et al. Expires July 2001 32 1717 bit(32) timeScale; // NOT USED 1718 bit(16) accessUnitDuration; // NOT USED 1719 bit(16) compositionUnitDuration; // NOT USED 1720 } 1721 if (!useTimeStampsFlag) { 1722 bit(timeStampLength) startDecodingTimeStamp; = 0 1723 bit(timeStampLength) startCompositionTimeStamp; = 0 1724 } 1725 } 1727 SL Packet Header structure 1729 With this configuration we have the following SL packet header 1730 structure: 1732 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1733 if (SL.useAccessUnitEndFlag) { 1734 bit(1) accessUnitEndFlag; // 1 bit 1735 } 1736 } 1738 Parameters 1740 This configuration is the default one; no parameters are required. 1742 RTP packet structure 1744 Note that accessUnitEndFlag is mapped to the RTP header M bit. 1746 +=========================================+=============+ 1747 | Field | size | 1748 +=========================================+=============+ 1749 | RTP header | - | 1750 +-----------------------------------------+-------------+ 1751 | SL packet payload | 1400 bytes | 1752 +-----------------------------------------+-------------+ 1754 Overhead 1756 In this example we have a RTP overhead of 40 bytes for 1400 bytes of 1757 payload i.e. 3 % overhead. 1759 Appendix.3 Low delay MPEG-4 Audio 1761 This example is for a low delay audio service. For this reason a 1762 single SL packet is transported in each RTP packet. 1764 SLConfigDescriptor 1766 Since CTS=DTS and AccessUnit duration is constant signaling of MPEG- 1767 4 time stamps is not needed (the durationFlag is set) 1769 Gentric et al. Expires July 2001 33 1770 We also assume here an audio Object Type for which all Access Units 1771 are Random Access Points, which is signaled using the 1772 hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. 1774 We assume furtheremore a mode where the Access Unit size is constant 1775 and 5 bytes (which is signaled with AU_Length). 1777 In this example the SLConfigDescriptor is: 1779 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1780 tag=SLConfigDescrTag { 1781 bit(8) predefined; 1782 if (predefined==0) { 1783 bit(1) useAccessUnitStartFlag; = 0 1784 bit(1) useAccessUnitEndFlag; = 0 1785 bit(1) useRandomAccessPointFlag; = 0 1786 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 1787 bit(1) usePaddingFlag; = 0 1788 bit(1) useTimeStampsFlag; = 0 1789 bit(1) useIdleFlag; = 0 1790 bit(1) durationFlag; = 1 // signals constant duration 1791 bit(32) timeStampResolution; = 0 1792 bit(32) OCRResolution; = 0 1793 bit(8) timeStampLength; = 0 1794 bit(8) OCRLength; = 0 1795 bit(8) AU_Length; = 5 1796 bit(8) instantBitrateLength; = 0 1797 bit(4) degradationPriorityLength; = 0 1798 bit(5) AU_seqNumLength; = 0 1799 bit(5) packetSeqNumLength; = 0 1800 bit(2) reserved=0b11; 1801 } 1802 if (durationFlag) { 1803 bit(32) timeScale; = 1000 // for milliseconds 1804 bit(16) accessUnitDuration; = 10 // ms 1805 bit(16) compositionUnitDuration; = 10 // ms 1806 } 1807 if (!useTimeStampsFlag) { 1808 bit(timeStampLength) startDecodingTimeStamp; = 0 1809 bit(timeStampLength) startCompositionTimeStamp; = 0 1810 } 1811 } 1813 SL packet header 1815 With this configuration the SL packet header is empty. 1817 Parameters 1819 No parameters are required. 1821 RTP packet structure 1823 Gentric et al. Expires July 2001 34 1824 Note that the RTP header M bit should be always set to 1. 1826 +=========================================+=============+ 1827 | Field | size | 1828 +=========================================+=============+ 1829 | RTP header | - | 1830 +-----------------------------------------+-------------+ 1831 | SL packet payload | 5 bytes | 1832 +-----------------------------------------+-------------+ 1834 Overhead estimation 1836 The overhead is extremely large i.e. more than 800 %, since 40 bytes 1837 of headers are required to transport 5 bytes of data. Note however 1838 that RTP header compression would work well since time stamps 1839 increments are constant. 1841 Appendix.4 Media delivery MPEG-4 Audio 1843 This example is for a media delivery service where delay is not an 1844 issue but efficiency is. In this case several SL Packets are 1845 transported in each RTP packet. 1847 SLConfigDescriptor 1849 Is the same as in Appendix.3 1851 SL packet header 1853 With this configuration the SL packet header is empty. 1855 Parameters 1857 The absence of RSLHSectionSizeLength indicates that the RSLHSection 1858 is empty. 1860 The size of SL Packets (which are all complete Access Units in this 1861 case) is constant and is indicated with: 1863 a=fmtp: SLPPSize=5 1865 This also indicates to the receiver that the Multiple-SL mode will 1866 be used, i.e. that a 2 bytes field will give the size of the 1867 MSLHSection. In this case however this field always contains zero 1868 since the MSLHSection is empty. 1870 RTP packet structure 1872 Gentric et al. Expires July 2001 35 1873 Note that the RTP header M bit should be always set to 1. 1875 +=========================================+=============+ 1876 | Field | size | 1877 +=========================================+=============+ 1878 | RTP header | - | 1879 +-----------------------------------------+-------------+ 1880 | MSLHSection size in bits = 0 | 2 bytes | 1881 +-----------------------------------------+-------------+ 1882 | SL packet payload | 5 bytes | 1883 +-----------------------------------------+-------------+ 1884 | SL packet payload | 5 bytes | 1885 +-----------------------------------------+-------------+ 1886 | etc, until MTU is reached | 1887 +-----------------------------------------+-------------+ 1888 | SL packet payload | 5 bytes | 1889 +-----------------------------------------+-------------+ 1891 Overhead estimation 1893 The overhead is 3% i.e. minimal. 1895 Appendix.5 A more complex case: AAC with interleaving 1897 Let us consider AAC around 130 kb/s where each Access Unit is split 1898 in 4 SL packets corresponding to Error Sensitivity Categories (ESC) 1899 of maximum 90 bytes for which interleaving is very useful in terms 1900 of error resilience. We will therefore use an interleaving scheme 1901 where 15 SL Packets from 15 consecutive Access Units will be 1902 interleaved per RTP packet to match a MTU of 1500 bytes. 1904 The interleaving sequence is 4 RTP packets and 350 ms long, which is 1905 too long for conferencing but perfectly OK for Internet radio. 1907 Since the sequence contains 60 SL packets, the sequence number can 1908 be encoded on 6 bits. But 2 bits are actually enough if the sender 1909 always resets the SL packet sequence number to zero at the start of 1910 each sequence, since only the first MSLH in each of the 4 RTP 1911 packets in the sequence carries an absolute sequence number value 1912 (0,1,2,3). 1914 2 bits are also enough for SLPSeqNumDelta, which is constant and 1915 equal to 3 (since +1 is automatically added) 1917 Note that the 4th RTP packet in each sequence has its M bit set to 1 1918 since it contains 15 SL packets transporting the end of 15 different 1919 Access Units. 1921 With this scheme a sender (for example upon reception of RTCP 1922 reports indicating high loss rates) can �for example- choose to 1923 duplicate for each interleaving sequence the first RTP packet that 1924 contains the most useful data in terms of ESC or apply other error 1925 protection techniques, with due care to congestion issues. 1927 Gentric et al. Expires July 2001 36 1928 In this example we will also show several other SL features (OCR, AU 1929 boundary flags, as detailed below). 1931 One feature demonstrated by this example is the degradation 1932 priority. We assume degradation priority can take 4 different 1933 values, one for each SL packet of an Access Unit and is encoded on 2 1934 bits. This interleaving scheme makes sure that only SL packets of 1935 identical degradation priorities are grouped in the same RTP packet 1936 (3.6.3) and that only the first RSLH of each RTP packet transports 1937 the degradation priority. 1939 We also assume that for each last SL packet of each RTP packet the 1940 server inserts an OCR. 1942 SLConfigDescriptor 1944 In this example the SLConfigDescriptor is: 1946 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1947 tag=SLConfigDescrTag { 1948 bit(8) predefined; 1949 if (predefined==0) { 1950 bit(1) useAccessUnitStartFlag; = 1 1951 bit(1) useAccessUnitEndFlag; = 1 1952 bit(1) useRandomAccessPointFlag; = 0 1953 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 1954 bit(1) usePaddingFlag; = 0 1955 bit(1) useTimeStampsFlag; = 0 1956 bit(1) useIdleFlag; = 0 1957 bit(1) durationFlag; = 1 1958 bit(32) timeStampResolution; = 0 1959 bit(32) OCRResolution; = 30 1960 bit(8) timeStampLength; = 0 1961 bit(8) OCRLength; = 32 1962 bit(8) AU_Length; = 0 1963 bit(8) instantBitrateLength; = 0 1964 bit(4) degradationPriorityLength; = 2 1965 bit(5) AU_seqNumLength; = 0 1966 bit(5) packetSeqNumLength; = 6 1967 bit(2) reserved=0b11; 1968 } 1969 if (durationFlag) { 1970 bit(32) timeScale; = 1000// milliseconds 1971 bit(16) accessUnitDuration; = 23.22 // ms 1972 bit(16) compositionUnitDuration; = 23.22 // ms 1973 } 1974 if (!useTimeStampsFlag) { 1975 bit(timeStampLength) startDecodingTimeStamp; = 0 1976 bit(timeStampLength) startCompositionTimeStamp; = 0 1977 } 1978 } 1980 Gentric et al. Expires July 2001 37 1981 SL Packet Header structure 1983 With this configuration we have the following SL packet header 1984 structure: 1986 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1987 bit(1) accessUnitStartFlag; 1988 bit(1) accessUnitEndFlag; 1989 bit(1) OCRflag; 1990 bit(SL.packetSeqNumLength) packetSequenceNumber; 1991 bit(1) DegPrioflag; 1992 if (DegPrioflag) { 1993 bit(SL.degradationPriorityLength) degradationPriority;} 1994 if (OCRflag) { 1995 bit(SL.OCRLength) objectClockReference;} 1996 } 1997 } 1999 Parameters 2001 The RSLHSectionSize cannot exceed 2 bits, which is encoded on 2 bits 2002 and signaled by RSLHSectionSizeLength. 2004 The resulting concatenated fmtp line is: 2006 a=fmtp: 2007 SLPPSizeLength=6;RSLHSectionSizeLength=2;SLPSeqNumLength=2;SLPSeqNum 2008 DeltaLength=2;OCRDeltaLength=16 2010 RTP packet structure 2012 +=========================================+=============+ 2013 | Field | size | 2014 +=========================================+=============+ 2015 | RTP header | - | 2016 +-----------------------------------------+-------------+ 2017 MSLHSection 2018 +=========================================+=============+ 2019 | MSLHSection size in bits = 135 | 2 bytes | 2020 +-----------------------------------------+-------------+ 2021 | SLPPayloadSize | 7 bits | 2022 +-----------------------------------------+-------------+ 2023 | SLPSeqNum = 0 or 1 or 2 or 3 | 2 bits | 2024 +-----------------------------------------+-------------+ 2025 | SLPPayloadSize | 7 bits | 2026 +-----------------------------------------+-------------+ 2027 | SLPSeqDeltaNum = 3 | 2 bits | 2028 +-----------------------------------------+-------------+ 2029 | etc + 12 times 9 bits | 2030 +-----------------------------------------+-------------+ 2031 | SLPPayloadSize | 7 bits | 2032 +-----------------------------------------+-------------+ 2034 Gentric et al. Expires July 2001 38 2035 | SLPSeqDeltaNum = 3 | 2 bits | 2036 +-----------------------------------------+-------------+ 2037 | bits to byte alignment | 7 bits | 2038 +-----------------------------------------+-------------+ 2039 RSLHSection 2040 +=========================================+=============+ 2041 | RSLHSectionSize | 6 bits | 2042 +-----------------------------------------+-------------+ 2043 | accessUnitStartFlag | 1 bit | 2044 +-----------------------------------------+-------------+ 2045 | accessUnitEndFlag | 1 bit | 2046 +-----------------------------------------+-------------+ 2047 | OCRFlag = 0 | 1 bit | 2048 +-----------------------------------------+-------------+ 2049 | DegPrioflag = 1 | 1 bit | 2050 +-----------------------------------------+-------------+ 2051 | degradationPriority | 2 bits | 2052 +-----------------------------------------+-------------+ 2053 | accessUnitStartFlag | 1 bit | 2054 +-----------------------------------------+-------------+ 2055 | accessUnitEndFlag | 1 bit | 2056 +-----------------------------------------+-------------+ 2057 | OCRFlag = 0 | 1 bit | 2058 +-----------------------------------------+-------------+ 2059 | DegPrioflag = 0 | 1 bit | 2060 +-----------------------------------------+-------------+ 2061 | etc + 12 times 4 bits | 2062 +-----------------------------------------+-------------+ 2063 | accessUnitStartFlag | 1 bit | 2064 +-----------------------------------------+-------------+ 2065 | accessUnitEndFlag | 1 bit | 2066 +-----------------------------------------+-------------+ 2067 | OCRFlag = 1 | 1 bit | 2068 +-----------------------------------------+-------------+ 2069 | OCRDelta | 16 bits | 2070 +-----------------------------------------+-------------+ 2071 | DegPrioflag = 0 | 1 bit | 2072 +-----------------------------------------+-------------+ 2073 | bits to byte alignment | 4 bits | 2074 +-----------------------------------------+-------------+ 2075 SLPPSection 2076 +=========================================+=============+ 2077 | SL packet payload |max 90 bytes | 2078 +-----------------------------------------+-------------+ 2079 | etc + 13 SL packets | 2080 +-----------------------------------------+-------------+ 2081 | SL packet payload |max 90 bytes | 2082 +-----------------------------------------+-------------+ 2084 Overhead estimation 2086 Gentric et al. Expires July 2001 39 2087 The MSLHSection is 19 bytes, the RSLHSection is 10 bytes; in this 2088 example we have therefore a RTP overhead of 40 + 23 bytes for 1350 2089 bytes (max) of payload i.e. around 5 % overhead. 2091 Gentric et al. Expires July 2001 40