idnits 2.17.1 draft-ietf-avt-mpeg4-multisl-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 2 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 42 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 202 has weird spacing: '... media unawa...' == Line 647 has weird spacing: '...aLength bits)...' == Line 2060 has weird spacing: '...dicated with:...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2001) is 8344 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 357, but not defined == Unused Reference: '10' is defined on line 1627, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '7') (Obsoleted by RFC 6416) == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-02 == Outdated reference: A later version (-04) exists of draft-singer-mpeg4-ip-02 -- Possible downref: Normative reference to a draft: ref. '9' ** Obsolete normative reference: RFC 2327 (ref. '10') (Obsoleted by RFC 4566) == Outdated reference: A later version (-03) exists of draft-curet-avt-rtp-mpeg4-flexmux-00 -- Possible downref: Normative reference to a draft: ref. '11' ** Obsolete normative reference: RFC 1890 (ref. '12') (Obsoleted by RFC 3551) Summary: 9 errors (**), 0 flaws (~~), 12 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Avaro-France Telecom 3 Internet Draft Basso-AT&T 4 Casner-Packet Design 5 Civanlar-AT&T 6 Gentric-Philips 7 Herpel-Thomson 8 Lifshitz-Optibase 9 Lim-mp4cast 10 Perkins-ISI 11 van der Meer-Philips 12 June 2001 13 Expires Dec. 2001 14 Document: draft-ietf-avt-mpeg4-multisl-00.txt 16 RTP Payload Format for MPEG-4 Streams 18 Status of this Memo 20 This document is an Internet-Draft and is in full conformance with 21 all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. Internet-Drafts are draft documents valid for a maximum of 27 six months and may be updated, replaced, or obsoleted by other 28 documents at any time. It is inappropriate to use Internet- Drafts 29 as reference material or to cite them other than as "work in 30 progress." 32 This specification is a product of the Audio/Video Transport working 33 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 34 ad hoc group on MPEG-4 over Internet. Comments are solicited and 35 should be addressed to the working group's mailing list at rem- 36 conf@es.net and/or the authors. 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html. 43 This document contains a MIME type registration form that is 44 intended to be taken as-is and therefore makes reference to this 45 document, using the temporary placeholder: . 47 Abstract 49 This document describes a payload format for transporting MPEG-4 50 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 51 the coding of natural and synthetic audio-visual data. Several 52 services provided by RTP are beneficial for MPEG-4 encoded data 54 Gentric et al. Expires December 2001 1 55 transport over the Internet. Additionally, the use of RTP makes it 56 possible to synchronize MPEG-4 data with other real-time data types. 58 1. Introduction 60 MPEG-4 is a recent standard from ISO/IEC for the coding of natural 61 and synthetic audio-visual data in the form of audiovisual objects 62 that are arranged into an audiovisual scene by means of a scene 63 description [1][2][3][4]. This draft specifies an RTP [5] payload 64 format for transporting MPEG-4 encoded data streams. 66 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 67 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 68 this document are to be interpreted as described in RFC 2119 [6]. 70 The benefits of using RTP for MPEG-4 data stream transport include: 72 i. Ability to synchronize MPEG-4 streams with other RTP payloads 74 ii. Monitoring MPEG-4 delivery performance through RTCP 76 iii. Combining MPEG-4 and other real-time data streams received from 77 multiple end-systems into a set of consolidated streams through RTP 78 mixers 80 iv. Converting data types, etc. through the use of RTP translators. 82 1.1 Overview of MPEG-4 End-System Architecture 84 Fig. 1 below shows the layered architecture of a terminal which 85 implements the complete MPEG-4 systems model. The Compression Layer 86 processes individual audio-visual media streams. The MPEG-4 87 compression schemes are defined in the ISO/IEC specifications 14496- 88 2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve 89 efficient encoding over a bandwidth ranging from several kbps to 90 many Mbps. The audio-visual content compressed by this layer is 91 organized into Elementary Streams (ESs). 92 The MPEG-4 standard specifies MPEG-4 compliant streams. Within the 93 constraint of this compliance the compression layer is unaware of a 94 specific delivery technology, but it can be made to react to the 95 characteristics of a particular delivery layer such as the path-MTU 96 or loss characteristics. Also, some compressors can be designed to 97 be delivery specific for implementation efficiency. In such cases 98 the compressor may work in a non-optimal fashion with delivery 99 technologies that are different than the one it is specifically 100 designed to operate with. 102 The hierarchical relations, location and properties of ESs in a 103 presentation are described by a dynamic set of Object Descriptors 104 (ODs). Each OD groups one or more ES Descriptors referring to a 105 single content item (audio-visual object). Hence, multiple 106 alternative or hierarchical representations of each content item are 107 possible. 109 Gentric et al. Expires December 2001 2 110 ODs are themselves conveyed through one or more ESs. A complete set 111 of ODs can be seen as an MPEG-4 resource or session description at a 112 stream level. The resource description may itself be hierarchical, 113 i.e. an ES conveying an OD may describe other ESs conveying other 114 ODs. 116 The session description is accompanied by a dynamic scene 117 description, Binary Format for Scene (BIFS), again conveyed through 118 one or more ESs. At this level, content is identified in terms of 119 audio-visual objects. The spatio-temporal location of each object is 120 defined by BIFS. The audio-visual content of those objects that are 121 synthetic and static are described by BIFS also. Natural and 122 animated synthetic objects may refer to an OD that points to one or 123 more ESs that carries the coded representation of the object or its 124 animation data. 126 By conveying the session (or resource) description as well as the 127 scene (or content composition) description through their own ESs, it 128 is made possible to change portions of the content composition and 129 the number and properties of media streams that carry the audio- 130 visual content separately and dynamically at well known instants in 131 time. 133 One or more initial Scene Description streams and the corresponding 134 OD stream are pointed to by an initial object descriptor (IOD). In 135 this context the IOD needs to be made available to the receivers 136 through some out-of-band means that are out of scope of this payload 137 specification. However in the context of transport on IP networks it 138 is defined in a separate document [9]. Note that for applications 139 that only use audio and/or video this payload format can also be 140 used without IOD and OD streams (decoder configuration is then 141 transported as MIME parameters, see section 4.1). 143 The Compression Layer organizes the ESs in Access Units (AU), the 144 smallest elements that can be attributed individual timestamps. The 145 Access Units concept defines the boundary between media specific 146 processing and delivery specific processing. That is to say 147 transport should not depend on the nature of the media data but only 148 on AU properties. 150 The Sync Layer (SL) that primarily provides the synchronization 151 between streams defines a homogeneous encapsulation of ESs carrying 152 media or control data (ODs, BIFS). Integer or fractional AUs are 153 then encapsulated in SL packets and in the following we will 154 describe this payload format as transporting SL packets, although in 155 many cases SL packet payloads are actually (entire) Access Units 156 payloads i.e. encoded media frames. All consecutive data from one 157 stream is called an SL-packetized stream at this layer. The 158 interface between the compression layer and the SL is called the 159 Elementary Stream Interface (ESI). The ESI is informative i.e. it is 160 extremely useful in order to define concepts and mechanisms but does 161 not have to be implemented. For the same reason this draft describes 163 Gentric et al. Expires December 2001 3 164 the transport of SL packets i.e. Access Units or fragments thereof. 165 It is important to note however that a SL stream can be configured 166 so that SL packets are reduced to the media (compressed) data and in 167 that case implementations do not need to be aware of the SL at all. 169 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 170 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 171 media unaware but delivery technology aware. It provides transparent 172 access to and delivery of content irrespective of the technologies 173 used. The interface between the SL and DMIF is called the DMIF 174 Application Interface (DAI). It offers content location independent 175 procedures for establishing MPEG-4 sessions and access to transport 176 channels. The specification of this payload format is considered as 177 a part of the MPEG-4 Delivery Layer. 179 media aware +-----------------------------------------+ 180 delivery unaware | COMPRESSION LAYER | 181 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 182 14496-3 Audio +-----------------------------------------+ 184 Elementary 185 Stream 186 ===================================================Interface 188 (ESI) 189 +-------------------------------------------+ 190 media and | SYNC LAYER | 191 delivery unaware | manages elementary streams, their synch- | 192 14496-1 Systems | ronization and hierarchical relations | 193 +-------------------------------------------+ 195 DMIF 196 Application 197 ====================================================Interface 199 (DAI) 200 +-------------------------------------------+ 201 delivery aware | DELIVERY LAYER | 202 media unaware |provides transparent access to and delivery| 203 14496-6 DMIF | of content irrespective of delivery | 204 | technology | 205 +-------------------------------------------+ 207 Figure 1: Conceptual MPEG-4 terminal architecture 209 1.2 MPEG-4 Elementary Stream Data Packetization 211 The ESs from the encoders are fed into the SL with indications of AU 212 boundaries, random access points, desired composition time and the 213 current time. 215 Gentric et al. Expires December 2001 4 216 The Sync Layer fragments the ESs into SL packets, each containing a 217 header that encodes information conveyed through the ESI. If the AU 218 is larger than a SL packet, subsequent packets containing remaining 219 parts of the AU are generated with subset headers until the complete 220 AU is packetized. 222 The syntax of the Sync Layer is configurable and can be adapted to 223 the needs of the stream to be transported. This includes the 224 possibility to select the presence or absence of individual syntax 225 elements as well as configuration of their length in bits. The 226 configuration for each individual stream is conveyed in a 227 SLConfigDescriptor, which is an integral part of the ES Descriptor 228 for this stream. The MPEG-4 SLConfigDescriptor, being configuration 229 information, is not carried by the media stream itself but is rather 230 transported via an ObjectDescriptor Stream encoded using the MPEG-4 231 Object Description framework. This can be done in a separate stream 232 using this payload format (see section 5.2 for details). The 233 SLConfigDescriptor MAY also be transported by other means (for 234 example as a parameter, see section 4.1). Finally streams for which 235 the SL packet headers are completely empty (or fully map into the 236 RTP headers) can also be transported using this payload format; in 237 these cases the Synch Layer can be seen as a purely conceptual 238 construction that does not have to be implemented at all. Since only 239 the knowledge of the decoder configuration is then needed it MAY 240 also be transported as a parameter, as described in section 4.1. 242 2. Analysis of the carriage of MPEG-4 over IP 244 When transporting MPEG-4 audio and video, applications may or may 245 not require the use of MPEG-4 systems. To achieve the highest level 246 of interoperability between all MPEG-4 applications, it is desirable 247 that (a) in both cases the same MPEG-4 transport format can be used 248 and that (b) receivers that have no MPEG-4 system knowledge can 249 easily skip the MPEG-4 system specific information, if any. 251 RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4 252 video, but when using MPEG-4 systems a problem arises from the fact 253 that both RTP and MPEG-4 systems contain a synchronization layer. 254 In particular, the RTP header duplicates some of the information 255 provided in SL packet headers such as the composition timestamps 256 (CTSs) and the marker bit that signals the end of access units. 258 To avoid unnecessary overhead and potential interoperability risks 259 when transporting MPEG-4 systems, it is desirable to remove the 260 redundancy between the SL packet header and the RTP packet header. 261 To be independent on the use of MPEG-4 systems, synchronization can 262 rely on the parameters provided in the RTP header. 264 In case SL headers are used, the redundant fields are removed from 265 the SL header, producing "reduced SL headers". 266 The remaining information from the SL header, if any, is contained 267 inside the RTP packet payload, together with the SL packet payload. 269 Gentric et al. Expires December 2001 5 270 The combination of RTP packet headers and reduced SL packet headers 271 can be used to logically map the RTP packets to complete SL packets. 273 Some of the information contained in the reduced SL headers is also 274 useful for transport over RTP when MPEG-4 systems is not used. 276 For that reason the information in the "reduced" SL headers is split 277 into "general useful information" and "MPEG-4 systems only 278 information". 280 The "general useful information" hereinafter called Mapped SL Packet 281 Header (MSLH) is carried by a number of fields configurable using 282 parameters defined in section 4.1; all receivers MUST parse these 283 fields. 285 The "MPEG-4 systems only information", if any, is contained in a 286 reduced SL header, hereinafter called Remaining SL Packet Header 287 (RSLH), also configured using parameters (see section 4.1) and 288 preceded by a length field, so that non-MPEG-4-system devices MAY 289 skip this information. 291 This is depicted in figure 2. 293 <----------SL Packet--------> 295 +---------------------------+ 296 | SL Packet | SL Packet | 297 | Header | Payload | 298 +---------------------------+ 299 | | 300 | | 301 +-------------+----------+---+ | 302 | | | | 303 V V V V 304 +-----------+ +-----------+ +-------------+ +-----------+ 305 |RTP Packet | | Mapped SL | | Remaining SL| | SL Packet | 306 | Header | | Header | | Header | | Payload | 307 +-----------+ +-----------+ +-------------+ +-----------+ 309 <----RTP Packet Payload-------------------> 311 Figure 2: Mapping of SL Packet into RTP packet 313 When the configuration is such that SL packet headers map directly 314 to RTP headers this process of mapping SL packet headers is purely 315 conceptual. For example this RTP payload format has been designed so 316 that it is by default configured to be identical to RFC 3016 for the 317 recommended MPEG-4 video configurations (see section 5.5). Hence 318 receivers that comply with this payload specification can decode 319 such RTP payload without knowledge about the Synch Layer (see also 320 the example in Appendix.2). In a similar fashion MPEG-4 audio (see 322 Gentric et al. Expires December 2001 6 323 Appendix.3 and Appendix.4) can be transported without explicit use 324 of the Synch Layer. 326 3. Payload Format 328 The RTP Payload corresponds to an integer number of SL packets. 330 If multiple SL packets are transported in each RTP packet, they MUST 331 be in decoding order, i.e: 332 i) decodingTimeStamp order, if present 333 ii) packetSequenceNumber order, if present 334 iii) Implicit decoding order in all other cases. 336 The SL Packet Headers are transformed into RSLH with some fields 337 extracted to be mapped in the RTP header and others extracted to be 338 mapped in the corresponding MSLH. The SL Packet Payload is 339 unchanged. 341 This payload format has two modes. The "SingleSL" mode is a mode 342 where a single SL packet is transported per RTP packet. The 343 "MultipleSL" mode is a mode where possibly more than one SL packet 344 are transported per RTP packet. The default mode is the Single-SL 345 mode. The mode can be set to Multiple-SL by adding a non-zero 346 ConstantSize or SizeLength parameter (see section 4.1). 348 RTP Packets SHOULD be sent in the SL stream order (as defined 349 above). In case of interleaving the first SL packet of each RTP 350 packet is used as reference as in the following examples of RTP 351 packets containing interleaved SL packets. 352 This sequence is correct: [0,2,4][1,3,5] 353 This sequence is correct: [0,3,6][1,2][4,5] 354 This sequence is correct: [0,3,6][1,4][2,5] 355 This sequence is prohibited: [0,4,2][1,5,3] 356 This sequence is prohibited: [1,3,5][0,2,4] 357 This sequence is prohibited: [0,3,6][2,5][1,4] 359 The size (or number) of the SL packet(s) SHOULD be adjusted such 360 that the resulting RTP packet is not larger than the path-MTU. To 361 handle larger packets, this payload format relies on lower layers 362 for fragmentation, which may not be desirable. 364 3.1 RTP Header Fields Usage 366 Payload Type (PT): The assignment of an RTP payload type for this 367 new packet format is outside the scope of this document, and will 368 not be specified here. It is expected that the RTP profile for a 369 particular class of applications will assign a payload type for this 370 encoding, or if that is not done then a payload type in the dynamic 371 range shall be chosen. 373 Marker (M) bit: The M bit is set to 1 when all SL packets in the RTP 374 packet are Access Units ends i.e. the M bit maps to the Synch Layer 375 accessUnitEndFlag. 377 Gentric et al. Expires December 2001 7 378 Specifically the M bit is set to 0 when the RTP packet contains one 379 or more Access Unit fragments that are not Access Unit ends, and the 380 M bit is set to 1 for RTP packets that contain either: 381 . A single complete Access Unit 382 . The last fragment of an Access Unit 383 . Several complete Access Units 384 . Several last fragments of Access Units 385 . A mix of complete Access Units and last fragments of Access Units 387 Therefore for streams where all SL packets are complete Access Units 388 the M bit is 1 for all RTP packets. 390 Extension (X) bit: Defined by the RTP profile used. 392 Sequence Number: The RTP sequence number should be generated by the 393 sender with a constant random offset and does not have to be 394 correlated to any (optional) MPEG-4 SL sequence numbers. 396 Timestamp: Set to the value in the compositionTimeStamp field of the 397 first SL packet in the RTP packet, if present. If 398 compositionTimeStamp has less than 32 bits length, the MSBs of 399 timestamp MUST be set to zero. 401 Although it is available from the SL configuration data, the 402 resolution of the timestamp may need to be conveyed explicitly 403 through some out-of-band means to be used by network elements that 404 are not MPEG-4 aware. 406 If compositionTimeStamp has more than 32 bits length, this payload 407 format cannot be used. 409 In all cases, the sender SHALL always make sure that RTP time stamps 410 are identical only for RTP packets transporting fragments of the 411 same Access Unit. 413 In case compositionTimeStamp is not present in the current SL 414 packet, but has been present in a previous SL packet the reason is 415 that this is the same Access Unit that has been fragmented, 416 therefore the same timestamp value MUST be taken as RTP timestamp. 418 If compositionTimeStamp is never present in SL packets for this 419 stream, the RTP packetizer SHOULD convey a reading of a local clock 420 at the time the RTP packet is created. 422 According to RFC1889 [5, Section 5.1] timestamps are recommended to 423 start at a random value for security reasons. However then, a 424 receiver is not in the general case able to reconstruct the original 425 MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for 426 applications where streams from multiple sources are to be 427 synchronized. Therefore the usage of such a random offset SHOULD be 428 avoided. 430 Gentric et al. Expires December 2001 8 431 Note that since RTP devices may re-stamp the stream, all time stamps 432 inside of the RTP payload (CTS and DTS in MSLH, OCR in RSLH) MUST be 433 expressed as difference to the RTP time stamp. Since this 434 subtraction may lead to negative values, the offset MUST be encoded 435 as a two's complement signed integer in network byte order. Note 436 these offsets (delta) typically require much fewer bits to be 437 encoded than the original length, which is another justification. 439 When startCompositionTimeStamp is signaled in the SLConfigDescriptor 440 the RTP time stamps MUST start with this value. 442 SSRC, CC and CSRC fields are used as described in RFC 1889 [5]. 444 RTCP SHOULD be used as defined in RFC 1889 [5]. 446 RTP timestamps in RTCP SR packets: according to the RTP timing 447 model, the RTP timestamp that is carried into an RTCP SR packet is 448 the same as the compositionTimeStamp that would be applied to an RTP 449 packet for data that was sampled at the instant the SR packet is 450 being generated and sent. The RTP timestamp value is calculated from 451 the NTP timestamp for the current time, which also goes in the RTCP 452 SR packet. To perform that calculation, an implementation needs to 453 periodically establish a correspondence between the CTS value of a 454 data packet and the NTP time at which that data was sampled. 456 3.2 RTP payload structure 458 The packet payload structure consists of 3 byte-aligned sections. 460 The first section is the MSLHSection and contains Mapped SL Packet 461 Headers (MSLH). The MSLH structure is described in 3.3. In the 462 Single-SL mode this section is empty by default. 464 The second section is the RSLHSection and contains Remaining SL 465 Headers (RSLH). The RSLH structure is described in 3.5. By default 466 this section is empty. 468 The last section (SLPPSection) contains the SL packet payloads. This 469 section is never empty. 471 The Nth MSLH in the MSLHSection, the Nth RSLH in the RSLHSection and 472 the Nth SL packet payload in the SLPPSection correspond to the Nth 473 SL packet transported by the RTP packet. 475 0 1 2 3 476 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 478 |V=2|P|X| CC |M| PT | sequence number | 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 480 | timestamp | 481 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 482 | synchronization source (SSRC) identifier | 484 Gentric et al. Expires December 2001 9 485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 486 : contributing source (CSRC) identifiers : 487 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 488 | | 489 | MSLHSection (byte aligned) | 490 | | 491 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 492 | | | 493 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 494 | | 495 | RSLHSection (byte aligned) | 496 | | 497 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 498 | | | 499 +-+-+-+-+-+-+-+-+ | 500 | | 501 | SLPPSection (byte aligned) | 502 | | 503 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 504 | :...OPTIONAL RTP padding | 505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 507 Figure 3: An RTP packet for MPEG-4 509 3.3 MSLHSection structure 511 If the MSLHSection consumes a non-integer number of bytes, up to 7 512 zero-valued padding bits MUST be inserted at the end in order to 513 achieve byte-alignment. 515 In the Single-SL mode the MSLHSection consists of a single MSLH. 517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 518 | MSLH (x bits ) : padding bits| 519 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 521 Figure 4: MSLHSection structure in Single-SL mode 523 In the Multiple-SL mode this section consist of a 2 bytes field 524 giving the size in bits (in network byte order) of the following 525 block of bit-wise concatenated MSLHs. 527 This size field is absent in the Single-SL mode not because it is 528 not needed (which would be a minor gain) but for compatibility with 529 RFC 3016. 531 This size field is also absent when the value would always be zero 532 because the MSLH is always empty, which may happen when a constant 533 size in signaled using ConstantSize. 535 0 1 2 3 537 Gentric et al. Expires December 2001 10 538 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 539 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 540 | MSLH section size in bits | MSLH | etc | 541 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 542 | as many bit-wise concatenated MSLHs | 543 | as SL packets in this RTP packet | 544 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 | : padding bits| 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 548 Figure 5: MSLHSection structure in Multiple-SL mode 550 3.4 MSLH structure 552 The Mapped SL Packet Header content depends on parameters (as 553 described in section 4.1); by default it is empty for the Single-SL 554 mode and, except when ConstantSize is signaled, contains at least 555 the PayloadSize field in the Multiple-SL mode. 557 When all options are used the MSLH structure is given in figure 6. 559 +============================+ 560 |PayloadSize | 561 +----------------------------+ 562 |Index or IndexDelta | 563 +----------------------------+ 564 |CTSFlag | 565 +----------------------------+ 566 |CTSDelta | 567 +----------------------------+ 568 |DTSFlag | 569 +----------------------------+ 570 |DTSDelta | 571 +============================+ 573 Figure 6: Mapped SL Packet Header (MSLH) structure 575 In the general case a receiver can only discover the size of a MSLH 576 by parsing it since for example the presence of CTSDelta is signaled 577 by the value of CTSFlag. 579 3.4.1 Fields of MSLH 581 PayloadSize: Indicates the size in bytes of the associated SL Packet 582 Payload, which can be found in the SLPPSection of the RTP packet. 583 The length in bits of this field is signaled by the SizeLength 584 parameter (see section 4.1). 586 IndexDelta: Encodes the packetSequenceNumber (serial number) of the 587 SL Packet. When making streams specifically for transport with this 588 payload format this is useful for interleaving. Since a mapping to 589 RTP sequence number is not possible in the Multiple-SL mode there is 590 no requirement for a correspondence. 592 Gentric et al. Expires December 2001 11 593 Index is found only for the first SL packet of a RTP packet. 594 IndexDelta is optional and -if present- appears for subsequent (non- 595 first) SL packets in a RTP packet. 597 The length in bits of the Index field is defined by the IndexLength 598 parameter (see section 4.1). 600 The length in bits of the IndexDelta field is defined by the 601 IndexDeltaLength parameter (see section 4.1). 603 If the parameter IndexDeltaLength is defined, non-first SL packets 604 inside a RTP packet have their packetSequenceNumber encoded as a 605 difference named IndexDelta. This difference is relative to the 606 previous SL packet in the RTP packet according to (with i>=0): 607 packetSequenceNumber(0) = Index(0) 608 packetSequenceNumber(i+1) = packetSequenceNumber(i) + 609 IndexDelta(i+1) + 1 611 If the parameter IndexDeltaLength is not defined the default value 612 is zero and then the IndexDelta field is not present for non-first 613 SL packets. Nevertheless receivers SHALL then apply the above 614 formula with IndexDelta equal to zero. In other words by default 615 packetSequenceNumber is incremented by 1 for each SL packet in one 616 RTP packet. 618 CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A 619 value of 1 indicates that the CTSDelta field is present, a value of 620 0 that it is not present. 622 If CTSDeltaLength is not zero, CTSFlag is present in all MSLH 623 regardless of whether the SL packet is an Access Unit start or not. 625 CTSDelta (CTSDeltaLength bits): Specifies the value of the CTS as a 626 2-complement offset (delta) from the timestamp in the RTP header of 627 the RTP packet. The length in bits of each CTSDelta field is 628 specified by the CTSDeltaLength parameter (see section 4.1). 630 The CTSDelta field is present if CTSFlag is 1. 632 For the first MSLH of each RTP packet CTSFlag is always 0, since the 633 composition time stamp of the first SL packet in the RTP packet is 634 mapped to the RTP time stamp. In all cases the sender MUST remove 635 the compositionTimeStamp from the RSLH. 637 DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A 638 value of 1 indicates that DTSDelta is present, a value of 0 that it 639 is not present. 641 If DTSDeltaLength is not zero, DTSFlag is present in all MSLH 642 regardless of whether the SL packet is an Access Unit start or not; 643 the receiver needs this flag in order to reconstruct the 644 decodingTimeStampFlag of SL Headers. 646 Gentric et al. Expires December 2001 12 647 DTSDelta (DTSDeltaLength bits): encodes (compositionTimeStamp - 648 decodingTimeStamp) for the same SL packet (always positive). The 649 length in bits of each DTSDelta field is specified by the 650 DTSDeltaLength parameter (see section 4.1). 652 The DTSDelta field appears when DTSFlag is 1. The sender MUST always 653 remove the decodingTimeStamp from the RSLH. 655 If DTSDelta is zero i.e. if decodingTimeStamp equals 656 compositionTimeStamp then DTSFlag MUST be set to 0 and no DTSDelta 657 field SHALL be present. 659 3.4.2 Relationship between sizes of MSLH fields and parameters 661 The relationship between a Mapped SL Packet Header and the related 662 parameters is as follows: 664 +===========================+=================================+ 665 | Fields of MSLPH | Number of bits (parameters) | 666 +===========================+=================================+ 667 | PayloadSize | SizeLength | 668 +---------------------------+---------------------------------+ 669 | Index | IndexLength | 670 +---------------------------+---------------------------------+ 671 | IndexDelta | IndexDeltaLength | 672 +---------------------------+---------------------------------+ 673 | CTSFlag | 1 If (CTSDeltaLength > 0) | 674 +---------------------------+---------------------------------+ 675 | CTSDelta | CTSDeltaLength If (CTSFlag==1) | 676 +---------------------------+---------------------------------+ 677 | DTSFlag | 1 If (DTSDeltaLength > 0) | 678 +---------------------------+---------------------------------+ 679 | DTSDelta | DTSDeltaLength If (DTSFlag==1) | 680 +---------------------------+---------------------------------+ 682 Table 1: Relationship between MSLH field size and parameters 684 3.5 RSLHSection structure 686 This section consists of a field (RSLHSectionSize) giving the size 687 in bits of the following block of bit-wise concatenated RSLHs. 689 If the section consumes a non-integer number of bytes, up to 7 zero 690 padding bits MUST be inserted at the end in order to achieve byte- 691 alignment. 693 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 694 | RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable | 695 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 696 | number of bits) | 697 | | 698 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 700 Gentric et al. Expires December 2001 13 701 | | RSLH (variable number of bits) | 702 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703 | etc | 704 | as many bit-wise concatenated RSLHs | 705 | as SL Packets in this RTP packet | 706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 707 | RSLH (variable number of bits) | 708 | +-+-+-+-+-+-+-+ 709 | : padding bits| 710 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 712 Figure 7: RSLHSection structure 714 The length in bits of the RSLHSectionSize field is 715 RSLHSectionSizeLength and is specified with a default value of zero 716 indicating that the whole RSLHSection is absent. 718 +=================================+===============================+ 719 | Fields of RSLHSection | Number of bits | 720 +=================================+===============================+ 721 | RSLHSectionSize | RSLHSectionSizeLength | 722 +---------------------------------+-------------------------------+ 723 | all bit-wise concatenated RSLHs | RSLHSectionSize | 724 +---------------------------------+-------------------------------+ 726 Table 2: Sizes in bits inside RSLHSection 728 Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system 729 awareness, specifically it requires to understand the MPEG-4 730 Synchronization Layer (SL) syntax and the modifications to this 731 syntax described in the next section. 733 However thanks to the RSLHSectionSize field non-MPEG-4-system 734 receivers MAY skip this part by rounding up RSLPHSize/8 to the next 735 integer number of bytes. 737 3.6 RSLH structure 739 A Remaining SL Packet Header (RSLH) is what remains of an SL header 740 after modifications for mapping into this payload format. 742 The following modifications of the SL packet header MUST be applied. 743 The other fields of the SL packet header MUST remain unchanged but 744 are bit-shifted to fill in the gaps left by the operations specified 745 below. 747 3.6.1 Removal of fields 749 The following SL Packet Header fields -if present- are removed since 750 they are mapped either in the RTP header or in the corresponding 751 MSLH: 752 . compositionTimeStampFlag 753 . compositionTimeStamp 755 Gentric et al. Expires December 2001 14 756 . decodingTimeStampFlag 757 . decodingTimeStamp 758 . packetSequenceNumber 759 . AccessUnitEndFlag (in Single-SL mode only) 761 The AccessUnitEndFlag, when present for a given stream, MUST be 762 removed from every RSLH when using the Single-SL mode since it has 763 the same meaning as the Marker bit (and for compatibility with RFC 764 3016). However when using the Multiple-SL mode, AccessUnitEndFlag 765 MUST NOT be removed since it is useful to signal individual AU ends. 767 3.6.2 Mapping of OCR 769 Furthermore if the SL Packet header contains an OCR, then this field 770 is encoded in the RSLH as a 2-complement difference (delta) exactly 771 like a compositionTimeStamp or a decodingTimeStamp in the MSLH. The 772 length in bit of this difference is indicated by the OCRDeltaLength 773 parameter (see section 4.1). 775 With this payload format OCRs MUST have the same clock resolution as 776 Time Stamps. 778 If compositionTimeStamp is not present for a SL packet that has OCR 779 then the OCR SHALL be encoded as a difference to the RTP time stamp. 781 3.6.3 Degradation Priority 783 For streams that use the optional degradationPriority field in the 784 SL Packet Headers, only SL packets with the same degradation 785 priority SHALL be transported by one RTP packet so that components 786 may dispatch the RTP packets according to appropriate QOS or 787 protection schemes. Furthermore only the first RSLH of one RTP 788 packet SHALL contain the degradationPriority field since it would be 789 otherwise redundant. 791 3.7 SLPPSection structure 793 The SLPPSection (SL Packet Payload Section) contains the 794 concatenated SL Packet Payloads. By definition SL Packet Payloads 795 are byte aligned. 797 For efficiency SL packets do not carry their own payload size. This 798 is not an issue for RTP packets that contain a single SL Packet. 800 However in the Multiple-SL mode the size of each SL packet payload 801 MUST be available to the receiver. 803 If the SL packet payload size is constant for a stream, the size 804 information SHOULD NOT be transported in the RTP packet. However in 805 that case it MUST be signaled using the ConstantSize parameter (see 806 section 4.1). 808 Gentric et al. Expires December 2001 15 809 If the SL packet payload size is variable then the size of each SL 810 packet payload MUST be indicated in the corresponding MSLH. In order 811 to do so the MSLH MUST contain a PayloadSize field. The number of 812 bits on which this PayloadSize field is encoded MUST be indicated 813 using the SizeLength parameter (see section 4.1). 815 The absence of either ConstantSize or SizeLength indicates the 816 Single-SL mode i.e. that a single SL packet is transported in each 817 RTP packet for that stream. 819 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 820 | SLPP (variable number of bytes) | 821 | | 822 | | 823 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 824 | | SLPP (variable number of bytes) | 825 +-+-+-+-+-+-+-+-+-+-+-+-+-+ | 826 | | 827 | | 828 | | 829 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 830 | etc | 831 | as many byte-wise concatenated SLPPs | 832 | as SL Packets in this RTP packet | 833 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 835 Figure 8: SLPPSection structure 837 3.8 Interleaving 839 SL Packets MAY be interleaved. Senders MAY perform interleaving. 840 Receivers MUST support interleaving. 842 When interleaving of SL packets is used it SHALL be implemented 843 using the Index field of MSLH. 845 The AUSequenceNumber field of the SL header MUST NOT be used for 846 interleaving since firstly it may collide with the Scene Description 847 Carousel usage described in section 4.1 and secondly it is not 848 visible to non-MPEG-4 system receivers. 850 The conjunction of RTP sequence number and Index can produce a 851 quasi-unique identifier for each SL packet so that a receiver can 852 unambiguously reconstruct the original order even in case of out-of- 853 order packets, packet loss or duplication. 855 3.9 Fragmentation Rules 857 This section specifies rules for senders in order to prevent media 858 decoding difficulties at the receiver end. 860 Gentric et al. Expires December 2001 16 861 MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams 862 and SHOULD be mapped directly into RTP packets of this format with 863 two exceptions: 864 - Access Units larger than the MTU 865 - When using interleaving for better packet loss resilience. 867 In all cases Access Unit start MUST be aligned with SL packet start. 869 This section gives rules to apply when performing Access Unit 870 fragmentation. 872 Some MPEG-4 codecs define optional syntax for Access Units sub- 873 entities (fragments) that are independently decodable for error 874 resilience purposes. Examples are Video Packets for video and Error 875 Sensitivity Categories (ESC) for audio. This always corresponds to 876 specific bitstream syntax, which is signaled in the 877 DecoderSpecificInfo inside the DecoderConfig in SLConfig, and/or 878 using the corresponding parameters as described in section 4.1. 879 Therefore encoders and decoders are both aware whether they are 880 operating in such a mode or not (however since this codec 881 configuration is an opaque data block this is not explicitly 882 signaled by this payload format). 884 If not operating in such a mode it is obvious that the decoder has 885 to skip packets after a loss until an Access Unit start is received. 886 Similarly decoder implementations that do not implement robust 887 decoding of Access Units fragments have to discard all packets after 888 a packet loss until an Access Unit start is received. In the same 889 way decoder implementations that do not implement re-synchronization 890 at any Access Units start have to discard all packets after a packet 891 loss until a Random Access Point Access Unit is received. These are 892 all obvious things that a good implementation would do. 894 However serious problems would arise for decoder implementations 895 that try to restart decoding after a packet loss if independently 896 decodable fragments are signaled (in the decoder configuration) but 897 the fragments actually received are not independently decodable 898 because the RTP sender has made RTP packets on different boundaries 899 than the fragments provided by the encoder (so this issue applies to 900 the interface between the encoder and the RTP sender and to the RTP 901 sender component itself), because the decoder has in general no way 902 to detect such a faulty fragment. 904 For this reason the following rules must apply to SL streams that 905 are specifically made for transport with this payload format: 907 SL packets SHOULD be codec-semantic entities in the spirit of ALF 908 i.e. either complete Access Units or fragments of Access Units that 909 are independently decodable. Specifically when a given codec has an 910 independently decodable Access Unit fragments optional syntax this 911 option SHOULD be used. 913 Gentric et al. Expires December 2001 17 914 Furthermore when streams are generated using independently decodable 915 Access Units fragments these Access Units fragments MUST be mapped 916 one-to-one into SL packets. Consequently independently decodable 917 Access Units fragments MUST NOT be split across several SL packets 918 and therefore MUST NOT be split across several RTP packets. 920 For example an MPEG-4 audio stream encoded using the ESC syntax MUST 921 NOT split one ESC across 2 RTP packets. 923 This rule is relaxed when using MPEG-4 Video Packets for two 924 reasons: firstly Video Packets can be much larger than typical MTU 925 and secondly all Video Packets start with a specific 926 resynchronization marker that can be unambiguously detected. 927 Therefore for video streams using the Video Packet syntax Video 928 Packets MAY be split across several SL packets although it is 929 strongly RECOMMENDED to always adapt the Video Packet size to fit 930 the MTU. A Video Packet start MUST always be aligned with a SL 931 packet start, except when a GOV is present, in which case the GOV 932 and the first Video Packet of the following VOP MUST be included in 933 the same SL packet. 935 4. Types and Names 937 This section describes the MIME types and names associated with this 938 payload format. Section 4.1 is intended for registration with IANA 939 as in RFC 2048. 941 This format may require additional information about the mapping to 942 be made available to the receiver. This is done using parameters 943 described in the next section. The absence of any of these fields is 944 equivalent to a field set to the default value, which is always 945 zero. The absence of any such parameters resolves into a default 946 "basic" configuration. 948 In the MPEG-4 framework the SL stream configuration information is 949 carried using the Object Descriptor. For compatibility with 950 receivers that do not implement the full MPEG-4 system specification 951 this information MAY also be signaled using parameters described 952 here. When such information is present both in an Object Descriptor 953 and as a parameter of this payload format it MUST be exactly the 954 same. 956 For transport of MPEG-4 audio and video without the use of MPEG-4 957 systems, as well as to support non-MPEG-4 system receivers, it is 958 also possible to transport information on the profile and level of 959 the stream and on the decoder configuration. This is also described 960 in the next section. 962 4.1 MIME type registration 964 MIME media type name: "video" or "audio" or "application" 966 Gentric et al. Expires December 2001 18 967 "video" SHOULD be used for MPEG-4 Video streams (ISO/IEC 14496-2) or 968 MPEG-4 Systems streams that convey information needed for an 969 audio/visual presentation. 971 "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or 972 MPEG-4 Systems streams that convey information needed for an audio 973 only presentation. 975 "application" SHOULD be used for MPEG-4 Systems streams 976 (ISO/IEC14496-1) that serve other purposes than audio/visual 977 presentation, e.g. in some cases when MPEG-J streams are 978 transmitted. 980 MIME subtype name: mpeg4-sl 982 Required parameters: none 984 Optional parameters: 986 DTSDeltaLength: 987 The number of bits on which the DTSDelta field is encoded in MSLH. 988 The default value is zero and indicates the absence of DTSFlag and 989 DTSDelta in MSLH (the stream does not transport decodingTimeStamps). 990 A value larger than zero indicates that there is a DTSFlag in each 991 MSLH. Since decodingTimeStamp -if present- must be encoded as a 992 difference to the RTP time stamp, the DTSDeltaLength parameter MUST 993 be present in order to transport decodingTimeStamps with this 994 payload format. 996 CTSDeltaLength: 997 The number of bits on which the CTSDelta field is encoded in (non- 998 first) MSLH. The default value is zero and indicates the absence of 999 the CTSFlag and CTSDelta fields in MSLH. Non-zero values MUST NOT be 1000 signaled in the Single-SL mode. Since compositionTimeStamps �if 1001 present- must be encoded as a difference to the RTP time stamp, the 1002 CTSDeltaLength parameter MUST be present in order to transport 1003 compositionTimeStamps using this payload format (in the Multiple-SL 1004 mode). However CTSDeltaLength SHOULD be set to zero (or not 1005 signaled) for streams that have a constant Access Unit duration 1006 (which can be explicitly signaled using the DurationFlag and 1007 AccessUnitDuration field of SLConfigDescriptor). 1009 OCRDeltaLength: 1010 The number of bits on which the OCRDelta field is encoded in RSLH. 1011 The default value is zero and indicates the absence of OCR for this 1012 stream. Since objectClockReference -if present- must be encoded as a 1013 difference to the RTP time stamp, the OCRDeltaLength parameter MUST 1014 be present in order to transport objectClockReferences with this 1015 payload format. 1017 SizeLength: 1018 The number of bits on which the PayloadSize field of MSLH is 1019 encoded. The default value is zero and indicates the Single-SL mode 1021 Gentric et al. Expires December 2001 19 1022 (unless ConstantSize is present). Simultaneous presence of this 1023 parameter and ConstantSize is illegal. Either the SizeLength or 1024 ConstantSize parameter MUST be present in order to signal the 1025 Multiple-SL mode of this payload format. 1027 ConstantSize: 1028 The constant size in bytes of each SL Packet Payload for this 1029 stream. The default value is zero and indicates variable SL Packet 1030 Payload size (or the Single-SL mode if SizeLength is absent). 1031 Simultaneous presence of this parameter and SizeLength is illegal. 1032 Either the SizeLength or ConstantSize parameter MUST be present in 1033 order to signal the Multiple-SL mode of this payload format. When 1034 ConstantSize is present the PayloadSize of MSLH in the RTP packets 1035 MUST NOT be present. 1037 IndexLength: 1038 The number of bits on which the Index is encoded in the first MSLH. 1039 The default value is zero and indicates the absence of Index and 1040 IndexDelta for all MSLHs. Since packetSequenceNumber -if present- 1041 must be mapped in MSLH, the IndexLength parameter MUST be present in 1042 order to transport packetSequenceNumber with this payload format. 1044 IndexDeltaLength: 1045 The number of bits on which the IndexDelta are encoded in any non- 1046 first MSLH. The default value is zero and indicates that 1047 packetSequenceNumber MUST be incremented by one for each SL packet 1048 in the RTP packet (see section 3.5). Since when interleaving 1049 packetSequenceNumber does not increment by 1 inside a RTP packet, 1050 the IndexDeltaLength parameter MUST be present when using 1051 interleaving with this payload format. 1053 RSLHSectionSizeLength: 1054 The number of bits that is used to encode the RSLHSectionSize field. 1055 The default value is zero and indicates the absence of the whole 1056 RSLHSection for all RTP packets of this stream. Compatibility with 1057 RFC 3016 requires that the RSLHSection must be empty, including the 1058 RSLHSectionSize field. This is the reason why there is such a 1059 variable length with a default value indicating absence of the 1060 RSLHSectionSize field. 1062 SLConfigDescriptor: 1063 A base-64 encoding of the SLConfigDescriptor. This SHALL be the 1064 original SLConfigDescriptor and it SHALL be the same as the one 1065 transported by the OD framework, if any. 1067 profile-level-id: 1068 A decimal representation of the MPEG-4 Profile Level indication 1069 value. For audio this parameter indicates which MPEG-4 Audio tool 1070 subsets are applied to encode the audio stream and is defined in 1071 defined in ISO/IEC 14496-1. For video this parameter indicates which 1072 MPEG-4 Visual tool subsets are applied to encode the video stream 1073 and is defined in Table G-1 of ISO/IEC 14496-2. This parameter MAY 1074 be used in the capability exchange or session setup procedure to 1076 Gentric et al. Expires December 2001 20 1077 indicate MPEG-4 Profile and Level combination of which the relevant 1078 MPEG-4 media codec is capable. If this parameter is not specified by 1079 the procedure, its default value of 1 (Simple Profile/Level 1) is 1080 used. 1082 Config: 1083 A hexadecimal representation of an octet string that expresses the 1084 media payload configuration. Configuration data is mapped onto the 1085 octet string in an MSB-first basis. The first bit of the 1086 configuration data SHALL be located at the MSB of the first octet. 1087 In the last octet, zero-valued padding bits, if necessary, shall 1088 follow the configuration data. For audio this is a 1089 "StreamMuxConfig", as defined in ISO/IEC 14496-3. For video this 1090 expresses the MPEG-4 Visual configuration information, as defined in 1091 subclause 6.2.1 Start codes of ISO/IEC14496-2[2][4][9] and the 1092 configuration information indicated by this parameter SHALL be the 1093 same as the configuration information in the corresponding MPEG-4 1094 Visual stream, except for first-half-vbv-occupancy and latter-half- 1095 vbv-occupancy, if it exists, which may vary in the repeated 1096 configuration information inside an MPEG-4 Visual stream (See 6.2.1 1097 Start codes of ISO/IEC14496-2). 1099 StreamType: 1100 The integer value that indicates the type of MPEG-4 stream that is 1101 carried; its coding corresponds to the values of the streamType as 1102 defined for the DecoderConfigDescriptor in ISO/IEC 14496-1. 1104 Encoding considerations: 1105 System bitstreams MUST be generated according to MPEG-4 System 1106 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 1107 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 1108 bitstreams MUST be generated according to MPEG-4 Visual 1109 specifications (ISO/IEC 14496-3). All SL streams MUST be generated 1110 according to MPEG-4 Sync Layer specifications (ISO/IEC 14496-1 1111 section 10), in order to read this format the SLConfigDescriptor may 1112 be required. These bitstream are binary data and MUST be encoded for 1113 non-binary transport (for Email, the Base64 encoding is sufficient). 1114 This type is also defined for transfer via RTP. The RTP packets 1115 MUST be packetized according to the RTP payload format defined in 1116 RFC . 1118 Security considerations: 1119 As in RFC . 1121 Interoperability considerations: 1122 MPEG-4 provides a large and rich set of tools for the coding of 1123 visual objects. For effective implementation of the standard, 1124 subsets of the MPEG-4 tool sets have been provided for use in 1125 specific applications. These subsets, called 'Profiles', limit the 1126 size of the tool set a decoder is required to implement. In order to 1127 restrict computational complexity, one or more 'Levels' are set for 1128 each Profile. A Profile@Level combination allows: 1130 Gentric et al. Expires December 2001 21 1131 . a codec builder to implement only the subset of the standard he 1132 needs, while maintaining interworking with other MPEG-4 devices 1133 included in the same combination, and 1134 . checking whether MPEG-4 devices comply with the standard 1135 ('conformance testing'). 1136 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1137 by the parameter "profile-level-id". Interoperability between a 1138 sender and a receiver may be achieved by specifying the parameter 1139 "profile-level-id" in MIME content, or by arranging in the 1140 capability exchange/announcement procedure to set this parameter 1141 mutually to the same value. 1143 Published specification: 1144 The specifications for MPEG-4 streams are presented in ISO/IEC 1145 14469-1, 14469-2, and 14469-3. The RTP payload format is described 1146 in RFC . 1148 Applications which use this media type: 1149 Multimedia streaming and conferencing tools, Internet messaging and 1150 Email applications. Also supra-relativistic elementary particle 1151 hyperspace tunneling trans-galactic communication devices :-) 1153 Additional information: none 1155 Magic number(s): none 1157 File extension(s): 1158 None. A file format with the extension .mp4 has been defined for 1159 MPEG-4 content but is not directly correlated with this MIME type 1160 which sole purpose is RTP transport. 1162 Macintosh File Type Code(s): none 1164 Person & email address to contact for further information: 1165 Authors of RFC . 1167 Intended usage: COMMON 1169 Author/Change controller: 1170 Authors of RFC . 1172 4.2 Concatenation of parameters 1174 Multiple parameters SHOULD be expressed as a MIME media type string, 1175 in the form of a semicolon-separated list of parameter=value pairs 1176 (see examples in Appendix). 1178 4.3 Usage of SDP 1180 4.3.1 The a=fmtp keyword 1182 It is assumed that one typical way to transport the above-described 1183 parameters associated with this payload format is via a SDP message 1185 Gentric et al. Expires December 2001 22 1186 for example transported to the client in reply to a RTSP DESCRIBE of 1187 via SAP. In that case the (a=fmtp) keyword MUST be used as described 1188 in RFC 2327 [10, section 6]. The syntax being then: 1190 a=fmtp: = 1192 4.3.2 SDP example 1194 The following is an example of SDP syntax for the description of a 1195 session containing one MPEG-4 audio stream, one MPEG-4 video and 1196 three MPEG-4 system streams, the first one being BIFS, the second 1197 one OD and the third one IPMP. All are transported using this format 1198 and the AVP profile [12]. Note that the video stream DTSDelta are 1199 encoded on 4 bits in this example. See the Appendix for more 1200 examples. 1202 o= .... 1203 I= .... 1204 c=IN IP4 123.234.71.112 1206 m=video 1034 RTP/AVP 97 1207 a=fmtp:97 StreamType=4;DTSDeltaLength=4 1208 a=rtpmap:97 mpeg4-sl 1210 m=audio 810 RTP/AVP 98 1211 a=fmtp:98 StreamType=5; profile-level-id=1; config=7866E7E6EF 1212 a=rtpmap:98 mpeg4-sl 1214 m=application 1234 RTP/AVP 99 1215 a=rtpmap:99 mpeg4-sl 1216 a=fmtp:99 StreamType=3; 1218 m=application 1236 RTP/AVP 99 1219 a=rtpmap:99 mpeg4-sl 1220 a=fmtp:99 StreamType=1; 1222 m=application 1238 RTP/AVP 99 1223 a=rtpmap:99 mpeg4-sl 1224 a=fmtp:99 StreamType=7; 1226 5. Other issues 1228 5.1 SL packetized stream reconstruction 1230 The purpose of this section is to document how a receiver can 1231 reconstruct a valid SL packetized stream. Since this format directly 1232 transports SL packets this reconstruction is performed by reversing 1233 the payload structure rules (section 3). We explicitly describe here 1234 the most complex transformations. 1236 In the following let (i) be the index of SL packets inside one RTP 1237 packet (starting at zero for each RTP packet), let SLPacketHeader.x 1239 Gentric et al. Expires December 2001 23 1240 denote field x of the reconstructed SL packet header, let MSLH.x 1241 denote field x of the received MSLH, etc. 1243 SLPacketHeader.packetSequenceNumber is restored from MSLH.Index and 1244 MSLH.IndexDelta using: 1246 If ( IndexLength == 0) { // or is absent 1247 if ( SLConfig.packetSeqNumLength == 0 ) { 1248 // this stream does not have SL packet sequence number 1249 } 1250 else { 1251 // illegal, normally the sender MUST map 1252 // SLPacketHeader.packetSequenceNumber in MSLH 1253 // and set a relevant IndexLength value; 1254 // otherwise it is unfortunately impossible for the receiver 1255 // to reconstruct the correct sequence 1256 } 1257 } 1258 else { // IndexLength is not zero 1259 if ( SLConfig.packetSeqNumLength == 0 ) { 1260 // the original SL stream does not have SL packet 1261 // sequence numbers, typically the sender inserted them 1262 // in order to implement interleaving at the RTP level; 1263 // they must be ignored for SL stream reconstruction 1264 } 1265 else { 1266 if (i == 0){ // first SL packet in RTP packet 1267 SLPacketHeader.packetSequenceNumber(0) = MSLH.Index(0); 1268 } 1269 else { // remaining SL packets 1270 SLPacketHeader.packetSequenceNumber(i+1)= 1271 SLPacketHeader.packetSequenceNumber(i) 1272 + MSLH.IndexDelta(i+1) 1273 +1; 1274 } 1275 } 1277 All time stamps (CTS, DTS, OCR), when present, are restored from the 1278 delta values. Time stamps flags (CTSFlag, DTSFlag) in MSLH are used 1279 to reconstruct respectively the compositionTimeStampFlag and 1280 decodingTimeStampFlag of SLPacketHeader. 1282 if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent 1283 // CTS is not transported for this RTP stream 1284 if (i == 0){ // first SL packet in RTP packet 1285 if ( SLConfig.useTimeStamps == 1 ) { 1286 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1287 SLPacketHeader.compositionTimeStampFlag(0) = 1; 1288 SLPacketHeader.compositionTimeStamp(0) = RTP TimeStamp; 1289 } 1290 else { 1291 // ignore 1292 } 1294 Gentric et al. Expires December 2001 24 1295 } 1296 else { 1297 // empty 1298 } 1299 } 1300 else { // non-first SL packets in RTP packet 1301 if ( SLConfig.useTimeStamps == 1 ) { 1302 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1303 SLPacketHeader.compositionTimeStampFlag(i) = 0; 1304 } 1305 else { 1306 // ignore 1307 } 1308 } 1309 else { 1310 // empty 1311 } 1312 } 1313 } 1314 else { // CTSDeltaLength is not zero 1315 // CTS is transported for this stream 1316 if ( SLConfig.useTimeStamps == 1 ) { 1317 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1318 SLPacketHeader.compositionTimeStampFlag(i) = 1319 MSLH.CTSFlag(i); 1320 SLPacketHeader.compositionTimeStamp(i) = 1321 RTP TimeStamp + MSLH.CTSDelta(i); 1322 } 1323 else { 1324 // ignore CTSFlag (which must be zero) 1325 } 1326 else { 1327 // this is strange and sub-optimal at best 1328 // a receiver should ignore this 1329 } 1330 } 1332 if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent 1333 // DTS is not transported for this stream 1334 if ( SLConfig.useTimeStamps == 1 ) { 1335 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1336 SLPacketHeader.decodingTimeStampFlag(i) = 0; 1337 } 1338 else { 1339 // ignore 1340 } 1341 } 1342 else { 1343 // empty 1344 } 1345 } 1346 else { 1347 // DTS is transported for this stream 1349 Gentric et al. Expires December 2001 25 1350 if ( SLConfig.useTimeStamps == 1 ) { 1351 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1352 SLPacketHeader.decodingTimeStampFlag(i) = 1353 MSLH.DTSFlag(i); 1354 SLPacketHeader.decodingTimeStamp(i) = 1355 RTP TimeStamp + MSLH.DTSDelta(i); 1356 } 1357 else { 1358 // ignore DTSFlag (which must be zero) 1359 } 1360 } 1361 else { 1362 // this is strange and sub-optimal at best 1363 // a receiver should ignore this 1364 } 1365 } 1367 if ( OCRDeltaLength == 0) { // or OCRDeltaLength is absent 1368 // the RTP stream does not transport any OCR 1369 if ( SLConfig.OCRLenght == 0 ) { 1370 // this stream does not have any OCR 1371 } 1372 else { 1373 // illegal, normally the sender MUST detect 1374 // OCRs, replace them with OCRDelta and set 1375 // a relevant OCRDeltaLength value 1376 } 1377 } 1378 else { 1379 if ( SLConfig.OCRLenght == 0 ) { 1380 // this is strange and sub-optimal at best 1381 // a receiver should ignore this 1382 } 1383 else { 1384 SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i); 1385 if ( SLPacketHeader.OCRflag(i) == 1) { 1386 SLPacketHeader.objectClockReference(i) = 1387 RTP TimeStamp + RSLH.OCRDelta(i); 1388 } 1389 } 1390 } 1392 In the SingleSL mode the AccessUnitEndFlag, if needed, is restored 1393 from the M bit, as follows: 1395 if ( SLConfig.useAccessUnitEndFlag == 0 ) { 1396 // this SL stream does not signal access unit ends 1397 else { 1398 SLPacketHeader.AccessUnitEndFlag = M bit; 1399 } 1401 In the multipleSL mode the AccessUnitEndFlag is untouched in RSLH. 1403 Gentric et al. Expires December 2001 26 1404 The other SL packet header fields SHALL remain as found in RSLH. 1406 It is obvious that in the general case the reconstruction of the 1407 original SL packetized stream requires SL-awareness. However this 1408 payload format allows in all cases a receiver that does not know 1409 about the SL syntax to reconstruct the semantic of SL for the 1410 following very useful features: 1411 - Packet order (decoding order) 1412 - Access Unit boundaries (using the M bit) 1413 - Access Unit fragments (i.e. SL packet boundaries using 1414 MSLH.PayloadSize) 1415 - Composition Time Stamps (using the RTP Time Stamp and 1416 MSLH.CTSDelta) 1417 - Decoding Time Stamps (using the RTP Time Stamp and MSLH.DTSDelta) 1418 - Packet sequence number (using the RTP Time Sequence number and 1419 MSLH.Index) 1421 5.2 Handling of scene description streams 1423 MPEG-4 introduces new stream types as described in section 1 namely 1424 Object Descriptors and BIFS. In the following both OD and BIFS are 1425 discussed on the same basis i.e. as "scene description". 1427 Considering scene description as a "stream-able" type of content is 1428 a rather new concept and for that reasons some specific comments are 1429 needed. 1431 Typically scene descriptions are encoded in such a way that 1432 information loss would in the general case cripple the presentation 1433 beyond any hope of repair by the receiver. Still this is well suited 1434 for a number of multimedia applications were the scene is first made 1435 available via reliable channels to the client and then played. This 1436 payload format is not intended for this type of applications for 1437 which download of MPEG-4 interchange (.mp4) files is typical. 1438 However it can also be used if the RTP packets are transported using 1439 TCP or any other reliable protocol. 1441 On the other hand MPEG-4 has introduced the possibility to 1442 dynamically change the scene description by sending animation 1443 information (changes in parameters) and structural change 1444 information (updates). Since this information has to be sent in a 1445 timely fashion MPEG-4 has defined a number of techniques in order to 1446 encode the scene description in a manner that makes it behave 1447 similarly to other temporal encoding schemes such as audio and 1448 video. This payload format is intended for this usage. 1450 Note that in many cases the application will consist of first the 1451 reliable transmission of a static initial scene followed by the 1452 streaming of animations and updates. For this reason the usage of 1453 this payload format is attractive since it offers a unique solution. 1455 Gentric et al. Expires December 2001 27 1456 Senders must be aware that suitable schemes should be used when 1457 scene description streams transport sensitive configuration 1458 information. For example in case the RTP packet transporting an OD- 1459 update command would be lost, the corresponding media stream would 1460 not be accessible by the receiver. 1462 Redundancy is a possibility and may either be added by tools 1463 hierarchically higher than this payload format, e.g. by packet based 1464 FEC, re-transmission, or similar tools. In such a case, the general 1465 congestion control principles have to be observed. 1467 Since BIFS and OD streams may be modified during the session with 1468 update commands, there is a need to send both update commands and 1469 full BIFS/OD refresh. For that reason MPEG-4 defines Random Access 1470 Points (RAP) for scene description streams (OD and BIFS) where by 1471 definition a decoder can restart decoding i.e. receives a "full 1472 update" of the scene. This mechanism is called Scene and Object 1473 Description Carrousel. The AU Sequence Number field of SL Packet 1474 Header is used to support this behavior at the Synchronization 1475 Layer. When two access units are sent consecutively with the same AU 1476 Sequence Number, the second one is assumed to be a semantic 1477 repetition of the first. If a receiver starts to listen in the 1478 middle of a session or has detected losses, it can skip all received 1479 Access Units until such a RAP. The periodicity of transmission of 1480 these RAPs should be chosen/adjusted depending on the application 1481 and the network it is deployed on; i.e. exactly like Intra-coded 1482 frames for video, it is the responsibility of the sender to make 1483 sure the periodicity of RAPs is suitable. 1485 5.3 Multiplexing 1487 An advanced MPEG-4 session may involve a large number of objects 1488 that may be as many as a few hundred, transporting each ES as an 1489 individual RTP stream may not always be practical. Allocating and 1490 controlling hundreds of destination addresses for each MPEG-4 1491 session may pose insurmountable session administration problems. 1492 The input/output processing overhead at the end-points will be 1493 extremely high also. Additionally, low delay transmission of low 1494 bitrate data streams, e.g. facial animation parameters, results in 1495 extremely high header overheads. 1497 To solve these problems, MPEG-4 data transport requires a 1498 multiplexing scheme that allows selective bundling of several ESs. 1499 This is beyond the scope of the payload format defined here. 1501 The MPEG-4's Flexmux multiplexing scheme may be used for this 1502 purpose and a specific RTP payload format is being developed [11]. 1504 Another approach may be to develop a generic RTP multiplexing scheme 1505 usable for MPEG-4 data. The multiplexing scheme reported in [8] may 1506 be a candidate for this approach. 1508 Gentric et al. Expires December 2001 28 1509 For MPEG-4 applications, the multiplexing technique needs to address 1510 the following requirements: 1512 i. The ESs multiplexed in one stream can change frequently during a 1513 session. Consequently, the coding type, individual packet size and 1514 temporal relationships between the multiplexed data units must be 1515 handled dynamically. 1517 ii. The multiplexing scheme should have a mechanism to determine the 1518 ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is 1519 not a part of the SL header. 1521 iii. In general, an SL packet does not contain information about its 1522 size. The multiplexing scheme should be able to delineate the 1523 multiplexed packets whose lengths may vary from a few bytes to close 1524 to the path-MTU. 1526 5.5 Overlap with RFC 3016 1528 This payload format has been designed to have a (large) overlap with 1529 RFC 3016 [7]. The conditions for this overlap are: 1530 Conditions for RFC 3016: 1531 i. MPEG-4 video elementary streams only 1532 ii. There MUST be a single VOP or Video Packet per RTP packet (only 1533 recommended in RFC 3016) 1534 iii. The decoder configuration MUST be signaled out-of-band either 1535 using the Config mime parameter or using the OD framework 1536 Conditions for this payload format: 1537 i. No structural parameters defined (or all set to zero), i.e. 1538 Single-SL mode with empty MSLH and empty RSLH. 1539 ii. Receivers MUST be ready to accept (and ignore) video 1540 configuration headers (e.g. VOSH, VO and VOL) and visual-object- 1541 sequence-end-code transported in-band. 1543 6. Security Considerations 1545 RTP packets using the payload format defined in this specification 1546 are subject to the security considerations discussed in the RTP 1547 specification [5]. This implies that confidentiality of the media 1548 streams is achieved by encryption. Because the data compression used 1549 with this payload format is applied end-to-end, encryption may be 1550 performed on the compressed data so there is no conflict between the 1551 two operations. The packet processing complexity of this payload 1552 type (i.e. excluding media data processing) does not exhibit any 1553 significant non-uniformity in the receiver side to cause a denial- 1554 of-service threat. 1556 However, it is possible to inject non-compliant MPEG streams (Audio, 1557 Video, and Systems) to overload the receiver/decoder's buffers which 1558 might compromise the functionality of the receiver or even crash it. 1559 This is especially true for end-to-end systems like MPEG where the 1560 buffer models are precisely defined. 1562 Gentric et al. Expires December 2001 29 1563 MPEG-4 Systems supports stream types including commands that are 1564 executed on the terminal like OD commands, BIFS commands, etc. and 1565 programmatic content like MPEG-J (Java(TM) Byte Code) and 1566 ECMAScript. It is possible to use one or more of the above in a 1567 manner non-compliant to MPEG to crash or temporarily make the 1568 receiver unavailable. 1570 Authentication mechanisms can be used to validate of the sender and 1571 the data to prevent security problems due to non-compliant malignant 1572 MPEG-4 streams. 1574 A security model is defined in MPEG-4 Systems streams carrying MPEG- 1575 J access units which comprises Java(TM) classes and objects. MPEG-J 1576 defines a set of Java APIs and a secure execution model. MPEG-J 1577 content can call this set of APIs and Java(TM) methods from a set of 1578 Java packages supported in the receiver within the defined security 1579 model. According to this security model, downloaded byte code is 1580 forbidden to load libraries, define native methods, start programs, 1581 read or write files, or read system properties. 1583 Receivers can implement intelligent filters to validate the buffer 1584 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1585 ECMAScript) commands in the streams. However, this can increase the 1586 complexity significantly. 1588 7. Acknowledgements 1589 This document evolved across several years thanks to contributions 1590 from a large number of people since it is based on work within the 1591 IETF AVT working group and various ISO MPEG working groups, 1592 especially the 4-on-IP ad-hoc group in the last stages. The authors 1593 wish to thank Guido Fransceschini, Art Howarth, Dave Mackie, Dave 1594 Singer, and Stephan Wenger for their valuable comments. 1596 8. References 1598 [1] ISO/IEC 14496-1:2001 MPEG-4 Systems 1600 [2] ISO/IEC 14496-2:2001 MPEG-4 Visual 1602 [3] ISO/IEC 14496-3:2001 MPEG-4 Audio 1604 [4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework. 1606 [5] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport 1607 Protocol for Real Time Applications RFC 1889, Internet Engineering 1608 Task Force, January 1996. 1610 [6] S. Bradner, Key words for use in RFCs to Indicate Requirement 1611 Levels, RFC 2119, Internet Engineering Task Force, March 1997. 1613 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 1614 payload format for MPEG-4 Audio/Visual streams, Internet Engineering 1615 Task Force, RFC 3016. 1617 Gentric et al. Expires December 2001 30 1619 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed 1620 RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-02.txt, 1621 November 2000. 1623 [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over 1624 IP-based Protocols, work in progress, draft-singer-mpeg4-ip-02.txt, 1625 May 2001. 1627 [10] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327, 1628 Internet Engineering Task Force, April 1998. 1630 [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed 1631 Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt, 1632 February 2001. 1634 [12] H. Schulzrinne, RTP Profile for Audio and Video Conferences 1635 with Minimal Control, RFC1890, Internet Engineering Task Force, 1636 January 1996. 1638 9. Authors' Addresses 1640 Olivier Avaro 1641 France Telecom 1642 35 A Schutzenhuttenweg 1643 60598 Frankfurt am Main 1644 Deutschland 1645 e-mail: olivier.avaro@francetelecom.fr 1647 Andrea Basso 1648 AT&T Labs Research 1649 200 Laurel Avenue 1650 Middletown, NJ 07748 1651 USA 1652 e-mail: basso@research.att.com 1654 Stephen L. Casner 1655 Packet Design, Inc. 1656 66 Willow Place 1657 Menlo Park, CA 94025 1658 USA 1659 e-mail: casner@acm.org 1661 M. Reha Civanlar 1662 AT&T Labs - Research 1663 100 Schultz Drive 1664 Red Bank, NJ 07701 1665 USA 1666 e-mail: civanlar@research.att.com 1668 Philippe Gentric 1669 Philips Digital Networks � MP4Net 1671 Gentric et al. Expires December 2001 31 1672 51 rue Carnot 1673 92156 Suresnes 1674 France 1675 e-mail: philippe.gentric@philips.com 1677 Carsten Herpel 1678 THOMSON multimedia 1679 Karl-Wiechert-Allee 74 1680 30625 Hannover 1681 Germany 1682 e-mail: herpelc@thmulti.com 1684 Zvi Lifshitz 1685 Optibase Ltd. 1686 7 Shenkar St. 1687 Herzliya 46120 1688 Israel 1689 e-mail: zvil@optibase.com 1691 Young-kwon Lim 1692 mp4cast (MPEG-4 Internet Broadcasting Solution Consortium) 1693 1001-1 Daechi-Dong Gangnam-Gu 1694 Seoul, 305-333, 1695 Korea 1696 e-mail : young@techway.co.kr 1698 Colin Perkins 1699 USC Information Sciences Institute 1700 4350 N. Fairfax Drive #620 1701 Arlington, VA 22203 1702 USA 1703 e-mail : csp@isi.edu 1705 Jan van der Meer 1706 Philips Digital Networks 1707 Cederlaan 4 1708 5600 JB Eindhoven 1709 Netherlands 1710 e-mail : jan.vandermeer@philips.com 1712 APPENDIX: Examples of usage 1714 This payload format has been designed to transport efficiently a 1715 very versatile packetization scheme: the MPEG-4 Synch Layer; as a 1716 result its complexity is larger than the average RTP payload format. 1717 For this reason this section describes a number of key examples of 1718 how this payload format can be used. 1720 Gentric et al. Expires December 2001 32 1721 A C++-like syntax called SDL (Syntactic Description Language) 1722 defined in [1, section 14] is used to economically describe MPEG-4 1723 system data structures. 1725 Furthermore these examples assume that the (a=fmtp) SDP syntax is 1726 used to convey the MIME parameters of the payload format. 1728 Appendix.1 MPEG-4 Video 1730 Let us consider the case of a 30 frames per second MPEG-4 video 1731 stream which bit rate is high enough that Access Units have to be 1732 split in several SL packets (typically above 300 kb/s). 1734 Let us assume also that the video codec generates in that case Video 1735 Packets suitable to fit in one SL packet i.e that the video codec is 1736 MTU aware and the MTU is 1500 bytes. We assume furthermore that this 1737 stream contains B frames and that decodingTimeStamps are present. 1739 SLConfigDescriptor 1741 In this example the SLConfigDescriptor is: 1743 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1744 tag=SLConfigDescrTag { 1745 bit(8) predefined; 1746 if (predefined==0) { 1747 bit(1) useAccessUnitStartFlag; = 1 1748 bit(1) useAccessUnitEndFlag; = 0 1749 bit(1) useRandomAccessPointFlag; = 1 1750 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1751 bit(1) usePaddingFlag; = 0 1752 bit(1) useTimeStampsFlag; = 1 1753 bit(1) useIdleFlag; = 0 1754 bit(1) durationFlag; = 0 1755 bit(32) timeStampResolution; = 30 1756 bit(32) OCRResolution; = 0 1757 bit(8) timeStampLength; = 32 1758 bit(8) OCRLength; = 0 1759 bit(8) AU_Length; = 0 1760 bit(8) instantBitrateLength; = 0 1761 bit(4) degradationPriorityLength; = 0 1762 bit(5) AU_seqNumLength; = 0 1763 bit(5) packetSeqNumLength; = 0 1764 bit(2) reserved=0b11; 1765 } 1766 if (durationFlag) { 1767 bit(32) timeScale; // NOT USED 1768 bit(16) accessUnitDuration; // NOT USED 1769 bit(16) compositionUnitDuration; // NOT USED 1770 } 1771 if (!useTimeStampsFlag) { 1772 bit(timeStampLength) startDecodingTimeStamp; // NOT USED 1773 bit(timeStampLength) startCompositionTimeStamp; // NOT USED 1775 Gentric et al. Expires December 2001 33 1776 } 1777 } 1779 The useRandomAccessPointFlag is set so that the 1780 randomAccessPointFlag can indicate that the corresponding SL packet 1781 contains a GOV and the first Video Packet of an Intra coded frame. 1783 SL Packet Header structure 1785 With this configuration we have the following SL packet header 1786 structure: 1788 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1789 bit(1) accessUnitStartFlag; // 1 bit 1790 if (accessUnitStartFlag) { 1791 bit(1) randomAccessPointFlag; // 1 bit 1792 bit(1) decodingTimeStampFlag; // 1 bit 1793 bit(1) compositionTimeStampFlag; // 1 bit 1794 if (decodingTimeStampFlag) { 1795 bit(SL.timeStampLength) decodingTimeStamp; 1796 } 1797 if (compositionTimeStampFlag) { 1798 bit(SL.timeStampLength) compositionTimeStamp; 1799 } 1800 } 1802 Parameters 1804 decodingTimeStamps are encoded on 32 bits, which is much more than 1805 needed for delta. Therefore the sender will use DTSDeltaLength to 1806 signal that only 7 bits are used for the coding of relative DTS in 1807 the RTP packet. 1809 The RSLHSectionSize cannot exceed 2 bits, which is encoded on 2 bits 1810 and signaled by RSLHSectionSizeLength. The resulting concatenated 1811 fmtp line is: 1813 a=fmtp: DTSDeltaLength=7;RSLHSectionSizeLength=3 1815 RTP packet structure 1817 Two cases can occur; for packets that transport first fragments of 1818 Access Units we have: 1820 +=========================================+=============+ 1821 | Field | size | 1822 +=========================================+=============+ 1823 | RTP header | - | 1824 +-----------------------------------------+-------------+ 1825 | DTSFlag = 1 | 1 bit | 1826 +-----------------------------------------+-------------+ 1827 | DTSDelta | 7 bits | 1828 +-----------------------------------------+-------------+ 1830 Gentric et al. Expires December 2001 34 1831 | bits to byte alignment | 0 bits | 1832 +-----------------------------------------+-------------+ 1833 | RSLHSectionSize = 4 | 3 bits | 1834 +-----------------------------------------+-------------+ 1835 | accessUnitStartFlag = 1 | 1 bit | 1836 +-----------------------------------------+-------------+ 1837 | randomAccessPointFlag | 1 bit | 1838 +-----------------------------------------+-------------+ 1839 | decodingTimeStampFlag | 1 bit | 1840 +-----------------------------------------+-------------+ 1841 | compositionTimeStampFlag | 1 bit | 1842 +-----------------------------------------+-------------+ 1843 | bits to byte alignment | 1 bit | 1844 +-----------------------------------------+-------------+ 1845 | SL packet payload | N bytes | 1846 +-----------------------------------------+-------------+ 1848 For packets that transport non-first fragments of Access Units we 1849 have: 1851 +=========================================+=============+ 1852 | Field | size | 1853 +=========================================+=============+ 1854 | RTP header | - | 1855 +-----------------------------------------+-------------+ 1856 | DTSFlag = 0 | 1 bit | 1857 +-----------------------------------------+-------------+ 1858 | bits to byte alignment | 7 bits | 1859 +-----------------------------------------+-------------+ 1860 | RSLHSectionSize = 1 | 3 bits | 1861 +-----------------------------------------+-------------+ 1862 | accessUnitStartFlag = 0 | 1 bit | 1863 +-----------------------------------------+-------------+ 1864 | bits to byte alignment | 4 bits | 1865 +-----------------------------------------+-------------+ 1866 | SL packet payload | N bytes | 1867 +-----------------------------------------+-------------+ 1869 Overhead estimation 1871 In this example we have a RTP overhead of 40 + 2 bytes for 1400 1872 bytes of payload i.e. 3 % overhead. 1874 Appendix.2 RFC 3016 compatible MPEG-4 Video 1876 This is an example of a video stream where the SL is configured to 1877 produce RTP packets compatible with RFC 3016. 1879 SLConfigDescriptor 1881 In this example the SLConfigDescriptor is: 1883 Gentric et al. Expires December 2001 35 1884 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1885 tag=SLConfigDescrTag { 1886 bit(8) predefined; 1887 if (predefined==0) { 1888 bit(1) useAccessUnitStartFlag; = 0 1889 bit(1) useAccessUnitEndFlag; = 1 1890 bit(1) useRandomAccessPointFlag; = 0 1891 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1892 bit(1) usePaddingFlag; = 0 1893 bit(1) useTimeStampsFlag; = 0 1894 bit(1) useIdleFlag; = 0 1895 bit(1) durationFlag; = 0 1896 bit(32) timeStampResolution; = 0 1897 bit(32) OCRResolution; = 0 1898 bit(8) timeStampLength; = 0 1899 bit(8) OCRLength; = 0 1900 bit(8) AU_Length; = 0 1901 bit(8) instantBitrateLength; = 0 1902 bit(4) degradationPriorityLength; = 0 1903 bit(5) AU_seqNumLength; = 0 1904 bit(5) packetSeqNumLength; = 0 1905 bit(2) reserved=0b11; 1906 } 1907 if (durationFlag) { 1908 bit(32) timeScale; // NOT USED 1909 bit(16) accessUnitDuration; // NOT USED 1910 bit(16) compositionUnitDuration; // NOT USED 1911 } 1912 if (!useTimeStampsFlag) { 1913 bit(timeStampLength) startDecodingTimeStamp; = 0 1914 bit(timeStampLength) startCompositionTimeStamp; = 0 1915 } 1916 } 1918 SL Packet Header structure 1920 With this configuration we have the following SL packet header 1921 structure: 1923 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1924 if (SL.useAccessUnitEndFlag) { 1925 bit(1) accessUnitEndFlag; // 1 bit 1926 } 1927 } 1929 In this case this payload produces RTP packets that are exactly 1930 conformant to RFC 3016 and the Synch Layer is reduced to a purely 1931 logical construction that neither sender nor receiver need to 1932 implement. 1934 Parameters 1936 This configuration is the default one; no parameters are required. 1938 Gentric et al. Expires December 2001 36 1939 RTP packet structure 1941 Note that accessUnitEndFlag is mapped to the RTP header M bit. 1943 +=========================================+=============+ 1944 | Field | size | 1945 +=========================================+=============+ 1946 | RTP header | - | 1947 +-----------------------------------------+-------------+ 1948 | SL packet payload | 1400 bytes | 1949 +-----------------------------------------+-------------+ 1951 Overhead 1953 In this example we have a RTP overhead of 40 bytes for 1400 bytes of 1954 payload i.e. 3 % overhead. 1956 Appendix.3 Low delay MPEG-4 Audio 1958 This example is for a low delay audio service. For this reason a 1959 single SL packet is transported in each RTP packet. 1961 SLConfigDescriptor 1963 Since CTS=DTS and Access Unit duration is constant signaling of 1964 MPEG-4 time stamps is not needed (the durationFlag of SLConfig is 1965 set) 1967 We also assume here an audio Object Type for which all Access Units 1968 are Random Access Points, which is signaled using the 1969 hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. 1971 We assume furthermore a mode where the Access Unit size is constant 1972 and equal to 5 bytes (which is signaled with AU_Length). 1974 In this example the SLConfigDescriptor is: 1976 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1977 tag=SLConfigDescrTag { 1978 bit(8) predefined; 1979 if (predefined==0) { 1980 bit(1) useAccessUnitStartFlag; = 0 1981 bit(1) useAccessUnitEndFlag; = 0 1982 bit(1) useRandomAccessPointFlag; = 0 1983 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 1984 bit(1) usePaddingFlag; = 0 1985 bit(1) useTimeStampsFlag; = 0 1986 bit(1) useIdleFlag; = 0 1987 bit(1) durationFlag; = 1 // signals constant AU duration 1988 bit(32) timeStampResolution; = 0 1989 bit(32) OCRResolution; = 0 1990 bit(8) timeStampLength; = 0 1992 Gentric et al. Expires December 2001 37 1993 bit(8) OCRLength; = 0 1994 bit(8) AU_Length; = 5 1995 bit(8) instantBitrateLength; = 0 1996 bit(4) degradationPriorityLength; = 0 1997 bit(5) AU_seqNumLength; = 0 1998 bit(5) packetSeqNumLength; = 0 1999 bit(2) reserved=0b11; 2000 } 2001 if (durationFlag) { 2002 bit(32) timeScale; = 1000 // for milliseconds 2003 bit(16) accessUnitDuration; = 10 // ms 2004 bit(16) compositionUnitDuration; = 10 // ms 2005 } 2006 if (!useTimeStampsFlag) { 2007 bit(timeStampLength) startDecodingTimeStamp; = 0 2008 bit(timeStampLength) startCompositionTimeStamp; = 0 2009 } 2010 } 2012 SL packet header 2014 With this configuration the SL packet header is empty. 2016 Parameters 2018 No parameters are required. 2020 RTP packet structure 2022 Note that the RTP header M bit should be always set to 1. 2024 +=========================================+=============+ 2025 | Field | size | 2026 +=========================================+=============+ 2027 | RTP header | - | 2028 +-----------------------------------------+-------------+ 2029 | SL packet payload | 5 bytes | 2030 +-----------------------------------------+-------------+ 2032 Overhead estimation 2034 The overhead is extremely large i.e. more than 800 %, since 40 bytes 2035 of headers are required to transport 5 bytes of data. Note however 2036 that RTP header compression would work well since time stamps 2037 increments are constant. 2039 Appendix.4 Media delivery MPEG-4 Audio 2041 Gentric et al. Expires December 2001 38 2042 This example is for a media delivery service where delay is not an 2043 issue but efficiency is. In this case several SL Packets are 2044 transported in each RTP packet. 2046 SLConfigDescriptor 2048 Is the same as in Appendix.3 2050 SL packet header 2052 With this configuration the SL packet header is empty. 2054 Parameters 2056 The absence of RSLHSectionSizeLength indicates that the RSLHSection 2057 is empty. 2059 The size of SL Packets (which are all complete Access Units in this 2060 case) is constant and is indicated with: 2062 a=fmtp: ConstantSize=5 2064 This also indicates to the receiver that the Multiple-SL mode will 2065 be used, the 2 bytes field that would give the size of the 2066 MSLHSection is ommited since in this case this field always contains 2067 zero (the MSLHSection is always empty). 2069 RTP packet structure 2071 Note that the RTP header M bit is always set to 1, which indicates 2072 to the receiver that only complete Access Units are transported. 2074 +=========================================+=============+ 2075 | Field | size | 2076 +=========================================+=============+ 2077 | RTP header | - | 2078 +-----------------------------------------+-------------+ 2079 | SL packet payload | 5 bytes | 2080 +-----------------------------------------+-------------+ 2081 | SL packet payload | 5 bytes | 2082 +-----------------------------------------+-------------+ 2083 | etc, until MTU is reached | 2084 +-----------------------------------------+-------------+ 2085 | SL packet payload | 5 bytes | 2086 +-----------------------------------------+-------------+ 2088 Overhead estimation 2090 The overhead is 3% i.e. minimal. 2092 Appendix.5 A more complex case: AAC with interleaving 2094 Gentric et al. Expires December 2001 39 2095 Let us consider AAC around 130 kb/s where each Access Unit is split 2096 in 4 SL packets corresponding to Error Sensitivity Categories (ESC) 2097 of maximum 90 bytes for which interleaving is very useful in terms 2098 of error resilience. We thus use an interleaving scheme where 15 SL 2099 Packets (extracted from 15 consecutive Access Units) are used to 2100 construct each RTP packet in order to match a MTU of 1500 bytes. 2101 Note that since ESC fragments are not byte aligned we also use the 2102 paddingFlag and paddingBits features of the Synch Layer. 2104 The interleaving sequence is 4 RTP packets and 350 ms long, which is 2105 too long for conferencing but perfectly OK for Internet radio. 2107 Since the sequence contains 60 SL packets, the sequence number can 2108 be encoded on 6 bits. However 2 bits are actually enough if the 2109 sender always resets the SL packet sequence number to zero at the 2110 start of each sequence, since only the first MSLH in each of the 4 2111 RTP packets in the sequence carries an absolute sequence number 2112 value (0,1,2,3). 2114 2 bits are also enough for IndexDelta, which is constant and equal 2115 to 3 (since +1 is automatically added). 2117 Note that the 4th RTP packet in each sequence has its M bit set to 1 2118 since it contains 15 SL packets transporting the end of 15 2119 consecutive Access Units. 2121 With this scheme a sender (for example upon reception of RTCP 2122 reports indicating high loss rates) can (for example) choose to 2123 duplicate for each interleaving sequence the first RTP packet that 2124 contains the most useful data in terms of ESC or apply other error 2125 protection techniques, with due care to congestion issues. 2127 In this example we will also show several other SL features (OCR, AU 2128 boundary flags, padding, as detailed below). 2130 One feature demonstrated by this example is the degradation 2131 priority. We assume degradation priority can take 4 different 2132 values, mapped to Error Sensitivity Categories, and is encoded on 2 2133 bits. This interleaving scheme makes sure that only SL packets of 2134 identical degradation priorities are grouped in the same RTP packet 2135 (3.6.3) and that only the first RSLH of each RTP packet transports 2136 the degradation priority. 2138 We also assume that for each last SL packet of each RTP packet the 2139 server inserts an OCR. 2141 SLConfigDescriptor 2143 In this example the SLConfigDescriptor is: 2145 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2146 tag=SLConfigDescrTag { 2148 Gentric et al. Expires December 2001 40 2149 bit(8) predefined; 2150 if (predefined==0) { 2151 bit(1) useAccessUnitStartFlag; = 1 2152 bit(1) useAccessUnitEndFlag; = 1 2153 bit(1) useRandomAccessPointFlag; = 0 2154 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 2155 bit(1) usePaddingFlag; = 1 // we need to signal padding bits 2156 bit(1) useTimeStampsFlag; = 0 2157 bit(1) useIdleFlag; = 0 2158 bit(1) durationFlag; = 1 2159 bit(32) timeStampResolution; = 0 2160 bit(32) OCRResolution; = 30 2161 bit(8) timeStampLength; = 0 2162 bit(8) OCRLength; = 32 2163 bit(8) AU_Length; = 0 2164 bit(8) instantBitrateLength; = 0 2165 bit(4) degradationPriorityLength; = 2 2166 bit(5) AU_seqNumLength; = 0 2167 bit(5) packetSeqNumLength; = 6 2168 bit(2) reserved=0b11; 2169 } 2170 if (durationFlag) { 2171 bit(32) timeScale; = 1000// milliseconds 2172 bit(16) accessUnitDuration; = 23.22 // ms 2173 bit(16) compositionUnitDuration; = 23.22 // ms 2174 } 2175 if (!useTimeStampsFlag) { 2176 bit(timeStampLength) startDecodingTimeStamp; = 0 2177 bit(timeStampLength) startCompositionTimeStamp; = 0 2178 } 2179 } 2181 SL Packet Header structure 2183 With this configuration we have the following SL packet header 2184 structure: 2186 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2187 bit(1) accessUnitStartFlag; 2188 bit(1) accessUnitEndFlag; 2189 bit(1) OCRflag; 2190 bit(1) paddingFlag; 2191 if (paddingFlag) bit(3) paddingBits; 2192 bit(SL.packetSeqNumLength) packetSequenceNumber; 2193 bit(1) DegPrioflag; 2194 if (DegPrioflag) { 2195 bit(SL.degradationPriorityLength) degradationPriority;} 2196 if (OCRflag) { 2197 bit(SL.OCRLength) objectClockReference;} 2198 } 2199 } 2201 Parameters 2203 Gentric et al. Expires December 2001 41 2204 The RSLHSectionSize cannot exceed 2 bits, which is encoded on 2 bits 2205 and signaled by RSLHSectionSizeLength. 2207 The resulting concatenated fmtp line is: 2209 a=fmtp: 2210 SizeLength=6;RSLHSectionSizeLength=2;IndexLength=2;IndexDeltaLength= 2211 2;OCRDeltaLength=16 2213 RTP packet structure 2215 +=========================================+=============+ 2216 | Field | size | 2217 +=========================================+=============+ 2218 | RTP header | - | 2219 +-----------------------------------------+-------------+ 2220 MSLHSection 2221 +=========================================+=============+ 2222 | MSLHSection size in bits = 135 | 2 bytes | 2223 +-----------------------------------------+-------------+ 2224 | PayloadSize | 7 bits | 2225 +-----------------------------------------+-------------+ 2226 | Index = 0 or 1 or 2 or 3 | 2 bits | 2227 +-----------------------------------------+-------------+ 2228 | PayloadSize | 7 bits | 2229 +-----------------------------------------+-------------+ 2230 | SLPSeqDeltaNum = 3 | 2 bits | 2231 +-----------------------------------------+-------------+ 2232 | etc + 12 times 9 bits | 2233 +-----------------------------------------+-------------+ 2234 | PayloadSize | 7 bits | 2235 +-----------------------------------------+-------------+ 2236 | SLPSeqDeltaNum = 3 | 2 bits | 2237 +-----------------------------------------+-------------+ 2238 | bits to byte alignment | 7 bits | 2239 +-----------------------------------------+-------------+ 2240 RSLHSection 2241 +=========================================+=============+ 2242 | RSLHSectionSize | 6 bits | 2243 +-----------------------------------------+-------------+ 2244 | accessUnitStartFlag | 1 bit | 2245 +-----------------------------------------+-------------+ 2246 | accessUnitEndFlag | 1 bit | 2247 +-----------------------------------------+-------------+ 2248 | OCRFlag = 0 | 1 bit | 2249 +-----------------------------------------+-------------+ 2250 | paddingFlag = 1 | 1 bit | 2251 +-----------------------------------------+-------------+ 2252 | paddingBits | 3 bits | 2253 +-----------------------------------------+-------------+ 2254 | DegPrioflag = 1 | 1 bit | 2255 +-----------------------------------------+-------------+ 2257 Gentric et al. Expires December 2001 42 2258 | degradationPriority | 2 bits | 2259 +-----------------------------------------+-------------+ 2260 | accessUnitStartFlag | 1 bit | 2261 +-----------------------------------------+-------------+ 2262 | accessUnitEndFlag | 1 bit | 2263 +-----------------------------------------+-------------+ 2264 | OCRFlag = 0 | 1 bit | 2265 +-----------------------------------------+-------------+ 2266 | paddingFlag = 1 | 1 bit | 2267 +-----------------------------------------+-------------+ 2268 | paddingBits | 3 bits | 2269 +-----------------------------------------+-------------+ 2270 | DegPrioflag = 0 | 1 bit | 2271 +-----------------------------------------+-------------+ 2272 | etc + 12 times 8 bits | 2273 +-----------------------------------------+-------------+ 2274 | accessUnitStartFlag | 1 bit | 2275 +-----------------------------------------+-------------+ 2276 | accessUnitEndFlag | 1 bit | 2277 +-----------------------------------------+-------------+ 2278 | OCRFlag = 1 | 1 bit | 2279 +-----------------------------------------+-------------+ 2280 | OCRDelta | 16 bits | 2281 +-----------------------------------------+-------------+ 2282 | paddingFlag = 0 | 1 bit | 2283 +-----------------------------------------+-------------+ 2284 | DegPrioflag = 0 | 1 bit | 2285 +-----------------------------------------+-------------+ 2286 | bits to byte alignment | 5 bits | 2287 +-----------------------------------------+-------------+ 2288 SLPPSection 2289 +=========================================+=============+ 2290 | SL packet payload |max 90 bytes | 2291 +-----------------------------------------+-------------+ 2292 | etc + 13 SL packets | 2293 +-----------------------------------------+-------------+ 2294 | SL packet payload |max 90 bytes | 2295 +-----------------------------------------+-------------+ 2297 Note that in the above table the last SL packet in the RTP packet 2298 has a payload that is byte-aligned (at the end). When this happens 2299 paddingFlag is set to zero and the paddingBits field is omitted. 2301 Overhead estimation 2303 The MSLHSection is 19 bytes, the RSLHSection is 16 bytes; in this 2304 example we have therefore a RTP overhead of 40 + 35 bytes for 1350 2305 bytes (max) of payload i.e. around 6 % overhead. 2307 Gentric et al. Expires December 2001 43