idnits 2.17.1 draft-gentric-avt-mpeg4-multisl-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 42 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 200 has weird spacing: '... media unawa...' == Line 643 has weird spacing: '...aLength bits)...' == Line 2052 has weird spacing: '...dicated with:...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 2001) is 8376 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0' is mentioned on line 354, but not defined == Unused Reference: '10' is defined on line 1611, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '7') (Obsoleted by RFC 6416) == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-02 == Outdated reference: A later version (-04) exists of draft-singer-mpeg4-ip-02 -- Possible downref: Normative reference to a draft: ref. '9' ** Obsolete normative reference: RFC 2327 (ref. '10') (Obsoleted by RFC 4566) == Outdated reference: A later version (-03) exists of draft-curet-avt-rtp-mpeg4-flexmux-00 -- Possible downref: Normative reference to a draft: ref. '11' ** Obsolete normative reference: RFC 1890 (ref. '12') (Obsoleted by RFC 3551) Summary: 9 errors (**), 0 flaws (~~), 11 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Avaro-France Telecom 3 Internet Draft Basso-AT&T 4 Casner-Packet Design 5 Civanlar-AT&T 6 Gentric-Philips 7 Herpel-Thomson 8 Lifshitz-Optibase 9 Lim-mp4cast 10 Perkins-ISI 11 van der Meer-Philips 12 May 2001 13 Expires Nov. 2001 14 Document: draft-gentric-avt-mpeg4-multisl-04.txt 16 RTP Payload Format for MPEG-4 Streams 18 Status of this Memo 20 This document is an Internet-Draft and is in full conformance with 21 all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. Internet-Drafts are draft documents valid for a maximum of 27 six months and may be updated, replaced, or obsoleted by other 28 documents at any time. It is inappropriate to use Internet- Drafts 29 as reference material or to cite them other than as "work in 30 progress." 32 This specification is a product of the Audio/Video Transport working 33 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 34 ad hoc group on MPEG-4 over Internet. Comments are solicited and 35 should be addressed to the working group's mailing list at rem- 36 conf@es.net and/or the authors. 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html. 43 This document contains a MIME type registration form that is 44 intended to be taken as-is and therefore makes reference to this 45 document, using the temporary placeholder: . 47 Abstract 49 This document describes a payload format for transporting MPEG-4 50 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 51 the coding of natural and synthetic audio-visual data. Several 52 services provided by RTP are beneficial for MPEG-4 encoded data 54 Gentric et al. Expires November 2001 1 55 transport over the Internet. Additionally, the use of RTP makes it 56 possible to synchronize MPEG-4 data with other real-time data types. 58 1. Introduction 60 MPEG-4 is a recent standard from ISO/IEC for the coding of natural 61 and synthetic audio-visual data in the form of audiovisual objects 62 that are arranged into an audiovisual scene by means of a scene 63 description [1][2][3][4]. This draft specifies an RTP [5] payload 64 format for transporting MPEG-4 encoded data streams. 66 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 67 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 68 this document are to be interpreted as described in RFC 2119 [6]. 70 The benefits of using RTP for MPEG-4 data stream transport include: 72 i. Ability to synchronize MPEG-4 streams with other RTP payloads 74 ii. Monitoring MPEG-4 delivery performance through RTCP 76 iii. Combining MPEG-4 and other real-time data streams received from 77 multiple end-systems into a set of consolidated streams through RTP 78 mixers 80 iv. Converting data types, etc. through the use of RTP translators. 82 1.1 Overview of MPEG-4 End-System Architecture 84 Fig. 1 below shows the layered architecture of a terminal which 85 implements the complete MPEG-4 systems model. The Compression Layer 86 processes individual audio-visual media streams. The MPEG-4 87 compression schemes are defined in the ISO/IEC specifications 14496- 88 2 [2] and 14496-3 [3]. The compression schemes in MPEG-4 achieve 89 efficient encoding over a bandwidth ranging from several kbps to 90 many Mbps. The audio-visual content compressed by this layer is 91 organized into Elementary Streams (ESs). 92 The MPEG-4 standard specifies MPEG-4 compliant streams. Within the 93 constraint of this compliance the compression layer is unaware of a 94 specific delivery technology, but it can be made to react to the 95 characteristics of a particular delivery layer such as the path-MTU 96 or loss characteristics. Also, some compressors can be designed to 97 be delivery specific for implementation efficiency. In such cases 98 the compressor may work in a non-optimal fashion with delivery 99 technologies that are different than the one it is specifically 100 designed to operate with. 102 The hierarchical relations, location and properties of ESs in a 103 presentation are described by a dynamic set of Object Descriptors 104 (ODs). Each OD groups one or more ES Descriptors referring to a 105 single content item (audio-visual object). Hence, multiple 106 alternative or hierarchical representations of each content item are 107 possible. 109 Gentric et al. Expires November 2001 2 110 ODs are themselves conveyed through one or more ESs. A complete set 111 of ODs can be seen as an MPEG-4 resource or session description at a 112 stream level. The resource description may itself be hierarchical, 113 i.e. an ES conveying an OD may describe other ESs conveying other 114 ODs. 116 The session description is accompanied by a dynamic scene 117 description, Binary Format for Scene (BIFS), again conveyed through 118 one or more ESs. At this level, content is identified in terms of 119 audio-visual objects. The spatio-temporal location of each object is 120 defined by BIFS. The audio-visual content of those objects that are 121 synthetic and static are described by BIFS also. Natural and 122 animated synthetic objects may refer to an OD that points to one or 123 more ESs that carries the coded representation of the object or its 124 animation data. 126 By conveying the session (or resource) description as well as the 127 scene (or content composition) description through their own ESs, it 128 is made possible to change portions of the content composition and 129 the number and properties of media streams that carry the audio- 130 visual content separately and dynamically at well known instants in 131 time. 133 One or more initial Scene Description streams and the corresponding 134 OD stream are pointed to by an initial object descriptor (IOD). In 135 this context the IOD needs to be made available to the receivers 136 through some out-of-band means that are out of scope of this payload 137 specification. However in the context of transport on IP networks it 138 is defined in a separate document [9]. Note that for applications 139 that only use audio and/or video this payload format can also be 140 used without IOD and OD streams (decoder configuration is then 141 transported as MIME parameters, see section 4.1). 143 The Compression Layer organizes the ESs in Access Units (AU), the 144 smallest elements that can be attributed individual timestamps. The 145 Access Units concept defines the boundary between media specific 146 processing and delivery specific processing. That is to say 147 transport should not depend on the nature of the media data but only 148 on AU properties. 150 A homogeneous encapsulation of ESs carrying media or control (ODs, 151 BIFS) data is defined by the Sync Layer (SL) that primarily provides 152 the synchronization between streams. Integer or fractional AUs are 153 then encapsulated in SL packets. All consecutive data from one 154 stream is called an SL-packetized stream at this layer. The 155 interface between the compression layer and the SL is called the 156 Elementary Stream Interface (ESI). The ESI is informative i.e. it is 157 extremely useful in order to define concepts and mechanisms but does 158 not have to be implemented. For the same reason this draft describes 159 the transport of SL packets i.e. Access Units or fragments of Access 160 Units. It is important to note however that a SL stream can be 161 configured so that SL packets are reduced to the media (compressed) 163 Gentric et al. Expires November 2001 3 164 data and in that case implementations do not need to be aware of the 165 SL at all. 167 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 168 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 169 media unaware but delivery technology aware. It provides transparent 170 access to and delivery of content irrespective of the technologies 171 used. The interface between the SL and DMIF is called the DMIF 172 Application Interface (DAI). It offers content location independent 173 procedures for establishing MPEG-4 sessions and access to transport 174 channels. The specification of this payload format is considered as 175 a part of the MPEG-4 Delivery Layer. 177 media aware +-----------------------------------------+ 178 delivery unaware | COMPRESSION LAYER | 179 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 180 14496-3 Audio +-----------------------------------------+ 182 Elementary 183 Stream 184 ===================================================Interface 186 (ESI) 187 +-------------------------------------------+ 188 media and | SYNC LAYER | 189 delivery unaware | manages elementary streams, their synch- | 190 14496-1 Systems | ronization and hierarchical relations | 191 +-------------------------------------------+ 193 DMIF 194 Application 195 ====================================================Interface 197 (DAI) 198 +-------------------------------------------+ 199 delivery aware | DELIVERY LAYER | 200 media unaware |provides transparent access to and delivery| 201 14496-6 DMIF | of content irrespective of delivery | 202 | technology | 203 +-------------------------------------------+ 205 Figure 1: Conceptual MPEG-4 terminal architecture 207 1.2 MPEG-4 Elementary Stream Data Packetization 209 The ESs from the encoders are fed into the SL with indications of AU 210 boundaries, random access points, desired composition time and the 211 current time. 213 The Sync Layer fragments the ESs into SL packets, each containing a 214 header that encodes information conveyed through the ESI. If the AU 215 is larger than a SL packet, subsequent packets containing remaining 217 Gentric et al. Expires November 2001 4 218 parts of the AU are generated with subset headers until the complete 219 AU is packetized. 221 The syntax of the Sync Layer is configurable and can be adapted to 222 the needs of the stream to be transported. This includes the 223 possibility to select the presence or absence of individual syntax 224 elements as well as configuration of their length in bits. The 225 configuration for each individual stream is conveyed in a 226 SLConfigDescriptor, which is an integral part of the ES Descriptor 227 for this stream. The MPEG-4 SLConfigDescriptor, being configuration 228 information, is not carried by the media stream itself but is rather 229 transported via an ObjectDescriptor Stream encoded using the MPEG-4 230 Object Description framework. This can be done in a separate stream 231 using this payload format (see section 5.2 for details). The 232 SLConfigDescriptor MAY also be transported by other means (for 233 example as a parameter, see section 4.1). Finally streams for which 234 the SL packet headers are completely empty (or fully map into the 235 RTP headers) can also be transported using this payload format; in 236 these cases the Synch Layer can be seen as a purely conceptual 237 construction that does not have to be implemented at all. Since only 238 the knowledge of the decoder configuration is then needed it can 239 also be transported as a parameter, as described in section 4.1. 241 2. Analysis of the carriage of MPEG-4 over IP 243 When transporting MPEG-4 audio and video, applications may or may 244 not require the use of MPEG-4 systems. To achieve the highest level 245 of interoperability between all MPEG-4 applications, it is desirable 246 that (a) in both cases the same MPEG-4 transport format can be used 247 and that (b) receivers that have no MPEG-4 system knowledge can 248 easily skip the MPEG-4 system specific information, if any. 250 RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4 251 video, but when using MPEG-4 systems a problem arises from the fact 252 that both RTP and MPEG-4 systems contain a synchronization layer. 253 In particular, the RTP header duplicates some of the information 254 provided in SL packet headers such as the composition timestamps 255 (CTSs) and the marker bit that signals the end of access units. 257 To avoid unnecessary overhead and potential interoperability risks 258 when transporting MPEG-4 systems, it is desirable to remove the 259 redundancy between the SL packet header and the RTP packet header. 260 To be independent on the use of MPEG-4 systems, synchronization can 261 rely on the parameters provided in the RTP header. 263 In case SL headers are used, the redundant fields are removed from 264 the SL header, producing "reduced SL headers". 265 The remaining information from the SL header, if any, is contained 266 inside the RTP packet payload, together with the SL packet payload. 267 The combination of RTP packet headers and reduced SL packet headers 268 can be used to logically map the RTP packets to complete SL packets. 270 Gentric et al. Expires November 2001 5 271 Some of the information contained in the reduced SL headers is also 272 useful for transport over RTP when MPEG-4 systems is not used. 274 For that reason the information in the "reduced" SL headers is split 275 into "general useful information" and "MPEG-4 systems only 276 information". 278 The "general useful information" hereinafter called Mapped SL Packet 279 Header (MSLH) is carried by a number of fields configurable using 280 parameters defined in section 4.1; all receivers MUST parse these 281 fields. 283 The "MPEG-4 systems only information", if any, is contained in a 284 reduced SL header, hereinafter called Remaining SL Packet Header 285 (RSLH), also configured using parameters (see section 4.1) and 286 preceded by a length field, so that non-MPEG-4-system devices MAY 287 skip this information. 289 This is depicted in figure 2. 291 <----------SL Packet--------> 293 +---------------------------+ 294 | SL Packet | SL Packet | 295 | Header | Payload | 296 +---------------------------+ 297 | | 298 | | 299 +-------------+----------+---+ | 300 | | | | 301 V V V V 302 +-----------+ +-----------+ +-------------+ +-----------+ 303 |RTP Packet | | Mapped SL | | Remaining SL| | SL Packet | 304 | Header | | Header | | Header | | Payload | 305 +-----------+ +-----------+ +-------------+ +-----------+ 307 <----RTP Packet Payload-------------------> 309 Figure 2: Mapping of SL Packet into RTP packet 311 When the configuration is such that SL packet headers map directly 312 to RTP headers this process of mapping SL packet headers is purely 313 conceptual. For example this RTP payload format has been designed so 314 that it is by default configured to be identical to RFC 3016 for the 315 recommended MPEG-4 video configurations (see section 5.5). Hence 316 receivers that comply with this payload specification can decode 317 such RTP payload without knowledge about the Synch Layer (see also 318 the example in Appendix.2). In a similar fashion MPEG-4 audio (see 319 Appendix.3 and Appendix.4) can be transported without explicit use 320 of the Synch Layer. 322 Gentric et al. Expires November 2001 6 323 3. Payload Format 325 The RTP Payload corresponds to an integer number of SL packets. 327 If multiple SL packets are transported in each RTP packet, they MUST 328 be in decoding order, i.e: 329 i) decodingTimeStamp order, if present 330 ii) packetSequenceNumber order, if present 331 iii) Implicit decoding order in all other cases. 333 The SL Packet Headers are transformed into RSLH with some fields 334 extracted to be mapped in the RTP header and others extracted to be 335 mapped in the corresponding MSLH. The SL Packet Payload is 336 unchanged. 338 This payload format has two modes. The "SingleSL" mode is a mode 339 where a single SL packet is transported per RTP packet. The 340 "MultipleSL" mode is a mode where more than one SL packet are 341 transported per RTP packet. The default mode is the Single-SL mode. 342 The mode can be set to Multiple-SL by adding a non-zero SLPPSize or 343 SLPPSizeLength parameter (see section 4.1). 345 RTP Packets SHOULD be sent in the SL stream order (as defined 346 above). In case of interleaving the first SL packet of each RTP 347 packet is used as reference as in the following examples of RTP 348 packets containing interleaved SL packets. 349 This sequence is correct: [0,2,4][1,3,5] 350 This sequence is correct: [0,3,6][1,2][4,5] 351 This sequence is correct: [0,3,6][1,4][2,5] 352 This sequence is prohibited: [0,4,2][1,5,3] 353 This sequence is prohibited: [1,3,5][0,2,4] 354 This sequence is prohibited: [0,3,6][2,5][1,4] 356 The size (or number) of the SL packet(s) SHOULD be adjusted such 357 that the resulting RTP packet is not larger than the path-MTU. To 358 handle larger packets, this payload format relies on lower layers 359 for fragmentation, which may not be desirable. 361 3.1 RTP Header Fields Usage 363 Payload Type (PT): The assignment of an RTP payload type for this 364 new packet format is outside the scope of this document, and will 365 not be specified here. It is expected that the RTP profile for a 366 particular class of applications will assign a payload type for this 367 encoding, or if that is not done then a payload type in the dynamic 368 range shall be chosen. 370 Marker (M) bit: The M bit is set to 1 when all SL packets in the RTP 371 packet are Access Units ends i.e. the M bit maps to the Synch Layer 372 accessUnitEndFlag. 374 Gentric et al. Expires November 2001 7 375 Specifically the M bit is set to 0 when the RTP packet contains one 376 or more Access Unit fragments that are not Access Unit ends, and the 377 M bit is set to 1 for RTP packets that contain either: 378 . A single complete Access Unit 379 . The last fragment of an Access Unit 380 . Several complete Access Units 381 . Several last fragments of Access Units 382 . A mix of complete Access Units and last fragments of Access Units 384 For streams where all SL packets are complete Access Units the M bit 385 is 1 for all RTP packets. 387 Extension (X) bit: Defined by the RTP profile used. 389 Sequence Number: The RTP sequence number should be generated by the 390 sender with a constant random offset and does not have to be 391 correlated to any (optional) MPEG-4 SL sequence numbers. 393 Timestamp: Set to the value in the compositionTimeStamp field of the 394 first SL packet in the RTP packet, if present. If 395 compositionTimeStamp has less than 32 bits length, the MSBs of 396 timestamp MUST be set to zero. 398 Although it is available from the SL configuration data, the 399 resolution of the timestamp may need to be conveyed explicitly 400 through some out-of-band means to be used by network elements which 401 are not MPEG-4 aware. 403 If compositionTimeStamp has more than 32 bits length, this payload 404 format cannot be used. 406 In all cases, the sender SHALL always make sure that RTP time stamps 407 are identical only for RTP packets transporting fragments of the 408 same Access Unit. 410 In case compositionTimeStamp is not present in the current SL 411 packet, but has been present in a previous SL packet the reason is 412 that this is the same Access Unit that has been fragmented, 413 therefore the same timestamp value MUST be taken as RTP timestamp. 415 If compositionTimeStamp is never present in SL packets for this 416 stream, the RTP packetizer SHOULD convey a reading of a local clock 417 at the time the RTP packet is created. 419 According to RFC1889 [5, Section 5.1] timestamps are recommended to 420 start at a random value for security reasons. However then, a 421 receiver is not in the general case able to reconstruct the original 422 MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for 423 applications where streams from multiple sources are to be 424 synchronized. Therefore the usage of such a random offset SHOULD be 425 avoided. 427 Gentric et al. Expires November 2001 8 428 Note that since RTP devices may re-stamp the stream, all time stamps 429 inside of the RTP payload (CTS and DTS in MSLH, OCR in RSLH) MUST be 430 expressed as difference to the RTP time stamp. Since this 431 subtraction may lead to negative values, the offset MUST be encoded 432 as a two's complement signed integer in network byte order. Note 433 these offsets (delta) typically require much fewer bits to be 434 encoded than the original length, which is another justification. 436 When startCompositionTimeStamp is signaled in the SLConfigDescriptor 437 the RTP time stamps MUST start with this value. 439 SSRC, CC and CSRC fields are used as described in RFC 1889 [5]. 441 RTCP SHOULD be used as defined in RFC 1889 [5]. 443 RTP timestamps in RTCP SR packets: according to the RTP timing 444 model, the RTP timestamp that is carried into an RTCP SR packet is 445 the same as the compositionTimeStamp that would be applied to an RTP 446 packet for data that was sampled at the instant the SR packet is 447 being generated and sent. The RTP timestamp value is calculated from 448 the NTP timestamp for the current time, which also goes in the RTCP 449 SR packet. To perform that calculation, an implementation needs to 450 periodically establish a correspondence between the CTS value of a 451 data packet and the NTP time at which that data was sampled. 453 3.2 RTP payload structure 455 The packet payload structure consists of 3 byte-aligned sections. 457 The first section is the MSLHSection and contains Mapped SL Packet 458 Headers (MSLH). The MSLH structure is described in 3.3. In the 459 Single-SL mode this section is empty by default. 461 The second section is the RSLHSection and contains Remaining SL 462 Headers (RSLH). The RSLH structure is described in 3.5. By default 463 this section is empty. 465 The last section (SLPPSection) contains the SL packet payloads. This 466 section is never empty. 468 The Nth MSLH in the MSLHSection, the Nth RSLH in the RSLHSection and 469 the Nth SL packet payload in the SLPPSection correspond to the Nth 470 SL packet transported by the RTP packet. 472 0 1 2 3 473 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 474 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 475 |V=2|P|X| CC |M| PT | sequence number | 476 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 477 | timestamp | 478 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 479 | synchronization source (SSRC) identifier | 481 Gentric et al. Expires November 2001 9 482 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 483 : contributing source (CSRC) identifiers : 484 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 485 | | 486 | MSLHSection (byte aligned) | 487 | | 488 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 489 | | | 490 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 491 | | 492 | RSLHSection (byte aligned) | 493 | | 494 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 495 | | | 496 +-+-+-+-+-+-+-+-+ | 497 | | 498 | SLPPSection (byte aligned) | 499 | | 500 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 501 | :...OPTIONAL RTP padding | 502 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 504 Figure 3: An RTP packet for MPEG-4 506 3.3 MSLHSection structure 508 If the MSLHSection consumes a non-integer number of bytes, up to 7 509 zero-valued padding bits MUST be inserted at the end in order to 510 achieve byte-alignment. 512 In the Single-SL mode the MSLHSection consists of a single MSLH. 514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 515 | MSLH (x bits ) : padding bits| 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 518 Figure 4: MSLHSection structure in Single-SL mode 520 In the Multiple-SL mode this section consist of a 2 bytes field 521 giving the size in bits (in network byte order) of the following 522 block of bit-wise concatenated MSLHs. 524 This size field is absent in the Single-SL mode not because it is 525 not needed (which would be a minor gain) but for compatibility with 526 RFC 3016. 528 0 1 2 3 529 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 531 | MSLH section size in bits | MSLH | etc | 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 533 | as many bit-wise concatenated MSLHs | 535 Gentric et al. Expires November 2001 10 536 | as SL packets in this RTP packet | 537 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 538 | : padding bits| 539 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 541 Figure 5: MSLHSection structure in Multiple-SL mode 543 3.4 MSLH structure 545 The Mapped SL Packet Header content depends on parameters (as 546 described in section 4.1); by default it is empty for the Single-SL 547 mode and contains only the SLPPayloadSize (SL Packet Payload Size) 548 field in the Multiple-SL mode. 550 When all options are used the MSLH structure is given in figure 6. 552 +============================+ 553 |SLPPayloadSize | 554 +----------------------------+ 555 |SLPSeqNum or SLPSeqNumDelta | 556 +----------------------------+ 557 |CTSFlag | 558 +----------------------------+ 559 |CTSDelta | 560 +----------------------------+ 561 |DTSFlag | 562 +----------------------------+ 563 |DTSDelta | 564 +============================+ 566 Figure 6: Mapped SL Packet Header (MSLH) structure 568 In the general case a receiver can only discover the size of a MSLH 569 by parsing it since for example the presence of CTSDelta is signaled 570 by the value of CTSFlag. 572 3.4.1 Fields of MSLH 574 SLPPayloadSize (SL Packet Payload Size): Indicates the size in bytes 575 of the associated SL Packet Payload, which can be found in the 576 SLPPSection of the RTP packet. The length in bits of this field is 577 signaled by the SLPPSizeLength parameter (see section 4.1). 579 SLPSeqNum/SLPSeqNumDelta: Encodes the packetSequenceNumber (serial 580 number) of the SL Packet. When making streams specifically for 581 transport with this payload format this is useful for interleaving. 582 Since a mapping to RTP sequence number is not possible in the 583 Multiple-SL mode there is no requirement for a correspondence. 585 SLPSeqNum is found only for the first SL packet of a RTP packet. 586 SLPSeqNumDelta is optional and -if present- appears for subsequent 587 (non-first) SL packets in a RTP packet. 589 Gentric et al. Expires November 2001 11 590 The length in bits of the SLPSeqNum field is defined by the 591 SLPSeqNumLength parameter (see section 4.1). 593 The length in bits of the SLPSeqNumDelta field is defined by the 594 SLPSeqNumDeltaLength parameter (see section 4.1). 596 If the parameter SLPSeqNumDeltaLength is defined, non-first SL 597 packets inside a RTP packet have their packetSequenceNumber encoded 598 as a difference named SLPSeqNumDelta. This difference is relative to 599 the previous SL packet in the RTP packet according to (with i>=0): 600 packetSequenceNumber(0) = SLPSeqNum(0) 601 packetSequenceNumber(i+1) = packetSequenceNumber(i) + 602 SLPSeqNumDelta(i+1) + 1 604 If the parameter SLPSeqNumDeltaLength is not defined the default 605 value is zero and then the SLPSeqNumDelta field is not present for 606 non-first SL packets. Nevertheless receivers SHALL then apply the 607 above formula with SLPSeqNumDelta equal to zero. In other words by 608 default packetSequenceNumber is incremented by 1 for each SL packet 609 in one RTP packet. 611 CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A 612 value of 1 indicates that the CTSDelta field is present, a value of 613 0 that it is not present (except for the first SL packet in the RTP 614 packet, see below). 616 If CTSDeltaLength is not zero, CTSFlag is present in all MSLH 617 regardless of whether the SL packet is an Access Unit start or not; 618 the receiver needs this flag in order to reconstruct the 619 compositionTimeStampFlag of SL Headers. 621 CTSDelta (CTSDeltaLength bits): Specifies the value of the CTS as a 622 2-complement offset (delta) from the timestamp in the RTP header of 623 this RTP packet. 624 The length in bits of each CTSDelta field is specified by the 625 CTSDeltaLength parameter (see section 4.1). 627 This field is present if CTSFlag is 1 except for the first MSLH of 628 each RTP packet since the composition time stamp of the first SL 629 packet in the RTP packet is mapped to the RTP time stamp, regardless 630 of whether CTSFlag is 1. In all cases the sender MUST remove the 631 compositionTimeStamp from the RSLH. 633 DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A 634 value of 1 indicates that DTSDelta is present, a value of 0 that it 635 is not present. 637 If DTSDeltaLength is not zero, DTSFlag is present in all MSLH 638 regardless of whether the SL packet is an Access Unit start or not; 639 the receiver needs this flag in order to reconstruct the 640 decodingTimeStampFlag of SL Headers. 642 Gentric et al. Expires November 2001 12 643 DTSDelta (DTSDeltaLength bits): Specifies the value of the 644 decodingTimeStamp as a 2 complement offset (delta) from the 645 timestamp in the RTP header of this packet. The length in bits of 646 each DTSDelta field is specified by the DTSDeltaLength parameter 647 (see section 4.1). 649 The DTSDelta field appears when DTSFlag is 1. The sender MUST always 650 remove the decodingTimeStamp from the RSLH. 652 3.4.2 Relationship between sizes of MSLH fields and parameters 654 The relationship between a Mapped SL Packet Header and the related 655 parameters is as follows: 657 +===========================+=================================+ 658 | Fields of MSLPH | Number of bits (parameters) | 659 +===========================+=================================+ 660 | SLPPayloadSize | SLPPSizeLength | 661 +---------------------------+---------------------------------+ 662 | SLPSeqNum | SLPSeqNumLength | 663 +---------------------------+---------------------------------+ 664 | SLPSeqNumDelta | SLPSeqNumDeltaLength | 665 +---------------------------+---------------------------------+ 666 | CTSFlag | 1 If ( CTSDeltaLength > 0 ) | 667 +---------------------------+---------------------------------+ 668 | CTSDelta | CTSDeltaLength If(CTSFlag==1) | 669 +---------------------------+---------------------------------+ 670 | DTSFlag | 1 If ( DTSDeltaLength > 0 ) | 671 +---------------------------+---------------------------------+ 672 | DTSDelta | DTSDeltaLength If(DTSFlag==1) | 673 +---------------------------+---------------------------------+ 675 Table 1: Relationship between MSLH field size and parameters 677 3.5 RSLHSection structure 679 This section consists of a field (RSLHSectionSize) giving the size 680 in bits of the following block of bit-wise concatenated RSLHs. 682 If the section consumes a non-integer number of bytes, up to 7 zero 683 padding bits MUST be inserted at the end in order to achieve byte- 684 alignment. 686 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 687 | RSLHSectionSize (RSLHSectionSizeLength bits)| RSLH (variable | 688 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 689 | number of bits) | 690 | | 691 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 692 | | RSLH (variable number of bits) | 693 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 694 | etc | 695 | as many bit-wise concatenated RSLHs | 697 Gentric et al. Expires November 2001 13 698 | as SL Packets in this RTP packet | 699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 700 | RSLH (variable number of bits) | 701 | +-+-+-+-+-+-+-+ 702 | : padding bits| 703 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 705 Figure 7: RSLHSection structure 707 The length in bits of the RSLHSectionSize field is 708 RSLHSectionSizeLength and is specified with a default value of zero 709 indicating that the whole RSLHSection is absent. 711 +=================================+===============================+ 712 | Fields of RSLHSection | Number of bits | 713 +=================================+===============================+ 714 | RSLHSectionSize | RSLHSectionSizeLength | 715 +---------------------------------+-------------------------------+ 716 | all bit-wise concatenated RSLHs | RSLHSectionSize | 717 +---------------------------------+-------------------------------+ 719 Table 2: Sizes in bits inside RSLHSection 721 Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system 722 awareness, specifically it requires to understand the MPEG-4 723 Synchronization Layer (SL) syntax and the modifications to this 724 syntax described in the next section. 726 However thanks to the RSLHSectionSize field non-MPEG-4-system 727 receivers MAY skip this part by rounding up RSLPHSize/8 to the next 728 integer number of bytes. 730 3.6 RSLH structure 732 A Remaining SL Packet Header (RSLH) is what remains of an SL header 733 after modifications for mapping into this payload format. 735 The following modifications of the SL packet header MUST be applied. 736 The other fields of the SL packet header MUST remain unchanged but 737 are bit-shifted to fill in the gaps left by the operations specified 738 below. 740 3.6.1 Removal of fields 742 The following SL Packet Header fields -if present- are removed since 743 they are mapped either in the RTP header or in the corresponding 744 MSLH: 745 . compositionTimeStampFlag 746 . compositionTimeStamp 747 . decodingTimeStampFlag 748 . decodingTimeStamp 749 . packetSequenceNumber 750 . AccessUnitEndFlag (in Single-SL mode only) 752 Gentric et al. Expires November 2001 14 753 The AccessUnitEndFlag, when present for a given stream, MUST be 754 removed from every RSLH when using the Single-SL mode since it has 755 the same meaning as the Marker bit (and for compatibility with RFC 756 3016). However when using the Multiple-SL mode, AccessUnitEndFlag 757 MUST NOT be removed since it is useful to signal individual AU ends. 759 3.6.2 Mapping of OCR 761 Furthermore if the SL Packet header contains an OCR, then this field 762 is encoded in the RSLH as a 2-complement difference (delta) exactly 763 like a compositionTimeStamp or a decodingTimeStamp in the MSLH. The 764 length in bit of this difference is indicated by the OCRDeltaLength 765 parameter (see section 4.1). 767 With this payload format OCRs MUST have the same clock resolution as 768 Time Stamps. 770 If compositionTimeStamp is not present for a SL packet that has OCR 771 then the OCR SHALL be encoded as a difference to the RTP time stamp. 773 3.6.3 Degradation Priority 775 For streams that use the optional degradationPriority field in the 776 SL Packet Headers, only SL packets with the same degradation 777 priority SHALL be transported by one RTP packet so that components 778 may dispatch the RTP packets according to appropriate QOS or 779 protection schemes. Furthermore only the first RSLH of one RTP 780 packet SHALL contain the degradationPriority field since it would be 781 otherwise redundant. 783 3.7 SLPPSection structure 785 The SLPPSection (SL Packet Payload Section) contains the 786 concatenated SL Packet Payloads. By definition SL Packet Payloads 787 are byte aligned. 789 For efficiency SL packets do not carry their own payload size. This 790 is not an issue for RTP packets that contain a single SL Packet. 792 However in the Multiple-SL mode the size of each SL packet payload 793 MUST be available to the receiver. 795 If the SL packet payload size is constant for a stream, the size 796 information SHOULD NOT be transported in the RTP packet. However in 797 that case it MUST be signaled using the SLPPSize parameter (see 798 section 4.1). 800 If the SL packet payload size is variable then the size of each SL 801 packet payload MUST be indicated in the corresponding MSLH. In order 802 to do so the MSLH MUST contain a SLPPayloadSize field. The number of 803 bits on which this SLPPayloadSize field is encoded MUST be indicated 804 using the SLPPSizeLength parameter (see section 4.1). 806 Gentric et al. Expires November 2001 15 807 The absence of either SLPPSize or SLPPSizeLength indicates the 808 Single-SL mode i.e. that a single SL packet is transported in each 809 RTP packet for that stream. 811 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 812 | SLPP (variable number of bytes) | 813 | | 814 | | 815 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 816 | | SLPP (variable number of bytes) | 817 +-+-+-+-+-+-+-+-+-+-+-+-+-+ | 818 | | 819 | | 820 | | 821 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 822 | etc | 823 | as many byte-wise concatenated SLPPs | 824 | as SL Packets in this RTP packet | 825 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 827 Figure 8: SLPPSection structure 829 3.8 Interleaving 831 SL Packets MAY be interleaved. Senders MAY perform interleaving. 832 Receivers MUST support interleaving. 834 When interleaving of SL packets is used it SHALL be implemented 835 using the SLPSeqNum field of MSLH. 837 The AUSequenceNumber field of the SL header MUST NOT be used for 838 interleaving since firstly it may collide with the Scene Description 839 Carousel usage described in section 4.1 and secondly it is not 840 visible to non-MPEG-4 system receivers. 842 The conjunction of RTP sequence number and SLPSeqNum can produce a 843 quasi-unique identifier for each SL packet so that a receiver can 844 unambiguously reconstruct the original order even in case of out-of- 845 order packets, packet loss or duplication. 847 3.9 Fragmentation Rules 849 This section specifies rules for senders in order to prevent media 850 decoding difficulties at the receiver end. 852 MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams 853 and SHOULD be mapped directly into RTP packets of this format with 854 two exceptions: 855 - Access Units larger than the MTU 856 - When using interleaving for better packet loss resilience. 858 Gentric et al. Expires November 2001 16 859 In all cases Access Unit start MUST be aligned with SL packet start. 861 This section gives rules to apply when performing Access Unit 862 fragmentation. 864 Some MPEG-4 codecs define optional syntax for Access Units sub- 865 entities (fragments) that are independently decodable for error 866 resilience purposes. Examples are Video Packets for video and Error 867 Sensitivity Categories (ESC) for audio. This always corresponds to 868 specific bitstream syntax, which is signaled in the 869 DecoderSpecificInfo inside the DecoderConfig in SLConfig, and/or 870 using the corresponding parameters as described in section 4.1. 871 Therefore encoders and decoders are both aware whether they are 872 operating in such a mode or not (however since this codec 873 configuration is an opaque data block this is not explicitly 874 signaled by this payload format). 876 If not operating in such a mode it is obvious that the decoder has 877 to skip packets after a loss until an Access Unit start is received. 878 Similarly decoder implementations that do not implement robust 879 decoding of Access Units fragments have to discard all packets after 880 a packet loss until an Access Unit start is received. In the same 881 way decoder implementations that do not implement re-synchronization 882 at any Access Units start have to discard all packets after a packet 883 loss until a Random Access Point Access Unit is received. These are 884 all obvious things that a good implementation would do. 886 However serious problems would arise for decoder implementations 887 that try to restart decoding after a packet loss if independently 888 decodable fragments are signaled (in the decoder configuration) but 889 the fragments actually received are not independently decodable 890 because the RTP sender has made RTP packets on different boundaries 891 than the fragments provided by the encoder (so this issue applies to 892 the interface between the encoder and the RTP sender and to the RTP 893 sender component itself), because the decoder has in general no way 894 to detect such a faulty fragment. 896 For this reason the following rules must apply to SL streams that 897 are specifically made for transport with this payload format: 899 SL packets SHOULD be codec-semantic entities in the spirit of ALF 900 i.e. either complete Access Units or fragments of Access Units that 901 are independently decodable. Specifically when a given codec has an 902 independently decodable Access Unit fragments optional syntax this 903 option SHOULD be used. 905 Furthermore when streams are generated using independently decodable 906 Access Units fragments these Access Units fragments MUST be mapped 907 one-to-one into SL packets. Consequently independently decodable 908 Access Units fragments MUST NOT be split across several SL packets 909 and therefore MUST NOT be split across several RTP packets. 911 Gentric et al. Expires November 2001 17 912 For example an MPEG-4 audio stream encoded using the ESC syntax MUST 913 NOT split one ESC across 2 RTP packets. 915 This rule is relaxed when using MPEG-4 Video Packets for two 916 reasons: firstly Video Packets can be much larger than typical MTU 917 and secondly all Video Packets start with a specific 918 resynchronization marker that can be unambiguously detected. 919 Therefore for video streams using the Video Packet syntax Video 920 Packets MAY be split across several SL packets although it is 921 strongly RECOMMENDED to always adapt the Video Packet size to fit 922 the MTU. A Video Packet start MUST always be aligned with a SL 923 packet start, except when a GOV is present, in which case the GOV 924 and the first Video Packet of the following VOP MUST be included in 925 the same SL packet. 927 4. Types and Names 929 This section describes the MIME types and names associated with this 930 payload format. Section 4.1 is intended for registration with IANA 931 as in RFC 2048. 933 This format may require additional information about the mapping to 934 be made available to the receiver. This is done using parameters 935 described in the next section. The absence of any of these fields is 936 equivalent to a field set to the default value, which is always 937 zero. The absence of any such parameters resolves into a default 938 "basic" configuration. 940 In the MPEG-4 framework the SL stream configuration information is 941 carried using the Object Descriptor. For compatibility with 942 receivers that do not implement the full MPEG-4 system specification 943 this information MAY also be signaled using parameters described 944 here. When such information is present both in an Object Descriptor 945 and as a parameter of this payload format it MUST be exactly the 946 same. 948 For transport of MPEG-4 audio and video without the use of MPEG-4 949 systems, as well as to support non-MPEG-4 system receivers, it is 950 also possible to transport information on the profile and level of 951 the stream and on the decoder configuration. This is also described 952 in the next section. 954 4.1 MIME type registration 956 MIME media type name: "video" or "audio" or "application" 958 "video" SHOULD be used for MPEG-4 Video streams (ISO/IEC 14496-2) or 959 MPEG-4 Systems streams that convey information needed for an 960 audio/visual presentation. 962 "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or 963 MPEG-4 Systems streams that convey information needed for an audio 964 only presentation. 966 Gentric et al. Expires November 2001 18 967 "application" SHOULD be used for MPEG-4 Systems streams 968 (ISO/IEC14496-1) that serve other purposes than audio/visual 969 presentation, e.g. in some cases when MPEG-J streams are 970 transmitted. 972 MIME subtype name: mpeg4-sl 974 Required parameters: none 976 Optional parameters: 978 DTSDeltaLength: 979 The number of bits on which the DTSDelta field is encoded in MSLH. 980 The default value is zero and indicates the absence of DTSFlag and 981 DTSDelta in MSLH (the stream does not transport decodingTimeStamps). 982 A value larger than zero indicates that there is a DTSFlag in each 983 MSLH. Since decodingTimeStamp -if present- must be encoded as a 984 difference to the RTP time stamp, the DTSDeltaLength parameter MUST 985 be present in order to transport decodingTimeStamps with this 986 payload format. 988 CTSDeltaLength: 989 The number of bits on which the CTSDelta field is encoded in (non- 990 first) MSLH. The default value is zero and indicates the absence of 991 the CTSFlag and CTSDelta fields in MSLH. Non-zero values MOST NOT be 992 signaled in the Single-SL mode. Since compositionTimeStamps -if 993 present- must be encoded as a difference to the RTP time stamp, the 994 CTSDeltaLength parameter MUST be present in order to transport 995 compositionTimeStamps using this payload format (in the Multiple-SL 996 mode). However CTSDeltaLength SHOULD be set to zero (or not 997 signaled) for streams that have a constant Access Unit duration 998 (which can be explicitly signaled using the DurationFlag and 999 AccessUnitDuration field of SLConfigDescriptor). 1001 OCRDeltaLength: 1002 The number of bits on which the OCRDelta field is encoded in RSLH. 1003 The default value is zero and indicates the absence of OCR for this 1004 stream. Since objectClockReference -if present- must be encoded as a 1005 difference to the RTP time stamp, the OCRDeltaLength parameter MUST 1006 be present in order to transport objectClockReferences with this 1007 payload format. 1009 SLPPSizeLength: 1010 The number of bits on which the SLPPayloadSize field of MSLH is 1011 encoded. The default value is zero and indicates the Single-SL mode 1012 (unless SLPPSize is present). Simultaneous presence of this 1013 parameter and SLPPSize is illegal. Either the SLPPSizeLength or 1014 SLPPSize parameter MUST be present in order to signal the Multiple- 1015 SL mode of this payload format. 1017 SLPPSize: 1019 Gentric et al. Expires November 2001 19 1020 The constant size in bytes of each SL Packet Payload for this 1021 stream. The default value is zero and indicates variable SL Packet 1022 Payload size (or the Single-SL mode if SLPPSizeLength is absent). 1023 Simultaneous presence of this parameter and SLPPSizeLength is 1024 illegal. Either the SLPPSizeLength or SLPPSize parameter MUST be 1025 present in order to signal the Multiple-SL mode of this payload 1026 format. When SLPPSize is present the SLPPayloadSize of MSLH in the 1027 RTP packets MUST NOT be present. 1029 SLPSeqNumLength: 1030 The number of bits on which the SLPSeqNum is encoded in the first 1031 MSLH. The default value is zero and indicates the absence of 1032 SLPSeqNum and SLPSeqNumDelta for all MSLHs. Since 1033 packetSequenceNumber -if present- must be mapped in MSLH, the 1034 SLPSeqNumLength parameter MUST be present in order to transport 1035 packetSequenceNumber with this payload format. 1037 SLPSeqNumDeltaLength: 1038 The number of bits on which the SLPSeqNumDelta are encoded in any 1039 non-first MSLH. The default value is zero and indicates that 1040 packetSequenceNumber MUST be incremented by one for each SL packet 1041 in the RTP packet (see section 3.5). Since when interleaving 1042 packetSequenceNumber does not increment by 1 inside a RTP packet, 1043 the SLPSeqNumDeltaLength parameter MUST be present when using 1044 interleaving with this payload format. 1046 RSLHSectionSizeLength: 1047 The number of bits that is used to encode the RSLHSectionSize field. 1048 The default value is zero and indicates the absence of the whole 1049 RSLHSection for all RTP packets of this stream. Compatibility with 1050 RFC 3016 requires that the RSLHSection must be empty, including the 1051 RSLHSectionSize field. This is the reason why there is such a 1052 variable length with a default value indicating absence of the 1053 RSLHSectionSize field. 1055 SLConfigDescriptor: 1056 A base-64 encoding of the SLConfigDescriptor. This SHALL be the 1057 original SLConfigDescriptor and it SHALL be the same as the one 1058 transported by the OD framework, if any. 1060 profile-level-id: 1061 A decimal representation of the MPEG-4 Profile Level indication 1062 value. For audio this parameter indicates which MPEG-4 Audio tool 1063 subsets are applied to encode the audio stream and is defined in 1064 defined in ISO/IEC 14496-1. For video this parameter indicates which 1065 MPEG-4 Visual tool subsets are applied to encode the video stream 1066 and is defined in Table G-1 of ISO/IEC 14496-2. This parameter MAY 1067 be used in the capability exchange or session setup procedure to 1068 indicate MPEG-4 Profile and Level combination of which the relevant 1069 MPEG-4 media codec is capable. If this parameter is not specified by 1070 the procedure, its default value of 1 (Simple Profile/Level 1) is 1071 used. 1073 Gentric et al. Expires November 2001 20 1074 Config: 1075 A hexadecimal representation of an octet string that expresses the 1076 media payload configuration. Configuration data is mapped onto the 1077 octet string in an MSB-first basis. The first bit of the 1078 configuration data SHALL be located at the MSB of the first octet. 1079 In the last octet, zero-valued padding bits, if necessary, shall 1080 follow the configuration data. For audio this is a 1081 "StreamMuxConfig", as defined in ISO/IEC 14496-3. For video this 1082 expresses the MPEG-4 Visual configuration information, as defined in 1083 subclause 6.2.1 Start codes of ISO/IEC14496-2[2][4][9] and the 1084 configuration information indicated by this parameter SHALL be the 1085 same as the configuration information in the corresponding MPEG-4 1086 Visual stream, except for first-half-vbv-occupancy and latter-half- 1087 vbv-occupancy, if it exists, which may vary in the repeated 1088 configuration information inside an MPEG-4 Visual stream (See 6.2.1 1089 Start codes of ISO/IEC14496-2). 1091 object-type: 1092 A decimal representation of the MPEG-4 Audio Object Type value 1093 defined in ISO/IEC 14496-3. This parameter specifies the tool used 1094 by the encoder. It CAN be used to limit the capability within the 1095 specified "profile-level-id". 1097 Bitrate: 1098 A decimal representation of the audio bitrate in bits per second for 1099 the audio bit stream. 1101 Encoding considerations: 1102 System bitstreams MUST be generated according to MPEG-4 System 1103 specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated 1104 according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio 1105 bitstreams MUST be generated according to MPEG-4 Visual 1106 specifications (ISO/IEC 14496-3). All SL streams MUST be generated 1107 according to MPEG-4 Sync Layer specifications (ISO/IEC 14496-1 1108 section 10), in order to read this format the SLConfigDescriptor may 1109 be required. These bitstream are binary data and MUST be encoded for 1110 non-binary transport (for Email, the Base64 encoding is sufficient). 1111 This type is also defined for transfer via RTP. The RTP packets 1112 MUST be packetized according to the RTP payload format defined in 1113 RFC . 1115 Security considerations: 1116 As in RFC . 1118 Interoperability considerations: 1119 MPEG-4 provides a large and rich set of tools for the coding of 1120 visual objects. For effective implementation of the standard, 1121 subsets of the MPEG-4 tool sets have been provided for use in 1122 specific applications. These subsets, called 'Profiles', limit the 1123 size of the tool set a decoder is required to implement. In order to 1124 restrict computational complexity, one or more 'Levels' are set for 1125 each Profile. A Profile@Level combination allows: 1127 Gentric et al. Expires November 2001 21 1128 . a codec builder to implement only the subset of the standard he 1129 needs, while maintaining interworking with other MPEG-4 devices 1130 included in the same combination, and 1131 . checking whether MPEG-4 devices comply with the standard 1132 ('conformance testing'). 1133 A stream SHALL be compliant with the MPEG-4 Profile@Level specified 1134 by the parameter "profile-level-id". Interoperability between a 1135 sender and a receiver may be achieved by specifying the parameter 1136 "profile-level-id" in MIME content, or by arranging in the 1137 capability exchange/announcement procedure to set this parameter 1138 mutually to the same value. 1140 Published specification: 1141 The specifications for MPEG-4 streams are presented in ISO/IEC 1142 14469-1, 14469-2, and 14469-3. The RTP payload format is described 1143 in RFC . 1145 Applications which use this media type: 1146 Multimedia streaming and conferencing tools, Internet messaging and 1147 Email applications. Also supra-relativistic elementary particle 1148 hyperspace tunneling trans-galactic communication devices :-) 1150 Additional information: none 1152 Magic number(s): none 1154 File extension(s): 1155 None. A file format with the extension .mp4 has been defined for 1156 MPEG-4 content but is not directly correlated with this MIME type 1157 which sole purpose is RTP transport. 1159 Macintosh File Type Code(s): none 1161 Person & email address to contact for further information: 1162 Authors of RFC . 1164 Intended usage: COMMON 1166 Author/Change controller: 1167 Authors of RFC . 1169 4.2 Concatenation of parameters 1171 Multiple parameters SHOULD be expressed as a MIME media type string, 1172 in the form of a semicolon-separated list of parameter=value pairs 1173 (see examples in Appendix). 1175 4.3 Usage of SDP 1177 4.3.1 The a=fmtp keyword 1179 It is assumed that one typical way to transport the above-described 1180 parameters associated with this payload format is via a SDP message 1182 Gentric et al. Expires November 2001 22 1183 for example transported to the client in reply to a RTSP DESCRIBE of 1184 via SAP. In that case the (a=fmtp) keyword MUST be used as described 1185 in RFC 2327 [10, section 6]. The syntax being then: 1187 a=fmtp: = 1189 4.3.2 SDP example 1191 The following is an example of SDP syntax for the description of a 1192 session containing one MPEG-4 audio stream, one MPEG-4 video and two 1193 MPEG-4 system stream, transported using this format and the AVP 1194 profile [12]. Note that the video stream DTSDelta are encoded on 4 1195 bits in this example. See the Appendix for more examples. 1197 o= .... 1198 I= .... 1199 c=IN IP4 123.234.71.112 1200 m=video 1034 RTP/AVP 97 1201 a=fmtp:DTSDeltaLength=4 1202 a=rtpmap:97 mpeg4-sl 1203 m=audio 810 RTP/AVP 98 1204 a=fmtp: profile-level-id=1; config=7866E7E6EF 1205 a=rtpmpa:98 mpeg4-sl 1206 m=application 1234 RTP/AVP 99 1207 a=rtpmap:99 mpeg4-sl 1208 m=application 1234 RTP/AVP 99 1209 a=rtpmap:99 mpeg4-sl 1211 5. Other issues 1213 5.1 SL packetized stream reconstruction 1215 The purpose of this section is to document how a receiver can 1216 reconstruct a valid SL packetized stream. Since this format directly 1217 transports SL packets this reconstruction is performed by reversing 1218 the payload structure rules (section 3). We explicitly describe here 1219 the most complex transformations. 1221 In the following let (i) be the index of SL packets inside one RTP 1222 packet (starting at zero for each RTP packet), let SLPacketHeader.x 1223 denote field x of the reconstructed SL packet header, let MSLH.x 1224 denote field x of the received MSLH, etc. 1226 SLPacketHeader.packetSequenceNumber is restored from MSLH.SLPSeqNum 1227 and MSLH.SLPSeqNumDelta using: 1229 If ( SLPSeqNumLength == 0) { // or is absent 1230 if ( SLConfig.packetSeqNumLength == 0 ) { 1231 // this stream does not have SL packet sequence number 1232 } 1233 else { 1234 // illegal, normally the sender MUST map 1235 // SLPacketHeader.packetSequenceNumber in MSLH 1237 Gentric et al. Expires November 2001 23 1238 // and set a relevant SLPSeqNumLength value; 1239 // otherwise it is unfortunately impossible for the receiver 1240 // to reconstruct the correct sequence 1241 } 1242 } 1243 else { // SLPSeqNumLength is not zero 1244 if ( SLConfig.packetSeqNumLength == 0 ) { 1245 // the original SL stream does not have SL packet 1246 // sequence numbers, typically the sender inserted them 1247 // in order to implement interleaving at the RTP level; 1248 // they must be ignored for SL stream reconstruction 1249 } 1250 else { 1251 if (i == 0){ // first SL packet in RTP packet 1252 SLPacketHeader.packetSequenceNumber(0) = MSLH.SLPSeqNum(0); 1253 } 1254 else { // remaining SL packets 1255 SLPacketHeader.packetSequenceNumber(i+1)= 1256 SLPacketHeader.packetSequenceNumber(i) 1257 + MSLH.SLPSeqNumDelta(i+1) 1258 +1; 1259 } 1260 } 1262 All time stamps (CTS, DTS, OCR), when present, are restored from the 1263 delta values. Time stamps flags (CTSFlag, DTSFlag) in MSLH are used 1264 to reconstruct respectively the compositionTimeStampFlag and 1265 decodingTimeStampFlag of SLPacketHeader. 1267 if ( CTSDeltaLength == 0) { // or CTSDeltaLength is absent 1268 // CTS is not transported for this RTP stream 1269 if (i == 0){ // first SL packet in RTP packet 1270 if ( SLConfig.useTimeStamps == 1 ) { 1271 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1272 SLPacketHeader.compositionTimeStampFlag(0) = 1; 1273 SLPacketHeader.compositionTimeStamp(0) = RTP TimeStamp; 1274 } 1275 else { 1276 // ignore 1277 } 1278 } 1279 else { 1280 // empty 1281 } 1282 } 1283 else { // non-first SL packets in RTP packet 1284 if ( SLConfig.useTimeStamps == 1 ) { 1285 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1286 SLPacketHeader.compositionTimeStampFlag(i) = 0; 1287 } 1288 else { 1289 // ignore 1290 } 1292 Gentric et al. Expires November 2001 24 1293 } 1294 else { 1295 // empty 1296 } 1297 } 1298 } 1299 else { // CTSDeltaLength is not zero 1300 // CTS is transported for this stream 1301 if ( SLConfig.useTimeStamps == 1 ) { 1302 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1303 SLPacketHeader.compositionTimeStampFlag(i) = 1304 MSLH.CTSFlag(i); 1305 SLPacketHeader.compositionTimeStamp(i) = 1306 RTP TimeStamp + MSLH.CTSDelta(i); 1307 } 1308 else { 1309 // ignore CTSFlag (which must be zero) 1310 } 1311 else { 1312 // this is strange and sub-optimal at best 1313 // a receiver should ignore this 1314 } 1315 } 1317 if ( DTSDeltaLength == 0) { // or DTSDeltaLength is absent 1318 // DTS is not transported for this stream 1319 if ( SLConfig.useTimeStamps == 1 ) { 1320 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1321 SLPacketHeader.decodingTimeStampFlag(i) = 0; 1322 } 1323 else { 1324 // ignore 1325 } 1326 } 1327 else { 1328 // empty 1329 } 1330 } 1331 else { 1332 // DTS is transported for this stream 1333 if ( SLConfig.useTimeStamps == 1 ) { 1334 if ( SLPacketHeader.accessUnitStartFlag == 1 ) { 1335 SLPacketHeader.decodingTimeStampFlag(i) = 1336 MSLH.DTSFlag(i); 1337 SLPacketHeader.decodingTimeStamp(i) = 1338 RTP TimeStamp + MSLH.DTSDelta(i); 1339 } 1340 else { 1341 // ignore DTSFlag (which must be zero) 1342 } 1343 } 1344 else { 1345 // this is strange and sub-optimal at best 1347 Gentric et al. Expires November 2001 25 1348 // a receiver should ignore this 1349 } 1350 } 1352 if ( OCRDeltaLength == 0) { // or OCRDeltaLength is absent 1353 // the RTP stream does not transport any OCR 1354 if ( SLConfig.OCRLenght == 0 ) { 1355 // this stream does not have any OCR 1356 } 1357 else { 1358 // illegal, , normally the sender MUST detect 1359 // OCRs, replace them with OCRDelta and set 1360 // a relevant OCRDeltaLength value 1361 } 1362 } 1363 else { 1364 if ( SLConfig.OCRLenght == 0 ) { 1365 // this is strange and sub-optimal at best 1366 // a receiver should ignore this 1367 } 1368 else { 1369 SLPacketHeader.OCRflag(i) = RSLH.OCRFlag(i); 1370 if ( SLPacketHeader.OCRflag(i) == 1) { 1371 SLPacketHeader.objectClockReference(i) = 1372 RTP TimeStamp + RSLH.OCRDelta(i); 1373 } 1374 } 1375 } 1377 In the SingleSL mode the AccessUnitEndFlag, if needed, is restored 1378 from the M bit, as follows: 1380 if ( SLConfig.useAccessUnitEndFlag == 0 ) { 1381 // this SL stream does not signal access unit ends 1382 else { 1383 SLPacketHeader.AccessUnitEndFlag = M bit; 1384 } 1386 In the multipleSL mode the AccessUnitEndFlag is untouched in RSLH. 1388 The other SL packet header fields SHALL remain as found in RSLH. 1390 It is obvious that in the general case the reconstruction of the 1391 original SL packetized stream requires SL-awareness. However this 1392 payload format allows in all cases a receiver that does not know 1393 about the SL syntax to reconstruct the semantic of SL for the 1394 following very useful features: 1395 - Packet order (decoding order) 1396 - Access Unit boundaries (using the M bit) 1397 - Access Unit fragments (i.e. SL packet boundaries using 1398 MSLH.SLPPayloadSize) 1400 Gentric et al. Expires November 2001 26 1401 - Composition Time Stamps (using the RTP Time Stamp and 1402 MSLH.CTSDelta) 1403 - Decoding Time Stamps (using the RTP Time Stamp and MSLH.DTSDelta) 1404 - Packet sequence number (using the RTP Time Sequence number and 1405 MSLH.SLPSeqNum) 1407 5.2 Handling of scene description streams 1409 MPEG-4 introduces new stream types as described in section 1 namely 1410 Object Descriptors and BIFS. In the following both OD and BIFS are 1411 discussed on the same basis i.e. as "scene description". 1413 Considering Scene description as a "stream-able" type of content is 1414 a rather new concept and for that reasons some specific comments are 1415 needed. 1417 Typically scene descriptions are encoded in such a way that 1418 information loss would in the general case cripple the presentation 1419 beyond any hope of repair by the receiver. Still this is well suited 1420 for a number of multimedia applications were the scene is first made 1421 available via reliable channels to the client and then played. This 1422 payload format is not intended for this type of applications for 1423 which download of MPEG-4 interchange (.mp4) files is typical. 1424 However it can also be used if the RTP packets are transported using 1425 TCP or any other reliable protocol. 1427 On the other hand MPEG-4 has introduced the possibility to 1428 dynamically change the scene description by sending animation 1429 information (changes in parameters) and structural change 1430 information (updates). Since this information has to be sent in a 1431 timely fashion MPEG-4 has defined a number of techniques in order to 1432 encode the scene description in a manner that makes it behave 1433 similarly to other temporal encoding schemes such as audio and 1434 video. This payload format is intended for this usage. 1436 Note that in many cases the application will consist of first the 1437 reliable transmission of a static initial scene followed by the 1438 streaming of animations and updates. For this reason the usage of 1439 this payload format is attractive since it offers a unique solution. 1441 Senders must be aware that suitable schemes should be used when 1442 scene description streams transport sensitive configuration 1443 information. For example in case the RTP packet transporting an OD- 1444 update command would be lost, the corresponding media stream would 1445 not be accessible by the receiver. 1447 Redundancy is a possibility and may either be added by tools 1448 hierarchically higher than this payload format, e.g. by packet based 1449 FEC, re-transmission, or similar tools. In such a case, the general 1450 congestion control principles have to be observed. 1452 Since BIFS and OD streams may be modified during the session with 1453 update commands, there is a need to send both update commands and 1455 Gentric et al. Expires November 2001 27 1456 full BIFS/OD refresh. For that reason MPEG-4 defines Random Access 1457 Points (RAP) for scene description streams (OD and BIFS) where by 1458 definition a decoder can restart decoding i.e. receives a "full 1459 update" of the scene. This mechanism is called Scene and Object 1460 Description Carrousel. The AU Sequence Number field of SL Packet 1461 Header is used to support this behavior at the Synchronization 1462 Layer. When two access units are sent consecutively with the same AU 1463 Sequence Number, the second one is assumed to be a semantic 1464 repetition of the first. If a receiver starts to listen in the 1465 middle of a session or has detected losses, it can skip all received 1466 Access Units until such a RAP. The periodicity of transmission of 1467 these RAPs should be chosen/adjusted depending on the application 1468 and the network it is deployed on; i.e. exactly like Intra-coded 1469 frames for video, it is the responsibility of the sender to make 1470 sure the periodicity of RAPs is suitable. 1472 5.3 Multiplexing 1474 An advanced MPEG-4 session may involve a large number of objects 1475 that may be as many as a few hundred, transporting each ES as an 1476 individual RTP stream may not always be practical. Allocating and 1477 controlling hundreds of destination addresses for each MPEG-4 1478 session may pose insurmountable session administration problems. 1479 The input/output processing overhead at the end-points will be 1480 extremely high also. Additionally, low delay transmission of low 1481 bitrate data streams, e.g. facial animation parameters, results in 1482 extremely high header overheads. 1484 To solve these problems, MPEG-4 data transport requires a 1485 multiplexing scheme that allows selective bundling of several ESs. 1486 This is beyond the scope of the payload format defined here. 1488 The MPEG-4's Flexmux multiplexing scheme may be used for this 1489 purpose and a specific RTP payload format is being developed [11]. 1491 Another approach may be to develop a generic RTP multiplexing scheme 1492 usable for MPEG-4 data. The multiplexing scheme reported in [8] may 1493 be a candidate for this approach. 1495 For MPEG-4 applications, the multiplexing technique needs to address 1496 the following requirements: 1498 i. The ESs multiplexed in one stream can change frequently during a 1499 session. Consequently, the coding type, individual packet size and 1500 temporal relationships between the multiplexed data units must be 1501 handled dynamically. 1503 ii. The multiplexing scheme should have a mechanism to determine the 1504 ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is 1505 not a part of the SL header. 1507 iii. In general, an SL packet does not contain information about its 1508 size. The multiplexing scheme should be able to delineate the 1510 Gentric et al. Expires November 2001 28 1511 multiplexed packets whose lengths may vary from a few bytes to close 1512 to the path-MTU. 1514 5.5 Overlap with RFC 3016 1516 This payload format has been designed to have a (large) overlap with 1517 RFC 3016 [7]. The conditions for this overlap are: 1518 Conditions for RFC 3016: 1519 i. MPEG-4 video elementary streams only 1520 ii. Maximum one VOP or Video Packet per RTP packet 1521 Conditions for this payload format: 1522 i. No structural parameters defined (or all set to zero), i.e. 1523 Single-SL mode with empty MSLH and empty RSLH. 1524 ii. Receivers MUST be ready to accept (ignore) video configuration 1525 headers (e.g. VOSH, VO and VOL) and visual-object-sequence-end-code 1526 transported in-band. 1528 6. Security Considerations 1530 RTP packets using the payload format defined in this specification 1531 are subject to the security considerations discussed in the RTP 1532 specification [5]. This implies that confidentiality of the media 1533 streams is achieved by encryption. Because the data compression used 1534 with this payload format is applied end-to-end, encryption may be 1535 performed on the compressed data so there is no conflict between the 1536 two operations. The packet processing complexity of this payload 1537 type (i.e. excluding media data processing) does not exhibit any 1538 significant non-uniformity in the receiver side to cause a denial- 1539 of-service threat. 1541 However, it is possible to inject non-compliant MPEG streams (Audio, 1542 Video, and Systems) to overload the receiver/decoder's buffers which 1543 might compromise the functionality of the receiver or even crash it. 1544 This is especially true for end-to-end systems like MPEG where the 1545 buffer models are precisely defined. 1547 MPEG-4 Systems supports stream types including commands that are 1548 executed on the terminal like OD commands, BIFS commands, etc. and 1549 programmatic content like MPEG-J (Java(TM) Byte Code) and 1550 ECMASCRIPT. It is possible to use one or more of the above in a 1551 manner non-compliant to MPEG to crash or temporarily make the 1552 receiver unavailable. 1554 Authentication mechanisms can be used to validate of the sender and 1555 the data to prevent security problems due to non-compliant malignant 1556 MPEG-4 streams. 1558 A security model is defined in MPEG-4 Systems streams carrying MPEG- 1559 J access units which comprises Java(TM) classes and objects. MPEG-J 1560 defines a set of Java APIs and a secure execution model. MPEG-J 1561 content can call this set of APIs and Java(TM) methods from a set of 1562 Java packages supported in the receiver within the defined security 1563 model. According to this security model, downloaded byte code is 1565 Gentric et al. Expires November 2001 29 1566 forbidden to load libraries, define native methods, start programs, 1567 read or write files, or read system properties. 1569 Receivers can implement intelligent filters to validate the buffer 1570 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1571 ECMAScript) commands in the streams. However, this can increase the 1572 complexity significantly. 1574 7. Acknowledgements 1575 This document evolved across several years thanks to contributions 1576 from a large number of people since it is based on work within the 1577 IETF AVT working group and various ISO MPEG working groups, 1578 especially the 4-on-IP ad-hoc group in the last stages. The authors 1579 wish to thank Guido Fransceschini, Art Howarth, Dave Mackie, Dave 1580 Singer, and Stephan Wenger for their valuable comments. 1582 8. References 1584 [1] ISO/IEC 14496-1:2001 MPEG-4 Systems 1586 [2] ISO/IEC 14496-2:2001 MPEG-4 Visual 1588 [3] ISO/IEC 14496-3:2001 MPEG-4 Audio 1590 [4] ISO/IEC 14496-6:2001 Delivery Multimedia Integration Framework. 1592 [5] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport 1593 Protocol for Real Time Applications RFC 1889, Internet Engineering 1594 Task Force, January 1996. 1596 [6] S. Bradner, Key words for use in RFCs to Indicate Requirement 1597 Levels, RFC 2119, Internet Engineering Task Force, March 1997. 1599 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 1600 payload format for MPEG-4 Audio/Visual streams, Internet Engineering 1601 Task Force, RFC 3016. 1603 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed 1604 RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-02.txt, 1605 November 2000. 1607 [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over 1608 IP-based Protocols, work in progress, draft-singer-mpeg4-ip-02.txt, 1609 May 2001. 1611 [10] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327, 1612 Internet Engineering Task Force, April 1998. 1614 [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed 1615 Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt, 1616 February 2001. 1618 Gentric et al. Expires November 2001 30 1620 [12] H. Schulzrinne, RTP Profile for Audio and Video Conferences 1621 with Minimal Control, RFC1890, Internet Engineering Task Force, 1622 January 1996. 1624 9. Authors' Addresses 1626 Olivier Avaro 1627 France Telecom 1628 35 A Schutzenhuttenweg 1629 60598 Frankfurt am Main 1630 Deutschland 1631 e-mail: olivier.avaro@francetelecom.fr 1633 Andrea Basso 1634 AT&T Labs Research 1635 200 Laurel Avenue 1636 Middletown, NJ 07748 1637 USA 1638 e-mail: basso@research.att.com 1640 Stephen L. Casner 1641 Packet Design, Inc. 1642 66 Willow Place 1643 Menlo Park, CA 94025 1644 USA 1645 e-mail: casner@acm.org 1647 M. Reha Civanlar 1648 AT&T Labs - Research 1649 100 Schultz Drive 1650 Red Bank, NJ 07701 1651 USA 1652 e-mail: civanlar@research.att.com 1654 Philippe Gentric 1655 Philips Digital Networks 1656 22 Avenue Descartes 1657 94453 Limeil-Brevannes CEDEX 1658 France 1659 e-mail: philippe.gentric@philips.com 1661 Carsten Herpel 1662 THOMSON multimedia 1663 Karl-Wiechert-Allee 74 1664 30625 Hannover 1665 Germany 1666 e-mail: herpelc@thmulti.com 1668 Zvi Lifshitz 1669 Optibase Ltd. 1670 7 Shenkar St. 1671 Herzliya 46120 1673 Gentric et al. Expires November 2001 31 1674 Israel 1675 e-mail: zvil@optibase.com 1677 Young-kwon Lim 1678 mp4cast (MPEG-4 Internet Broadcasting Solution Consortium) 1679 1001-1 Daechi-Dong Gangnam-Gu 1680 Seoul, 305-333, 1681 Korea 1682 e-mail : young@techway.co.kr 1684 Colin Perkins 1685 USC Information Sciences Institute 1686 4350 N. Fairfax Drive #620 1687 Arlington, VA 22203 1688 USA 1689 e-mail : csp@isi.edu 1691 Jan van der Meer 1692 Philips Digital Networks 1693 Cederlaan 4 1694 5600 JB Eindhoven 1695 Netherlands 1696 e-mail : jan.vandermeer@philips.com 1698 APPENDIX: Examples of usage 1700 This payload format has been designed to transport efficiently a 1701 very versatile packetization scheme: the MPEG-4 Synch Layer; as a 1702 result its complexity is larger than the average RTP payload format. 1703 For this reason this section describes a number of key examples of 1704 how this payload format can be used. 1706 A C++-like syntax called SDL (Syntactic Description Language) 1707 defined in [1, section 14] is used to economically describe MPEG-4 1708 system data structures. 1710 Furthermore these examples assume that the (a=fmtp) SDP syntax is 1711 used to convey the MIME parameters of the payload format. 1713 Appendix.1 MPEG-4 Video 1715 Let us consider the case of a 30 frames per second MPEG-4 video 1716 stream which bit rate is high enough that Access Units have to be 1717 split in several SL packets (typically above 300 kb/s). 1719 Let us assume also that the video codec generates in that case Video 1720 Packets suitable to fit in one SL packet i.e that the video codec is 1721 MTU aware and the MTU is 1500 bytes. We assume furthermore that this 1722 stream contains B frames and that decodingTimeStamps are present. 1724 Gentric et al. Expires November 2001 32 1725 SLConfigDescriptor 1727 In this example the SLConfigDescriptor is: 1729 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1730 tag=SLConfigDescrTag { 1731 bit(8) predefined; 1732 if (predefined==0) { 1733 bit(1) useAccessUnitStartFlag; = 1 1734 bit(1) useAccessUnitEndFlag; = 0 1735 bit(1) useRandomAccessPointFlag; = 1 1736 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1737 bit(1) usePaddingFlag; = 0 1738 bit(1) useTimeStampsFlag; = 1 1739 bit(1) useIdleFlag; = 0 1740 bit(1) durationFlag; = 0 1741 bit(32) timeStampResolution; = 30 1742 bit(32) OCRResolution; = 0 1743 bit(8) timeStampLength; = 32 1744 bit(8) OCRLength; = 0 1745 bit(8) AU_Length; = 0 1746 bit(8) instantBitrateLength; = 0 1747 bit(4) degradationPriorityLength; = 0 1748 bit(5) AU_seqNumLength; = 0 1749 bit(5) packetSeqNumLength; = 0 1750 bit(2) reserved=0b11; 1751 } 1752 if (durationFlag) { 1753 bit(32) timeScale; // NOT USED 1754 bit(16) accessUnitDuration; // NOT USED 1755 bit(16) compositionUnitDuration; // NOT USED 1756 } 1757 if (!useTimeStampsFlag) { 1758 bit(timeStampLength) startDecodingTimeStamp; // NOT USED 1759 bit(timeStampLength) startCompositionTimeStamp; // NOT USED 1760 } 1761 } 1763 The useRandomAccessPointFlag is set so that the 1764 randomAccessPointFlag can indicate that the corresponding SL packet 1765 contains a GOV and the first Video Packet of an Intra coded frame. 1767 SL Packet Header structure 1769 With this configuration we have the following SL packet header 1770 structure: 1772 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1773 bit(1) accessUnitStartFlag; // 1 bit 1774 if (accessUnitStartFlag) { 1775 bit(1) randomAccessPointFlag; // 1 bit 1776 bit(1) decodingTimeStampFlag; // 1 bit 1777 bit(1) compositionTimeStampFlag; // 1 bit 1779 Gentric et al. Expires November 2001 33 1780 if (decodingTimeStampFlag) { 1781 bit(SL.timeStampLength) decodingTimeStamp; 1782 } 1783 if (compositionTimeStampFlag) { 1784 bit(SL.timeStampLength) compositionTimeStamp; 1785 } 1786 } 1788 Parameters 1790 decodingTimeStamps are encoded on 32 bits, which is much more than 1791 needed for delta. Therefore the sender will use DTSDeltaLength to 1792 signal that only 6 bits are used for the coding of relative DTS in 1793 the RTP packet. 1795 The RSLHSectionSize cannot exceed 2 bits, which is encoded on 2 bits 1796 and signaled by RSLHSectionSizeLength. The resulting concatenated 1797 fmtp line is: 1799 a=fmtp: DTSDeltaLength=6;RSLHSectionSizeLength=2 1801 RTP packet structure 1803 Two cases can occur; for packets that transport first fragments of 1804 Access Units we have: 1806 +=========================================+=============+ 1807 | Field | size | 1808 +=========================================+=============+ 1809 | RTP header | - | 1810 +-----------------------------------------+-------------+ 1811 | CTSFlag = 1 | 1 bit | 1812 +-----------------------------------------+-------------+ 1813 | DTSFlag = 1 | 1 bit | 1814 +-----------------------------------------+-------------+ 1815 | DTSDelta | 6 bits | 1816 +-----------------------------------------+-------------+ 1817 | bits to byte alignment | 0 bits | 1818 +-----------------------------------------+-------------+ 1819 | RSLHSectionSize = 2 | 2 bits | 1820 +-----------------------------------------+-------------+ 1821 | accessUnitStartFlag = 1 | 1 bit | 1822 +-----------------------------------------+-------------+ 1823 | randomAccessPointFlag | 1 bit | 1824 +-----------------------------------------+-------------+ 1825 | bits to byte alignment | 4 bits | 1826 +-----------------------------------------+-------------+ 1827 | SL packet payload | N bytes | 1828 +-----------------------------------------+-------------+ 1830 For packets that transport non-first fragments of Access Units we 1831 have: 1833 Gentric et al. Expires November 2001 34 1834 +=========================================+=============+ 1835 | Field | size | 1836 +=========================================+=============+ 1837 | RTP header | - | 1838 +-----------------------------------------+-------------+ 1839 | CTSFlag = 0 | 1 bit | 1840 +-----------------------------------------+-------------+ 1841 | DTSFlag = 0 | 1 bit | 1842 +-----------------------------------------+-------------+ 1843 | bits to byte alignment | 6 bits | 1844 +-----------------------------------------+-------------+ 1845 | RSLHSectionSize = 1 | 2 bits | 1846 +-----------------------------------------+-------------+ 1847 | accessUnitStartFlag = 0 | 1 bit | 1848 +-----------------------------------------+-------------+ 1849 | zero bits to byte alignment | 4 bits | 1850 +-----------------------------------------+-------------+ 1851 | SL packet payload | N bytes | 1852 +-----------------------------------------+-------------+ 1854 Note the compositionTimeStamp is never present since it would be 1855 redundant with the RTP time stamp. However the value of CTSFlag is 1 1856 to indicate to the receiver that the value of 1857 compositionTimeStampFlag for the corresponding reconstructed SL 1858 packed. 1860 Overhead estimation 1862 In this example we have a RTP overhead of 40 + 2 bytes for 1400 1863 bytes of payload i.e. 3 % overhead. 1865 Appendix.2 RFC 3016 compatible MPEG-4 Video 1867 This is an example of a video stream where the SL is configured to 1868 produce RTP packets compatible with RFC 3016. 1870 SLConfigDescriptor 1872 In this example the SLConfigDescriptor is: 1874 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1875 tag=SLConfigDescrTag { 1876 bit(8) predefined; 1877 if (predefined==0) { 1878 bit(1) useAccessUnitStartFlag; = 0 1879 bit(1) useAccessUnitEndFlag; = 1 1880 bit(1) useRandomAccessPointFlag; = 0 1881 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1882 bit(1) usePaddingFlag; = 0 1883 bit(1) useTimeStampsFlag; = 0 1884 bit(1) useIdleFlag; = 0 1886 Gentric et al. Expires November 2001 35 1887 bit(1) durationFlag; = 0 1888 bit(32) timeStampResolution; = 0 1889 bit(32) OCRResolution; = 0 1890 bit(8) timeStampLength; = 0 1891 bit(8) OCRLength; = 0 1892 bit(8) AU_Length; = 0 1893 bit(8) instantBitrateLength; = 0 1894 bit(4) degradationPriorityLength; = 0 1895 bit(5) AU_seqNumLength; = 0 1896 bit(5) packetSeqNumLength; = 0 1897 bit(2) reserved=0b11; 1898 } 1899 if (durationFlag) { 1900 bit(32) timeScale; // NOT USED 1901 bit(16) accessUnitDuration; // NOT USED 1902 bit(16) compositionUnitDuration; // NOT USED 1903 } 1904 if (!useTimeStampsFlag) { 1905 bit(timeStampLength) startDecodingTimeStamp; = 0 1906 bit(timeStampLength) startCompositionTimeStamp; = 0 1907 } 1908 } 1910 SL Packet Header structure 1912 With this configuration we have the following SL packet header 1913 structure: 1915 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1916 if (SL.useAccessUnitEndFlag) { 1917 bit(1) accessUnitEndFlag; // 1 bit 1918 } 1919 } 1921 In this case this payload produces RTP packets that are exactly 1922 conformant to RFC 3016 and the Synch Layer is reduced to a purely 1923 logical construction that neither sender nor receiver need to 1924 implement. 1926 Parameters 1928 This configuration is the default one; no parameters are required. 1930 RTP packet structure 1932 Note that accessUnitEndFlag is mapped to the RTP header M bit. 1934 +=========================================+=============+ 1935 | Field | size | 1936 +=========================================+=============+ 1937 | RTP header | - | 1938 +-----------------------------------------+-------------+ 1939 | SL packet payload | 1400 bytes | 1941 Gentric et al. Expires November 2001 36 1942 +-----------------------------------------+-------------+ 1944 Overhead 1946 In this example we have a RTP overhead of 40 bytes for 1400 bytes of 1947 payload i.e. 3 % overhead. 1949 Appendix.3 Low delay MPEG-4 Audio 1951 This example is for a low delay audio service. For this reason a 1952 single SL packet is transported in each RTP packet. 1954 SLConfigDescriptor 1956 Since CTS=DTS and AccessUnit duration is constant signaling of MPEG- 1957 4 time stamps is not needed (the durationFlag of SLConfig is set) 1959 We also assume here an audio Object Type for which all Access Units 1960 are Random Access Points, which is signaled using the 1961 hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. 1963 We assume furtheremore a mode where the Access Unit size is constant 1964 and 5 bytes (which is signaled with AU_Length). 1966 In this example the SLConfigDescriptor is: 1968 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1969 tag=SLConfigDescrTag { 1970 bit(8) predefined; 1971 if (predefined==0) { 1972 bit(1) useAccessUnitStartFlag; = 0 1973 bit(1) useAccessUnitEndFlag; = 0 1974 bit(1) useRandomAccessPointFlag; = 0 1975 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 1976 bit(1) usePaddingFlag; = 0 1977 bit(1) useTimeStampsFlag; = 0 1978 bit(1) useIdleFlag; = 0 1979 bit(1) durationFlag; = 1 // signals constant AU duration 1980 bit(32) timeStampResolution; = 0 1981 bit(32) OCRResolution; = 0 1982 bit(8) timeStampLength; = 0 1983 bit(8) OCRLength; = 0 1984 bit(8) AU_Length; = 5 1985 bit(8) instantBitrateLength; = 0 1986 bit(4) degradationPriorityLength; = 0 1987 bit(5) AU_seqNumLength; = 0 1988 bit(5) packetSeqNumLength; = 0 1989 bit(2) reserved=0b11; 1990 } 1991 if (durationFlag) { 1992 bit(32) timeScale; = 1000 // for milliseconds 1993 bit(16) accessUnitDuration; = 10 // ms 1994 bit(16) compositionUnitDuration; = 10 // ms 1996 Gentric et al. Expires November 2001 37 1997 } 1998 if (!useTimeStampsFlag) { 1999 bit(timeStampLength) startDecodingTimeStamp; = 0 2000 bit(timeStampLength) startCompositionTimeStamp; = 0 2001 } 2002 } 2004 SL packet header 2006 With this configuration the SL packet header is empty. 2008 Parameters 2010 No parameters are required. 2012 RTP packet structure 2014 Note that the RTP header M bit should be always set to 1. 2016 +=========================================+=============+ 2017 | Field | size | 2018 +=========================================+=============+ 2019 | RTP header | - | 2020 +-----------------------------------------+-------------+ 2021 | SL packet payload | 5 bytes | 2022 +-----------------------------------------+-------------+ 2024 Overhead estimation 2026 The overhead is extremely large i.e. more than 800 %, since 40 bytes 2027 of headers are required to transport 5 bytes of data. Note however 2028 that RTP header compression would work well since time stamps 2029 increments are constant. 2031 Appendix.4 Media delivery MPEG-4 Audio 2033 This example is for a media delivery service where delay is not an 2034 issue but efficiency is. In this case several SL Packets are 2035 transported in each RTP packet. 2037 SLConfigDescriptor 2039 Is the same as in Appendix.3 2041 SL packet header 2043 With this configuration the SL packet header is empty. 2045 Parameters 2047 Gentric et al. Expires November 2001 38 2048 The absence of RSLHSectionSizeLength indicates that the RSLHSection 2049 is empty. 2051 The size of SL Packets (which are all complete Access Units in this 2052 case) is constant and is indicated with: 2054 a=fmtp: SLPPSize=5 2056 This also indicates to the receiver that the Multiple-SL mode will 2057 be used, i.e. that a 2 bytes field will give the size of the 2058 MSLHSection. In this case however this field always contains zero 2059 since the MSLHSection is empty. 2061 RTP packet structure 2063 Note that the RTP header M bit is always set to 1, which indicates 2064 to the receiver that only complete Access Units are transported. 2066 +=========================================+=============+ 2067 | Field | size | 2068 +=========================================+=============+ 2069 | RTP header | - | 2070 +-----------------------------------------+-------------+ 2071 | MSLHSection size in bits = 0 | 2 bytes | 2072 +-----------------------------------------+-------------+ 2073 | SL packet payload | 5 bytes | 2074 +-----------------------------------------+-------------+ 2075 | SL packet payload | 5 bytes | 2076 +-----------------------------------------+-------------+ 2077 | etc, until MTU is reached | 2078 +-----------------------------------------+-------------+ 2079 | SL packet payload | 5 bytes | 2080 +-----------------------------------------+-------------+ 2082 Overhead estimation 2084 The overhead is 3% i.e. minimal. 2086 Appendix.5 A more complex case: AAC with interleaving 2088 Let us consider AAC around 130 kb/s where each Access Unit is split 2089 in 4 SL packets corresponding to Error Sensitivity Categories (ESC) 2090 of maximum 90 bytes for which interleaving is very useful in terms 2091 of error resilience. We thus use an interleaving scheme where 15 SL 2092 Packets (extracted from 15 consecutive Access Units) are used to 2093 construct each RTP packet in order to match a MTU of 1500 bytes. 2094 Note that since ESC fragments are not byte aligned we also use the 2095 paddingFlag and paddingBits features of the Synch Layer. 2097 The interleaving sequence is 4 RTP packets and 350 ms long, which is 2098 too long for conferencing but perfectly OK for Internet radio. 2100 Gentric et al. Expires November 2001 39 2101 Since the sequence contains 60 SL packets, the sequence number can 2102 be encoded on 6 bits. However 2 bits are actually enough if the 2103 sender always resets the SL packet sequence number to zero at the 2104 start of each sequence, since only the first MSLH in each of the 4 2105 RTP packets in the sequence carries an absolute sequence number 2106 value (0,1,2,3). 2108 2 bits are also enough for SLPSeqNumDelta, which is constant and 2109 equal to 3 (since +1 is automatically added). 2111 Note that the 4th RTP packet in each sequence has its M bit set to 1 2112 since it contains 15 SL packets transporting the end of 15 2113 consecutive Access Units. 2115 With this scheme a sender (for example upon reception of RTCP 2116 reports indicating high loss rates) can (for example) choose to 2117 duplicate for each interleaving sequence the first RTP packet that 2118 contains the most useful data in terms of ESC or apply other error 2119 protection techniques, with due care to congestion issues. 2121 In this example we will also show several other SL features (OCR, AU 2122 boundary flags, padding, as detailed below). 2124 One feature demonstrated by this example is the degradation 2125 priority. We assume degradation priority can take 4 different 2126 values, mapped to Error Sensitivity Categories, and is encoded on 2 2127 bits. This interleaving scheme makes sure that only SL packets of 2128 identical degradation priorities are grouped in the same RTP packet 2129 (3.6.3) and that only the first RSLH of each RTP packet transports 2130 the degradation priority. 2132 We also assume that for each last SL packet of each RTP packet the 2133 server inserts an OCR. 2135 SLConfigDescriptor 2137 In this example the SLConfigDescriptor is: 2139 class SLConfigDescriptor extends BaseDescriptor : bit(8) 2140 tag=SLConfigDescrTag { 2141 bit(8) predefined; 2142 if (predefined==0) { 2143 bit(1) useAccessUnitStartFlag; = 1 2144 bit(1) useAccessUnitEndFlag; = 1 2145 bit(1) useRandomAccessPointFlag; = 0 2146 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 2147 bit(1) usePaddingFlag; = 1 // we need to signal padding bits 2148 bit(1) useTimeStampsFlag; = 0 2149 bit(1) useIdleFlag; = 0 2150 bit(1) durationFlag; = 1 2151 bit(32) timeStampResolution; = 0 2153 Gentric et al. Expires November 2001 40 2154 bit(32) OCRResolution; = 30 2155 bit(8) timeStampLength; = 0 2156 bit(8) OCRLength; = 32 2157 bit(8) AU_Length; = 0 2158 bit(8) instantBitrateLength; = 0 2159 bit(4) degradationPriorityLength; = 2 2160 bit(5) AU_seqNumLength; = 0 2161 bit(5) packetSeqNumLength; = 6 2162 bit(2) reserved=0b11; 2163 } 2164 if (durationFlag) { 2165 bit(32) timeScale; = 1000// milliseconds 2166 bit(16) accessUnitDuration; = 23.22 // ms 2167 bit(16) compositionUnitDuration; = 23.22 // ms 2168 } 2169 if (!useTimeStampsFlag) { 2170 bit(timeStampLength) startDecodingTimeStamp; = 0 2171 bit(timeStampLength) startCompositionTimeStamp; = 0 2172 } 2173 } 2175 SL Packet Header structure 2177 With this configuration we have the following SL packet header 2178 structure: 2180 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 2181 bit(1) accessUnitStartFlag; 2182 bit(1) accessUnitEndFlag; 2183 bit(1) OCRflag; 2184 bit(1) paddingFlag; 2185 if (paddingFlag) bit(3) paddingBits; 2186 bit(SL.packetSeqNumLength) packetSequenceNumber; 2187 bit(1) DegPrioflag; 2188 if (DegPrioflag) { 2189 bit(SL.degradationPriorityLength) degradationPriority;} 2190 if (OCRflag) { 2191 bit(SL.OCRLength) objectClockReference;} 2192 } 2193 } 2195 Parameters 2197 The RSLHSectionSize cannot exceed 2 bits, which is encoded on 2 bits 2198 and signaled by RSLHSectionSizeLength. 2200 The resulting concatenated fmtp line is: 2202 a=fmtp: 2203 SLPPSizeLength=6;RSLHSectionSizeLength=2;SLPSeqNumLength=2;SLPSeqNum 2204 DeltaLength=2;OCRDeltaLength=16 2206 RTP packet structure 2208 Gentric et al. Expires November 2001 41 2209 +=========================================+=============+ 2210 | Field | size | 2211 +=========================================+=============+ 2212 | RTP header | - | 2213 +-----------------------------------------+-------------+ 2214 MSLHSection 2215 +=========================================+=============+ 2216 | MSLHSection size in bits = 135 | 2 bytes | 2217 +-----------------------------------------+-------------+ 2218 | SLPPayloadSize | 7 bits | 2219 +-----------------------------------------+-------------+ 2220 | SLPSeqNum = 0 or 1 or 2 or 3 | 2 bits | 2221 +-----------------------------------------+-------------+ 2222 | SLPPayloadSize | 7 bits | 2223 +-----------------------------------------+-------------+ 2224 | SLPSeqDeltaNum = 3 | 2 bits | 2225 +-----------------------------------------+-------------+ 2226 | etc + 12 times 9 bits | 2227 +-----------------------------------------+-------------+ 2228 | SLPPayloadSize | 7 bits | 2229 +-----------------------------------------+-------------+ 2230 | SLPSeqDeltaNum = 3 | 2 bits | 2231 +-----------------------------------------+-------------+ 2232 | bits to byte alignment | 7 bits | 2233 +-----------------------------------------+-------------+ 2234 RSLHSection 2235 +=========================================+=============+ 2236 | RSLHSectionSize | 6 bits | 2237 +-----------------------------------------+-------------+ 2238 | accessUnitStartFlag | 1 bit | 2239 +-----------------------------------------+-------------+ 2240 | accessUnitEndFlag | 1 bit | 2241 +-----------------------------------------+-------------+ 2242 | OCRFlag = 0 | 1 bit | 2243 +-----------------------------------------+-------------+ 2244 | paddingFlag = 1 | 1 bit | 2245 +-----------------------------------------+-------------+ 2246 | paddingBits | 3 bits | 2247 +-----------------------------------------+-------------+ 2248 | DegPrioflag = 1 | 1 bit | 2249 +-----------------------------------------+-------------+ 2250 | degradationPriority | 2 bits | 2251 +-----------------------------------------+-------------+ 2252 | accessUnitStartFlag | 1 bit | 2253 +-----------------------------------------+-------------+ 2254 | accessUnitEndFlag | 1 bit | 2255 +-----------------------------------------+-------------+ 2256 | OCRFlag = 0 | 1 bit | 2257 +-----------------------------------------+-------------+ 2258 | paddingFlag = 1 | 1 bit | 2259 +-----------------------------------------+-------------+ 2260 | paddingBits | 3 bits | 2262 Gentric et al. Expires November 2001 42 2263 +-----------------------------------------+-------------+ 2264 | DegPrioflag = 0 | 1 bit | 2265 +-----------------------------------------+-------------+ 2266 | etc + 12 times 8 bits | 2267 +-----------------------------------------+-------------+ 2268 | accessUnitStartFlag | 1 bit | 2269 +-----------------------------------------+-------------+ 2270 | accessUnitEndFlag | 1 bit | 2271 +-----------------------------------------+-------------+ 2272 | OCRFlag = 1 | 1 bit | 2273 +-----------------------------------------+-------------+ 2274 | OCRDelta | 16 bits | 2275 +-----------------------------------------+-------------+ 2276 | paddingFlag = 0 | 1 bit | 2277 +-----------------------------------------+-------------+ 2278 | DegPrioflag = 0 | 1 bit | 2279 +-----------------------------------------+-------------+ 2280 | bits to byte alignment | 5 bits | 2281 +-----------------------------------------+-------------+ 2282 SLPPSection 2283 +=========================================+=============+ 2284 | SL packet payload |max 90 bytes | 2285 +-----------------------------------------+-------------+ 2286 | etc + 13 SL packets | 2287 +-----------------------------------------+-------------+ 2288 | SL packet payload |max 90 bytes | 2289 +-----------------------------------------+-------------+ 2291 Note that in the above table the last SL packet in the RTP packet 2292 has a payload that is byte-aligned (at the end). When this happens 2293 paddingFlag is set to zero and the paddingBits field is omitted. 2295 Overhead estimation 2297 The MSLHSection is 19 bytes, the RSLHSection is 16 bytes; in this 2298 example we have therefore a RTP overhead of 40 + 35 bytes for 1350 2299 bytes (max) of payload i.e. around 6 % overhead. 2301 Gentric et al. Expires November 2001 43