idnits 2.17.1 draft-gentric-avt-mpeg4-multisl-02.txt: -(942): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(982): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 6 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 39 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 177 has weird spacing: '... media unawa...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 2001) is 8442 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '7' is defined on line 1999, but no explicit reference was found in the text == Unused Reference: '10' is defined on line 2012, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '7') (Obsoleted by RFC 6416) == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-02 == Outdated reference: A later version (-04) exists of draft-singer-mpeg4-ip-01 -- Possible downref: Normative reference to a draft: ref. '9' ** Obsolete normative reference: RFC 2327 (ref. '10') (Obsoleted by RFC 4566) == Outdated reference: A later version (-03) exists of draft-curet-avt-rtp-mpeg4-flexmux-00 -- Possible downref: Normative reference to a draft: ref. '11' Summary: 8 errors (**), 0 flaws (~~), 10 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Avaro-France Telecom 3 Internet Draft Basso-AT&T 4 Casner-Packet Design 5 Civanlar-AT&T 6 Gentric-Philips 7 Herpel-Thomson 8 Lifshitz-Optibase 9 Lim-mp4cast 10 Perkins-ISI 11 van der Meer-Philips 12 March 2001 13 Expires Sept. 2001 14 Document: draft-gentric-avt-mpeg4-multisl-02.txt 16 RTP Payload Format for MPEG-4 Streams 18 Status of this Memo 20 This document is an Internet-Draft and is in full conformance with 21 all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. Internet-Drafts are draft documents valid for a maximum of 27 six months and may be updated, replaced, or obsoleted by other 28 documents at any time. It is inappropriate to use Internet- Drafts 29 as reference material or to cite them other than as "work in 30 progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 Abstract 39 This document describes a payload format for transporting MPEG-4 40 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 41 the coding of natural and synthetic audio-visual data. Several 42 services provided by RTP are beneficial for MPEG-4 encoded data 43 transport over the Internet. Additionally, the use of RTP makes it 44 possible to synchronize MPEG-4 data with other real-time data types. 46 This specification is a product of the Audio/Video Transport working 47 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 48 ad hoc group on MPEG-4 over Internet. Comments are solicited and 50 Gentric et al. Expires September 2001 1 51 should be addressed to the working group's mailing list at rem- 52 conf@es.net and/or the authors. 54 1. Introduction 56 MPEG-4 is a recent standard from ISO/IEC for the coding of natural 57 and synthetic audio-visual data in the form of audiovisual objects 58 that are arranged into an audiovisual scene by means of a scene 59 description [1][2][3][4]. This draft specifies an RTP [5] payload 60 format for transporting MPEG-4 encoded data streams. 62 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 63 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 64 this document are to be interpreted as described in RFC 2119 [6]. 66 The benefits of using RTP for MPEG-4 data stream transport include: 68 i. Ability to synchronize MPEG-4 streams with other RTP payloads 70 ii. Monitoring MPEG-4 delivery performance through RTCP 72 iii. Combining MPEG-4 and other real-time data streams received from 73 multiple end-systems into a set of consolidated streams through RTP 74 mixers 76 iv. Converting data types, etc. through the use of RTP translators. 78 1.1 Overview of MPEG-4 End-System Architecture 80 Fig. 1 below shows the general layered architecture of MPEG-4 81 terminals. The Compression Layer processes individual audio-visual 82 media streams. The MPEG-4 compression schemes are defined in the 83 ISO/IEC specifications 14496-2 [2] and 14496-3 [3]. The compression 84 schemes in MPEG-4 achieve efficient encoding over a bandwidth 85 ranging from several kbps to many Mbps. The audio-visual content 86 compressed by this layer is organized into Elementary Streams (ESs). 87 The MPEG-4 standard specifies MPEG-4 compliant streams. Within the 88 constraint of this compliance the compression layer is unaware of a 89 specific delivery technology, but it can be made to react to the 90 characteristics of a particular delivery layer such as the path-MTU 91 or loss characteristics. Also, some compressors can be designed to 92 be delivery specific for implementation efficiency. In such cases 93 the compressor may work in a non-optimal fashion with delivery 94 technologies that are different than the one it is specifically 95 designed to operate with. 97 The hierarchical relations, location and properties of ESs in a 98 presentation are described by a dynamic set of Object Descriptors 99 (ODs). Each OD groups one or more ES Descriptors referring to a 100 single content item (audio-visual object). Hence, multiple 101 alternative or hierarchical representations of each content item are 102 possible. 104 Gentric et al. Expires July 2001 2 105 ODs are themselves conveyed through one or more ESs. A complete set 106 of ODs can be seen as an MPEG-4 resource or session description at a 107 stream level. The resource description may itself be hierarchical, 108 i.e. an ES conveying an OD may describe other ESs conveying other 109 ODs. 111 The session description is accompanied by a dynamic scene 112 description, Binary Format for Scene (BIFS), again conveyed through 113 one or more ESs. At this level, content is identified in terms of 114 audio-visual objects. The spatio-temporal location of each object is 115 defined by BIFS. The audio-visual content of those objects that are 116 synthetic and static are described by BIFS also. Natural and 117 animated synthetic objects may refer to an OD that points to one or 118 more ESs that carry the coded representation of the object or its 119 animation data. 121 By conveying the session (or resource) description as well as the 122 scene (or content composition) description through their own ESs, it 123 is made possible to change portions of the content composition and 124 the number and properties of media streams that carry the audio- 125 visual content separately and dynamically at well known instants in 126 time. 128 One or more initial Scene Description streams and the corresponding 129 OD stream has to be pointed to by an initial object descriptor 130 (IOD). The IOD needs to be made available to the receivers through 131 some out-of-band means that are not defined in this document. 133 A homogeneous encapsulation of ESs carrying media or control (ODs, 134 BIFS) data is defined by the Sync Layer (SL) that primarily provides 135 the synchronization between streams. The Compression Layer organizes 136 the ESs in Access Units (AU), the smallest elements that can be 137 attributed individual timestamps. Integer or fractional AUs are then 138 encapsulated in SL packets. All consecutive data from one stream is 139 called an SL-packetized stream at this layer. The interface between 140 the compression layer and the SL is called the Elementary Stream 141 Interface (ESI). The ESI is informative. 143 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 144 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 145 media unaware but delivery technology aware. It provides transparent 146 access to and delivery of content irrespective of the technologies 147 used. The interface between the SL and DMIF is called the DMIF 148 Application Interface (DAI). It offers content location independent 149 procedures for establishing MPEG-4 sessions and access to transport 150 channels. The specification of this payload format is considered as 151 a part of the MPEG-4 Delivery Layer. 153 media aware +-----------------------------------------+ 154 delivery unaware | COMPRESSION LAYER | 155 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 156 14496-3 Audio +-----------------------------------------+ 158 Gentric et al. Expires July 2001 3 159 Elementary 160 Stream 161 ===================================================Interface 163 (ESI) 164 +-------------------------------------------+ 165 media and | SYNC LAYER | 166 delivery unaware | manages elementary streams, their synch- | 167 14496-1 Systems | ronization and hierarchical relations | 168 +-------------------------------------------+ 170 DMIF 171 Application 172 ====================================================Interface 174 (DAI) 175 +-------------------------------------------+ 176 delivery aware | DELIVERY LAYER | 177 media unaware |provides transparent access to and delivery| 178 14496-6 DMIF | of content irrespective of delivery | 179 | technology | 180 +-------------------------------------------+ 182 Figure 1: General MPEG-4 terminal architecture 184 1.2 MPEG-4 Elementary Stream Data Packetization 186 The ESs from the encoders are fed into the SL with indications of AU 187 boundaries, random access points, desired composition time and the 188 current time. 190 The Sync Layer fragments the ESs into SL packets, each containing a 191 header that encodes information conveyed through the ESI. If the AU 192 is larger than a SL packet, subsequent packets containing remaining 193 parts of the AU are generated with subset headers until the complete 194 AU is packetized. 196 The syntax of the Sync Layer is configurable and can be adapted to 197 the needs of the stream to be transported. This includes the 198 possibility to select the presence or absence of individual syntax 199 elements as well as configuration of their length in bits. The 200 configuration for each individual stream is conveyed in a 201 SLConfigDescriptor, which is an integral part of the ES Descriptor 202 for this stream. 204 It is assumed that the MPEG-4 SLConfigDescriptor is transported "out 205 of band". This is typically done via an ObjectDescriptorStream using 206 the MPEG-4 Object Description framework. However since some 207 knowledge of the SLConfigDescriptor is required by an RTP receiver 208 in order to parse MPEG-4 System specific elements in the RTP payload 209 defined in this document, the SLConfigDescriptor MAY be transported 211 Gentric et al. Expires July 2001 4 212 in the SDP associated with such a stream using the a=fmtp syntax 213 (see section 8). 215 2. Analysis of the carriage of MPEG-4 over IP 217 When transporting MPEG-4 audio and video, applications may or may 218 not require the use of MPEG-4 systems. To achieve the highest level 219 of interoperability between all MPEG-4 applications, it is desirable 220 that (a) in both cases the same MPEG-4 transport format can be used 221 and that (b) receivers that have no MPEG-4 system knowledge can 222 easily skip the MPEG-4 system specific information -if any-. 224 RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4 225 video, but when using MPEG-4 systems a problem arises from the fact 226 that both RTP and MPEG-4 systems contain a synchronization layer. 227 In particular, the RTP header duplicates some of the information 228 provided in SL packet headers such as the composition timestamps 229 (CTSs) and the marker bit that signals the end of access units. 231 To avoid unnecessary overhead and potential interoperability risks 232 when transporting MPEG-4 systems, it is desirable to remove the 233 redundancy between the SL packet header and the RTP packet header. 234 To be independent on the use of MPEG-4 systems, synchronization can 235 rely on the parameters provided in the RTP header. 237 In case SL headers are used, the redundant fields are removed from 238 the SL header, producing "reduced SL headers". 239 The remaining information from the SL header, if any, is contained 240 inside the RTP packet payload, together with the SL packet payload. 241 The combination of RTP packet headers and reduced SL packet headers 242 can be used to logically map the RTP packets to complete SL packets. 244 Some of the information contained in the reduced SL headers is also 245 useful for transport over RTP when MPEG-4 systems is not used. 247 For that reason the information in the "reduced" SL headers is split 248 into "general useful information" and "MPEG-4 systems only 249 information". 251 The "general useful information" hereinafter called Mapped SL Packet 252 Header (MSLH) is carried by a number of fields configurable using 253 SDP parameters; all receivers can parse these fields. 255 The "MPEG-4 systems only information" �if any- is contained in a 256 reduced SL header, hereinafter called Remaining SL Packet Header 257 (RSLH), also signaled by SDP parameters and preceded by a length 258 field, so as to enable easy skipping of this information by non- 259 MPEG-4 system devices. 261 This is depicted in figure 2. 263 Gentric et al. Expires July 2001 5 264 <----------SL Packet--------> 266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 267 | SL Packet | SL Packet | 268 | Header | Payload | 269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 270 | | 271 | | 272 ++++++++++++++++++++++++++++++ | 273 | | | | 274 V V V V 275 +-+-+-+-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ 276 |RTP Packet | | Mapped SL | | Remaining SL| | SL Packet | 277 | Header | | Header | | Header | | Payload | 278 +-+-+-+-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ 280 <----RTP Packet Payload-------------------> 282 Figure 2: Mapping of SL Packet into RTP packet 284 This RTP payload format has been designed so that it can be 285 configured (using SDP parameters) to be identical to RFC 3016 for 286 the recommended MPEG-4 video configurations. Hence receivers that 287 comply with this payload specification can decode such RTP payload. 289 3. Payload Format 291 The RTP Payload corresponds to an integer number of SL packets. 293 SL packets inside RTP packets MUST be in the SL stream order i.e: 294 i) decodingTimeStamp order, if present 295 ii) packetSequenceNumber order, if present 296 iii) Implicit decoding order in all other cases. 298 The SL Packet Headers are transformed into RSLH with some fields 299 extracted to be mapped in the RTP header and others extracted to be 300 mapped in the corresponding MSLH. The SL Packet Payload is 301 unchanged. 303 When generating SL packetized stream specifically for this format 304 all other fields in the SL Packet Headers that the RTP header does 305 not duplicate (including the decodingTimeStamp) is OPTIONAL. 307 This payload format has two modes. The "SingleSL" mode is a mode 308 where a single SL packet is transported per RTP packet. The 309 "MultipleSL" mode is a mode where more than one SL packet are 310 transported per RTP packet. The default mode is the Single-SL mode. 311 The mode can be set to Multiple-SL by adding in SDP a SLPPSize or 312 SLPPSizeLength parameter (see section 8). 314 Gentric et al. Expires July 2001 6 315 RTP Packets SHOULD be sent in the decoding (MPEG-4 316 decodingTimeStamp) order. 318 The size (or number) of the SL packet(s) SHOULD be adjusted such 319 that the resulting RTP packet is not larger than the path-MTU. To 320 handle larger packets, this payload format relies on lower layers 321 for fragmentation, which may not be desirable. 323 3.1 RTP Header Fields Usage 325 Payload Type (PT): The assignment of an RTP payload type for this 326 new packet format is outside the scope of this document, and will 327 not be specified here. It is expected that the RTP profile for a 328 particular class of applications will assign a payload type for this 329 encoding, or if that is not done then a payload type in the dynamic 330 range shall be chosen. 332 Marker (M) bit: The M bit is set to 1 when all SL packets in the RTP 333 packet are Access Units ends i.e. the M bit maps to the SL 334 accessUnitEndFlag. 336 M is set to 1 when the RTP packet contains either: 337 . a single SL packet containing a full Access Unit 338 . a single SL packet transporting the last fragment of an Access 339 Unit 340 . multiple SL packets each containing a full Access Unit 341 . multiple SL packets each containing the last fragment of an Access 342 Unit 343 . multiple SL packets each containing either a full Access Unit or 344 the last fragment of an Access Unit 346 The last 2 cases occur when using specific interleaving schemes. In 347 some interleaving schemes it may not be practical to reshuffle the 348 SL packets so as to group Access Unit ends in the same RTP packet. 349 In that case, Access Unit boundaries -if needed- can be transported 350 using one or both of the SL flags accessUnitStartFlag and 351 accessUnitEndFlag. 353 Extension (X) bit: Defined by the RTP profile used. 355 Sequence Number: The RTP sequence number should be generated by the 356 sender with a constant random offset and does not have to be 357 correlated to any (optional) MPEG-4 SL sequence numbers. 359 Timestamp: Set to the value in the compositionTimeStamp field of the 360 first SL packet, if present. If compositionTimeStamp has less than 361 32 bits length, the MSBs of timestamp MUST be set to zero. 363 Although it is available from the SL configuration data, the 364 resolution of the timestamp may need to be conveyed explicitly 365 through some out-of-band means to be used by network elements which 366 are not MPEG-4 aware. 368 Gentric et al. Expires July 2001 7 369 If compositionTimeStamp has more than 32 bits length, this payload 370 format cannot be used. 372 In all cases, the sender SHALL always make sure that RTP time stamps 373 are identical only for RTP packets transporting fragments of the 374 same Access Unit. 376 In case compositionTimeStamp is not present in the current SL 377 packet, but has been present in a previous SL packet the reason is 378 that this is the same Access Unit that has been fragmented therefore 379 the same timestamp value MUST be taken as RTP timestamp. 381 If compositionTimeStamp is never present in SL packets for this 382 stream, the RTP packetizer SHOULD convey a reading of a local clock 383 at the time the RTP packet is created. 385 According to RFC1889 [5, Section 5.1] timestamps are recommended to 386 start at a random value for security reasons. However then, a 387 receiver is not in the general case able to reconstruct the original 388 MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for 389 applications where streams from multiple sources are to be 390 synchronized. Therefore the usage of such a random offset SHOULD be 391 avoided. 393 Note that since RTP devices may re-stamp the stream, all time stamps 394 inside of the RTP payload (CTS and DTS in MSLH, OCR in RSLH) MUST be 395 expressed as difference to the RTP time stamp. Since this 396 subtraction may lead to negative values, the offset MUST be encoded 397 as a two's complement signed integer in network byte order. Note 398 these offsets (delta) typically require much fewer bits to be 399 encoded than the original length, which is another justification. 401 SSRC: set as described in RFC1889 [5]. A mapping between the ES 402 identifiers (ESIDs) and SSRCs should be provided through out-of-band 403 means. 405 CC and CSRC fields are used as described in RFC 1889 [5]. 407 RTCP SHOULD be used as defined in RFC 1889 [5]. 409 RTP timestamps in RTCP SR packets: according to the RTP timing 410 model, the RTP timestamp that is carried into an RTCP SR packet is 411 the same as the compositionTimeStamp that would be applied to an RTP 412 packet for data that was sampled at the instant the SR packet is 413 being generated and sent. The RTP timestamp value is calculated from 414 the NTP timestamp for the current time, which also goes in the RTCP 415 SR packet. To perform that calculation, an implementation needs to 416 periodically establish a correspondence between the CTS value of a 417 data packet and the NTP time at which that data was sampled. 419 Gentric et al. Expires July 2001 8 420 3.2 RTP payload structure 422 The packet payload structure consists of 3 byte-aligned sections. 424 The first section is the MSLHSection and contains Mapped SL Packet 425 Headers (MSLH). The MSLH structure is described in 3.3. In the 426 Single-SL mode this section is empty by default. 428 The second section is the RSLHSection and contains Remaining SL 429 Headers (RSLH). The RSLH structure is described in 3.5. By default 430 this section is empty. 432 The last section (SLPPSection) contains the SL packet payloads. This 433 section is never empty. 435 The Nth MSLH in the MSLHSection, the Nth RSLH in the RSLHSection and 436 the Nth SL packet payload in the SLPPSection correspond to the Nth 437 SL packet transported by the RTP packet. 439 0 1 2 3 440 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 442 |V=2|P|X| CC |M| PT | sequence number | 443 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 444 | timestamp | 445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 446 | synchronization source (SSRC) identifier | 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 : contributing source (CSRC) identifiers : 449 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 450 | | 451 | MSLHSection (byte aligned) | 452 | | 453 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | | | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 456 | | 457 | RSLHSection (byte aligned) | 458 | | 459 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 | | | 461 +-+-+-+-+-+-+-+-+ | 462 | | 463 | SLPPSection (byte aligned) | 464 | | 465 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 | :...OPTIONAL RTP padding | 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 469 Figure 3: An RTP packet for MPEG-4 471 Gentric et al. Expires July 2001 9 472 3.3 MSLHSection structure 474 If the MSLHSection consumes a non-integer number of bytes, up to 7 475 zero padding bits MUST be inserted at the end in order to achieve 476 byte-alignment. 478 In the Single-SL mode the MSLHSection consists of a single MSLH. 480 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 481 | MSLH (x bits ) | padding bits | 482 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 484 Figure 4: MSLHSection structure in Single-SL mode 486 In the Multiple-SL mode this section consist of a 2 bytes field 487 giving the size in bits (in network byte order) of the following 488 block of bit-wise concatenated MSLHs. 490 This size field is absent in the Single-SL mode not because it is 491 not needed (which would be a minor gain) but for compatibility with 492 RFC 3016. 494 0 1 2 3 495 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 | MSLH section size in bits | MSLH | etc | 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 499 | as many bit-wise concatenated MSLHs | 500 | as SL packets in this RTP packet | 501 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 502 | |padding bits | 503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 505 Figure 5: MSLHSection structure in Multiple-SL mode 507 3.4 MSLH structure 509 The Mapped SL Packet Header content depends on SDP parameters, by 510 default it is empty for the Single-SL mode and contains only the 511 SLPPayloadSize (SL Packet Payload Size) field in the Multiple-SL 512 mode. 514 When all options are signaled in SDP the MSLH structure is given in 515 figure 6. 517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 518 | SLPPayloadSize | 519 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 520 | SLPSeqNum/SLPSeqNumDelta | 521 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 523 Gentric et al. Expires July 2001 10 524 | CTSFlag | 525 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 526 | CTSDelta | 527 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 528 | DTSFlag | 529 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 530 | DTSDelta | 531 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 533 Figure 6: Mapped SL Packet Header (MSLH) structure 535 In the general case a receiver can only discover the size of a MSLH 536 by parsing it since for example the presence of CTSDelta is signaled 537 by the value of CTSFlag. 539 3.4.1 Fields of MSLH 541 SLPPayloadSize (SL Packet Payload Size): Indicates the size in bytes 542 of the associated SL Packet Payload, which can be found in the 543 SLPPSection of the RTP packet. The length in bits of this field is 544 signaled in SDP by the SLPPSizeLength SDP parameter (see section 8). 546 SLPSeqNum/SLPSeqNumDelta: Encodes the packetSequenceNumber (serial 547 number) of the SL Packet. 549 SLPSeqNum is found only for the first SL packet. SLPSeqNumDelta is 550 optional and -if present- appears for subsequent (non-first) SL 551 packets. 553 The length in bits of the SLPSeqNum field is defined by the 554 SLPSeqNumLength SDP parameter (see section 8). 556 The length in bits of the SLPSeqNumDelta field is defined by the 557 SLPSeqNumDeltaLength SDP parameter (see section 8). 559 If the parameter SLPSeqNumDeltaLength is defined in SDP, non-first 560 SL packets have their packetSequenceNumber encoded as a difference 561 named SLPSeqNumDelta. This difference is relative to the previous SL 562 packet in the RTP packet according to (with i>=0): 563 packetSequenceNumber(0) = SLPSeqNum(0) 564 packetSequenceNumber(i+1) = packetSequenceNumber(i) + 565 SLPSeqNumDelta(i+1) + 1 567 If the parameter SLPSeqNumDeltaLength is not defined in SDP the 568 default value is zero i.e. this field is not present for non-first 569 SL packets. Furthermore receivers SHALL then apply the above formula 570 with SLPSeqNumDelta equal to zero i.e. by default 571 packetSequenceNumber is incremented by 1 for each SL packet in one 572 RTP packet. This means that for streams that use 573 packetSequenceNumber and are not interleaved the transport of 574 packetSequenceNumber in the Multiple-SL mode is "almost free". 576 Gentric et al. Expires July 2001 11 577 CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A 578 value of 1 indicates that the field is present, a value of 0 that it 579 is not present. 581 This field -if present- appears for all SL packets since the 582 receiver needs it to reconstruct the compositionTimeStampFlag of SL 583 Headers. 585 CTSDelta: Specifies the value of the CTS as a 2-complement offset 586 (delta) from the timestamp in the RTP header of this RTP packet. 587 The length in bits of each CTSDelta field is specified in SDP by the 588 CTSDeltaLength parameter (see section 8). 590 This field -if present- appears only for non-first SL packets since 591 the composition time stamp of the first SL packet is mapped to the 592 RTP time stamp, regardless of whether CTSFlag is 1. The sender MUST 593 remove the compositionTimeStamp from the corresponding RSLH. 595 DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A 596 value of 1 indicates that DTSDelta is present, a value of 0 that it 597 is not present. 599 This field -if present- appears for all SL packets since it is 600 needed by the receiver to reconstruct the decodingTimeStampFlag. 602 DTSDelta: Specifies the value of the decodingTimeStamp as a 2 603 complement offset (delta) from the timestamp in the RTP header of 604 this packet. The length in bits of each DTSDelta field is specified 605 in SDP by the DTSDeltaLength parameter (see section 8). 607 This field appears when DTSFlag is 1. Then the sender MUST remove 608 the decodingTimeStamp from the corresponding RSLH. 610 3.4.2 Relationship between sizes of MSLH fields and SDP parameters 612 The relationship between a Mapped SL Packet Header and the related 613 SDP parameters is as follows: 615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 616 | Fields of MSLPH | Number of bits (SDP parameters) | 617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 618 | SLPPayloadSize | SLPPSizeLength | 619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 620 | SLPSeqNum | SLPSeqNumLength | 621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 622 | SLPSeqNumDelta | SLPSeqNumDeltaLength | 623 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 624 | CTSFlag | 1 If ( CTSDeltaLength > 0 ) | 625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 626 | CTSDelta | CTSDeltaLength If(CTSFlag==1) | 628 Gentric et al. Expires July 2001 12 629 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 630 | DTSFlag | 1 If ( DTSDeltaLength > 0 ) | 631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 632 | DTSDelta | DTSDeltaLength If(DTSFlag==1) | 633 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 635 Table 1: Relationship between MSLH field�s size and SDP parameters 637 3.5 RSLHSection structure 639 This section consists of a field (RSLHSize) giving the size in bits 640 of the following block of bit-wise concatenated RSLHs. 642 If the section consumes a non-integer number of bytes, up to 7 zero 643 padding bits MUST be inserted at the end in order to achieve byte- 644 alignment. 646 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 647 | RSLHSize (RSLHSizeLength bits)| RSLH (variable number of bits)| 648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 649 | | 650 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 651 | | RSLH (variable number of bits) | 652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 653 | etc | 654 | as many bit-wise concatenated RSLHs | 655 | as SL Packets in this RTP packet | 656 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 657 | RSLH (variable number of bits) | 658 | +-+-+-+-+-+-+-+-+ 659 | | padding bits | 660 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 662 Figure 7: RSLHSection structure 664 The length in bits of the RSLHSize field is RSLHSizeLenght and is 665 specified in SDP with a default value of zero indicating that the 666 whole RSLHSection is absent. 668 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 669 | Fields of RSLHSection |Number of bits (SDP parameters)| 670 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 671 | RSLHSize | RSLHSizeLength | 672 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 673 | all bit-wise concatenated RSLHs | RSLHSize | 674 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 676 Table 2: Sizes in bits inside RSLHSection, SDP parameters 678 Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system 679 awareness, specifically it requires to understand the MPEG-4 681 Gentric et al. Expires July 2001 13 682 Synchronization Layer (SL) syntax and the modifications to this 683 syntax described in the next section (3.6). 685 However thanks to the RSLHSize field non-MPEG-4-system receivers MAY 686 skip this part by rounding up RSLPHSize/8 to the next integer number 687 of bytes. 689 3.6 RSLH structure 691 A Remaining SL Packet Header (RSLH) is what remains of an SL header 692 after modifications for mapping into this payload format. 694 The following modifications of the SL packet header MUST be applied. 695 The other fields of the SL packet header MUST remain unchanged but 696 are bit-shifted to fill in the gaps left by the operations specified 697 below. 699 3.6.1 Removal of fields 701 The following SL Packet Header fields -if present- are removed since 702 they are mapped either in the RTP header or in the corresponding 703 MSLH: 704 . compositionTimeStampFlag 705 . compositionTimeStamp 706 . decodingTimeStampFlag 707 . decodingTimeStamp 708 . packetSequenceNumber 710 3.6.2 Mapping of OCR 712 Furthermore if the SL Packet header contains an OCR, then this field 713 is encoded in the RSLH as a 2-complement difference (delta) exactly 714 like a compositionTimeStamp or a decodingTimeStamp in the MSLH. The 715 length in bit of this difference is indicated by the OCRDeltaLength 716 parameter in SDP (see section 8). 718 With this payload format OCRs MUST have the same clock resolution as 719 Time Stamps. 721 If compositionTimeStamp is not present for a SL packet that has OCR 722 then the OCR SHALL be encoded as a difference to the RTP time stamp. 724 3.6.3 Degradation Priority 726 For streams that use the optional degradationPriority field in the 727 SL Packet Headers, only SL packets with the same degradation 728 priority SHALL be transported by one RTP packet so that components 729 may dispatch the RTP packets according to appropriate QOS or 730 protection schemes. Furthermore only the first RSLH of one RTP 731 packet SHALL contain the degradationPriority field since it would be 732 otherwise redundant. 734 Gentric et al. Expires July 2001 14 735 3.7 SLPPSection structure 737 The SLPPSection (SL Packet Payload Section) contains the 738 concatenated SL Packet Payloads. By definition SL Packet Payloads 739 are byte aligned. 741 For efficiency SL packets do not carry their own payload size. This 742 is not an issue for RTP packets that contain a single SL Packet. 744 However in the Multiple-SL mode the size of each SL packet payload 745 MUST be available to the receiver. 747 If the SL packet payload size is constant for a stream, the size 748 information SHOULD NOT be transported in the RTP packet. However in 749 that case it MUST be signaled in SDP using a (a=fmtp: 750 SLPPSize=) syntax (see section 8). 752 If the SL packet payload size is variable then the size of each SL 753 packet payload MUST be indicated in the corresponding MSLH. In order 754 to do so the MSLH MUST contain a SLPPayloadSize field. The number of 755 bits on which this SLPPayloadSize field is encoded MUST be indicated 756 in the corresponding SDP using a (a=fmtp: 757 SLPPSizeLength=) syntax (see section 8). 759 The absence of either SLPPSize or SLPPSizeLength in SDP indicates 760 the Single-SL mode i.e. that a single SL packet is transported in 761 each RTP packet for that stream. 763 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 764 | SLPP (variable number of bytes) | 765 + + 766 | | 767 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 768 | | SLPP (variable number of bytes) | 769 +-+-+-+-+-+-+-+-+-+-+-+-+-+ + 770 | | 771 + + 772 | | 773 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 774 | etc | 775 | as many byte-wise concatenated SLPPs | 776 | as SL Packets in this RTP packet | 777 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 779 Figure 8: SLPPSection structure 781 3.8 Interleaving 783 Gentric et al. Expires July 2001 15 784 SL Packets MAY be interleaved. Senders MAY perform interleaving. 785 Receivers MUST support interleaving. 787 When interleaving of SL packets is used it SHALL be implemented 788 using the SLPSeqNum field of MSLH. 790 The AUSequenceNumber field of the SL header MUST NOT be used for 791 interleaving since firstly it may collide with BIFS Carousel usage 792 and secondly it is not visible to non-MPEG-4 system receivers. 794 The conjunction of RTP sequence number and SLPSeqNum can produce a 795 quasi-unique identifier for each SL packet so that a receiver can 796 unambiguously reconstruct the original order even in case of out-of- 797 order packets, packet loss or duplication. 799 3.9 Fragmentation Rules 801 MPEG-4 Access Units are the default fragments for MPEG-4 bitstreams 802 and SHOULD be mapped one-to-one to RTP packets of this format with 803 two exceptions: 804 - Access Units larger than the MTU, 805 - When using interleaving for better packet loss resilience. 807 In all cases Access Unit start MUST be aligned with SL packet start. 809 This section gives rules to apply when performing Access Unit 810 fragmentation. 812 Some MPEG-4 codecs define optional syntax for Access Units sub- 813 entities (fragments) that are independently decodable for error 814 resilience purposes. Examples are Video Packets for video and Error 815 Sensitivity Categories (ESC) for audio. This always corresponds to 816 specific bitstream syntax, which is signaled in the 817 DecoderSpecificInfo inside the DecoderConfig in SLConfig, and/or 818 using the corresponding SDP parameters as described in section 8. 819 Therefore encoders and decoders are both aware whether they are 820 operating in such a mode or not (however since this codec 821 configuration is an opaque data block this is not explicitly 822 signaled by this payload format). 824 If not operating in such a mode it is obvious that the decoder has 825 to skip packets after a loss until an Access Unit start is received. 826 Similarly decoder implementations that do not implement robust 827 decoding of Access Units fragments have to discard all packets after 828 a packet loss until an Access Unit start is received. In the same 829 way decoder implementations that do not implement re-synchronization 830 at any Access Units start have to discard all packets after a packet 831 loss until a Random Access Point Access Unit is received. 833 One problem would arise however for decoder implementations that try 834 to restart decoding after a packet loss if independently decodable 835 fragments are signaled (in the decoder configuration) but the 836 fragments actually received are not independently decodable because 838 Gentric et al. Expires July 2001 16 839 the RTP sender has made RTP packets on different boundaries than the 840 fragments provided by the encoder. 842 For this reason the following rules must apply to SL streams that 843 are specifically made for transport with this payload format: 845 SL packets SHOULD be codec-semantic entities in the spirit of ALF 846 i.e. either complete Access Units or fragments of Access Units that 847 are independently decodable. Specifically when a given codec has an 848 independently decodable Access Unit fragments optional syntax this 849 option SHOULD be used. 851 Furthermore when streams are generated using independently decodable 852 Access Units fragments these Access Units fragments MUST be mapped 853 one-to-one into SL packets. Consequently independently decodable 854 Access Units fragments MUST NOT be split across several SL packets 855 and therefore MUST NOT be split across several RTP packets. 857 For example an MPEG-4 audio stream encoded using the ESC syntax MUST 858 NOT split one ESC across 2 RTP packets. 860 This rule is relaxed when using MPEG-4 Video Packets for two 861 reasons: firstly Video Packets can be much larger than typical MTU 862 and secondly all Video Packets start with a specific 863 resynchronization marker that can be unambiguously detected. 864 Therefore for video streams using the Video Packet syntax Video 865 Packets MAY be split across several SL packets although it is 866 strongly RECOMMENDED to always adapt the Video Packet size to fit 867 the MTU. In all cases a Video Packet start MUST always be aligned 868 with a SL packet start. 870 The rule is maintained for video Data Partitions since the second 871 Data Partition of a Video Packet does not start with a non-emulable 872 resynchronization marker. 874 4. SL packetized stream reconstruction 876 The MPEG-4 over IP framework [9] requires that the way a receiver 877 can reconstruct a valid SL packetized stream shall be documented, 878 this is the purpose of this section. 880 Since this format directly transports SL packets this reconstruction 881 is trivial with the following rules: 883 - SLPacketHeader.packetSequenceNumber is restored from 884 MSLH.SLPSeqNum for the first SL packet in the RTP packet (i= 0): 885 SLPacketHeader.packetSequenceNumber(0) = MSLH.SLPSeqNum(0) 886 and for subsequent packets using (for i>=0) : 887 SLPacketHeader.packetSequenceNumber(i+1) = 888 SLPacketHeader.packetSequenceNumber(i) + MSLH.SLPSeqNumDelta(i+1) +1 890 Gentric et al. Expires July 2001 17 891 - All time stamps (CTS, DTS, OCR), when present, are restored from 892 the delta values. 893 - Time stamps flags (CTSFlag, DTSFlag) in MSLH are used to 894 reconstruct respectively the compositionTimeStampFlag and 895 decodingTimeStampFlag of SLPacketHeader. 897 Specifically the reconstruction depends on the SDP parameters as 898 follows: 900 If SDP.CTSDeltaLength is absent or equals 0: 901 The SL stream reconstruction rules are: 902 . for the first (or only) SL packet: 903 . if SLConfig.useTimeStamps == true, then: 904 . SLPacketHeader.compositionTimeStampFlag = true 905 . SLPacketHeader.compositionTimeStamp = RTP TimeStamp 906 . if SLConfig.useTimeStamps == false, then: 907 . SLPacketHeader.compositionTimeStampFlag is not defined 908 . for the following SL packets: 909 . SLPacketHeader.compositionTimeStampFlag = false 911 If SDP.CTSDeltaLength is not zero: 912 . SLPacketHeader.compositionTimeStampFlag = MSLH.CTSFlag 913 . SLPacketHeader.compositionTimeStamp = RTP TimeStamp + 914 MSLH.CTSDelta 916 - The other SL packet header fields SHALL remain as found in RSLH. 918 It is obvious that in the general case the reconstruction of the 919 original SL packetized stream requires SL-awareness. However this 920 payload format allows in all cases a receiver that does not know 921 about the SL syntax to reconstruct the semantic of SL for the 922 following very useful features: 923 - Packet order (decoding order) 924 - Access Unit boundaries (using the M bit) 925 - Access Unit fragments (i.e. SL packet boundaries using 926 MSLH.SLPPayloadSize) 927 - Composition Time Stamps (using the RTP Time Stamp and 928 MSLH.CTSDelta) 929 - Decoding Time Stamps (using the RTP Time Stamp and MSLH.DTSDelta) 930 - Packet sequence number (using the RTP Time Sequence number and 931 MSLH.SLPSeqNum) 933 5. Other issues 935 5.1 Handling of scene description streams 937 MPEG-4 introduces new stream types as described in section 1 namely 938 Object Descriptors and BIFS. In the following both OD and BIFS are 939 discussed on the same basis i.e. as �scene description�. 941 Gentric et al. Expires July 2001 18 942 Considering Scene description as a �stream-able� type of content is 943 a rather new concept and for that reasons some specific comments are 944 needed. 946 In the past scene description has been formatted (encoded) in such a 947 way that information loss would in the general case cripple the 948 presentation beyond any hope of repair by the receiver. Still this 949 type of encoding is well suited for a number of multimedia 950 applications were the scene is first made available via reliable 951 channels to the client and then played. This payload format is not 952 intended for this type of applications but can be used if the RTP 953 packets are transported using TCP or any other reliable protocol. 955 MPEG-4 in contrast has introduced the possibility to dynamically 956 change the scene description by sending animation information 957 (changes in parameters) and structural change information (updates). 958 Since this information has to be sent in a timely fashion MPEG-4 has 959 defined a number of techniques in order to encode the scene 960 description in a manner that makes it behave similarly to other 961 temporal encoding schemes such as audio and video. This payload 962 format is intended for this usage. 964 Note that in many cases the application will consist of first the 965 reliable transmission of a static initial scene followed by the 966 streaming of animations and updates. For this reason the usage of 967 this payload format in both cases offers a useful unique solution. 969 Senders must be aware that suitable schemes should be used when 970 scene description streams transport sensitive configuration 971 information, for example in case the RTP packet transporting an OD- 972 update command signaling a new media subject would be lost the 973 corresponding media stream would not be accessible by the receiver. 975 Redundancy is a possibility and may either be added by tools 976 hierarchically higher than this payload format, e.g. by packet based 977 FEC, re-transmission, or similar tools. In such a case, the general 978 congestion control principles have to be observed. 980 MPEG-4 also defines Random Access Points (RAP) for scene description 981 streams (OD and BIFS) where by definition a decoder can restart 982 decoding i.e. receives a �full update� of the scene. The periodicity 983 of transmission of these RAPs can be calculated by observing 984 parameters such as the packet loss rate and the number of receivers. 985 Just as for video where the periodicity of so called Intra Frame 986 defines the robustness to errors it is the responsibility of the 987 sender to make sure the periodicity of RAPs is suitable. 989 5.2 Multiplexing 991 An advanced MPEG-4 session may involve a large number of objects 992 that may be as many as a few hundred, transporting each ES as an 993 individual RTP stream may not always be practical. Allocating and 995 Gentric et al. Expires July 2001 19 996 controlling hundreds of destination addresses for each MPEG-4 997 session may pose insurmountable session administration problems. 998 The input/output processing overhead at the end-points will be 999 extremely high also. Additionally, low delay transmission of low 1000 bitrate data streams, e.g. facial animation parameters, results in 1001 extremely high header overheads. 1003 To solve these problems, MPEG-4 data transport requires a 1004 multiplexing scheme that allows selective bundling of several ESs. 1005 This is beyond the scope of the payload format defined here. 1007 The MPEG-4's Flexmux multiplexing scheme may be used for this 1008 purpose and a specific RTP payload format is being developed [11]. 1010 Another approach may be to develop a generic RTP multiplexing scheme 1011 usable for MPEG-4 data. The multiplexing scheme reported in [8] may 1012 be a candidate for this approach. 1014 For MPEG-4 applications, the multiplexing technique needs to address 1015 the following requirements: 1017 i. The ESs multiplexed in one stream can change frequently during a 1018 session. Consequently, the coding type, individual packet size and 1019 temporal relationships between the multiplexed data units must be 1020 handled dynamically. 1022 ii. The multiplexing scheme should have a mechanism to determine the 1023 ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is 1024 not a part of the SL header. 1026 iii. In general, an SL packet does not contain information about its 1027 size. The multiplexing scheme should be able to delineate the 1028 multiplexed packets whose lengths may vary from a few bytes to close 1029 to the path-MTU. 1031 6. Security Considerations 1033 RTP packets using the payload format defined in this specification 1034 are subject to the security considerations discussed in the RTP 1035 specification [5]. This implies that confidentiality of the media 1036 streams is achieved by encryption. Because the data compression used 1037 with this payload format is applied end-to-end, encryption may be 1038 performed on the compressed data so there is no conflict between the 1039 two operations. The packet processing complexity of this payload 1040 type does not exhibit any significant non-uniformity in the receiver 1041 side to cause a denial-of-service threat. 1043 However, it is possible to inject non-compliant MPEG streams (Audio, 1044 Video, and Systems) to overload the receiver/decoder's buffers which 1045 might compromise the functionality of the receiver or even crash it. 1046 This is especially true for end-to-end systems like MPEG where the 1047 buffer models are precisely defined. 1049 Gentric et al. Expires July 2001 20 1050 MPEG-4 Systems supports stream types including commands that are 1051 executed on the terminal like OD commands, BIFS commands, etc. and 1052 programmatic content like MPEG-J (Java(TM) Byte Code) and 1053 ECMASCRIPT. It is possible to use one or more of the above in a 1054 manner non-compliant to MPEG to crash or temporarily make the 1055 receiver unavailable. 1057 Authentication mechanisms can be used to validate of the sender and 1058 the data to prevent security problems due to non-compliant malignant 1059 MPEG-4 streams. 1061 A security model is defined in MPEG-4 Systems streams carrying MPEG- 1062 J access units which comprises Java(TM) classes and objects. MPEG-J 1063 defines a set of Java APIs and a secure execution model. MPEG-J 1064 content can call this set of APIs and Java(TM) methods from a set of 1065 Java packages supported in the receiver within the defined security 1066 model. According to this security model, downloaded byte code is 1067 forbidden to load libraries, define native methods, start programs, 1068 read or write files, or read system properties. 1070 Receivers can implement intelligent filters to validate the buffer 1071 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 1072 ECMAScript) commands in the streams. However, this can increase the 1073 complexity significantly. 1075 7. Types and names 1077 The encoding name associated to this RTP payload format is: 1078 - "mpeg4-sl". 1080 The media type may be any of: 1081 - "video" 1082 - "audio" 1083 - "application" 1085 "video" SHOULD be used for MPEG-4 Video streams (ISO/IEC 14496-2) or 1086 MPEG-4 Systems streams that convey information needed for an 1087 audio/visual presentation. 1089 "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or 1090 MPEG-4 Systems streams that convey information needed for an audio- 1091 only presentation. 1093 "application" SHOULD be used for MPEG-4 Systems streams (ISO/IEC 1094 14496-1) that serve other purposes than audio/visual presentation, 1095 e.g. in some cases when MPEG-J streams are transmitted. 1097 8. Additional SDP syntax 1099 8.1 Mapping information 1101 Gentric et al. Expires July 2001 21 1102 This format may require additional information about the mapping to 1103 be made available to the receiver. This is signaled to the receiver 1104 using SDP (a=fmtp) parameters as in RFC 2327 [10, section 6]. 1106 The absence of any of these fields in SDP is similar to a field set 1107 to the default value, which is always zero. 1109 The absence of any such parameters resolves into a default "basic" 1110 configuration. 1112 8.1.1 Indication of DTSDelta bit length 1114 The following syntax shall be used: 1116 a=fmtp: DTSDeltaLength= 1118 being the number of bits on which the DTSDelta field is 1119 encoded in MSLH. The default value is zero and indicates the absence 1120 of DTSFlag and DTSDelta in MSLH (the stream does not transport 1121 decodingTimeStamps). A value larger than zero indicates that there 1122 is a DTSFlag in each MSLH. 1124 Since decodingTimeStamp -if present- must be encoded as a difference 1125 to the RTP time stamp, the DTSDeltaLength parameter MUST be present 1126 in SDP in order to transport decodingTimeStamps with this payload 1127 format. 1129 8.1.2 Indication of CTSDelta bit length 1131 The following syntax shall be used: 1133 a=fmtp: CTSDeltaLength= 1135 being the number of bits on which the CTSDelta field is 1136 encode in (non-first) MSLH. The default value is zero and indicates 1137 the absence of the CTSFlag and CTSDelta fields in MSLH (the mode is 1138 Single-SL or the stream does not transport compositionTimeStamps). 1140 Since compositionTimeStamps -if present- must be encoded as a 1141 difference to the RTP time stamp, the CTSDeltaLength parameter MUST 1142 be present in SDP in order to transport compositionTimeStamps using 1143 this payload format (in the Multiple-SL mode). 1145 8.1.3 Indication of OCRDelta bit length 1147 The following syntax shall be used: 1149 a=fmtp: OCRDeltaLength= 1151 being the number of bits on which the OCRDelta field is 1152 encoded in RSLH. The default value is zero and indicates the absence 1153 of OCR for this stream. 1155 Gentric et al. Expires July 2001 22 1156 Since objectClockReference -if present- must be encoded as a 1157 difference to the RTP time stamp, the OCRDeltaLength parameter MUST 1158 be present in SDP in order to transport objectClockReferences with 1159 this payload format. 1161 8.1.4 Indication of SLPPayloadSize bit length 1163 The following syntax shall be used: 1165 a=fmtp: SLPPSizeLength= 1167 being the number of bits on which the SLPPayloadSize field 1168 of MSLH is encoded. The default value is zero and indicates the 1169 Single-SL mode (unless SLPPSize is present in SDP). 1171 Simultaneous presence in SDP of this parameter and SLPPSize is 1172 illegal. 1174 Either the SLPPSizeLength or SLPPSize parameter MUST be present in 1175 SDP in order to signal the Multiple-SL mode of this payload format. 1177 8.1.5 Indication of constant SL packet size 1179 The following syntax shall be used: 1181 a=fmtp: SLPPSize= 1183 being the constant size in bytes of each SL Packet Payload 1184 for this stream. The default value is zero and indicates variable SL 1185 Packet Payload size (or the Single-SL mode if SLPPSizeLength is 1186 absent). 1188 Simultaneous presence in SDP of this parameter and SLPPSizeLength is 1189 illegal. 1191 Either the SLPPSizeLength or SLPPSize parameter MUST be present in 1192 SDP in order to signal the Multiple-SL mode of this payload format. 1194 When SLPPSize is present in SDP the SLPPayloadSize of MSLH in the 1195 RTP packets MUST NOT be present. 1197 8.1.6 Indication of SLPSeqNum bit length 1199 The following syntax shall be used: 1201 a=fmtp: SLPSeqNumLength= 1203 being the number of bits on which the SLPSeqNum is encoded 1204 in the first MSLH. The default value is zero and indicates the 1205 absence of SLPSeqNum and SLPSeqNumDelta for all MSLHs. 1207 Gentric et al. Expires July 2001 23 1208 Since packetSequenceNumber -if present- must be mapped in MSLH, the 1209 SLPSeqNumLength parameter MUST be present in SDP in order to 1210 transport packetSequenceNumber with this payload format. 1212 8.1.7 Indication of SLPSeqNumDelta bit length 1214 The following syntax shall be used: 1216 a=fmtp: SLPSeqNumDeltaLength= 1218 being the number of bits on which the SLPSeqNumDelta are 1219 encoded in any non-first MSLH. The default value is zero and 1220 indicates that packetSequenceNumber MUST be incremented by one for 1221 each SL packet in the RTP packet (see section 3.5). 1223 Since when interleaving packetSequenceNumber does not increment by 1 1224 inside a RTP packet, the SLPSeqNumDeltaLength parameter MUST be 1225 present in SDP when using interleaving with this payload format. 1227 8.1.8 Indication of RSLHSize bit length 1229 The following syntax shall be used: 1231 a=fmtp: RSLHSizeLength= 1233 being the number of bits that is used to encode the RSLHSize 1234 field. The default value is zero and indicates the absence of the 1235 whole RSLHSection for all RTP packets of this stream. 1237 Compatibility with RFC 3016 requires that the RSLHSection is empty, 1238 including the RSLHSize field. This is the reason why there is such a 1239 variable length with a default value indicating absence of the 1240 RSLHSize field. 1242 8.2 Optional configuration information 1244 In the MPEG-4 framework the following information is carried using 1245 the Object Descriptor. For compatibility with receivers that do not 1246 implement the full MPEG-4 system specification this information MAY 1247 also be indicated in SDP. 1249 For transport of MPEG-4 audio and video without the use of MPEG-4 1250 systems, as well as to support non-MPEG-4 system receivers, it is 1251 possible to transport information on the profile and level of the 1252 stream and on the decoder configuration. 1254 8.2.1 Indication of SLConfigDescriptor 1256 Senders MAY transmit the SLConfigDescriptor in SDP. 1258 The following syntax shall be used: 1260 Gentric et al. Expires July 2001 24 1261 a=fmtp: SLConfigDescriptor= 1263 being a base-64 encoding of the SLConfigDescriptor. This 1264 SHALL be the original SLConfigDescriptor and it SHALL be the same as 1265 the one transported by the OD framework. 1267 8.2.2 Indications for MPEG-4 audio streams 1269 8.2.2.1 Indication of profile level 1271 Senders MAY transmit the profile and level indication in SDP. 1273 The following syntax shall be used: 1275 a=fmtp: profile-level-id= 1277 being a decimal representation of the MPEG-4 Audio Profile 1278 Level indication value defined in ISO/IEC 14496-1. This parameter 1279 indicates which MPEG-4 Audio tool subsets are applied to encode the 1280 audio stream. 1282 8.2.2.2 Indication of audio object type 1284 Senders MAY transmit the audio object type indication in SDP. 1286 The following syntax shall be used: 1288 a=fmtp: object-type= 1290 being a decimal representation of the MPEG-4 Audio Object 1291 Type value defined in ISO/IEC 14496-3. This parameter specifies the 1292 tool used by the encoder. It CAN be used to limit the capability 1293 within the specified "profile-level-id". 1295 8.2.2.3 Indication of audio bitrate 1297 Senders MAY transmit the audio bitrate in SDP. 1299 The following syntax shall be used: 1301 a=fmtp: bitrate= 1303 being a decimal representation of the audio bitrate in bits 1304 per second for the audio bit stream. 1306 8.2.2.4 Indication of audio decoder configuration 1308 Senders MAY transmit the audio decoder configuration in SDP. 1310 The following syntax shall be used: 1312 a=fmtp: config= 1314 Gentric et al. Expires July 2001 25 1315 being a hexadecimal representation of an octet string that 1316 expresses the audio payload configuration data "StreamMuxConfig", as 1317 defined in ISO/IEC 14496-3. Configuration data is mapped onto the 1318 octet string in an MSB-first basis. The first bit of the 1319 configuration data SHALL be located at the MSB of the first octet. 1321 In the last octet, zero-padding bits, if necessary, shall follow the 1322 configuration data. 1324 8.2.3 Indications for MPEG-4 video streams 1326 8.2.3.1 Indication of profile and level 1328 Senders MAY transmit the video profile and level indication in SDP. 1330 The following syntax shall be used: 1332 a=fmtp: profile-level-id= 1334 being a decimal representation of MPEG-4 Visual Profile 1335 Level indication value (profile_and_level_indication) defined in 1336 Table G-1 of ISO/IEC 14496-2. This parameter MAY be used in the 1337 capability exchange or session setup procedure to indicate MPEG-4 1338 Visual Profile and Level combination of which the MPEG-4 Visual 1339 codec is capable. If this parameter is not specified by the 1340 procedure, its default value of 1 (Simple Profile/Level 1) is used. 1342 8.2.3.2 Indication of video decoder configuration 1344 Senders MAY transmit the video decoder configuration in SDP. This 1345 parameter indicates the configuration of the corresponding MPEG-4 1346 visual bitstream. It SHALL NOT be used to indicate the codec 1347 capability in the capability exchange procedure. 1349 The following syntax shall be used: 1351 a=fmtp: config= 1353 being a hexadecimal representation of an octet string that 1354 expresses the MPEG-4 Visual configuration information, as defined in 1355 subclause 6.2.1 Start codes of ISO/IEC14496-2[2][4][9]. The 1356 configuration information is mapped onto the octet string in an MSB- 1357 first basis. The first bit of the configuration information SHALL be 1358 located at the MSB of the first octet. The configuration information 1359 indicated by this parameter SHALL be the same as the configuration 1360 information in the corresponding MPEG-4 Visual stream, except for 1361 first_half_vbv_occupancy and latter_half_vbv_occupancy, if it 1362 exists, which may vary in the repeated configuration information 1363 inside an MPEG-4 Visual stream (See 6.2.1 Start codes of 1364 ISO/IEC14496-2). 1366 8.3 Concatenation of fmtp parameters 1368 Gentric et al. Expires July 2001 26 1369 Multiple fmtp parameters SHOULD be expressed as a MIME media type 1370 string, in the form of a semicolon-separated list of parameter=value 1371 pairs. 1373 8.4 SDP file example 1375 In the following is an example of SDP syntax for the description of 1376 a session containing one MPEG-4 audio stream, one MPEG-4 video and 1377 one MPEG-4 system stream, transported using this format. Note that 1378 the video stream DTSDelta are encoded on 4 bits in this example. 1380 o= .... 1381 I= .... 1382 c=IN IP4 123.234.71.112 1383 m=video 1034 RTP/AVT 97 1384 a=fmtp:DTSDeltaLength=4 1385 a=rtpmap:97 mpeg4-sl 1386 m=audio 810 RTP/AVT 98 1387 a=rtpmpa:98 mpeg4-sl 1388 m=application 1234 RTP/AVT 99 1389 a=rtpmap:99 mpeg4-sl 1391 9. Examples of usage of this payload format 1393 This payload format has been designed to transport with flexibility 1394 a very versatile packetization scheme (the MPEG-4 Synchronization 1395 Layer); its complexity is therefore larger than the average for RTP 1396 payload formats. For this reason this section describes a number of 1397 key examples of how this payload format can be used. 1399 9.1 MPEG-4 Video 1401 Let us consider the case of a 30 frames per second MPEG-4 video 1402 stream which bit rate is high enough that Access Units have to be 1403 split in several SL packets (typically above 300 kb/s). 1405 Let us assume also that the video codec generates in that case Video 1406 Packets suitable to fit in one SL packet i.e that the video codec is 1407 MTU aware and the MTU is 1500 bytes. We assume furthermore that this 1408 stream contains B frames and that decodingTimeStamps are present. 1410 9.1.1 SLConfigDescriptor 1412 In this example the SLConfigDescriptor is: 1414 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1415 tag=SLConfigDescrTag { 1416 bit(8) predefined; 1417 if (predefined==0) { 1418 bit(1) useAccessUnitStartFlag; = 1 1420 Gentric et al. Expires July 2001 27 1421 bit(1) useAccessUnitEndFlag; = 0 1422 bit(1) useRandomAccessPointFlag; = 1 1423 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1424 bit(1) usePaddingFlag; = 0 1425 bit(1) useTimeStampsFlag; = 1 1426 bit(1) useIdleFlag; = 0 1427 bit(1) durationFlag; = 0 1428 bit(32) timeStampResolution; = 30 1429 bit(32) OCRResolution; = 0 1430 bit(8) timeStampLength; = 32 1431 bit(8) OCRLength; = 0 1432 bit(8) AU_Length; = 0 1433 bit(8) instantBitrateLength; = 0 1434 bit(4) degradationPriorityLength; = 0 1435 bit(5) AU_seqNumLength; = 0 1436 bit(5) packetSeqNumLength; = 0 1437 bit(2) reserved=0b11; 1438 } 1439 if (durationFlag) { 1440 bit(32) timeScale; // NOT USED 1441 bit(16) accessUnitDuration; // NOT USED 1442 bit(16) compositionUnitDuration; // NOT USED 1443 } 1444 if (!useTimeStampsFlag) { 1445 bit(timeStampLength) startDecodingTimeStamp; = 0 1446 bit(timeStampLength) startCompositionTimeStamp; = 0 1447 } 1448 } 1450 The useRandomAccessPointFlag is set so that the 1451 randomAccessPointFlag can indicate that the corresponding SL packet 1452 contains a GOV and the first Video Packet of an Intra coded frame. 1454 9.1.2 SL Packet Header structure 1456 With this configuration we have the following SL packet header 1457 structure: 1459 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1460 if (SL.useAccessUnitStartFlag) { 1461 bit(1) accessUnitStartFlag; // 1 bit 1462 } 1463 if (accessUnitStartFlag) { 1464 if (SL.useRandomAccessPointFlag) { 1465 bit(1) randomAccessPointFlag; // 1 bit 1466 } 1467 if (SL.useTimeStampsFlag) { 1468 bit(1) decodingTimeStampFlag; // 1 bit 1469 bit(1) compositionTimeStampFlag; // 1 bit 1470 } 1471 if (decodingTimeStampFlag) { 1472 bit(SL.timeStampLength) decodingTimeStamp; 1473 } 1475 Gentric et al. Expires July 2001 28 1476 if (compositionTimeStampFlag) { 1477 bit(SL.timeStampLength) compositionTimeStamp; 1478 } 1479 } 1480 } 1482 9.1.3 SDP mapping information 1484 decodingTimeStamps are encoded on 32 bits, which is much more than 1485 needed for delta. Therefore the sender will use DTSDeltaLength in 1486 the corresponding SDP to signal that only 6 bits are used for the 1487 coding of relative DTS in the RTP packet. 1489 The RSLHSize cannot exceed 2 bits, which is encoded on 2 bits and 1490 signaled by RSLHSizeLength. The resulting concatenated fmtp line is: 1492 a=fmtp: DTSDeltaLength=6;RSLHSizeLength=2 1494 9.1.4 RTP packet structure 1496 Two cases can occur; for packets that transport first fragments of 1497 Access Units we have: 1499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1500 | Field | size | 1501 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1502 | RTP header | - | 1503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1504 | CTSFlag = 1 | 1 bit | 1505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1506 | DTSFlag = 1 | 1 bit | 1507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1508 | DTSDelta | 6 bits | 1509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1510 | bits to byte alignment | 0 bits | 1511 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1512 | RSLHSize = 2 | 2 bits | 1513 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1514 | accessUnitStartFlag = 1 | 1 bit | 1515 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1516 | randomAccessPointFlag | 1 bit | 1517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1518 | bits to byte alignment | 4 bits | 1519 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1520 | SL packet payload | N bytes | 1521 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1523 For packets that transport non-first fragments of Access Units we 1524 have: 1526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1527 | Field | size | 1529 Gentric et al. Expires July 2001 29 1530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1531 | RTP header | - | 1532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1533 | CTSFlag = 0 | 1 bit | 1534 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1535 | DTSFlag = 0 | 1 bit | 1536 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1537 | bits to byte alignment | 6 bits | 1538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1539 | RSLHSize = 2 | 2 bits | 1540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1541 | accessUnitStartFlag = 0 | 1 bit | 1542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1543 | randomAccessPointFlag | 1 bit | 1544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1545 | zero bits to byte alignment | 4 bits | 1546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1547 | SL packet payload | N bytes | 1548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1550 Note the compositionTimeStamp is never present since it would be 1551 redundant with the RTP time stamp. However the value of CTSFlag is 1 1552 to indicate to the receiver that the value of 1553 compositionTimeStampFlag for the corresponding reconstructed SL 1554 packed. 1556 9.1.5 Overhead estimation 1558 In this example we have a RTP overhead of 40 + 2 bytes for 1400 1559 bytes of payload i.e. 3 % overhead. 1561 9.2 RFC 3016 compatible MPEG-4 Video 1563 This is an example of a video stream where the SL is configured to 1564 produce RTP packets compatible with RFC 3016. 1566 9.2.1 SLConfigDescriptor 1568 In this example the SLConfigDescriptor is: 1570 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1571 tag=SLConfigDescrTag { 1572 bit(8) predefined; 1573 if (predefined==0) { 1574 bit(1) useAccessUnitStartFlag; = 0 1575 bit(1) useAccessUnitEndFlag; = 1 1576 bit(1) useRandomAccessPointFlag; = 0 1577 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1578 bit(1) usePaddingFlag; = 0 1580 Gentric et al. Expires July 2001 30 1581 bit(1) useTimeStampsFlag; = 0 1582 bit(1) useIdleFlag; = 0 1583 bit(1) durationFlag; = 0 1584 bit(32) timeStampResolution; = 0 1585 bit(32) OCRResolution; = 0 1586 bit(8) timeStampLength; = 0 1587 bit(8) OCRLength; = 0 1588 bit(8) AU_Length; = 0 1589 bit(8) instantBitrateLength; = 0 1590 bit(4) degradationPriorityLength; = 0 1591 bit(5) AU_seqNumLength; = 0 1592 bit(5) packetSeqNumLength; = 0 1593 bit(2) reserved=0b11; 1594 } 1595 if (durationFlag) { 1596 bit(32) timeScale; // NOT USED 1597 bit(16) accessUnitDuration; // NOT USED 1598 bit(16) compositionUnitDuration; // NOT USED 1599 } 1600 if (!useTimeStampsFlag) { 1601 bit(timeStampLength) startDecodingTimeStamp; = 0 1602 bit(timeStampLength) startCompositionTimeStamp; = 0 1603 } 1604 } 1606 9.2.2 SL Packet Header structure 1608 With this configuration we have the following SL packet header 1609 structure: 1611 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1612 if (SL.useAccessUnitEndFlag) { 1613 bit(1) accessUnitEndFlag; // 1 bit 1614 } 1615 } 1617 9.2.3 SDP mapping information 1619 This configuration is the default one; no SDP parameters are 1620 required. 1622 9.2.4 RTP packet structure 1624 Note that accessUnitEndFlag is mapped to the RTP header M bit. 1626 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1627 | Field | size | 1628 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1629 | RTP header | - | 1630 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1631 | SL packet payload | 1400 bytes | 1633 Gentric et al. Expires July 2001 31 1634 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1636 In this example we have a RTP overhead of 40 bytes for 1400 bytes of 1637 payload i.e. 3 % overhead. 1639 9.3 Low delay MPEG-4 Audio 1641 This example is for a low delay audio service. For this reason a 1642 single SL packet is transported in each RTP packet. 1644 9.3.1 SLConfigDescriptor 1646 Since CTS=DTS and AU duration is constant signaling of MPEG-4 time 1647 stamps is not needed. 1649 We also assume here an audio Object Type for which all Access Units 1650 are Random Access Points, which is signaled using the 1651 hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. 1653 We assume furtheremore a mode where the Access Unit size is constant 1654 and 5 bytes (which is signaled with AU_Length). 1656 In this example the SLConfigDescriptor is: 1658 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1659 tag=SLConfigDescrTag { 1660 bit(8) predefined; 1661 if (predefined==0) { 1662 bit(1) useAccessUnitStartFlag; = 0 1663 bit(1) useAccessUnitEndFlag; = 0 1664 bit(1) useRandomAccessPointFlag; = 0 1665 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 1666 bit(1) usePaddingFlag; = 0 1667 bit(1) useTimeStampsFlag; = 0 1668 bit(1) useIdleFlag; = 0 1669 bit(1) durationFlag; = 0 1670 bit(32) timeStampResolution; = 0 1671 bit(32) OCRResolution; = 0 1672 bit(8) timeStampLength; = 0 1673 bit(8) OCRLength; = 0 1674 bit(8) AU_Length; = 5 1675 bit(8) instantBitrateLength; = 0 1676 bit(4) degradationPriorityLength; = 0 1677 bit(5) AU_seqNumLength; = 0 1678 bit(5) packetSeqNumLength; = 0 1679 bit(2) reserved=0b11; 1680 } 1681 if (durationFlag) { 1682 bit(32) timeScale; // NOT USED 1683 bit(16) accessUnitDuration; = 20 ms (just an example) 1684 bit(16) compositionUnitDuration; // NOT USED 1685 } 1687 Gentric et al. Expires July 2001 32 1688 if (!useTimeStampsFlag) { 1689 bit(timeStampLength) startDecodingTimeStamp; = 0 1690 bit(timeStampLength) startCompositionTimeStamp; = 0 1691 } 1692 } 1694 9.3.2 SL packet header 1696 With this configuration the SL packet header is empty. 1698 9.3.3 SDP mapping information 1700 No SDP parameters are required. 1702 9.3.4 RTP packet structure 1704 Note that the RTP header M bit should be always set to 1. 1706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1707 | Field | size | 1708 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1709 | RTP header | - | 1710 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1711 | SL packet payload | 5 bytes | 1712 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1714 9.3.5 Overhead estimation 1716 The overhead is extremely large i.e. more than 800 %, since 40 bytes 1717 of headers are required to transport 5 bytes of data. Note however 1718 that RTP header compression would work well since time stamps 1719 increments are constant. 1721 9.4 Media delivery MPEG-4 Audio 1723 This example is for a media delivery service where delay is not an 1724 issue but efficiency is. In this case several SL Packets are 1725 transported in each RTP packet. 1727 9.4.1 SLConfigDescriptor 1729 Is the same as in 9.3.1. 1731 9.4.2 SL packet header 1733 With this configuration the SL packet header is empty. 1735 9.4.3 SDP mapping information 1737 Gentric et al. Expires July 2001 33 1738 The absence of RSLHSizeLength in SDP indicates that the RSLHSection 1739 is empty. 1741 The size of SL Packets (which are all complete Access Units in this 1742 case) is constant and is indicated in SDP with: 1744 a=fmtp: SLPPSize=5 1746 This also indicates to the receiver that the Multiple-SL mode will 1747 be used, i.e. that a 2 bytes field will give the size of the 1748 MSLHSection. In this case however this field always contains zero 1749 since the MSLHSection is empty. 1751 9.4.4 RTP packet structure 1753 Note that the RTP header M bit should be always set to 1. 1755 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1756 | Field | size | 1757 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1758 | RTP header | - | 1759 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1760 | MSLHSection size in bits = 0 | 2 bytes | 1761 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1762 | SL packet payload | 5 bytes | 1763 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1764 | SL packet payload | 5 bytes | 1765 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1766 | etc, until MTU is reached | 1767 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1768 | SL packet payload | 5 bytes | 1769 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1771 9.4.5 Overhead estimation 1773 The overhead is 3% i.e. minimal. 1775 9.5 A more complex case: AAC with interleaving 1777 Let us consider AAC around 130 kb/s where each Access Unit is split 1778 in 4 SL packets corresponding to Error Sensitivity Categories (ESC) 1779 of maximum 90 bytes for which interleaving is very useful in terms 1780 of error resilience. We will therefore use an interleaving scheme 1781 where 15 SL Packets from 15 consecutive Access Units will be 1782 interleaved per RTP packet to match a MTU of 1500 bytes. 1784 The interleaving sequence is 4 RTP packets and 350 ms long, which is 1785 too long for conferencing but perfectly OK for Internet radio. 1787 Gentric et al. Expires July 2001 34 1788 Since the sequence contains 60 SL packets, the sequence number can 1789 be encoded on 6 bits. But 2 bits are actually enough if the sender 1790 always resets the SL packet sequence number to zero at the start of 1791 each sequence, since only the first MSLH in each of the 4 RTP 1792 packets in the sequence carries an absolute sequence number value 1793 (0,1,2,3). 1795 2 bits are also enough for SLPSeqNumDelta, which is constant and 1796 equal to 3 (since +1 is automatically added) 1798 Note that the 4th RTP packet in each sequence has its M bit set to 1 1799 since it contains 15 SL packets transporting the end of 15 different 1800 Access Units. 1802 With this scheme a sender (for example upon reception of RTCP 1803 reports indicating high loss rates) can �for example- choose to 1804 duplicate for each interleaving sequence the first RTP packet that 1805 contains the most useful data in terms of ESC. 1807 In this example we will also show several other SL features (OCR, AU 1808 boundary flags, as detailed below). 1810 One feature demonstrated by this example is the degradation 1811 priority. We assume degradation priority can take 4 different 1812 values, one for each SL packet of an Access Unit and is encoded on 2 1813 bits. This interleaving scheme makes sure that only SL packets of 1814 identical degradation priorities are grouped in the same RTP packet 1815 (3.6.3) and that only the first RSLH of each RTP packet transports 1816 the degradation priority. 1818 We also assume that for each last SL packet of each RTP packet the 1819 server inserts an OCR. 1821 9.5.1 SLConfigDescriptor 1823 In this example the SLConfigDescriptor is: 1825 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1826 tag=SLConfigDescrTag { 1827 bit(8) predefined; 1828 if (predefined==0) { 1829 bit(1) useAccessUnitStartFlag; = 1 1830 bit(1) useAccessUnitEndFlag; = 1 1831 bit(1) useRandomAccessPointFlag; = 0 1832 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 1833 bit(1) usePaddingFlag; = 0 1834 bit(1) useTimeStampsFlag; = 0 1835 bit(1) useIdleFlag; = 0 1836 bit(1) durationFlag; = 0 1837 bit(32) timeStampResolution; = 0 1838 bit(32) OCRResolution; = 30 1839 bit(8) timeStampLength; = 0 1841 Gentric et al. Expires July 2001 35 1842 bit(8) OCRLength; = 32 1843 bit(8) AU_Length; = 0 1844 bit(8) instantBitrateLength; = 0 1845 bit(4) degradationPriorityLength; = 2 1846 bit(5) AU_seqNumLength; = 0 1847 bit(5) packetSeqNumLength; = 6 1848 bit(2) reserved=0b11; 1849 } 1850 if (durationFlag) { 1851 bit(32) timeScale; // NOT USED 1852 bit(16) accessUnitDuration; // NOT USED 1853 bit(16) compositionUnitDuration; // NOT USED 1854 } 1855 if (!useTimeStampsFlag) { 1856 bit(timeStampLength) startDecodingTimeStamp; = 0 1857 bit(timeStampLength) startCompositionTimeStamp; = 0 1858 } 1859 } 1861 9.5.2 SL Packet Header structure 1863 With this configuration we have the following SL packet header 1864 structure: 1866 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1867 bit(1) accessUnitStartFlag; 1868 bit(1) accessUnitEndFlag; 1869 bit(1) OCRflag; 1870 bit(SL.packetSeqNumLength) packetSequenceNumber; 1871 bit(1) DegPrioflag; 1872 if (DegPrioflag) { 1873 bit(SL.degradationPriorityLength) degradationPriority;} 1874 if (OCRflag) { 1875 bit(SL.OCRLength) objectClockReference;} 1876 } 1877 } 1879 9.5.3 SDP mapping information 1881 The RSLHSize cannot exceed 2 bits, which is encoded on 2 bits and 1882 signaled by RSLHSizeLength. 1884 The resulting concatenated fmtp line is: 1886 a=fmtp: 1887 SLPPSizeLength=6;RSLHSizeLength=2;SLPSeqNumLength=2;SLPSeqNumDeltaLe 1888 ngth=2;OCRDeltaLength=16 1890 9.5.4 RTP packet structure 1892 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1893 | Field | size | 1894 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1896 Gentric et al. Expires July 2001 36 1897 | RTP header | - | 1898 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1899 MSLHSection 1900 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1901 | MSLHSection size in bits = 135 | 2 bytes | 1902 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1903 | SLPPayloadSize | 7 bits | 1904 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1905 | SLPSeqNum = 0 or 1 or 2 or 3 | 2 bits | 1906 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1907 | SLPPayloadSize | 7 bits | 1908 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1909 | SLPSeqDeltaNum = 3 | 2 bits | 1910 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1911 | etc + 12 times 9 bits | 1912 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1913 | SLPPayloadSize | 7 bits | 1914 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1915 | SLPSeqDeltaNum = 3 | 2 bits | 1916 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1917 | bits to byte alignment | 7 bits | 1918 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1919 RSLHSection 1920 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1921 | RSLHSize | 6 bits | 1922 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1923 | accessUnitStartFlag | 1 bit | 1924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1925 | accessUnitEndFlag | 1 bit | 1926 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1927 | OCRFlag = 0 | 1 bit | 1928 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1929 | DegPrioflag = 1 | 1 bit | 1930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1931 | degradationPriority | 2 bits | 1932 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1933 | accessUnitStartFlag | 1 bit | 1934 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1935 | accessUnitEndFlag | 1 bit | 1936 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1937 | OCRFlag = 0 | 1 bit | 1938 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1939 | DegPrioflag = 0 | 1 bit | 1940 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1941 | etc + 12 times 4 bits | 1942 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1943 | accessUnitStartFlag | 1 bit | 1944 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1945 | accessUnitEndFlag | 1 bit | 1946 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1947 | OCRFlag = 1 | 1 bit | 1948 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1949 | OCRDelta | 16 bits | 1951 Gentric et al. Expires July 2001 37 1952 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1953 | DegPrioflag = 0 | 1 bit | 1954 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1955 | bits to byte alignment | 4 bits | 1956 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1957 SLPPSection 1958 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1959 | SL packet payload |max 90 bytes | 1960 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1961 | etc + 13 SL packets | 1962 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1963 | SL packet payload |max 90 bytes | 1964 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1966 9.5.5 Overhead estimation 1968 The MSLHSection is 19 bytes, the RSLHSection is 10 bytes; in this 1969 example we have therefore a RTP overhead of 40 + 23 bytes for 1350 1970 bytes (max) of payload i.e. around 5 % overhead. 1972 10. Acknowledgements 1974 This document involved across several years useful contributions 1975 from a large number of people since it is based on work within the 1976 IETF AVT working group and various ISO MPEG working groups, 1977 especially the 4-on-IP ad-hoc group in the last stages. The authors 1978 wish to thank Guido Fransceschini, Art Howarth, Dave Mackie, Dave 1979 Singer, and Stephan Wenger for their valuable comments. 1981 11. References 1983 [1] ISO/IEC 14496-1:2000 MPEG-4 Systems October 2000 1985 [2] ISO/IEC 14496-2:1999/Amd.1:2000(E) MPEG-4 Visual January 2000 1987 [3] ISO/IEC 14496-3:1999/FDAM 1:20000 MPEG-4 Audio January 2000 1989 [4] ISO/IEC 14496-6 FDIS Delivery Multimedia Integration Framework, 1990 November 1998. 1992 [5] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport 1993 Protocol for Real Time Applications RFC 1889, Internet Engineering 1994 Task Force, January 1996. 1996 [6] S. Bradner, Key words for use in RFCs to Indicate Requirement 1997 Levels, RFC 2119, March 1997. 1999 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 2000 payload format for MPEG-4 Audio/Visual streams, RFC 3016. 2002 Gentric et al. Expires July 2001 38 2004 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed 2005 RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-02.txt, 2006 November 2000. 2008 [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over 2009 IP-based Protocols, work in progress, draft-singer-mpeg4-ip- 2010 01.txt,October 2000. 2012 [10] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327, 2013 Internet Engineering Task Force, April 1998. 2015 [11] C.Roux & al, RTP Payload Format for MPEG-4 FlexMultiplexed 2016 Streams, work in progress, draft-curet-avt-rtp-mpeg4-flexmux-00.txt, 2017 February 2001. 2019 12. Authors' Addresses 2021 Olivier Avaro 2022 France Telecom 2023 35 A Schutzenhuttenweg 2024 60598 Frankfurt am Main 2025 Deutschland 2026 e-mail: olivier.avaro@francetelecom.fr 2028 Andrea Basso 2029 AT&T Labs Research 2030 200 Laurel Avenue 2031 Middletown, NJ 07748 2032 USA 2033 e-mail: basso@research.att.com 2035 Stephen L. Casner 2036 Packet Design, Inc. 2037 66 Willow Place 2038 Menlo Park, CA 94025 2039 USA 2040 e-mail: casner@acm.org 2042 M. Reha Civanlar 2043 AT&T Labs - Research 2044 100 Schultz Drive 2045 Red Bank, NJ 07701 2046 USA 2047 e-mail: civanlar@research.att.com 2049 Philippe Gentric 2050 Philips Digital Networks 2051 22 Avenue Descartes 2052 94453 Limeil-Brevannes CEDEX 2053 France 2054 e-mail: philippe.gentric@philips.com 2056 Gentric et al. Expires July 2001 39 2057 Carsten Herpel 2058 THOMSON multimedia 2059 Karl-Wiechert-Allee 74 2060 30625 Hannover 2061 Germany 2062 e-mail: herpelc@thmulti.com 2064 Zvi Lifshitz 2065 Optibase Ltd. 2066 7 Shenkar St. 2067 Herzliya 46120 2068 Israel 2069 e-mail: zvil@optibase.com 2071 Young-kwon Lim 2072 mp4cast (MPEG-4 Internet Broadcasting Solution Consortium) 2073 1001-1 Daechi-Dong Gangnam-Gu 2074 Seoul, 305-333, 2075 Korea 2076 e-mail : young@techway.co.kr 2078 Colin Perkins 2079 USC Information Sciences Institute 2080 4350 N. Fairfax Drive #620 2081 Arlington, VA 22203 2082 USA 2083 e-mail : csp@isi.edu 2085 Jan van der Meer 2086 Philips Digital Networks 2087 Cederlaan 4 2088 5600 JB Eindhoven 2089 Netherlands 2090 e-mail : jan.vandermeer@philips.com 2092 Gentric et al. Expires July 2001 40