idnits 2.17.1 draft-gentric-avt-mpeg4-multisl-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** Bad filename characters: the document name given in the document, 'draft-gentric-avt-mpeg4-multiSL-01', contains other characters than digits, lowercase letters and dash. == Mismatching filename: the document gives the document name as 'draft-gentric-avt-mpeg4-multiSL-01', but the file name used is 'draft-gentric-avt-mpeg4-multisl-01' ** The document is more than 15 pages and seems to lack a Table of Contents. == There is 1 instance of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 2441 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 186 has weird spacing: '... media unawa...' == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- The document date (July 2001) is 8319 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '7' is defined on line 1954, but no explicit reference was found in the text == Unused Reference: '10' is defined on line 1965, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) ** Obsolete normative reference: RFC 3016 (ref. '7') (Obsoleted by RFC 6416) == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-01 == Outdated reference: A later version (-04) exists of draft-singer-mpeg4-ip-01 -- Possible downref: Normative reference to a draft: ref. '9' ** Obsolete normative reference: RFC 2327 (ref. '10') (Obsoleted by RFC 4566) Summary: 9 errors (**), 0 flaws (~~), 11 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Avaro-France Telecom 3 Internet Draft Basso-AT&T 4 Casner-Packet Design 5 Civanlar-AT&T 6 Gentric-Philips 7 Herpel-Thomson 8 Lifshitz-Optibase 9 Lim-mp4cast 10 Perkins-ISI 11 van der Meer-Philips 12 January 2001 13 Expires July 2001 14 Document: draft-gentric-avt-mpeg4-multiSL-01.txt 16 RTP Payload Format for MPEG-4 Streams 18 Status of this Memo 20 This document is an Internet-Draft and is in full conformance with 21 all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. Internet-Drafts are draft documents valid for a maximum of 27 six months and may be updated, replaced, or obsoleted by other 28 documents at any time. It is inappropriate to use Internet- Drafts 29 as reference material or to cite them other than as "work in 30 progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 Abstract 39 This document describes a payload format for transporting MPEG-4 40 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 41 the coding of natural and synthetic audio-visual data. Several 42 services provided by RTP are beneficial for MPEG-4 encoded data 43 transport over the Internet. Additionally, the use of RTP makes it 44 possible to synchronize MPEG-4 data with other real-time data types. 46 This specification is a product of the Audio/Video Transport working 47 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 48 ad hoc group on MPEG-4 over Internet. Comments are solicited and 50 Gentric et al. Expires July 2001 1 52 RTP Payload Format for MPEG-4 Streams January 2001 54 should be addressed to the working group's mailing list at rem- 55 conf@es.net and/or the authors. 57 1. Introduction 59 MPEG-4 is a recent standard from ISO/IEC for the coding of natural 60 and synthetic audio-visual data in the form of audiovisual objects 61 that are arranged into an audiovisual scene by means of a scene 62 description [1][2][3][4]. This draft specifies an RTP [5] payload 63 format for transporting MPEG-4 encoded data streams. 65 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 66 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 67 this document are to be interpreted as described in RFC 2119 [6]. 69 The benefits of using RTP for MPEG-4 data stream transport include: 71 i. Ability to synchronize MPEG-4 streams with other RTP payloads 73 ii. Monitoring MPEG-4 delivery performance through RTCP 75 iii. Combining MPEG-4 and other real-time data streams received from 76 multiple end-systems into a set of consolidated streams through RTP 77 mixers 79 iv. Converting data types, etc. through the use of RTP translators. 81 1.1 Overview of MPEG-4 End-System Architecture 83 Fig. 1 below shows the general layered architecture of MPEG-4 84 terminals. The Compression Layer processes individual audio-visual 85 media streams. The MPEG-4 compression schemes are defined in the 86 ISO/IEC specifications 14496-2 [2] and 14496-3 [3]. The compression 87 schemes in MPEG-4 achieve efficient encoding over a bandwidth 88 ranging from several kbps to many Mbps. The audio-visual content 89 compressed by this layer is organized into Elementary Streams (ESs). 90 The MPEG-4 standard specifies MPEG-4 compliant streams. Within the 91 constraint of this compliance the compression layer is unaware of a 92 specific delivery technology, but it can be made to react to the 93 characteristics of a particular delivery layer such as the path-MTU 94 or loss characteristics. Also, some compressors can be designed to 95 be delivery specific for implementation efficiency. In such cases 96 the compressor may work in a non-optimal fashion with delivery 97 technologies that are different than the one it is specifically 98 designed to operate with. 100 The hierarchical relations, location and properties of ESs in a 101 presentation are described by a dynamic set of Object Descriptors 102 (ODs). Each OD groups one or more ES Descriptors referring to a 103 single content item (audio-visual object). Hence, multiple 104 alternative or hierarchical representations of each content item are 105 possible. 107 Gentric et al. Expires July 2001 2 109 RTP Payload Format for MPEG-4 Streams January 2001 111 ODs are themselves conveyed through one or more ESs. A complete set 112 of ODs can be seen as an MPEG-4 resource or session description at a 113 stream level. The resource description may itself be hierarchical, 114 i.e. an ES conveying an OD may describe other ESs conveying other 115 ODs. 117 The session description is accompanied by a dynamic scene 118 description, Binary Format for Scene (BIFS), again conveyed through 119 one or more ESs. At this level, content is identified in terms of 120 audio-visual objects. The spatio-temporal location of each object is 121 defined by BIFS. The audio-visual content of those objects that are 122 synthetic and static are described by BIFS also. Natural and 123 animated synthetic objects may refer to an OD that points to one or 124 more ESs that carry the coded representation of the object or its 125 animation data. 127 By conveying the session (or resource) description as well as the 128 scene (or content composition) description through their own ESs, it 129 is made possible to change portions of the content composition and 130 the number and properties of media streams that carry the audio- 131 visual content separately and dynamically at well known instants in 132 time. 134 One or more initial Scene Description streams and the corresponding 135 OD stream has to be pointed to by an initial object descriptor 136 (IOD). The IOD needs to be made available to the receivers through 137 some out-of-band means that are not defined in this document. 139 A homogeneous encapsulation of ESs carrying media or control (ODs, 140 BIFS) data is defined by the Sync Layer (SL) that primarily provides 141 the synchronization between streams. The Compression Layer organizes 142 the ESs in Access Units (AU), the smallest elements that can be 143 attributed individual timestamps. Integer or fractional AUs are then 144 encapsulated in SL packets. All consecutive data from one stream is 145 called an SL-packetized stream at this layer. The interface between 146 the compression layer and the SL is called the Elementary Stream 147 Interface (ESI). The ESI is informative. 149 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 150 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 151 media unaware but delivery technology aware. It provides transparent 152 access to and delivery of content irrespective of the technologies 153 used. The interface between the SL and DMIF is called the DMIF 154 Application Interface (DAI). It offers content location independent 155 procedures for establishing MPEG-4 sessions and access to transport 156 channels. The specification of this payload format is considered as 157 a part of the MPEG-4 Delivery Layer. 159 media aware +-----------------------------------------+ 160 delivery unaware | COMPRESSION LAYER | 161 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 162 14496-3 Audio +-----------------------------------------+ 164 Gentric et al. Expires July 2001 3 166 RTP Payload Format for MPEG-4 Streams January 2001 168 Elementary 169 Stream 170 ===================================================Interface 172 (ESI) 173 +-------------------------------------------+ 174 media and | SYNC LAYER | 175 delivery unaware | manages elementary streams, their synch- | 176 14496-1 Systems | ronization and hierarchical relations | 177 +-------------------------------------------+ 179 DMIF 180 Application 181 ====================================================Interface 183 (DAI) 184 +-------------------------------------------+ 185 delivery aware | DELIVERY LAYER | 186 media unaware |provides transparent access to and delivery| 187 14496-6 DMIF | of content irrespective of delivery | 188 | technology | 189 +-------------------------------------------+ 191 Figure 1: General MPEG-4 terminal architecture 193 1.2 MPEG-4 Elementary Stream Data Packetization 195 The ESs from the encoders are fed into the SL with indications of AU 196 boundaries, random access points, desired composition time and the 197 current time. 199 The Sync Layer fragments the ESs into SL packets, each containing a 200 header that encodes information conveyed through the ESI. If the AU 201 is larger than a SL packet, subsequent packets containing remaining 202 parts of the AU are generated with subset headers until the complete 203 AU is packetized. 205 The syntax of the Sync Layer is configurable and can be adapted to 206 the needs of the stream to be transported. This includes the 207 possibility to select the presence or absence of individual syntax 208 elements as well as configuration of their length in bits. The 209 configuration for each individual stream is conveyed in a 210 SLConfigDescriptor, which is an integral part of the ES Descriptor 211 for this stream. 213 It is assumed that the MPEG-4 SLConfigDescriptor is transported "out 214 of band". This is typically done via an ObjectDescriptorStream using 215 the MPEG-4 Object Description framework. However since some 216 knowledge of the SLConfigDescriptor is required by an RTP receiver 217 in order to parse MPEG-4 System specific elements in the RTP payload 218 defined in this document, the SLConfigDescriptor MAY be transported 220 Gentric et al. Expires July 2001 4 222 RTP Payload Format for MPEG-4 Streams January 2001 224 in the SDP associated with such a stream using the a=fmtp syntax 225 (see section 8). 227 2. Analysis of the carriage of MPEG-4 over IP 229 When transporting MPEG-4 audio and video, applications may or may 230 not require the use of MPEG-4 systems. To achieve the highest level 231 of interoperability between all MPEG-4 applications, it is desirable 232 that (a) in both cases the same MPEG-4 transport format can be used 233 and that (b) receivers that have no MPEG-4 system knowledge can 234 easily skip the MPEG-4 system specific information -if any-. 236 RTP is perfectly suitable to transport MPEG-4 audio and MPEG-4 237 video, but when using MPEG-4 systems a problem arises from the fact 238 that both RTP and MPEG-4 systems contain a synchronization layer. 239 In particular, the RTP header duplicates some of the information 240 provided in SL packet headers such as the composition timestamps 241 (CTSs) and the marker bit that signals the end of access units. 243 To avoid unnecessary overhead and potential interoperability risks 244 when transporting MPEG-4 systems, it is desirable to remove the 245 redundancy between the SL packet header and the RTP packet header. 246 To be independent on the use of MPEG-4 systems, synchronization can 247 rely on the parameters provided in the RTP header. 249 In case SL headers are used, the redundant fields are removed from 250 the SL header, producing "reduced SL headers". 251 The remaining information from the SL header, if any, is contained 252 inside the RTP packet payload, together with the SL packet payload. 253 The combination of RTP packet headers and reduced SL packet headers 254 can be used to logically map the RTP packets to complete SL packets. 256 Some of the information contained in the reduced SL headers is also 257 useful for transport over RTP when MPEG-4 systems is not used. 259 For that reason the information in the "reduced" SL headers is split 260 into "general useful information" and "MPEG-4 systems only 261 information". 263 The "general useful information" hereinafter called Mapped SL Packet 264 Header (MSLH) is carried by a number of fields configurable using 265 SDP parameters; all receivers can parse these fields. 267 The "MPEG-4 systems only information" is contained in a reduced SL 268 header, hereinafter called Remaining SL Packet Header (RSLH), also 269 signaled by SDP parameters and preceded by a length field, so as to 270 enable easy skipping of this information by non-MPEG-4 system 271 devices. 273 This is depicted in figure 2. 275 Gentric et al. Expires July 2001 5 277 RTP Payload Format for MPEG-4 Streams January 2001 279 <----------SL Packet--------> 281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 282 | SL Packet | SL Packet | 283 | Header | Payload | 284 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 285 | | 286 | | 287 ++++++++++++++++++++++++++++++ | 288 | | | | 289 V V V V 290 +-+-+-+-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ 291 |RTP Packet | | Mapped SL | | Remaining SL| | SL Packet | 292 | Header | | Header | | Header | | Payload | 293 +-+-+-+-+-+-+ +-+-+-+-+-+-+ +-+-+-+-+-+-+-+ +-+-+-+-+-+-+ 295 <----RTP Packet Payload-------------------> 297 Figure 2: Mapping of SL Packet into RTP packet 299 This RTP payload format has been designed so that it can be 300 configured (using SDP parameters) to be identical to RFC 3016 for 301 the recommended MPEG-4 video configurations. Hence receivers that 302 comply with this Internet Draft can decode such RTP payload. 304 3. Payload Format 306 The RTP Payload corresponds to an integer number of SL packets. 308 SL packets inside RTP packets MUST be in the SL stream order i.e: 309 i) decodingTimeStamp order, if present 310 ii) packetSequenceNumber order, if present 311 iii) implicit decoding order in all other cases. 313 The SL Packet Headers are transformed into RSLH with some fields 314 extracted to be mapped in the RTP header and others extracted to be 315 mapped in the corresponding MSLH. The SL Packet Payload is 316 unchanged. 318 When generating SL packetized stream specifically for this format 319 all other fields in the SL Packet Headers that the RTP header does 320 not duplicate (including the decodingTimeStamp) is OPTIONAL. 322 This payload format has two modes. The "SingleSL" mode is a mode 323 where a single SL packet is transported per RTP packet. The 324 "MultipleSL" mode is a mode where more than one SL packet are 325 transported per RTP packet. The default mode is the Single-SL mode. 326 The mode can be set to Multiple-SL by adding in SDP a SLPPSize or 327 SLPPSizeLength parameter (see section 8). 329 Gentric et al. Expires July 2001 6 331 RTP Payload Format for MPEG-4 Streams January 2001 333 RTP Packets SHOULD be sent in the decoding (MPEG-4 334 decodingTimeStamp) order. 336 The size (or number) of the SL packet(s) SHOULD be adjusted such 337 that the resulting RTP packet is not larger than the path-MTU. To 338 handle larger packets, this payload format relies on lower layers 339 for fragmentation, which may not be desirable. 341 3.1 RTP Header Fields Usage 343 Payload Type (PT): The assignment of an RTP payload type for this 344 new packet format is outside the scope of this document, and will 345 not be specified here. It is expected that the RTP profile for a 346 particular class of applications will assign a payload type for this 347 encoding, or if that is not done then a payload type in the dynamic 348 range shall be chosen. 350 Marker (M) bit: The M bit is set to 1 when all SL packets in the RTP 351 packet are Access Units ends i.e. the M bit maps to the SL 352 accessUnitEndFlag. 354 M is set to 1 when the RTP packet contains either: 355 . a single SL packet containing a full Access Unit 356 . a single SL packet transporting the last fragment of an Access 357 Unit 358 . multiple SL packets each containing a full Access Unit 359 . multiple SL packets each containing the last fragment of an Access 360 Unit 361 . multiple SL packets each containing either a full Access Unit or 362 the last fragment of an Access Unit 364 The last 2 cases occur when using specific interleaving schemes. In 365 some interleaving schemes it may not be practical to reshuffle the 366 SL packets so as to group Access Unit ends in the same RTP packet. 367 In that case, Access Unit boundaries -if needed- can be transported 368 using one or both of the SL flags accessUnitStartFlag and 369 accessUnitEndFlag. 371 Extension (X) bit: Defined by the RTP profile used. 373 Sequence Number: The RTP sequence number should be generated by the 374 sender with a constant random offset and does not have to be 375 correlated to any (optional) MPEG-4 SL sequence numbers. 377 Timestamp: Set to the value in the compositionTimeStamp field of the 378 first SL packet, if present. If compositionTimeStamp has less than 379 32 bits length, the MSBs of timestamp MUST be set to zero. 381 Although it is available from the SL configuration data, the 382 resolution of the timestamp may need to be conveyed explicitly 383 through some out-of-band means to be used by network elements which 384 are not MPEG-4 aware. 386 Gentric et al. Expires July 2001 7 388 RTP Payload Format for MPEG-4 Streams January 2001 390 If compositionTimeStamp has more than 32 bits length, this payload 391 format cannot be used. 393 In all cases, the sender SHALL always make sure that RTP time stamps 394 are identical only for RTP packets transporting fragments of the 395 same Access Unit. 397 In case compositionTimeStamp is not present in the current SL 398 packet, but has been present in a previous SL packet the reason is 399 that this is the same Access Unit that has been fragmented therefore 400 the same timestamp value MUST be taken as RTP timestamp. 402 If compositionTimeStamp is never present in SL packets for this 403 stream, the RTP packetizer SHOULD convey a reading of a local clock 404 at the time the RTP packet is created. 406 According to RFC1889 [5, Section 5.1] timestamps are recommended to 407 start at a random value for security reasons. However then, a 408 receiver is not in the general case able to reconstruct the original 409 MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for 410 applications where streams from multiple sources are to be 411 synchronized. Therefore the usage of such a random offset SHOULD be 412 avoided. 414 Note that since RTP devices may re-stamp the stream, all time stamps 415 inside of the RTP payload (CTS and DTS in MSLH, OCR in RSLH) MUST be 416 expressed as difference to the RTP time stamp. Since this 417 subtraction may lead to negative values, the offset MUST be encoded 418 as a two's complement signed integer in network byte order. Note 419 these offsets (delta) typically require much fewer bits to be 420 encoded than the original length, which is another justification. 422 SSRC: set as described in RFC1889 [5]. A mapping between the ES 423 identifiers (ESIDs) and SSRCs should be provided through out-of-band 424 means. 426 CC and CSRC fields are used as described in RFC 1889 [5]. 428 RTCP SHOULD be used as defined in RFC 1889 [5]. 430 RTP timestamps in RTCP SR packets: according to the RTP timing 431 model, the RTP timestamp that is carried into an RTCP SR packet is 432 the same as the compositionTimeStamp that would be applied to an RTP 433 packet for data that was sampled at the instant the SR packet is 434 being generated and sent. The RTP timestamp value is calculated from 435 the NTP timestamp for the current time, which also goes in the RTCP 436 SR packet. To perform that calculation, an implementation needs to 437 periodically establish a correspondence between the CTS value of a 438 data packet and the NTP time at which that data was sampled. 440 Gentric et al. Expires July 2001 8 442 RTP Payload Format for MPEG-4 Streams January 2001 444 3.2 RTP payload structure 446 The packet payload structure consists of 3 byte-aligned sections. 448 The first section is the MSLH section and contains Mapped SL Packet 449 Headers (MSLH). The MSLH structure is described in 3.3. In the 450 Single-SL mode this section is empty by default. 452 The second section is the RSLH section and contains Remaining SL 453 Headers (RSLH). The RSLH structure is described in 3.5. By default 454 this section is empty. 456 The last section (SLPP section) contains the SL packet payloads. 457 This section is never empty. 459 The Nth MSLH in the MSLH section, the Nth RSLH in the RSLH section 460 and the Nth SL packet payload in the SLPP section correspond to the 461 Nth SL packet transported by the RTP packet. 463 0 1 2 3 464 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 |V=2|P|X| CC |M| PT | sequence number | 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 | timestamp | 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | synchronization source (SSRC) identifier | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 : contributing source (CSRC) identifiers : 473 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 474 | | 475 | MSLH section (byte aligned) | 476 | | 477 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 478 | | | 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 480 | | 481 | RSLH section (byte aligned) | 482 | | 483 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 484 | | | 485 +-+-+-+-+-+-+-+-+ | 486 | | 487 | SLPP section (byte aligned) | 488 | | 489 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 490 | :...OPTIONAL RTP padding | 491 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 493 Figure 3: An RTP packet for MPEG-4 495 Gentric et al. Expires July 2001 9 497 RTP Payload Format for MPEG-4 Streams January 2001 499 3.3 MSLH section structure 501 If the MSLH section consumes a non-integer number of bytes, up to 7 502 zero padding bits MUST be inserted at the end in order to achieve 503 byte-alignment. 505 In the Single-SL mode this section consists of one MSLH. 507 * byte boundary 508 +-+-+-+-+-+-+-+-+-+ 509 | MSLH (x bits ) | 510 +-+-+-+-+-+-+-+-+-+ 511 | padding bits | 512 +-+-+-+-+-+-+-+-+-+ 513 * byte boundary 515 Figure 4: MSLH section structure in Single-SL mode 517 In the Multiple-SL mode this section consist of a 2 bytes field 518 giving the size in bits (in network byte order) of the following 519 block of bit-wise concatenated MSLHs. 521 This size field is absent in the Single-SL mode not because it is 522 not needed (which would be a minor gain) but for compatibility with 523 RFC 3016. 525 0 1 2 3 526 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 527 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 528 | MSLH section size in bits | MSLH | etc | 529 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 530 | as many bit-wise concatenated MSLHs | 531 | as SL packets in this RTP packet | 532 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 533 | |padding bits | 534 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 536 Figure 5: MSLH section structure in Multiple-SL mode 538 3.4 MSLH structure 540 The mapped SL Packet Header content depends on SDP parameters, by 541 default it is empty for the Single-SL mode and contains only the 542 SLPPSize (SL Packet Payload Size) field in the Multiple-SL mode. 544 When all options are signaled in SDP the MSLH structure is given in 545 figure 6. 547 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 548 | SLPPSize | 550 Gentric et al. Expires July 2001 10 552 RTP Payload Format for MPEG-4 Streams January 2001 554 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 555 | SLPSeqNum/SLPSeqNumDelta | 556 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 557 | CTSFlag | 558 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 559 | CTSDelta | 560 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 561 | DTSFlag | 562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 563 | DTSDelta | 564 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 Figure 6: Mapped SL Packet Header (MSLH) structure 568 In the general case a receiver can only discover the size of a MSLH 569 by parsing it since for example the presence of CTSDelta is signaled 570 by the value of CTSFlag. 572 3.4.1 Fields of MSLH 574 SLPPSize (SL Packet Payload Size): Indicates the size in bytes of 575 the associated SL Packet Payload, which can be found in the SLPP 576 section of the RTP packet. The length in bits of this field is 577 signaled in SDP by the SLPPSizeLength SDP parameter (see section 8). 579 SLPSeqNum/SLPSeqNumDelta: Encodes the packetSequenceNumber (serial 580 number) of the SL Packet. 582 SLPSeqNum is found only for the first SL packet. SLPSeqNumDelta is 583 optional and -if present- appears for subsequent (non-first) SL 584 packets. 586 The length in bits of the SLPSeqNum field is defined by the 587 SLPSeqNumLength SDP parameter (see section 8). 589 The length in bits of the SLPSeqNumDelta field is defined by the 590 SLPSeqNumDeltaLength SDP parameter (see section 8). 592 If the parameter SLPSeqNumDeltaLength is defined in SDP, non-first 593 SL packets have their packetSeqenceNumber expressed as a difference 594 named SLPSeqNumDelta. This difference is relative to the previous SL 595 packet in the RTP packet according to (with i>=0): 596 packetSequenceNumber(0) = SLPSeqNum(0) 597 packetSequenceNumber(i+1) = packetSequenceNumber(i) + 598 SLPSeqNumDelta(i+1) + 1 600 If the parameter SLPSeqNumDeltaLength is not defined in SDP the 601 default value is zero i.e. this field is not present for non-first 602 SL packets. Furthermore receivers SHALL then apply the above formula 603 with SLPSeqNumDelta equal to zero i.e. by default 604 packetSequenceNumber is incremented by 1 for each SL packet in one 605 RTP packet. This means that for streams that use 607 Gentric et al. Expires July 2001 11 609 RTP Payload Format for MPEG-4 Streams January 2001 611 packetSequenceNumber and are not interleaved the transport of 612 packetSequenceNumber in the Multiple-SL mode is "almost free". 614 CTSFlag (1 bit): Indicates whether the CTSDelta field is present. A 615 value of 1 indicates that the field is present, a value of 0 that it 616 is not present. 618 This field -if present- appears for all SL packets since the 619 receiver needs it to reconstruct the compositionTimeStampFlag of SL 620 Headers. 622 CTSDelta: Specifies the value of the CTS as a 2-complement offset 623 (delta) from the timestamp in the RTP header of this RTP packet. 624 The length in bits of each CTSDelta field is specified in SDP by the 625 CTSDeltaLength parameter (see section 8). 627 This field -if present- appears only for non-first SL packets since 628 the composition time stamp of the first SL packet is mapped to the 629 RTP time stamp, regardless of whether CTSFlag is 1. The sender MUST 630 remove the compositionTimeStamp from the corresponding RSLH. 632 DTSFlag (1 bit): Indicates whether the DTSDelta field is present. A 633 value of 1 indicates that DTSDelta is present, a value of 0 that it 634 is not present. 636 This field -if present- appears for all SL packets since it is 637 needed by the receiver to reconstruct the decodingTimeStampFlag. 639 DTSDelta: Specifies the value of the decodingTimeStamp as a 2 640 complement offset (delta) from the timestamp in the RTP header of 641 this packet. The length in bits of each DTSDelta field is specified 642 in SDP by the DTSDeltaLength parameter (see section 8). 644 This field appears when DTSFlag is 1. Then the sender MUST remove 645 the decodingTimeStamp from the corresponding RSLH. 647 3.4.2 Relationship between sizes of MSLH fields and SDP parameters 649 The relationship between a Mapped SL Packet Header and the related 650 SDP parameters is as follows: 652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 653 | Fields of MSLPH | Number of bits (in SDP) | 654 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 655 | SLPPSize | SLPPSizeLength | 656 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 657 | SLPSeqNum | SLPSeqNumLength | 658 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 659 | SLPSeqNumDelta | SLPSeqNumDeltaLength | 660 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 662 Gentric et al. Expires July 2001 12 664 RTP Payload Format for MPEG-4 Streams January 2001 666 | CTSFlag | 1 If ( CTSDeltaLength > 0 ) | 667 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 668 | CTSDelta | CTSDeltaLength If(CTSFlag==1) | 669 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 670 | DTSFlag | 1 If ( DTSDeltaLength > 0 ) | 671 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 672 | DTSDelta | DTSDeltaLength If(DTSFlag==1) | 673 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 675 Table 1: Relationship between MSLH fields size and SDP parameters 677 3.5 RSLH section structure 679 This section consists of a field (RSLHSize) giving the size in bits 680 of the following block of bit-wise concatenated RSLHs. 682 If the section consumes a non-integer number of bytes, up to 7 zero 683 padding bits MUST be inserted at the end in order to achieve byte- 684 alignment. 686 * byte boundary 687 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 688 | RSLHSize (RSLHSizeLength bits) | 689 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 690 | RSLH (variable number of bits) | 691 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 692 | etc | 693 | as many bit-wise concatenated RSLHs | 694 | as SL Packets in this RTP packet | 695 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 696 | RSLH (variable number of bits) | 697 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 698 | padding bits | 699 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 700 * byte boundary 702 Figure 7: RSLH section structure 704 The length in bits of the RSLHSize field is RSLHSizeLenght and is 705 specified in SDP with a default value of zero indicating that the 706 whole RSLH section is absent. 708 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 709 | Fields of RSLH | Number of bits | 710 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 711 | RSLHSize | RSLHSizeLength | 712 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 713 | bit-wise concatenated RSLHs | RSLHSize | 714 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 716 Table 2: Sizes in bits inside RSLH section, SDP parameters 718 Gentric et al. Expires July 2001 13 720 RTP Payload Format for MPEG-4 Streams January 2001 722 Parsing of the bit-wise concatenated RSLHs requires MPEG-4 system 723 awareness, specifically it requires to understand the MPEG-4 724 Synchronization Layer (SL) syntax and the modifications to this 725 syntax described in the next section (3.6). 727 However thanks to the RSLHSize field non-MPEG-4-system receivers MAY 728 skip this part by rounding up RSLPHSize/8 to the next integer number 729 of bytes. 731 3.6 RSLH structure 733 A Remaining SL Packet Header (RSLH) is what remains of an SL header 734 after modifications for mapping into this payload format. 736 The following modifications of the SL packet header MUST be applied. 737 The other fields of the SL packet header MUST remain unchanged but 738 are bit-shifted to fill in the gaps left by the operations specified 739 below. 741 3.6.1 Removal of fields 743 The following SL Packet Header fields -if present- are removed since 744 they are mapped either in the RTP header or in the corresponding 745 MSLH: 746 . compositionTimeStampFlag 747 . compositionTimeStamp 748 . decodingTimeStampFlag 749 . decodingTimeStamp 750 . packetSequenceNumber 752 3.6.2 Mapping of OCR 754 Furthermore if the SL Packet header contains an OCR, then this field 755 is encoded in the RSLH as a 2-complement difference (delta) exactly 756 like a compositionTimeStamp or a decodingTimeStamp in the MSLH. The 757 length in bit of this difference is indicated by the OCRDeltaLength 758 parameter in SDP (see section 8). 760 With this payload format OCRs MUST have the same clock resolution as 761 Time Stamps. 763 If compositionTimeStamp is not present for a SL packet that has OCR 764 then the OCR SHALL be encoded as a difference to the RTP time stamp. 766 3.6.3 Degradation Priority 768 For streams that use the optional degradationPriority field in the 769 SL Packet Headers, only SL packets with the same degradation 770 priority SHALL be transported by one RTP packet so that components 771 may dispatch the RTP packets according to appropriate QOS or 773 Gentric et al. Expires July 2001 14 775 RTP Payload Format for MPEG-4 Streams January 2001 777 protection schemes. Furthermore only the first RSLH of one RTP 778 packet SHALL contain the degradationPriority field since it would be 779 otherwise redundant. 781 3.7 SLPP section structure 783 The SLPP (SL Packet Payload) section contains the concatenated SL 784 Packet Payloads. By definition SL Packet Payloads are byte aligned. 786 For efficiency SL packets do not carry their own payload size. This 787 is not an issue for RTP packets that contain a single SL Packet. 789 However in the Multiple-SL mode the size of each SL packet payload 790 MUST be available to the receiver. 792 If the SL packet payload size is constant for a stream, the size 793 information SHOULD NOT be transported in the RTP packet. However in 794 that case it MUST be signaled in SDP using a (a=fmtp: 795 SLPPSize=) syntax (see section 8). 797 If the SL packet payload size is variable then the size of each SL 798 packet payload MUST be indicated in the corresponding MSLH. In order 799 to do so the MSLH MUST contain a SLPPSize field. The number of bits 800 on which this SLPPSize is encoded MUST be indicated in the 801 corresponding SDP using a (a=fmtp: SLPPSizeLength=) 802 syntax (see section 8). 804 The absence of either SLPPSize or SLPPSizeLength in SDP indicates 805 the Single-SL mode i.e. that a single SL packet is transported in 806 each RTP packet for that stream. 808 * byte boundary 809 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 810 | SLPP (variable number of bytes) | 811 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 812 | etc | 813 | as many byte-wise concatenated SLPPs | 814 | as SL Packets in this RTP packet | 815 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 816 | SLPP (variable number of bytes) | 817 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 818 * byte boundary 820 Figure 8: SLPP section structure 822 3.8 Interleaving 824 SL Packets MAY be interleaved. Senders MAY perform interleaving. 825 Receivers MUST support interleaving. 827 Gentric et al. Expires July 2001 15 829 RTP Payload Format for MPEG-4 Streams January 2001 831 When interleaving of SL packets is used it SHALL be implemented 832 using the SLPSeqNum field of MSLH. 834 The AUSequenceNumber field of the SL header MUST NOT be used for 835 interleaving since firstly it may collide with BIFS Carousel usage 836 and secondly it is not visible to non-MPEG-4 system receivers. 838 The conjunction of RTP sequence number and SLPSeqNum can produce a 839 quasi-unique identifier for each SL packet so that a receiver can 840 unambiguously reconstruct the original order even in case of out-of- 841 order packets, packet loss or duplication. 843 4. SL packetized stream reconstruction 845 The MPEG-4 over IP framework [9] requires that the way a receiver 846 can reconstruct a valid SL packetized stream shall be documented, 847 this is the purpose of this section. 849 Since this format directly transports SL packets this reconstruction 850 is trivial with the following rules: 852 - SLPacketHeader.packetSequenceNumber is restored from 853 MSLH.SLPSeqNum for the first SL packet in the RTP packet (i= 0): 854 SLPacketHeader.packetSequenceNumber(0) = MSLH.SLPSeqNum(0) 855 and for subsequent packets using (for i>=0) : 856 SLPacketHeader.packetSequenceNumber(i+1) = 857 SLPacketHeader.packetSequenceNumber(i) + MSLH.SLPSeqNumDelta(i+1) +1 859 - All time stamps (CTS, DTS, OCR), if present, are restored from the 860 delta values. 861 - Time stamps flags (CTSFlag, DTSFlag) in MSLH are used to 862 reconstruct respectively the compositionTimeStampFlag and 863 decodingTimeStampFlag of SLPacketHeader. 865 Specifically the reconstruction depends on the SDP parameters as 866 follows: 868 If SDP.CTSDeltaLength is absent or equals 0: 869 The SL stream reconstruction rules are: 870 . for the first (or only) SL packet: 871 . if SLConfig.useTimeStamps == true, then: 872 . SLPacketHeader.compositionTimeStampFlag = true 873 . SLPacketHeader.compositionTimeStamp = RTP TimeStamp 874 . if SLConfig.useTimeStamps == false, then: 875 . SLPacketHeader.compositionTimeStampFlag is not defined 876 . for the following SL packets: 877 . SLPacketHeader.compositionTimeStampFlag = false 879 If SDP.CTSDeltaLength is not zero: 880 . SLPacketHeader.compositionTimeStampFlag = MSLH.CTSFlag 881 . SLPacketHeader.compositionTimeStamp = RTP TimeStamp + 882 MSLH.CTSDelta 884 Gentric et al. Expires July 2001 16 886 RTP Payload Format for MPEG-4 Streams January 2001 888 - The other SL packet header fields SHALL remain as found in RSLH. 890 It is obvious that in the general case the reconstruction of the 891 original SL packetized stream requires SL-awareness. However this 892 payload format allows in all cases a receiver that does not know 893 about the SL syntax to reconstruct the semantic of SL for the 894 following very useful features: 895 - Packet order (decoding order) 896 - Access Unit boundaries (using the M bit) 897 - Access Unit fragments (i.e. SL packet boundaries using 898 MSLH.SLPPSize) 899 - Composition Time Stamps (using the RTP Time Stamp and 900 MSLH.CTSDelta) 901 - Decoding Time Stamps (using the RTP Time Stamp and MSLH.DTSDelta) 902 - Packet sequence number (using the RTP Time Sequence number and 903 MSLH.SLPSeqNum) 905 5. Multiplexing 907 Since a typical MPEG-4 session may involve a large number of 908 objects, that may be as many as a few hundred, transporting each ES 909 as an individual RTP session may not always be practical. Allocating 910 and controlling hundreds of destination addresses for each MPEG-4 911 session may pose insurmountable session administration problems. 912 The input/output processing overhead at the end-points will be 913 extremely high also. Additionally, low delay transmission of low 914 bitrate data streams, e.g. facial animation parameters, results in 915 extremely high header overheads. 917 To solve these problems, MPEG-4 data transport requires a 918 multiplexing scheme that allows selective bundling of several ESs. 919 This is beyond the scope of the payload format defined here. MPEG- 920 4's Flexmux multiplexing scheme may be used for this purpose by 921 defining an additional RTP payload format for "multiplexed MPEG-4 922 streams." Another approach may be to develop a generic RTP 923 multiplexing scheme usable for MPEG-4 data. The multiplexing scheme 924 reported in [8] may be a candidate for this approach. 926 For MPEG-4 applications, the multiplexing technique needs to address 927 the following requirements: 929 i. The ESs multiplexed in one stream can change frequently during a 930 session. Consequently, the coding type, individual packet size and 931 temporal relationships between the multiplexed data units must be 932 handled dynamically. 934 ii. The multiplexing scheme should have a mechanism to determine the 935 ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is 936 not a part of the SL header. 938 Gentric et al. Expires July 2001 17 940 RTP Payload Format for MPEG-4 Streams January 2001 942 iii. In general, an SL packet does not contain information about its 943 size. The multiplexing scheme should be able to delineate the 944 multiplexed packets whose lengths may vary from a few bytes to close 945 to the path-MTU. 947 6. Security Considerations 949 RTP packets using the payload format defined in this specification 950 are subject to the security considerations discussed in the RTP 951 specification [5]. This implies that confidentiality of the media 952 streams is achieved by encryption. Because the data compression used 953 with this payload format is applied end-to-end, encryption may be 954 performed on the compressed data so there is no conflict between the 955 two operations. The packet processing complexity of this payload 956 type does not exhibit any significant non-uniformity in the receiver 957 side to cause a denial-of-service threat. 959 However, it is possible to inject non-compliant MPEG streams (Audio, 960 Video, and Systems) to overload the receiver/decoder's buffers which 961 might compromise the functionality of the receiver or even crash it. 962 This is especially true for end-to-end systems like MPEG where the 963 buffer models are precisely defined. 965 MPEG-4 Systems supports stream types including commands that are 966 executed on the terminal like OD commands, BIFS commands, etc. and 967 programmatic content like MPEG-J (Java(TM) Byte Code) and 968 ECMASCRIPT. It is possible to use one or more of the above in a 969 manner non-compliant to MPEG to crash or temporarily make the 970 receiver unavailable. 972 Authentication mechanisms can be used to validate of the sender and 973 the data to prevent security problems due to non-compliant malignant 974 MPEG-4 streams. 976 A security model is defined in MPEG-4 Systems streams carrying MPEG- 977 J access units which comprises Java(TM) classes and objects. MPEG-J 978 defines a set of Java APIs and a secure execution model. MPEG-J 979 content can call this set of APIs and Java(TM) methods from a set of 980 Java packages supported in the receiver within the defined security 981 model. According to this security model, downloaded byte code is 982 forbidden to load libraries, define native methods, start programs, 983 read or write files, or read system properties. 985 Receivers can implement intelligent filters to validate the buffer 986 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 987 ECMAScript) commands in the streams. However, this can increase the 988 complexity significantly. 990 7. Types and names 992 The encoding name associated to this RTP payload format is: 993 - "mpeg4-sl". 995 Gentric et al. Expires July 2001 18 997 RTP Payload Format for MPEG-4 Streams January 2001 999 The media type may be any of: 1000 - "video" 1001 - "audio" 1002 - "application" 1004 "video" SHOULD be used for MPEG-4 Video streams (ISO/IEC 14496-2) or 1005 MPEG-4 Systems streams that convey information needed for an 1006 audio/visual presentation. 1008 "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or 1009 MPEG-4 Systems streams that convey information needed for an audio- 1010 only presentation. 1012 "application" SHOULD be used for MPEG-4 Systems streams (ISO/IEC 1013 14496-1) that serve other purposes than audio/visual presentation, 1014 e.g. in some cases when MPEG-J streams are transmitted. 1016 8. Additional SDP syntax 1018 8.1 Mapping information 1020 This format may require additional information about the mapping to 1021 be made available to the receiver. This is signaled to the receiver 1022 using SDP (a=fmtp) parameters as in RFC 2327 [10, section 6]. 1024 The absence of any of these fields in SDP is similar to a field set 1025 to the default value, which is always zero. 1027 The absence of any such parameters resolves into a default "basic" 1028 configuration. 1030 8.1.1 Indication of DTSDelta bit length 1032 The following syntax should be used: 1034 a=fmtp: DTSDeltaLength= 1036 being the number of bits on which the DTSDelta field is 1037 encoded in MSLH. The default value is zero and indicates the absence 1038 of DTSFlag and DTSDelta in MSLH (the stream does not transport 1039 decodingTimeStamps). A value larger than zero indicates that there 1040 is a DTSFlag in each MSLH. 1042 Since decodingTimeStamp -if present- must be encoded as a difference 1043 to the RTP time stamp, the DTSDeltaLength parameter MUST be present 1044 in SDP in order to transport decodingTimeStamps with this payload 1045 format. 1047 8.1.2 Indication of CTSDelta bit length 1049 The following syntax should be used: 1051 Gentric et al. Expires July 2001 19 1053 RTP Payload Format for MPEG-4 Streams January 2001 1055 a=fmtp: CTSDeltaLength= 1057 being the number of bits on which the CTSDelta field is 1058 encode in (non-first) MSLH. The default value is zero and indicates 1059 the absence of the CTSFlag and CTSDelta fields in MSLH (the mode is 1060 Single-SL or the stream does not transport compositionTimeStamps). 1062 Since compositionTimeStamps -if present- must be encoded as a 1063 difference to the RTP time stamp, the CTSDeltaLength parameter MUST 1064 be present in SDP in order to transport compositionTimeStamps using 1065 this payload format (in the Multiple-SL mode). 1067 8.1.3 Indication of OCRDelta bit length 1069 The following syntax should be used: 1071 a=fmtp: OCRDeltaLength= 1073 being the number of bits on which the OCRDelta field is 1074 encoded in RSLH. The default value is zero and indicates the absence 1075 of OCR for this stream. 1077 Since objectClockReference -if present- must be encoded as a 1078 difference to the RTP time stamp, the OCRDeltaLength parameter MUST 1079 be present in SDP in order to transport objectClockReferences with 1080 this payload format. 1082 8.1.4 Indication of SLPPSize bit length 1084 The following syntax should be used: 1086 a=fmtp: SLPPSizeLength= 1088 being the number of bits on which the SLPPSize field of MSLH 1089 is encoded. The default value is zero and indicates the Single-SL 1090 mode (unless SLPPSize is present). 1092 Simultaneous presence in SDP of this parameter and SLPPSize is 1093 illegal. 1095 Either the SLPPSizeLength or SLPPSize parameter MUST be present in 1096 SDP in order to signal the Multiple-SL mode of this payload format. 1098 8.1.5 Indication of constant SL packet size 1100 The following syntax should be used: 1102 a=fmtp: SLPPSize= 1104 being the constant size in bytes of each SL Packet Payload 1105 for this stream. The default value is zero and indicates variable SL 1107 Gentric et al. Expires July 2001 20 1109 RTP Payload Format for MPEG-4 Streams January 2001 1111 Packet Payload size (or the Single-SL mode if SLPPSizeLength is 1112 absent). 1114 Simultaneous presence in SDP of this parameter and SLPPSizeLength is 1115 illegal. 1117 Either the SLPPSizeLength or SLPPSize parameter MUST be present in 1118 SDP in order to signal the Multiple-SL mode of this payload format. 1120 8.1.6 Indication of SLPSeqNum bit length 1122 The following syntax should be used: 1124 a=fmtp: SLPSeqNumLength= 1126 being the number of bits on which the SLPSeqNum is encoded 1127 in the first MSLH. The default value is zero and indicates the 1128 absence of SLPSeqNum and SLPSeqNumDelta for all MSLHs. 1130 Since packetSequenceNumber -if present- must be mapped in MSLH, the 1131 SLPSeqNumLength parameter MUST be present in SDP in order to 1132 transport packetSequenceNumber with this payload format. 1134 8.1.7 Indication of SLPSeqNumDelta bit length 1136 The following syntax should be used: 1138 a=fmtp: SLPSeqNumDeltaLength= 1140 being the number of bits on which the SLPSeqNumDelta are 1141 encoded in any non-first MSLH. The default value is zero and 1142 indicates that packetSequenceNumber MUST be incremented by one for 1143 each SL packet in the RTP packet (see section 3.5). 1145 Since when interleaving packetSequenceNumber does not increment by 1 1146 inside a RTP packet, the SLPSeqNumDeltaLength parameter MUST be 1147 present in SDP when using interleaving with this payload format. 1149 8.1.8 Indication of RSLHSize bit length 1151 The following syntax should be used: 1153 a=fmtp: RSLHSizeLength= 1155 being the number of bits that is used to encode the RSLHSize 1156 field. The default value is zero and indicates the absence of the 1157 whole RSLH section for all RTP packets of this stream. 1159 Compatibility with RFC 3016 requires that the RSLH section is empty, 1160 including the RSLHSize field. This is the reason why there is such a 1161 variable length with a default value indicating absence of the 1162 RSLHSize field. 1164 Gentric et al. Expires July 2001 21 1166 RTP Payload Format for MPEG-4 Streams January 2001 1168 8.2 Optional configuration information 1170 In the MPEG-4 framework the following information is carried using 1171 the Object Descriptor. For compatibility with receivers that do not 1172 implement the full MPEG-4 system specification this information MAY 1173 also be indicated in SDP. 1175 For transport of MPEG-4 audio and video without the use of MPEG-4 1176 systems, as well as to support non-MPEG-4 system receivers, it is 1177 possible to transport information on the profile and level of the 1178 stream and on the decoder configuration. 1180 8.2.1 Indication of SLConfigDescriptor 1182 Senders MAY transmit the SLConfigDescriptor in SDP. 1184 The following syntax should be used: 1186 a=fmtp: SLConfigDescriptor= 1188 being a base-64 encoding of the SLConfigDescriptor. This 1189 SHALL be the original SLConfigDescriptor and it SHALL be the same as 1190 the one transported by the OD framework. 1192 8.2.2 Indications for MPEG-4 audio streams 1194 8.2.2.1 Indication of profile level 1196 Senders MAY transmit the profile and level indication in SDP. 1198 The following syntax should be used: 1200 a=fmtp: profile-level-id= 1202 being a decimal representation of the MPEG-4 Audio Profile 1203 Level indication value defined in ISO/IEC 14496-1. This parameter 1204 indicates which MPEG-4 Audio tool subsets are applied to encode the 1205 audio stream. 1207 8.2.2.2 Indication of audio object type 1209 Senders MAY transmit the audio object type indication in SDP. 1211 The following syntax should be used: 1213 a=fmtp: object-type= 1215 being a decimal representation of the MPEG-4 Audio Object 1216 Type value defined in ISO/IEC 14496-3. This parameter specifies the 1217 tool used by the encoder. It CAN be used to limit the capability 1218 within the specified "profile-level-id". 1220 Gentric et al. Expires July 2001 22 1222 RTP Payload Format for MPEG-4 Streams January 2001 1224 8.2.2.3 Indication of audio bitrate 1226 Senders MAY transmit the audio bitrate in SDP. 1228 The following syntax should be used: 1230 a=fmtp: bitrate= 1232 being a decimal representation of the audio bitrate in bits 1233 per second for the audio bit stream. 1235 8.2.2.4 Indication of audio decoder configuration 1237 Senders MAY transmit the audio decoder configuration in SDP. 1239 The following syntax should be used: 1241 a=fmtp: config= 1243 being a hexadecimal representation of an octet string that 1244 expresses the audio payload configuration data "StreamMuxConfig", as 1245 defined in ISO/IEC 14496-3. Configuration data is mapped onto the 1246 octet string in an MSB-first basis. The first bit of the 1247 configuration data SHALL be located at the MSB of the first octet. 1249 In the last octet, zero-padding bits, if necessary, shall follow the 1250 configuration data. 1252 8.2.3 Indications for MPEG-4 video streams 1254 8.2.3.1 Indication of profile and level 1256 Senders MAY transmit the video profile and level indication in SDP. 1258 The following syntax should be used: 1260 a=fmtp: profile-level-id= 1262 being a decimal representation of MPEG-4 Visual Profile 1263 Level indication value (profile_and_level_indication) defined in 1264 Table G-1 of ISO/IEC 14496-2. This parameter MAY be used in the 1265 capability exchange or session setup procedure to indicate MPEG-4 1266 Visual Profile and Level combination of which the MPEG-4 Visual 1267 codec is capable. If this parameter is not specified by the 1268 procedure, its default value of 1 (Simple Profile/Level 1) is used. 1270 8.2.3.2 Indication of video decoder configuration 1272 Senders MAY transmit the video decoder configuration in SDP. This 1273 parameter indicates the configuration of the corresponding MPEG-4 1274 visual bitstream. It SHALL NOT be used to indicate the codec 1275 capability in the capability exchange procedure. 1277 Gentric et al. Expires July 2001 23 1279 RTP Payload Format for MPEG-4 Streams January 2001 1281 The following syntax should be used: 1283 a=fmtp: config= 1285 being a hexadecimal representation of an octet string that 1286 expresses the MPEG-4 Visual configuration information, as defined in 1287 subclause 6.2.1 Start codes of ISO/IEC14496-2[2][4][9]. The 1288 configuration information is mapped onto the octet string in an MSB- 1289 first basis. The first bit of the configuration information SHALL be 1290 located at the MSB of the first octet. The configuration information 1291 indicated by this parameter SHALL be the same as the configuration 1292 information in the corresponding MPEG-4 Visual stream, except for 1293 first_half_vbv_occupancy and latter_half_vbv_occupancy, if it 1294 exists, which may vary in the repeated configuration information 1295 inside an MPEG-4 Visual stream (See 6.2.1 Start codes of 1296 ISO/IEC14496-2). 1298 8.3 Concatenation of fmtp parameters 1300 Multiple fmtp parameters SHOULD be expressed as a MIME media type 1301 string, in the form of a semicolon-separated list of parameter=value 1302 pairs. 1304 8.4 SDP file example 1306 In the following is an example of SDP syntax for the description of 1307 a session containing one MPEG-4 audio stream, one MPEG-4 video and 1308 one MPEG-4 system stream, transported using this format. Note that 1309 the video stream DTSDelta are encoded on 4 bits in this example. 1311 o= .... 1312 I= .... 1313 c=IN IP4 123.234.71.112 1314 m=video 1034 RTP/AVT 97 1315 a=fmtp:DTSDeltaLength 4 1316 a=rtpmap:97 mpeg4-sl 1317 m=audio 810 RTP/AVT 98 1318 a=rtpmpa:98 mpeg4-sl 1319 m=application 1234 RTP/AVT 99 1320 a=rtpmap:99 mpeg4-sl 1322 9. Examples of usage of this payload format 1324 This payload format has been designed to transport with flexibility 1325 a very versatile packetization scheme (the MPEG-4 Synchronization 1326 Layer); its complexity is therefore larger tahn the average for RTP 1327 payload formats. For this reason this section describes a number of 1328 key examples of how this payload format can be used. 1330 9.1 MPEG-4 Video 1332 Gentric et al. Expires July 2001 24 1334 RTP Payload Format for MPEG-4 Streams January 2001 1336 Let us consider the case of a 30 frames per second MPEG-4 video 1337 stream which bit rate is high enough that Access Units have to be 1338 split in several SL packets (typically above 300 kb/s). 1340 Let us assume also that the video codec generates in that case Video 1341 Packets suitable to fit in one SL packet i.e that the video codec is 1342 MTU aware and the MTU is 1500 bytes. We assume furthermore that this 1343 stream contains B frames and that decodingTimeStamps are present. 1345 9.1.1 SLConfigDescriptor 1347 In this example the SLConfigDescriptor is: 1349 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1350 tag=SLConfigDescrTag { 1351 bit(8) predefined; 1352 if (predefined==0) { 1353 bit(1) useAccessUnitStartFlag; = 1 1354 bit(1) useAccessUnitEndFlag; = 0 1355 bit(1) useRandomAccessPointFlag; = 1 1356 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1357 bit(1) usePaddingFlag; = 0 1358 bit(1) useTimeStampsFlag; = 1 1359 bit(1) useIdleFlag; = 0 1360 bit(1) durationFlag; = 0 1361 bit(32) timeStampResolution; = 30 1362 bit(32) OCRResolution; = 0 1363 bit(8) timeStampLength; = 32 1364 bit(8) OCRLength; = 0 1365 bit(8) AU_Length; = 0 1366 bit(8) instantBitrateLength; = 0 1367 bit(4) degradationPriorityLength; = 0 1368 bit(5) AU_seqNumLength; = 0 1369 bit(5) packetSeqNumLength; = 0 1370 bit(2) reserved=0b11; 1371 } 1372 if (durationFlag) { 1373 bit(32) timeScale; // NOT USED 1374 bit(16) accessUnitDuration; // NOT USED 1375 bit(16) compositionUnitDuration; // NOT USED 1376 } 1377 if (!useTimeStampsFlag) { 1378 bit(timeStampLength) startDecodingTimeStamp; = 0 1379 bit(timeStampLength) startCompositionTimeStamp; = 0 1380 } 1381 } 1383 The useRandomAccessPointFlag is set so that the 1384 randomAccessPointFlag can indicate that the corresponding SL packet 1385 contains a GOV and the first Video Packet of an Intra coded frame. 1387 9.1.2 SL Packet Header structure 1389 Gentric et al. Expires July 2001 25 1391 RTP Payload Format for MPEG-4 Streams January 2001 1393 With this configuration we can extrapolate the following SL packet 1394 header structure: 1396 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1397 if (SL.useAccessUnitStartFlag) { 1398 bit(1) accessUnitStartFlag; // 1 bit 1399 } 1400 if (accessUnitStartFlag) { 1401 if (SL.useRandomAccessPointFlag) { 1402 bit(1) randomAccessPointFlag; // 1 bit 1403 } 1404 if (SL.useTimeStampsFlag) { 1405 bit(1) decodingTimeStampFlag; // 1 bit 1406 bit(1) compositionTimeStampFlag; // 1 bit 1407 } 1408 if (decodingTimeStampFlag) { 1409 bit(SL.timeStampLength) decodingTimeStamp; 1410 } 1411 if (compositionTimeStampFlag) { 1412 bit(SL.timeStampLength) compositionTimeStamp; 1413 } 1414 } 1415 } 1417 9.1.3 SDP mapping information 1419 decodingTimeStamps are encoded on 32 bits, which is much more than 1420 needed for delta. Therefore the sender will use DTSDeltaLength in 1421 the corresponding SDP to signal that only 6 bits are used for the 1422 coding of relative DTS in the RTP packet. 1424 The RSLHSize cannot exceed 2 bits, which is encoded on 2 bits and 1425 signaled by RSLHSizeLength. The resulting concatenated fmtp line is: 1427 a=fmtp: DTSDeltaLength=6;RSLHSizeLength=2 1429 9.1.4 RTP packet structure 1431 Two cases can occur; for packets that transport first fragments of 1432 Access Units we have: 1434 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1435 | Field | size | 1436 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1437 | RTP header | - | 1438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1439 | CTSFlag = 1 | 1 bit | 1440 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1441 | DTSFlag = 1 | 1 bit | 1442 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1443 | DTSDelta | 6 bits | 1444 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1446 Gentric et al. Expires July 2001 26 1448 RTP Payload Format for MPEG-4 Streams January 2001 1450 | bits to byte alignment | 0 bits | 1451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1452 | RSLHSize = 2 | 2 bits | 1453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1454 | accessUnitStartFlag = 1 | 1 bit | 1455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1456 | randomAccessPointFlag | 1 bit | 1457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1458 | bits to byte alignment | 4 bits | 1459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1460 | SL packet payload | N bytes | 1461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1463 For packets that transport non-first fragments of Access Units we 1464 have: 1466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1467 | Field | size | 1468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1469 | RTP header | - | 1470 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1471 | CTSFlag = 0 | 1 bit | 1472 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1473 | DTSFlag = 0 | 1 bit | 1474 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1475 | bits to byte alignment | 6 bits | 1476 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1477 | RSLHSize = 2 | 2 bits | 1478 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1479 | accessUnitStartFlag = 0 | 1 bit | 1480 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1481 | randomAccessPointFlag | 1 bit | 1482 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1483 | zero bits to byte alignment | 4 bits | 1484 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1485 | SL packet payload | N bytes | 1486 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1488 Note the compositionTimeStamp is never present since it would be 1489 redundant with the RTP time stamp. However the value of CTSFlag is 1 1490 to indicate to the receiver that the value of 1491 compositionTimeStampFlag for the corresponding reconstructed SL 1492 packed. 1494 9.1.5 Overhead estimation 1496 In this example we have a RTP overhead of 40 + 2 bytes for 1400 1497 bytes of payload i.e. 3 % overhead. 1499 Gentric et al. Expires July 2001 27 1501 RTP Payload Format for MPEG-4 Streams January 2001 1503 9.2 RFC 3016 compatible MPEG-4 Video 1505 We assume exactly the same conditions as before except that the SL 1506 is configured to produce RTP packets compatible with RFC 3016. 1508 9.2.1 SLConfigDescriptor 1510 In this example the SLConfigDescriptor is: 1512 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1513 tag=SLConfigDescrTag { 1514 bit(8) predefined; 1515 if (predefined==0) { 1516 bit(1) useAccessUnitStartFlag; = 0 1517 bit(1) useAccessUnitEndFlag; = 1 1518 bit(1) useRandomAccessPointFlag; = 0 1519 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1520 bit(1) usePaddingFlag; = 0 1521 bit(1) useTimeStampsFlag; = 0 1522 bit(1) useIdleFlag; = 0 1523 bit(1) durationFlag; = 0 1524 bit(32) timeStampResolution; = 0 1525 bit(32) OCRResolution; = 0 1526 bit(8) timeStampLength; = 0 1527 bit(8) OCRLength; = 0 1528 bit(8) AU_Length; = 0 1529 bit(8) instantBitrateLength; = 0 1530 bit(4) degradationPriorityLength; = 0 1531 bit(5) AU_seqNumLength; = 0 1532 bit(5) packetSeqNumLength; = 0 1533 bit(2) reserved=0b11; 1534 } 1535 if (durationFlag) { 1536 bit(32) timeScale; // NOT USED 1537 bit(16) accessUnitDuration; // NOT USED 1538 bit(16) compositionUnitDuration; // NOT USED 1539 } 1540 if (!useTimeStampsFlag) { 1541 bit(timeStampLength) startDecodingTimeStamp; = 0 1542 bit(timeStampLength) startCompositionTimeStamp; = 0 1543 } 1544 } 1546 9.2.2 SL Packet Header structure 1548 With this configuration we can extrapolate the following SL packet 1549 header structure: 1551 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1552 if (SL.useAccessUnitEndFlag) { 1553 bit(1) accessUnitEndFlag; // 1 bit 1555 Gentric et al. Expires July 2001 28 1557 RTP Payload Format for MPEG-4 Streams January 2001 1559 } 1560 } 1562 9.2.3 SDP mapping information 1564 This configuration is the default one, no SDP parameters are 1565 required 1567 9.2.4 RTP packet structure 1569 Note that accessUnitEndFlag is mapped to the RTP header M bit. 1571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1572 | Field | size | 1573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1574 | RTP header | - | 1575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1576 | SL packet payload | 1400 bytes | 1577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1579 Note the compositionTimeStamp is never present since it would be 1580 redundant with the RTP time stamp. However the value of CTSFlag is 1 1581 to indicate to the receiver that the value of 1582 compositionTimeStampFlag for the corresponding reconstructed SL 1583 Packed. 1585 In this example we have a RTP overhead of 40 bytes for 1400 bytes of 1586 payload i.e. 3 % overhead. 1588 9.3 Low delay MPEG-4 Audio 1590 This example is for a low delay service where a single SL packet is 1591 transported in each RTP packet. 1593 9.3.1 SLConfigDescriptor 1595 Since CTS=DTS signaling of MPEG-4 time stamps is not needed. 1597 We also assume here an audio Object Type for which all Access Units 1598 are Random Access Points, which is signaled using the 1599 hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. 1601 We assume furtheremore a mode where the Access Unit size is constant 1602 and 5 bytes (which is signaled with AU_Length). 1604 In this example the SLConfigDescriptor is: 1606 Gentric et al. Expires July 2001 29 1608 RTP Payload Format for MPEG-4 Streams January 2001 1610 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1611 tag=SLConfigDescrTag { 1612 bit(8) predefined; 1613 if (predefined==0) { 1614 bit(1) useAccessUnitStartFlag; = 0 1615 bit(1) useAccessUnitEndFlag; = 0 1616 bit(1) useRandomAccessPointFlag; = 0 1617 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 1618 bit(1) usePaddingFlag; = 0 1619 bit(1) useTimeStampsFlag; = 0 1620 bit(1) useIdleFlag; = 0 1621 bit(1) durationFlag; = 0 1622 bit(32) timeStampResolution; = 0 1623 bit(32) OCRResolution; = 0 1624 bit(8) timeStampLength; = 0 1625 bit(8) OCRLength; = 0 1626 bit(8) AU_Length; = 5 1627 bit(8) instantBitrateLength; = 0 1628 bit(4) degradationPriorityLength; = 0 1629 bit(5) AU_seqNumLength; = 0 1630 bit(5) packetSeqNumLength; = 0 1631 bit(2) reserved=0b11; 1632 } 1633 if (durationFlag) { 1634 bit(32) timeScale; // NOT USED 1635 bit(16) accessUnitDuration; // NOT USED 1636 bit(16) compositionUnitDuration; // NOT USED 1637 } 1638 if (!useTimeStampsFlag) { 1639 bit(timeStampLength) startDecodingTimeStamp; = 0 1640 bit(timeStampLength) startCompositionTimeStamp; = 0 1641 } 1642 } 1644 9.3.2 SL packet header 1646 With this configuration the SL header is empty. 1648 9.3.3 SDP mapping information 1650 No SDP parameters are required 1652 9.3.4 RTP packet structure 1654 Note that the RTP header M bit should be always set to 1. 1656 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1657 | Field | size | 1658 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1659 | RTP header | - | 1660 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1661 | SL packet payload | 5 bytes | 1662 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1664 Gentric et al. Expires July 2001 30 1666 RTP Payload Format for MPEG-4 Streams January 2001 1668 9.3.5 Overhead estimation 1670 The overhead is extremely large i.e. more than 800 %. 1672 9.4 Media delivery MPEG-4 Audio 1674 This example is for a media delivery service where delay is not an 1675 issue but efficiency is. In this case several SL Packets are 1676 transported in each RTP packet. 1678 9.4.1 SLConfigDescriptor 1680 Is the same as in 9.3.1. 1682 9.4.2 SL packet header 1684 With this configuration the SL packet header is empty. 1686 9.4.3 SDP mapping information 1688 The absence of RSLHSizeLength in SDP indicates that the RSLH section 1689 is empty. 1691 The size of SL Packets (which are all complete Access Units in this 1692 case) is constant and is indicated in SDP with: 1694 a=fmtp: SLPPSize=5 1696 This also indicates to the receiver that the Multiple-SL mode will 1697 be used, i.e. that a 2 bytes field will give the size of the MSLH 1698 section. In this case however this field always contains zero. 1700 9.4.4 RTP packet structure 1702 Note that the RTP header M bit should be always set to 1. 1704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1705 | Field | size | 1706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1707 | RTP header | - | 1708 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1709 | MSLH section size in bits = 0 | 2 bytes | 1710 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1711 | SL packet payload | 5 bytes | 1712 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1713 | SL packet payload | 5 bytes | 1714 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1716 Gentric et al. Expires July 2001 31 1718 RTP Payload Format for MPEG-4 Streams January 2001 1720 | etc, until MTU is reached | 1721 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1722 | SL packet payload | 5 bytes | 1723 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1725 9.4.5 Overhead estimation 1727 The overhead is 3% i.e. minimal. 1729 9.5 A more complex case: AAC with interleaving 1731 Let us consider AAC around 130 kb/s each Access Unit is split in 4 1732 SL packets corresponding to Error Sensitivity Categories (ESC) of 1733 maximum 90 bytes for which interleaving is very useful in terms of 1734 error resilience. We will therefore use an interleaving scheme where 1735 15 SL Packets from 15 consecutive Access Units will be interleaved 1736 per RTP packet to match a MTU of 1500 bytes. 1738 The interleaving sequence is 4 RTP packets and 350 ms long which is 1739 too long for conferencing but perfectly OK for internet radio. 1741 Since the sequence contains 60 SL packets, the sequence number can 1742 be encoded on 6 bits. But 2 bits are actually enough if the sender 1743 always resets the SL packet sequence number to zero at the start of 1744 each sequence, since only the first MSLH in each of the 4 RTP packet 1745 in the sequence carries an absolute sequence number value (0,1,2,3). 1747 2 bits are also enough for SLPSeqNumDelta which is constant and 1748 equal to 3 (since +1 is automatically added) 1750 Note that the 4th RTP packet in each sequence has its M bit set to 1 1751 since it contains 15 SL packets transporting the end of 15 different 1752 Access Units. 1754 With this scheme a sender (for example upon reception of RTCP 1755 reports indicating high loss rates) case easily choose to duplicate 1756 for each interleaving sequence the first RTP packet which contains 1757 the most useful data in terms of ESC. 1759 We also want many SL features (OCR, AU boundary flags, as detailed 1760 below). 1762 One feature demonstrated by this example is the degradation 1763 priority. We assume degradation priority can take 4 different 1764 values, one for each SL packet of an Access Unit and is encoded on 2 1765 bits. This interleaving scheme makes sure that only SL packet of 1766 identical degradation priorities are grouped in the same RTP packet 1767 (3.6.3) and that only the first RSLH of each RTP packet transports 1768 the degradation priority. 1770 Gentric et al. Expires July 2001 32 1772 RTP Payload Format for MPEG-4 Streams January 2001 1774 We also assume that for each last SL packet of each RTP packet the 1775 server inserts an OCR. 1777 9.5.1 SLConfigDescriptor 1779 In this example the SLConfigDescriptor is: 1781 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1782 tag=SLConfigDescrTag { 1783 bit(8) predefined; 1784 if (predefined==0) { 1785 bit(1) useAccessUnitStartFlag; = 1 1786 bit(1) useAccessUnitEndFlag; = 1 1787 bit(1) useRandomAccessPointFlag; = 0 1788 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 1789 bit(1) usePaddingFlag; = 0 1790 bit(1) useTimeStampsFlag; = 0 1791 bit(1) useIdleFlag; = 0 1792 bit(1) durationFlag; = 0 1793 bit(32) timeStampResolution; = 0 1794 bit(32) OCRResolution; = 30 1795 bit(8) timeStampLength; = 0 1796 bit(8) OCRLength; = 32 1797 bit(8) AU_Length; = 0 1798 bit(8) instantBitrateLength; = 0 1799 bit(4) degradationPriorityLength; = 2 1800 bit(5) AU_seqNumLength; = 0 1801 bit(5) packetSeqNumLength; = 6 1802 bit(2) reserved=0b11; 1803 } 1804 if (durationFlag) { 1805 bit(32) timeScale; // NOT USED 1806 bit(16) accessUnitDuration; // NOT USED 1807 bit(16) compositionUnitDuration; // NOT USED 1808 } 1809 if (!useTimeStampsFlag) { 1810 bit(timeStampLength) startDecodingTimeStamp; = 0 1811 bit(timeStampLength) startCompositionTimeStamp; = 0 1812 } 1813 } 1815 9.5.2 SL Packet Header structure 1817 With this configuration we can extrapolate the following SL packet 1818 header structure: 1820 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1821 bit(1) accessUnitStartFlag; 1822 bit(1) accessUnitEndFlag; 1823 bit(1) OCRflag; 1824 bit(SL.packetSeqNumLength) packetSequenceNumber; 1825 bit(1) DegPrioflag; 1827 Gentric et al. Expires July 2001 33 1829 RTP Payload Format for MPEG-4 Streams January 2001 1831 if (DegPrioflag) { 1832 bit(SL.degradationPriorityLength) degradationPriority;} 1833 if (OCRflag) { 1834 bit(SL.OCRLength) objectClockReference;} 1835 } 1836 } 1838 9.5.3 SDP mapping information 1840 The RSLHSize cannot exceed 2 bits, which is encoded on 2 bits and 1841 signaled by RSLHSizeLength. 1843 The resulting concatenated fmtp line is: 1845 a=fmtp: 1846 SLPPSizeLength=6;RSLHSizeLength=2;SLPSeqNumLength=2;SLPSeqNumDeltaLe 1847 ngth=2;OCRDeltaLength=16 1849 9.5.4 RTP packet structure 1851 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1852 | Field | size | 1853 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1854 | RTP header | - | 1855 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1856 MSLH SECTION 1857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1858 | MSLH section size in bits = 135 | 2 bytes | 1859 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1860 | SLPPSize | 7 bits | 1861 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1862 | SLPSeqNum = 0 or 1 or 2 or 3 | 2 bits | 1863 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1864 | SLPPSize | 7 bits | 1865 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1866 | SLPSeqDeltaNum = 3 | 2 bits | 1867 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1868 | etc + 12 times 9 bits | 1869 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1870 | SLPPSize | 7 bits | 1871 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1872 | SLPSeqDeltaNum = 3 | 2 bits | 1873 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1874 | bits to byte alignment | 7 bits | 1875 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1876 RSLH SECTION 1877 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1878 | RSLHSize | 6 bits | 1879 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1880 | accessUnitStartFlag | 1 bit | 1881 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1882 | accessUnitEndFlag | 1 bit | 1883 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1885 Gentric et al. Expires July 2001 34 1887 RTP Payload Format for MPEG-4 Streams January 2001 1889 | OCRFlag = 0 | 1 bit | 1890 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1891 | DegPrioflag = 1 | 1 bit | 1892 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1893 | degradationPriority | 2 bits | 1894 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1895 | accessUnitStartFlag | 1 bit | 1896 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1897 | accessUnitEndFlag | 1 bit | 1898 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1899 | OCRFlag = 0 | 1 bit | 1900 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1901 | DegPrioflag = 0 | 1 bit | 1902 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1903 | etc + 12 times 4 bits | 1904 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1905 | accessUnitStartFlag | 1 bit | 1906 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1907 | accessUnitEndFlag | 1 bit | 1908 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1909 | OCRFlag = 1 | 1 bit | 1910 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1911 | OCRDelta | 16 bits | 1912 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1913 | DegPrioflag = 0 | 1 bit | 1914 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1915 | bits to byte alignment | 4 bits | 1916 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1917 SLPP SECTION 1918 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1919 | SL packet payload |max 90 bytes | 1920 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1921 | etc + 13 SL packets | 1922 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1923 | SL packet payload |max 90 bytes | 1924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1926 9.5.5 Overhead estimation 1928 The MSLH section is 19 bytes, the RSLH section is 10 bytes; in this 1929 example we have therefore a RTP overhead of 40 + 23 bytes for 1350 1930 bytes (max) of payload i.e. around 5 % overhead. 1932 10. References 1934 [1] ISO/IEC 14496-1:2000 MPEG-4 Systems October 2000 1936 [2] ISO/IEC 14496-2:1999/Amd.1:2000(E) MPEG-4 Visual January 2000 1938 [3] ISO/IEC 14496-3:1999/FDAM 1:20000 MPEG-4 Audio January 2000 1940 Gentric et al. Expires July 2001 35 1942 RTP Payload Format for MPEG-4 Streams January 2001 1944 [4] ISO/IEC 14496-6 FDIS Delivery Multimedia Integration Framework, 1945 November 1998. 1947 [5] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport 1948 Protocol for Real Time Applications RFC 1889, Internet Engineering 1949 Task Force, January 1996. 1951 [6] S. Bradner, Key words for use in RFCs to Indicate Requirement 1952 Levels, RFC 2119, March 1997. 1954 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 1955 payload format for MPEG-4 Audio/Visual streams, RFC 3016. 1957 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed 1958 RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-01.txt, July 1959 2000. 1961 [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over 1962 IP-based Protocols, work in progress, draft-singer-mpeg4-ip- 1963 01.txt,October 2000. 1965 [10] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327, 1966 Internet Engineering Task Force, April 1998. 1968 11. Authors' Addresses 1970 Olivier Avaro 1971 France Telecom 1972 35 A Sch�tzenh�ttenweg 1973 60598 Frankfurt am Main 1974 Deutschland 1975 e-mail: olivier.avaro@francetelecom.fr 1977 Andrea Basso 1978 AT&T Labs Research 1979 200 Laurel Avenue 1980 Middletown, NJ 07748 1981 USA 1982 e-mail: basso@research.att.com 1984 Stephen L. Casner 1985 Packet Design, Inc. 1986 66 Willow Place 1987 Menlo Park, CA 94025 1988 USA 1989 e-mail: casner@acm.org 1991 M. Reha Civanlar 1992 AT&T Labs - Research 1993 100 Schultz Drive 1994 Red Bank, NJ 07701 1996 Gentric et al. Expires July 2001 36 1998 RTP Payload Format for MPEG-4 Streams January 2001 2000 USA 2001 e-mail: civanlar@research.att.com 2003 Philippe Gentric 2004 Philips Digital Networks 2005 22 Avenue Descartes 2006 94453 Limeil-Brevannes CEDEX 2007 France 2008 e-mail: philippe.gentric@philips.com 2010 Carsten Herpel 2011 THOMSON multimedia 2012 Karl-Wiechert-Allee 74 2013 30625 Hannover 2014 Germany 2015 e-mail: herpelc@thmulti.com 2017 Zvi Lifshitz 2018 Optibase Ltd. 2019 7 Shenkar St. 2020 Herzliya 46120 2021 Israel 2022 e-mail: zvil@optibase.com 2024 Young-kwon Lim 2025 mp4cast (MPEG-4 Internet Broadcasting Solution Consortium) 2026 1001-1 Daechi-Dong Gangnam-Gu 2027 Seoul, 305-333, 2028 Korea 2029 e-mail : young@techway.co.kr 2031 Colin Perkins 2032 USC Information Sciences Institute 2033 4350 N. Fairfax Drive #620 2034 Arlington, VA 22203 2035 USA 2036 e-mail : csp@isi.edu 2038 Jan van der Meer 2039 Philips Digital Networks 2040 Cederlaan 4 2041 5600 JB Eindhoven 2042 Netherlands 2043 e-mail : jan.vandermeer@philips.com 2045 Gentric et al. Expires July 2001 37