idnits 2.17.1 draft-gentric-avt-mpeg4-multisl-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 5 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1913 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 187 has weird spacing: '... media unawa...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 2000) is 8504 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '10' is defined on line 1511, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-01 == Outdated reference: A later version (-04) exists of draft-singer-mpeg4-ip-01 -- Possible downref: Normative reference to a draft: ref. '9' ** Obsolete normative reference: RFC 2327 (ref. '10') (Obsoleted by RFC 4566) Summary: 7 errors (**), 0 flaws (~~), 8 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Avaro-France Telecom 3 Internet Draft Basso-AT&T 4 Casner-Packet Design 5 Civanlar-AT&T 6 Gentric-Philips 7 Herpel-Thomson 8 Lifshitz-Optibase 9 Lim-mp4cast 10 Perkins-ISI 11 van der Meer-Philips 12 Document: draft-gentric-avt-mpeg4-multisl-00.txt December 2000 13 Expires June 2000 15 RTP Payload Format for MPEG-4 Streams 17 Status of this Memo 19 This document is an Internet-Draft and is in full conformance with 20 all provisions of Section 10 of RFC2026. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. Internet-Drafts are draft documents valid for a maximum of 26 six months and may be updated, replaced, or obsoleted by other 27 documents at any time. It is inappropriate to use Internet- Drafts 28 as reference material or to cite them other than as "work in 29 progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 Abstract 38 This document describes a payload format for transporting MPEG-4 39 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 40 the coding of natural and synthetic audio-visual data. Several 41 services provided by RTP are beneficial for MPEG-4 encoded data 42 transport over the Internet. Additionally, the use of RTP makes it 43 possible to synchronize MPEG-4 data with other real-time data types. 45 This specification is a product of the Audio/Video Transport working 46 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 47 ad hoc group on MPEG-4 over Internet. Comments are solicited and 49 Gentric et al. 1 51 RTP Payload Format for MPEG-4 Streams December 2000 53 should be addressed to the working group's mailing list at rem- 54 conf@es.net and/or the authors. 56 1. Introduction 58 MPEG-4 is a recent standard from ISO/IEC for the coding of natural 59 and synthetic audio-visual data in the form of audiovisual objects 60 that are arranged into an audiovisual scene by means of a scene 61 description [1][2][3][4]. This draft specifies an RTP [5] payload 62 format for transporting MPEG-4 encoded data streams. 64 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 65 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 66 this document are to be interpreted as described in RFC 2119 [6]. 68 The benefits of using RTP for MPEG-4 data stream transport include: 70 i. Ability to synchronize MPEG-4 streams with other RTP payloads 72 ii. Monitoring MPEG-4 delivery performance through RTCP 74 iii. Combining MPEG-4 and other real-time data streams received from 75 multiple end-systems into a set of consolidated streams through RTP 76 mixers 78 iv. Converting data types, etc. through the use of RTP translators. 80 1.1 Overview of MPEG-4 End-System Architecture 82 Fig. 1 below shows the general layered architecture of MPEG-4 83 terminals. The Compression Layer processes individual audio-visual 84 media streams. The MPEG-4 compression schemes are defined in the 85 ISO/IEC specifications 14496-2 [2] and 14496-3 [3]. The compression 86 schemes in MPEG-4 achieve efficient encoding over a bandwidth 87 ranging from several Kbps to many Mbps. The audio-visual content 88 compressed by this layer is organized into Elementary Streams (ESs). 89 The MPEG-4 standard specifies MPEG-4 compliant streams. Within the 90 constraint of this compliance the compression layer is unaware of a 91 specific delivery technology, but it can be made to react to the 92 characteristics of a particular delivery layer such as the path-MTU 93 or loss characteristics. Also, some compressors can be designed to 94 be delivery specific for implementation efficiency. In such cases 95 the compressor may work in a non-optimal fashion with delivery 96 technologies that are different than the one it is specifically 97 designed to operate with. 99 The hierarchical relations, location and properties of ESs in a 100 presentation are described by a dynamic set of Object Descriptors 101 (ODs). Each OD groups one or more ES Descriptors referring to a 103 Gentric at al. 2 105 RTP Payload Format for MPEG-4 Streams December 2000 107 single content item (audio-visual object). Hence, multiple 108 alternative or hierarchical representations of each content item are 109 possible. 111 ODs are themselves conveyed through one or more ESs. A complete set 112 of ODs can be seen as an MPEG-4 resource or session description at a 113 stream level. The resource description may itself be hierarchical, 114 i.e. an ES conveying an OD may describe other ESs conveying other 115 ODs. 117 The session description is accompanied by a dynamic scene 118 description, Binary Format for Scene (BIFS), again conveyed through 119 one or more ESs. At this level, content is identified in terms of 120 audio-visual objects. The spatio-temporal location of each object is 121 defined by BIFS. The audio-visual content of those objects that are 122 synthetic and static are described by BIFS also. Natural and 123 animated synthetic objects may refer to an OD that points to one or 124 more ESs that carry the coded representation of the object or its 125 animation data. 127 By conveying the session (or resource) description as well as the 128 scene (or content composition) description through their own ESs, it 129 is made possible to change portions of the content composition and 130 the number and properties of media streams that carry the audio- 131 visual content separately and dynamically at well known instants in 132 time. 134 One or more initial Scene Description streams and the corresponding 135 OD stream has to be pointed to by an initial object descriptor 136 (IOD). The IOD needs to be made available to the receivers through 137 some out-of-band means that are not defined in this document. 139 A homogeneous encapsulation of ESs carrying media or control (ODs, 140 BIFS) data is defined by the Sync Layer (SL) that primarily provides 141 the synchronization between streams. The Compression Layer organizes 142 the ESs in Access Units (AU), the smallest elements that can be 143 attributed individual timestamps. Integer or fractional AUs are then 144 encapsulated in SL packets. All consecutive data from one stream is 145 called an SL-packetized stream at this layer. The interface between 146 the compression layer and the SL is called the Elementary Stream 147 Interface (ESI). The ESI is informative. 149 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 150 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 151 media unaware but delivery technology aware. It provides transparent 152 access to and delivery of content irrespective of the technologies 153 used. The interface between the SL and DMIF is called the DMIF 154 Application Interface (DAI). It offers content location independent 155 procedures for establishing MPEG-4 sessions and access to transport 156 channels. The specification of this payload format is considered as 157 a part of the MPEG-4 Delivery Layer. 159 media aware +-----------------------------------------+ 161 Gentric at al. 3 163 RTP Payload Format for MPEG-4 Streams December 2000 165 delivery unaware | COMPRESSION LAYER | 166 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 167 14496-3 Audio +-----------------------------------------+ 169 Elementary 170 Stream 171 ==========================================================Interface 173 (ESI) 174 +-------------------------------------------+ 175 media and | SYNC LAYER | 176 delivery unaware | manages elementary streams, their synch- | 177 14496-1 Systems | ronization and hierarchical relations | 178 +-------------------------------------------+ 179 DMIF 181 Application 182 ===========================================================Interface 184 (DAI) 185 +-------------------------------------------+ 186 delivery aware | DELIVERY LAYER | 187 media unaware |provides transparent access to and delivery| 188 14496-6 DMIF | of content irrespective of delivery | 189 | technology | 190 +-------------------------------------------+ 192 Figure 1: General MPEG-4 terminal architecture 194 1.2 MPEG-4 Elementary Stream Data Packetization 196 The ESs from the encoders are fed into the SL with indications of AU 197 boundaries, random access points, desired composition time and the 198 current time. 200 The Sync Layer fragments the ESs into SL packets, each containing a 201 header that encodes information conveyed through the ESI. If the AU 202 is larger than a SL packet, subsequent packets containing remaining 203 parts of the AU are generated with subset headers until the complete 204 AU is packetized. 206 The syntax of the Sync Layer is not fixed and can be adapted to the 207 needs of the stream to be transported. This includes the possibility 208 to select the presence or absence of individual syntax elements as 209 well as configuration of their length in bits. The configuration for 210 each individual stream is conveyed in a SLConfigDescriptor, which is 211 an integral part of the ES Descriptor for this stream. 213 2. Analysis of the alternatives for carrying MPEG-4 over IP 215 2.1 MPEG-4 over UDP 217 Gentric at al. 4 219 RTP Payload Format for MPEG-4 Streams December 2000 221 Considering that the MPEG-4 SL defines several transport related 222 functions such as timing, sequence numbering, etc., this seems to be 223 the most straightforward alternative for carrying MPEG-4 data over 224 IP. One group of problems with this approach, however, stems from 225 the monolithic architecture of MPEG-4. No other multimedia data 226 stream (including those carried with RTP) can be synchronized with 227 MPEG-4 data carried directly over UDP. Furthermore, the dynamic 228 scene and session control concepts can't be extended to non-MPEG-4 229 data. 231 Even if the coordination with non-MPEG-4 data is overlooked, 232 carrying MPEG-4 data over UDP has the following additional 233 shortcomings: 235 i. Mechanisms need to be defined to protect sensitive parts of MPEG- 236 4 data. Some of these (like FEC) are already defined for RTP. 238 ii. There is no defined technique for synchronizing MPEG-4 streams 239 from different servers in the variable delay environment of the 240 Internet. 242 iii. MPEG-4 streams originating from two servers may collide (their 243 sources may become unresolvable at the destination) in a multicast 244 session. 246 iv. An MPEG-4 back channel needs to be defined for quality feedback 247 similar to that provided by RTCP. 249 v. RTP mixers and translators can't be used. 251 The back-channel problem may be alleviated by developing a reception 252 reporting protocol like RTCP. Such an effort may benefit from RTCP 253 design knowledge, but needs extensions. 255 2.2 RTP header followed by full MPEG-4 headers 257 This alternative may be implemented by using the send time or the 258 composition time coming from the reference clock as the RTP 259 timestamp. 260 This way no new feedback protocol needs to be defined for MPEG-4's 261 back channel, but RTCP may not be sufficient for MPEG-4's feedback 262 requirements that are still in the definition stage. Additionally, 263 due to the duplication of header information, such as the sequence 264 numbers and time stamps, this alternative causes unnecessary 265 increases in the overhead. Scene description or dynamic session 266 control can't be extended to non-MPEG-4 streams also. 268 2.3 MPEG-4 ESs over RTP with individual payload types 270 This is the most suitable alternative for coordination with the 271 existing Internet multimedia transport techniques and does not use 272 MPEG-4 systems at all. Complete implementation of it requires 273 definition of potentially many payload types, as already proposed 275 Gentric at al. 5 277 RTP Payload Format for MPEG-4 Streams December 2000 279 for audio and video payloads [7], and might lead to constructing new 280 session and scene description mechanisms. Considering the size of 281 the work involved which essentially reconstructs MPEG-4 systems, 282 this may only be a long term alternative if no other solution can be 283 found. 285 2.4 RTP header followed by a reduced SL header 287 The inefficiency of the approach described in 2.2 can be fixed by 288 using a reduced SL header that does not carry duplicate information 289 following the RTP header. 291 2.5 Recommendation 293 Based on the above analysis, the best compromise is to map the MPEG- 294 4 SL packets onto RTP packets, such that the common pieces of the 295 headers reside in the RTP header that is followed by an optional 296 reduced SL 297 header providing the MPEG-4 specific information. The details of 298 this 299 payload format are described in the next section. 301 3. Payload Format 303 The RTP Payload corresponds to an integer number of SL packets. The 304 SLPacket headers are transformed into reduced SL packet headers, 305 with some fields replaced by those in the RTP header and others 306 transported in reduced form. The payload is unchanged. 308 When generating SL packetized stream specifically for this format 309 all other fields in the SL packet headers that the RTP header does 310 not duplicate (including the decodingTimeStamp) is OPTIONAL. 312 The packet structure consists in a concatenated header section where 313 all reduced SL packet headers are bit-wise concatenated. If the 314 resulting concatenated header section consumes a non-integer number 315 of bits zero padding bits MUST be inserted at the end in order to 316 achieve byte-alignment. 318 After the concatenated header section is the concatenated payload 319 section where all SLPacket payloads are concatenated. SLPacket 320 payloads are byte aligned. 322 RTP Packets SHOULD be sent in the decoding (MPEG-4 323 decodingTimeStamp) order. 325 SL packets inside RTP packets MUST be in the decoding (MPEG-4 326 decodingTimeStamp) order. 328 The size of the SL packet(s) SHOULD be adjusted such that the 329 resulting RTP packet is not larger than the path-MTU. To handle 330 larger packets, this payload format relies on lower layers for 331 fragmentation, which may not be desirable. 333 Gentric at al. 6 335 RTP Payload Format for MPEG-4 Streams December 2000 337 It is assumed that the MPEG-4 SLConfigDescriptor is transported "out 338 of 339 band". This is typically done via an ObjectDescriptorStream using 340 the 341 MPEG-4 Object Description framework. 343 However since some knowledge of the SLConfigDescriptor is required 344 by 345 an RTP receiver in order to parse MPEG-4 System specific elements in 346 the RTP payload defined in this document, the SLConfigDescriptor MAY 347 be transported in the SDP associated with such a stream using the 348 a=fmtp syntax (see below). 350 0 1 2 3 351 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 353 |V=2|P|X| CC |M| PT | sequence number | 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 355 | timestamp | 356 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 357 | synchronization source (SSRC) identifier | 358 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 359 : contributing source (CSRC) identifiers : 360 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 361 |Reduced SL Packet Header (variable # of bits) | Reduced SL | 362 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 363 |Packet Header (variable # of bits) | padding bits to byte | 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 365 | alignment | | 366 +-+-+-+-+-+-+ | 367 | | 368 | SL Packet Payload (byte aligned) | 369 | | 370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 371 | | 372 | | 373 | | 374 | SL Packet Payload (byte aligned) | 375 | | 376 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 377 | :...OPTIONAL RTP padding | 378 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 379 Figure 2: An RTP packet for MPEG-4 381 3.1 RTP Header Fields Usage 383 Payload Type (PT): The assignment of an RTP payload type for this 384 new packet format is outside the scope of this document, and will 385 not be specified here. It is expected that the RTP profile for a 386 particular class of applications will assign a payload type for this 388 Gentric at al. 7 390 RTP Payload Format for MPEG-4 Streams December 2000 392 encoding, or if that is not done then a payload type in the dynamic 393 range shall be chosen. 395 Marker (M) bit: Set to one to mark the last fragment (or only 396 fragment) of an AU. Also set to one for RTP packets that 398 Extension (X) bit: Defined by the RTP profile used. 400 Sequence Number: The RTP sequence number should be generated by the 401 sender with a constant random offset and does not have to be 402 correlated to any (optional) MPEG-4 SL sequence numbers. 404 Timestamp: Set to the value in the compositionTimeStamp field of the 405 first SL packet, if present. If compositionTimeStamp has less than 406 32 bits length, the MSBs of timestamp MUST be set to zero. 408 Although it is available from the SL configuration data, the 409 resolution of the timestamp may need to be conveyed explicitly 410 through some out-of-band means to be used by network elements which 411 are not MPEG-4 aware. 413 If compositionTimeStamp has more than 32 bits length, this payload 414 format cannot be used. 416 In all cases, the sender SHALL always make sure that RTP time stamps 417 are identical only for RTP packets transporting fragments of the 418 same Access Unit. 420 In case compositionTimeStamp is not present in the current SL 421 packet, but has been present in a previous SL packet the reason is 422 that this is the same Access Unit that has been fragmented therefore 423 the same timestamp value MUST be taken as RTP timestamp. 425 According to RFC1889 [5, Section 5.1] timestamps are recommended to 426 start at a random value for security reasons. However then, a 427 receiver is not in the general case able to reconstruct the original 428 MPEG-4 Time Stamps (CTS, DTS, OCR) which can be of use for 429 applications where streams from multiple sources are to be 430 synchronized. Therefore the usage of such a random offset SHOULD be 431 avoided. 433 SSRC: set as described in RFC1889 [5]. A mapping between the ES 434 identifiers (ESIDs) and SSRCs should be provided through out-of-band 435 means. 437 CC and CSRC fields are used as described in RFC 1889 [5]. 439 RTCP SHOULD be used as defined in RFC 1889 [5]. 441 Reduced SL Header Packet: Defined in section 3.2 and 3.3. If the 442 Reduced SL Packet Header contains a non-integer number of bytes, 443 trailing padding bits, each coded as zero, MUST be inserted to byte 444 align the start of the SL Packet Payload. 446 Gentric at al. 8 448 RTP Payload Format for MPEG-4 Streams December 2000 450 SL Packet Payload: The payload of an SL Packet. The payload MUST be 451 byte aligned, if needed, by using trailing padding bits, each coded 452 as zero. 454 RTP timestamps in RTCP SR packets: according to the RTP timing 455 model, the RTP timestamp that is carried into an RTCP SR packet is 456 the same as the compositionTimeStamp that would be applied to an RTP 457 packet for data that was sampled at the instant the SR packet is 458 being generated and sent. The RTP timestamp value is calculated from 459 the NTP timestamp for the current time, which also goes in the RTCP 460 SR packet. To perform that calculation, an implementation needs to 461 periodically establish a correspondence between the CTS value of a 462 data packet and the NTP time at which that data was sampled. 464 3.2 Reduced SL Packet header construction 466 The following modifications of the SL packet header MUST be applied 467 to 468 a SL packetized stream before encapsulation in this RTP payload 469 format. 470 The other fields of the SL packet header MUST remain unchanged (but 471 are bit-shifted to fill in the gaps left by the changes specified 472 below). 474 3.2.1 Time Stamps transformation 476 The first reduced SL packet includes a header without 477 compositionTimeStamp field since the RTP time stamp transports it. 478 After placing its value in the RTP time stamp, the sender MUST 479 remove the compositionTimeStamp, if any, from the first SL packet 480 header. All other MPEG-4 Time Stamps are encoded as offsets. 482 If compositionTimeStamp is never present in SL packets for this 483 stream, the RTP packetizer SHOULD convey a reading of a local clock 484 at the time the RTP packet is created. 486 All decodingTimeStamps, if present, MUST be replaced by the 487 difference between their value and the value of the 488 compositionTimeStamp. If an OCR (Object Clock Reference) is present 489 it MUST also be changed to encode a difference from the 490 compositionTimeStamp in the same fashion. With this payload format 491 OCRs MUST have the same clock resolution as Time Stamps. If 492 compositionTimeStamp is not present for a SL packet that has OCR 493 then the OCR SHALL be encoded as a difference to the RTP time stamp. 495 Since this subtraction may lead to negative values, the offset MUST 496 be encoded as a two's complement signed integer in network byte 497 order. 499 Because these offsets (delta) typically require fewer bits to be 500 encoded, the sender MAY use a different length than the one 501 indicated by the original SLConfigDescriptor timeStampLength field. 503 Gentric at al. 9 505 RTP Payload Format for MPEG-4 Streams December 2000 507 The length MUST then be signaled to the receiver by using an SDP 508 a=fmtp field (see section 3.3 and section 8). 510 3.2.2 Indication of size 512 For efficiency SL packets do not carry their own size. This is not 513 an issue for RTP packets that contain a single SL Packet. 515 However when multiple SL packets are carried in a RTP packet the 516 size of each SL packet payload MUST be available to the receiver. 518 If the SL packet payload size is constant for a stream, the size 519 information SHOULD NOT be transported in the RTP packet. However in 520 that case it MUST be signaled in SDP using a (a=fmtp: 521 SLPacketPayloadSize=) syntax (see section 8). 523 If the SL packet payload size is variable then the size of each SL 524 packet payload MUST be indicated in the corresponding Reduced SL 525 packet header. In order to do so the reduced SL packet header MUST 526 contain a SLPacketPayloadSize field. Since this field serves the 527 same purpose as the accessUnitLength field, it replaces it, if 528 present, i.e. senders MUST remove accessUnitLength from the original 529 SL packet headers. The number of bits on which this size is 530 described MUST be indicated in the corresponding SDP using a 531 (a=fmtp: SLPacketPayloadSizeLength=) syntax (see 532 section 8). 534 The absence of either SLPacketPayloadSize or 535 SLPacketPayloadSizeLength in SDP indicates that a single SL packet 536 is transported in each RTP packet for that stream. 538 3.2.3 Interleaving 540 SL packets MAY be interleaved. 542 When interleaving of SL packets is used it SHALL be implemented 543 using PacketSequenceNumber. 545 Note that AUSequenceNumber in the SL header is not available for 546 interleaving since it may collide with BIFS Carousel usage. 548 The conjunction of RTP sequence number and packetSequenceNumber can 549 produce a quasi-unique identifier for a SL packet so that a receiver 550 can unambiguously reconstruct the original order even in case of 551 out-of-order packets, packet loss or duplication. 553 If packetSequenceNumber is used it SHALL be unchanged for the first 554 SL packet but MAY be encoded as a difference for the other SL 555 packets in the same RTP packet. In that case the length in bits on 556 which these packetSequenceNumber differences (delta) are encoded 557 MUST be signaled in SDP using a (a=fmtp: 558 packetSequenceNumberDeltaLength=) syntax (see section 8). 560 Gentric at al. 10 562 RTP Payload Format for MPEG-4 Streams December 2000 564 3.2.4 Constraints for use of fields in the remainingSLPacketHeader 566 3.2.4.1 Random Access Points 568 Access Units that have the Random Access Point set to true (1) are 569 referred to as RAP. 571 In case multiple Access Units are transported in a RTP packet, this 572 packet may contain either no RAP or one or more RAPs. 574 In case one or more RAPs are present the first SL packet MUST be a 575 RAP; the reason being that a receiver after a packet loss may have 576 to skip packets until a RAP and this is facilitated when only the 577 first Reduced SL packet header has to be scanned. 579 3.2.4.2 Degradation priority 581 For streams that use the optional degradation priority field in the 582 SL packet headers, only SL packets with the same degradation 583 priority SHALL be transported by one RTP packet so that components 584 may dispatch the RTP packets according to appropriate QOS or 585 protection scheme. Furthermore only the first reduced SL packet 586 header SHALL carry the degradationPriority field since it would be 587 otherwise redundant. 589 3.2.4.3 AUSequenceNumber 591 If AUSequenceNumber is used it SHALL be unchanged for the first SL 592 packet but MAY be encoded as a difference for the others SL packets 593 in the same RTP packet. In that case the length in bits on which 594 these AUSequenceNumber differences (delta) are encoded MUST be 595 signaled in SDP using a (a=fmtp: 596 AUSequenceNumberDeltaLength=) syntax (see section 8). 598 3.3 Reduced SL Packet Headers 600 The reduced SL Packet Header is configurable and depends on SDP 601 parameters. 603 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 604 | packetSequenceNumber | 605 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 606 | SLPacketPayloadSize | 607 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 608 | compositionTimeStampFlag | 609 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 610 | compositionTimeStampDelta | 611 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 612 | decodingTimeStampFlag | 613 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 614 | decodingTimeStampDelta | 615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 616 | remainingSLPacketHeaderSize | 618 Gentric at al. 11 620 RTP Payload Format for MPEG-4 Streams December 2000 622 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 623 | remainingSLPacketHeader | 624 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 626 Figure 3: Reduced SL Packet Header 628 3.3.1 Usage of fields 630 packetSequenceNumber : Indicates the serial number of the 631 SLPacketPayload. The length of the packetSequenceNumber field is 632 defined by SDP parameters as follows. For the first reduced SL 633 Packet Header in an RTP packet, the length is defined by the 634 packetSequenceNumberLength, and for any subsequent reduced SL Packet 635 Header by the packetSequenceNumberDeltaLength. 637 SLPacketPayloadSize : Indicates the size in bytes of the associated 638 SL Packet Payload. 640 compositionTimeStampFlag : Indicates whether the 641 compositionTimeStampDelta field is present. A value of 1 indicates 642 that the field is present, a value of 0 that it is not present. 644 CompositionTimeStampDelta : Specifies the value of the CTS as a 2- 645 complement offset from the timestamp in the RTP header of this 646 packet. 648 decodingTimeStampFlag : Indicates whether the 649 decodingTimeStampDelta field is present. A value of 1 indicates that 650 the field is present, a value of 0 that it is not present. If the 651 decodingTimeStampFlag is true, the sender MUST remove the 652 decodingTimeStamp from the original SL packet headers. 654 DecodingTimeStampDelta : Specifies the value of the DTS as a 2- 655 complement offset from the timestamp in the RTP header of this 656 packet. 658 remainingSLPacketHeaderSize : Specifies the length in bits of the 659 immediately following remainingSLPacketHeader. 661 remainingSLPacketHeader : The remainder of an SL header after 662 removal of the CTS and DTS, field, if any, and modification of the 663 associated flags. The semantics of the original SL Packet Header is 664 defined by a SLConfigDescriptor conveyed in SDP or by other means. 665 If the remaining SL Packet header contains an OCR, then this field 666 is not coded as defined in such descriptor, but instead as described 667 in 3.1.1 with a length indicated by the OCRDeltaLength parameter at 668 SDP. Similarly, if the remaining SL Packet header of a subsequent 669 (non-first) reduced SL Header in an RTP packet contains the 670 AU_sequenceNumber field, then this field may not be coded as defined 671 in such descriptor but instead as described in 3.1.7 with a length 672 indicated by the AUSequenceNumberDeltaLength parameter at SDP. 674 Gentric at al. 12 676 RTP Payload Format for MPEG-4 Streams December 2000 678 3.3.2 Relationship between reduced SL Packet header and SDP parameters 680 The relationship between a reduced SL Packet Header and the SDP 681 parameters is as follows: 683 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 684 | Fields of Reduced SL Packet | Number of bits | 685 | Header | | 686 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 687 | If(packetSequenceNumberLength>0)| | 688 | { | | 689 | packetSequenceNumber | packetSequenceNumberLength | 690 | } | | 691 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 692 | If(SLPacketPayloadSizeLength>0) | | 693 | { | | 694 | SLPacketPayloadSize | SlpacketPayloadSizeLength | 695 | } | | 696 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 697 | If(decodingTime | | 698 | StampDeltaLength>0) | | 699 | { | | 700 | decodingTimeStampFlag | 1 | 701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 702 | If(decodingTimeStampFlag==1) | | 703 | { | | 704 | decodingTimeStampDelta | decodingTimeStampDeltaLength | 705 | } | | 706 | } | | 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 708 | If(compositionTime | | 709 | StampDeltaLength>0) | | 710 | { | | 711 | compositionTimeStampFlag | 1 | 712 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 713 | If(compositionTimeStampFlag==1) | | 714 | { | | 715 | CompositionTimeStampDelta |compositionTimeStampDeltaLength| 716 | } | | 717 | } | | 718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 719 | If(remainingSLPacket | | 720 | HeaderSizeLength>0) | | 721 | { | | 722 | remainingSLPacketHeaderSize | remainingSLPacket | 723 | | HeaderSizeLength | 724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 725 | remainingSLPacketHeader | remainingSLPacketHeaderSize | 726 | } | | 727 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 729 4. SL packetized stream reconstruction 731 Gentric at al. 13 733 RTP Payload Format for MPEG-4 Streams December 2000 735 The MPEG-4 over IP framework [9] requires that the way a receiver 736 can reconstruct a valid SL packetized stream shall be documented, 737 this is the purpose of this section. 739 Since this format directly transports SL packets this reconstruction 740 is trivial with the following rules: 741 - The SL packet header SHALL remain exactly the same as received 742 with the following exceptions: 743 - All time stamps (CTS, DTS, OCR), if present, are restored from the 744 delta values. 745 - All sequence numbers (packetSequenceNumber, AUSequenceNumber), if 746 present are restored from the delta values relative to the first SL 747 packet in the RTP packet. 748 - AccessUnitLength fields, if present (i.e. if SL.AU_Length is non 749 zero), are restored from SLPacketPayloadSize. 750 - The other SL packet header fields SHALL remain exactly the same as 751 in the remainingSLPacketHeader. 753 5. Multiplexing 755 Since a typical MPEG-4 session may involve a large number of 756 objects, that may be as many as a few hundred, transporting each ES 757 as an individual RTP session may not always be practical. Allocating 758 and controlling hundreds of destination addresses for each MPEG-4 759 session may pose insurmountable session administration problems. 760 The input/output processing overhead at the end-points will be 761 extremely high also. Additionally, low delay transmission of low 762 bitrate data streams, e.g. facial animation parameters, results in 763 extremely high header overheads. 765 To solve these problems, MPEG-4 data transport requires a 766 multiplexing scheme that allows selective bundling of several ESs. 767 This is beyond the scope of the payload format defined here. MPEG- 768 4's Flexmux multiplexing scheme may be used for this purpose by 769 defining an additional RTP payload format for "multiplexed MPEG-4 770 streams." Another approach may be to develop a generic RTP 771 multiplexing scheme usable for MPEG-4 data. The multiplexing scheme 772 reported in [8] may be a candidate for this approach. 774 For MPEG-4 applications, the multiplexing technique needs to address 775 the following requirements: 777 i. The ESs multiplexed in one stream can change frequently during a 778 session. Consequently, the coding type, individual packet size and 779 temporal relationships between the multiplexed data units must be 780 handled dynamically. 782 ii. The multiplexing scheme should have a mechanism to determine the 783 ES identifier (ES_ID) for each of the multiplexed packets. ES_ID is 784 not a part of the SL header. 786 Gentric at al. 14 788 RTP Payload Format for MPEG-4 Streams December 2000 790 iii. In general, an SL packet does not contain information about its 791 size. The multiplexing scheme should be able to delineate the 792 multiplexed packets whose lengths may vary from a few bytes to close 793 to the path-MTU. 795 6. Security Considerations 797 RTP packets using the payload format defined in this specification 798 are subject to the security considerations discussed in the RTP 799 specification [5]. This implies that confidentiality of the media 800 streams is achieved by encryption. Because the data compression used 801 with this payload format is applied end-to-end, encryption may be 802 performed on the compressed data so there is no conflict between the 803 two operations. The packet processing complexity of this payload 804 type does not exhibit any significant non-uniformity in the receiver 805 side to cause a denial-of-service threat. 807 However, it is possible to inject non-compliant MPEG streams (Audio, 808 Video, and Systems) to overload the receiver/decoder's buffers which 809 might compromise the functionality of the receiver or even crash it. 810 This is especially true for end-to-end systems like MPEG where the 811 buffer models are precisely defined. 813 MPEG-4 Systems supports stream types including commands that are 814 executed on the terminal like OD commands, BIFS commands, etc. and 815 programmatic content like MPEG-J (Java(TM) Byte Code) and 816 ECMASCRIPT. It is possible to use one or more of the above in a 817 manner non-compliant to MPEG to crash or temporarily make the 818 receiver unavailable. 820 Authentication mechanisms can be used to validate of the sender and 821 the data to prevent security problems due to non-compliant malignant 822 MPEG-4 streams. 824 A security model is defined in MPEG-4 Systems streams carrying MPEG- 825 J access units which comprises Java(TM) classes and objects. MPEG-J 826 defines a set of Java APIs and a secure execution model. MPEG-J 827 content can call this set of APIs and Java(TM) methods from a set of 828 Java packages supported in the receiver within the defined security 829 model. According to this security model, downloaded byte code is 830 forbidden to load libraries, define native methods, start programs, 831 read or write files, or read system properties. 833 Receivers can implement intelligent filters to validate the buffer 834 requirements or parametric (OD, BIFS, etc.) or programmatic (MPEG-J, 835 ECMAScript) commands in the streams. However, this can increase the 836 complexity significantly. 838 7. Types and names 840 The encoding name associated to this RTP payload format is "mpeg4- 841 sl". 843 Gentric at al. 15 845 RTP Payload Format for MPEG-4 Streams December 2000 847 The media type may be any of: 848 - "video" 849 - "audio" 850 - "application" 852 "video" SHOULD be used for MPEG-4 Video streams (ISO/IEC 14496-2) or 853 MPEG-4 Systems streams that convey information needed for an 854 audio/visual presentation. 856 "audio" SHOULD be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or 857 MPEG-4 Systems streams that convey information needed for an audio- 858 only presentation. 860 "application" SHOULD be used for MPEG-4 Systems streams (ISO/IEC 861 14496-1) that serve other purposes than audio/visual presentation, 862 e.g. in some cases when MPEG-J streams are transmitted. 864 8. Additional SDP syntax 866 8.1 Mapping information 868 This format may require additional information about the mapping to 869 be made available to the receiver. 871 For example as mentioned above some fields of the SL packet header 872 MAY be reconfigured for optimal efficiency. When such a change is 873 performed however it MUST be signaled to the receiver using a SDP 874 (a=fmtp) parameter as in RFC 2327 [10, section 6]. 876 The absence of any of these fields is similar to a field set to the 877 default value (zero). 879 8.1.1 Indication of decodingTimeStamp delta bit length 881 The following syntax should be used: 883 a=fmtp: decodingTimeStampDeltaLength= 885 being the number of bits on which the decoding time stamp 886 deltas are encoded in the reduced SL packet headers. The default 887 value is zero. A value larger than zero indicates that the 888 decodingTimeStampFlag is contained in each Reduced SL Packet Header. 889 A value of zero indicates that the decodingTimeStampFlag is not 890 present; in that case, the sender MUST remove any 891 decodingTimeStampFlag from the original SL packet headers. 893 8.1.2 Indication of compositionTimeStamp delta bit length 895 The following syntax should be used: 897 a=fmtp: compositionTimeStampDeltaLength= 899 Gentric at al. 16 901 RTP Payload Format for MPEG-4 Streams December 2000 903 being the number of bits on which the composition time stamp 904 deltas are encoded in the (non-first) reduced SL packet headers 906 8.1.3 Indication of OCR delta bit length 908 The following syntax should be used: 910 a=fmtp: OCRDeltaLength= 912 being the number of bits on which the Object Clock Reference 913 deltas are encoded in the remaindingSLPacketHeader. 915 8.1.4 Indication of SLPayloadSize description length 917 The following syntax should be used: 919 a=fmtp: SLPacketPayloadSizeLength= 921 being the number of bits on which the SLPacketPayloadSize 922 are encoded in the reduced SL packet headers. 924 Simultaneous presence in SDP of this parameter and 925 SLPacketPayloadSize is illegal. 927 8.1.5 Indication of packetSequenceNumber length 929 The following syntax should be used: 931 a=fmtp: packetSequenceNumberLength= 933 being the number of bits on which the packetSequenceNumber 934 is encoded in the first reduced SL packet headers. The default value 935 is zero and indicates the absence of packetSequenceNumber and 936 packetSequenceNumberDelta for all reduced SL headers. 938 8.1.6 Indication of packetSequenceNumber delta length 940 The following syntax should be used: 942 a=fmtp: packetSequenceNumberDeltaLength= 944 being the number of bits on which the 945 packetSequenceNumberDelta are encoded in any reduced SL packet 946 header subsequent to the first one. If this parameter is not present 947 and the packetSequenceNumberLength parameter is present, then the 948 packetSequenceNumber in any reduced SL header is encoded with the 949 number of bits defined by the value of packetSequenceNumberLength. 951 8.1.7 Indication of the length in bits of the 952 remainingSLPacketHeaderSize field 954 The following syntax should be used: 956 Gentric at al. 17 958 RTP Payload Format for MPEG-4 Streams December 2000 960 a=fmtp: remainingSLPacketHeaderSizeLength= 962 being the number of bits that is used to encode the 963 subsequent remaindingSLPacketHeaderSize field. The default value is 964 zero and indicates the absence of the remaindingSLPacketHeaderSize 965 and the remaindingSLPacketHeader fields. 967 8.1.8 Indication of AUSequenceNumber delta length 969 The following syntax should be used: 971 a=fmtp: AUSequenceNumberDeltaLength= 973 being the number of bits on which the AUSequenceNumberDelta 974 are encoded in the remainderSLPacketHeader. The default value is 975 zero and indicates that AUSequenceNumber, if present, is unchanged 976 in the remaining SL packet header. 978 8.1.9 Indication of constant SL packet size 980 The following syntax should be used: 982 a=fmtp: SLPacketPayloadSize= 984 being the constant size in bytes of each SL packet payload. 986 Simultaneous presence in SDP of this parameter and 987 SLPacketPayloadSizeLength is illegal. 989 8.2 Optional configuration information 991 In the MPEG-4 framework the following information is carried using 992 the Object Descriptor. For compatibility with receivers that do not 993 implement the full MPEG-4 system specification this information MAY 994 also be indicated in SDP. 996 For transport of MPEG-4 audio and video without the use of MPEG-4 997 systems, as well as to support non-MPEG-4 system receivers, it is 998 possible to transport information on the profile and level of the 999 stream and on the decoder configuration. 1001 8.2.1 Indication of SLConfigDescriptor 1003 Senders MAY transmit the SLConfigDescriptor in SDP. 1005 The following syntax should be used: 1007 a=fmtp: SLConfigDescriptor= 1009 being a base-64 encoding of the SLConfigDescriptor. This 1010 SHALL be the original SLConfigDescriptor and it SHALL be the same as 1011 the one transported by the OD framework. 1013 Gentric at al. 18 1015 RTP Payload Format for MPEG-4 Streams December 2000 1017 8.2.2 Indications for MPEG-4 audio streams 1019 8.2.2.1 Indication of profile level 1021 Senders MAY transmit the profile and level indication in SDP. 1023 The following syntax should be used: 1025 a=fmtp: profile-level-id= 1027 being a decimal representation of the MPEG-4 Audio Profile 1028 Level indication value defined in ISO/IEC 14496-1. This parameter 1029 indicates which MPEG-4 Audio tool subsets are applied to encode the 1030 audio stream. 1032 8.2.2.2 Indication of audio object type 1034 Senders MAY transmit the audio object type indication in SDP. 1036 The following syntax should be used: 1038 a=fmtp: object-type= 1040 being a decimal representation of the MPEG-4 Audio Object 1041 Type value defined in ISO/IEC 14496-3. This parameter specifies the 1042 tool used by the encoder. It CAN be used to limit the capability 1043 within the specified "profile-level-id". 1045 8.2.2.3 Indication of audio bitrate 1047 Senders MAY transmit the audio bitrate in SDP. 1049 The following syntax should be used: 1051 a=fmtp: bitrate= 1053 being a decimal representation of the audio bitrate in bits 1054 per second for the audio bit stream. 1056 8.2.2.4 Indication of audio decoder configuration 1058 Senders MAY transmit the audio decoder configuration in SDP. 1060 The following syntax should be used: 1062 a=fmtp: config= 1064 being a hexadecimal representation of an octet string that 1065 expresses the audio payload configuration data "StreamMuxConfig", as 1066 defined in ISO/IEC 14496-3. Configuration data is mapped onto the 1067 octet string in an MSB-first basis. The first bit of the 1068 configuration data SHALL be located at the MSB of the first octet. 1070 Gentric at al. 19 1072 RTP Payload Format for MPEG-4 Streams December 2000 1074 In the last octet, zero-padding bits, if necessary, shall follow the 1075 configuration data. 1077 8.2.3 Indications for MPEG-4 video streams 1079 8.2.3.1 Indication of profile and level 1081 Senders MAY transmit the video profile and level indication in SDP. 1083 The following syntax should be used: 1085 a=fmtp: profile-level-id= 1087 being a decimal representation of MPEG-4 Visual Profile 1088 Level indication value (profile_and_level_indication) defined in 1089 Table G-1 of ISO/IEC 14496-2. This parameter MAY be used in the 1090 capability exchange or session setup procedure to indicate MPEG-4 1091 Visual Profile and Level combination of which the MPEG-4 Visual 1092 codec is capable. If this parameter is not specified by the 1093 procedure, its default value of 1 (Simple Profile/Level 1) is used. 1095 8.2.3.2 Indication of video decoder configuration 1097 Senders MAY transmit the video decoder configuration in SDP. This 1098 parameter indicates the configuration of the corresponding MPEG-4 1099 visual bitstream. It SHALL NOT be used to indicate the codec 1100 capability in the capability exchange procedure. 1102 The following syntax should be used: 1104 a=fmtp: config= 1106 being a hexadecimal representation of an octet string that 1107 expresses the MPEG-4 Visual configuration information, as defined in 1108 subclause 6.2.1 Start codes of ISO/IEC14496-2[2][4][9]. The 1109 configuration information is mapped onto the octet string in an MSB- 1110 first basis. The first bit of the configuration information SHALL be 1111 located at the MSB of the first octet. The configuration information 1112 indicated by this parameter SHALL be the same as the configuration 1113 information in the corresponding MPEG-4 Visual stream, except for 1114 first_half_vbv_occupancy and latter_half_vbv_occupancy, if it 1115 exists, which may vary in the repeated configuration information 1116 inside an MPEG-4 Visual stream (See 6.2.1 Start codes of 1117 ISO/IEC14496-2). 1119 8.3 Concatenation of fmtp parameters 1121 Multiple fmtp parameters SHOULD be expressed as a MIME media type 1122 string, in the form of a semicolon separated list of parameter=value 1123 pairs. 1125 8.4 SDP file example 1127 Gentric at al. 20 1129 RTP Payload Format for MPEG-4 Streams December 2000 1131 In the following is an example of SDP syntax for the description of 1132 a session containing one MPEG-4 audio stream, one MPEG-4 video and 1133 one MPEG-4 system stream, transported using this format. Note that 1134 the video stream Decoding Time Stamps are encoded on 4 bits in this 1135 example. 1137 o= .... 1138 I= .... 1139 c=IN IP4 123.234.71.112 1140 m=video 1034 RTP/AVT 97 1141 a=fmtp:decodingtimeStampLength 4 1142 a=rtpmap:97 mpeg4-sl 1143 m=audio 810 RTP/AVT 98 1144 a=rtpmpa:98 mpeg4-sl 1145 m=application 1234 RTP/AVT 99 1146 a=rtpmap:99 mpeg4-sl 1148 9. Examples of usage of this payload format 1150 9.1 MPEG-4 Video 1152 Let us consider the case of a 30 frames per second MPEG-4 video 1153 stream which bit rate is high enough that Access Units have to be 1154 split in several SL packets (typically above 300 kb/s). 1156 Let us assume also that the video codec generates in that case Video 1157 Packets suitable to fit in one SL packet i.e that the video codec is 1158 MTU aware and the MTU is 1500 bytes. We assume furthermore that this 1159 stream contains B frames and that decodingTimeStamps are present. 1161 9.1.1 Typical SLConfigDescriptor for video streams 1163 In this example the SLConfigDescriptor is: 1165 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1166 tag=SLConfigDescrTag { 1167 bit(8) predefined; 1168 if (predefined==0) { 1169 bit(1) useAccessUnitStartFlag; = 1 1170 bit(1) useAccessUnitEndFlag; = 0 1171 bit(1) useRandomAccessPointFlag; = 1 1172 bit(1) hasRandomAccessUnitsOnlyFlag; = 0 1173 bit(1) usePaddingFlag; = 0 1174 bit(1) useTimeStampsFlag; = 1 1175 bit(1) useIdleFlag; = 0 1176 bit(1) durationFlag; = 0 1177 bit(32) timeStampResolution; = 30 1178 bit(32) OCRResolution; = 0 1179 bit(8) timeStampLength; // must be <= 64 = 32 1180 bit(8) OCRLength; // must be <= 64 = 0 1181 bit(8) AU_Length; // must be <= 32 = 0 1182 bit(8) instantBitrateLength; = 0 1183 bit(4) degradationPriorityLength; = 0 1185 Gentric at al. 21 1187 RTP Payload Format for MPEG-4 Streams December 2000 1189 bit(5) AU_seqNumLength; // must be <= 16 = 0 1190 bit(5) packetSeqNumLength; // must be <= 16 = 0 1191 bit(2) reserved=0b11; 1192 } 1193 if (durationFlag) { 1194 bit(32) timeScale; // NOT USED 1195 bit(16) accessUnitDuration; // NOT USED 1196 bit(16) compositionUnitDuration; // NOT USED 1197 } 1198 if (!useTimeStampsFlag) { 1199 bit(timeStampLength) startDecodingTimeStamp; = 0 1200 bit(timeStampLength) startCompositionTimeStamp; = 0 1201 } 1202 } 1204 Note that: 1205 the useRandomAccessPointFlag is set so that the 1206 randomAccessPointFlag can indicate that the corresponding SL packet 1207 contains a GOV and the first Video Packet of an Intra coded frame. 1209 9.1.2 Typical SL packet header structure for video streams 1211 With this configuration we can extrapolate the following SL packet 1212 header structure: 1213 aligned(8) class SL_PacketHeader (SLConfigDescriptor SL) { 1214 if (SL.useAccessUnitStartFlag) bit(1) accessUnitStartFlag; // 1 1215 bit 1216 if (accessUnitStartFlag) { 1217 if (SL.useRandomAccessPointFlag) bit(1) randomAccessPointFlag; 1218 // 1 bit 1219 if (SL.useTimeStampsFlag) { 1220 bit(1) decodingTimeStampFlag; // 1 bit 1221 bit(1) compositionTimeStampFlag; // 1 bit 1222 } 1223 if (decodingTimeStampFlag) bit(SL.timeStampLength) 1224 decodingTimeStamp; 1225 if (compositionTimeStampFlag) bit(SL.timeStampLength) 1226 compositionTimeStamp; 1227 } 1228 } 1230 9.1.3 SDP mapping information 1232 decodingTimeStamps are encoded on 32 bits, which is much more than 1233 needed for delta. Therefore the sender will use 1234 decodingTimeStampDeltaLength in the corresponding SDP to signal that 1235 only 6 bits are used for the coding of relative DTS in the RTP 1236 packet. 1238 The remainingSLPacketHeaderSize cannot exceed 3 bits, which is 1239 encoded on 2 bits and signaled by remainingSLPacketHeaderSizeLength. 1241 The resulting concatenated fmtp line is: 1243 Gentric at al. 22 1245 RTP Payload Format for MPEG-4 Streams December 2000 1247 a=fmtp: decodingTimeStampDeltaLength=6; 1248 remainingSLPacketHeaderSizeLength=2 1250 9.1.4 RTP packet structure 1252 Such SL packet headers can result in several reduced SL packet 1253 headers: 1255 For packets that transport first fragments of Access Units: 1257 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1258 | RTP header | 1259 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1260 | decodingTimeStampFlag = 1 (1 bit) | 1261 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1262 | decodingTimeStampDelta (6 bits) | 1263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1264 | remainingSLPacketHeaderSize = 3 (2 bits)| 1265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1266 | accessUnitStartFlag = 1 (1 bit) | 1267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1268 | randomAccessPointFlag (1 bit) | 1269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1270 | compositionTimeStampFlag = 1 (1 bit) | 1271 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1272 | 0000 (4 zero bits to byte alignment) | 1273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1274 | SL packet payload (N bytes) | 1275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1277 For packets that transport non-first fragments of Access Units: 1279 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1280 | RTP header | 1281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1282 | decodingTimeStampFlag = 0 (1 bit) | 1283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1284 | remainingSLPacketHeaderSize = 1 (2 bits)| 1285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1286 | accessUnitStartFlag = 0 (1 bit) | 1287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1288 | 0000 (4 zero bits to byte alignment) | 1289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1290 | SL packet payload (N bytes) | 1291 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1293 Note the compositionTimeStamp is never present since it would be 1294 redundant with the RTP time stamp. However the value of 1295 compositionTimeStampFlag is still 1 to indicate that 1296 compositionTimeStamp was present for this SL packet and should 1297 therefore be restored by the receiver using the RTP time stamp. 1299 Gentric at al. 23 1301 RTP Payload Format for MPEG-4 Streams December 2000 1303 In this example we have a RTP overhead of 40 + 2 bytes for 1400 1304 bytes of payload i.e. 3 % overhead. 1306 9.2 Low delay MPEG-4 Audio 1308 This example is for a low delay service where a single SL packet is 1309 transported in each RTP packet. 1311 9.2.1 Typical SLConfigDescriptor for low delay MPEG-4 Audio 1313 Since CTS=DTS signaling of MPEG-4 time stamps is not needed. 1315 We also assume here an audio Object Type for which all Access Units 1316 are 1317 Random Access Points, which is signaled using the 1318 hasRandomAccessUnitsOnlyFlag in the SLConfigDescriptor. 1320 In this example the SLConfigDescriptor is: 1322 class SLConfigDescriptor extends BaseDescriptor : bit(8) 1323 tag=SLConfigDescrTag { 1324 bit(8) predefined; 1325 if (predefined==0) { 1326 bit(1) useAccessUnitStartFlag; = 0 1327 bit(1) useAccessUnitEndFlag; = 0 1328 bit(1) useRandomAccessPointFlag; = 0 1329 bit(1) hasRandomAccessUnitsOnlyFlag; = 1 1330 bit(1) usePaddingFlag; = 0 1331 bit(1) useTimeStampsFlag; = 0 1332 bit(1) useIdleFlag; = 0 1333 bit(1) durationFlag; = 0 1334 bit(32) timeStampResolution; = 0 1335 bit(32) OCRResolution; = 0 1336 bit(8) timeStampLength; // must be <= 64 = 0 1337 bit(8) OCRLength; // must be <= 64 = 0 1338 bit(8) AU_Length; // must be <= 32 = 0 1339 bit(8) instantBitrateLength; = 0 1340 bit(4) degradationPriorityLength; = 0 1341 bit(5) AU_seqNumLength; // must be <= 16 = 0 1342 bit(5) packetSeqNumLength; // must be <= 16 = 0 1343 bit(2) reserved=0b11; 1344 } 1345 if (durationFlag) { 1346 bit(32) timeScale; // NOT USED 1347 bit(16) accessUnitDuration; // NOT USED 1348 bit(16) compositionUnitDuration; // NOT USED 1349 } 1350 if (!useTimeStampsFlag) { 1351 bit(timeStampLength) startDecodingTimeStamp; = 0 1352 bit(timeStampLength) startCompositionTimeStamp; = 0 1353 } 1354 } 1356 Gentric at al. 24 1358 RTP Payload Format for MPEG-4 Streams December 2000 1360 9.2.2 Typical SL packet header for low delay MPEG-4 Audio 1362 With this configuration the SL header is empty. 1364 This does not have to be indicated in SDP since the default value 1365 for remainingSLPacketHeaderSizeLength and 1366 decodingTimeStampDeltaLength is zero. 1367 Therefore the absence of these fields in SDP indicates the absence 1368 of decodingTimeStampFlag and remainingSLPacketHeaderSize in RTP 1369 packets. 1371 9.2.3 Overhead estimation for low delay MPEG-4 Audio 1373 Depending on the actual MPEG-4 audio Object Type used the RTP 1374 overhead (IP+UDP+RTP headers) can be very large since the SL packet 1375 payload can be a few bytes or less. 1377 9.3 Media delivery MPEG-4 Audio 1379 This example is for a service where delay is not an issue but 1380 streaming efficiency is of paramount importance. In this example 1381 multiple SL packets are transported in each RTP packet. 1383 9.3.1 RTP packet structure 1385 The SL configuration is the same as in the previous example; we will 1386 however use 1387 SLPayloadSizeLength to indicate multiple SL packets per RTP packets. 1388 In this example we will assume that this size never exceeds 31 bytes 1389 and can therefore be encoded on 5 bits. This will be signaled in SDP 1390 using: 1392 a=fmtp: SLPayloadSizeLength=5 1394 Therefore the structure of the RTP packet will be: 1396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1397 | RTP header | 1398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1399 | SLPacketPayloadSize (5 bits) | 1400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1401 | SLPacketPayloadSize (5 bits) | 1402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1403 | � as many times as SL packets | 1404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1405 | 0000 (byte alignment) | 1406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1407 | SL packet payload (N bytes) | 1408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1409 | SL packet payload (N bytes) | 1410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1411 | � as many times as SL packets | 1413 Gentric at al. 25 1415 RTP Payload Format for MPEG-4 Streams December 2000 1417 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1419 9.3.2 Overhead estimation for MPEG-4 Audio media delivery 1421 The resulting overhead can be computed as follows: 1423 At bit rate (BR) we compute the average Access Unit size (AvS) in 1424 bytes using the Access Unit duration (AuDur) in milliseconds as: 1425 AvS = (int)(BR/8*AuDur/1000) 1427 For example 8 kb/s CELP with AuDur=20 ms leads to AvS=20 bytes. In 1428 the same context as before we can assume 70 Access Units per RTP 1429 packets, therefore the overhead is 40 bytes for RTP+UDP+IP plus 70*5 1430 bits = 44 bytes of SL headers i.e. the overhead is 6 %. 1432 For high bit rate audio the number of SL packets per RTP packet will 1433 decrease, leading to better overhead figures. 1435 9.4 Interleaving for MPEG-4 Audio 1437 This example is the same as before with the addition of interleaving 1438 for error resilience. 1440 The SL configuration is the same as in the previous example except 1441 that packetSeqNumLength is not zero but 9 bits. We will also use 1442 SLPacketPayloadSizeLength to indicate multiple SL packets per RTP 1443 packets. Additionally we use packetSequenceNumberLength to signal 1444 the length 1445 of all packetSequenceNumber fields (packetSequenceNumberDeltaLength 1446 is not used in this example) 1448 This will be signaled in SDP using: 1450 a=fmtp: SLPayloadSizeLength=5;packetSequenceNumberLength=9 1452 The RTP packet structure is then: 1454 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1455 | RTP header | 1456 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1457 | packetSequenceNumber (9 bits) | 1458 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1459 | SLPacketPayloadSize (5 bits) | 1460 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1461 | packetSequenceNumber (9 bits) | 1462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1463 | SLPacketPayloadSize (5 bits) | 1464 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1465 | � as many times as SL packets | 1466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1467 | 000 (x bits to byte alignment)| 1468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1469 | SL packet payload (x bytes) | 1471 Gentric at al. 26 1473 RTP Payload Format for MPEG-4 Streams December 2000 1475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1476 | SL packet payload (x bytes) | 1477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1478 | � as many times as SL packets | 1479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1481 10. References 1483 [1] ISO/IEC 14496-1:2000 MPEG-4 Systems October 2000 1485 [2] ISO/IEC 14496-2:1999/Amd.1:2000(E) MPEG-4 Visual January 2000 1487 [3] ISO/IEC 14496-3:1999/FDAM 1:20000 MPEG-4 Audio January 2000 1489 [4] ISO/IEC 14496-6 FDIS Delivery Multimedia Integration Framework, 1490 November 1998. 1492 [5] Schulzrinne, Casner, Frederick, Jacobson RTP: A Transport 1493 Protocol for Real Time Applications RFC 1889, Internet Engineering 1494 Task Force, January 1996. 1496 [6] S. Bradner, Key words for use in RFCs to Indicate Requirement 1497 Levels, RFC 2119, March 1997. 1499 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, H. Kimata, RTP 1500 payload format for MPEG-4 Audio/Visual streams, work in progress, 1501 draft-ietf-avt-rtp-mpeg4-es-05.txt, September 2000. 1503 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed Compressed 1504 RTP ("TCRTP"), work in progress, draft-ietf-avt-tcrtp-01.txt, July 1505 2000. 1507 [9] D. Singer, Y Lim, A Framework for the delivery of MPEG-4 over 1508 IP-based Protocols, work in progress, draft-singer-mpeg4-ip- 1509 01.txt,October 2000. 1511 [10] Handley, Jacobson, SDP: Session Description Protocol, RFC 2327, 1512 Internet Engineering Task Force, April 1998. 1514 11. Authors' Addresses 1516 Olivier Avaro 1517 France Telecom 1518 35 A Sch�tzenh�ttenweg 1519 60598 Frankfurt am Main 1520 Deutschland 1521 e-mail: olivier.avaro@francetelecom.fr 1523 Andrea Basso 1524 AT&T Labs Research 1525 200 Laurel Avenue 1527 Gentric at al. 27 1529 RTP Payload Format for MPEG-4 Streams December 2000 1531 Middletown, NJ 07748 1532 USA 1533 e-mail: basso@research.att.com 1535 Stephen L. Casner 1536 Packet Design, Inc. 1537 66 Willow Place 1538 Menlo Park, CA 94025 1539 USA 1540 casner@acm.org 1542 M. Reha Civanlar 1543 AT&T Labs - Research 1544 100 Schultz Drive 1545 Red Bank, NJ 07701 1546 USA 1547 e-mail: civanlar@research.att.com 1549 Philippe Gentric 1550 Philips Digital Networks 1551 22 Avenue Descartes 1552 94453 Limeil-Brevannes CEDEX 1553 France 1554 e-mail: philippe.gentric@philips.com 1556 Carsten Herpel 1557 THOMSON multimedia 1558 Karl-Wiechert-Allee 74 1559 30625 Hannover 1560 Germany 1561 e-mail: herpelc@thmulti.com 1563 Zvi Lifshitz 1564 Optibase Ltd. 1565 7 Shenkar St. 1566 Herzliya 46120 1567 Israel 1568 e-mail: zvil@optibase.com 1570 Young-kwon Lim 1571 mp4cast (MPEG-4 Internet Broadcasting Solution Consortium) 1572 1001-1 Daechi-Dong Gangnam-Gu 1573 Seoul, 305-333, 1574 Korea 1575 e-mail : young@techway.co.kr 1577 Colin Perkins 1578 USC Information Sciences Institute 1579 4350 N. Fairfax Drive #620 1580 Arlington, VA 22203 1581 USA 1582 e-mail : csp@isi.edu 1584 Gentric at al. 28 1586 RTP Payload Format for MPEG-4 Streams December 2000 1588 Jan van der Meer 1589 Philips Digital Networks 1590 Cederlaan 4 1591 5600 JB Eindhoven 1592 Netherlands 1593 e-mail : jan.vandermeer@philips.com 1595 Gentric at al. 29