idnits 2.17.1 draft-ietf-avt-rtp-mpeg4-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 5 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 155 has weird spacing: '... media unawar...' == Line 431 has weird spacing: '...cessing to ca...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 13, 2000) is 8681 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 1889 (ref. '5') (Obsoleted by RFC 3550) == Outdated reference: A later version (-05) exists of draft-ietf-avt-rtp-mpeg4-es-02 == Outdated reference: A later version (-08) exists of draft-ietf-avt-tcrtp-00 Summary: 6 errors (**), 0 flaws (~~), 5 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Civanlar-AT&T/Basso-AT&T 3 INTERNET DRAFT Casner-Packet Design 4 File: draft-ietf-avt-rtp-mpeg4-03.txt Herpel-Thomson/Perkins-ISI 5 July 13, 2000 6 Expires: Jan 13, 2001 8 RTP Payload Format for MPEG-4 Streams 10 STATUS OF THIS MEMO 12 This document is an Internet-Draft and is in full conformance with all 13 provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering Task 16 Force (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet- Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 Abstract 32 This document describes a payload format for transporting MPEG-4 33 encoded data using RTP. MPEG-4 is a recent standard from ISO/IEC for 34 the coding of natural and synthetic audio-visual data. Several 35 services provided by RTP are beneficial for MPEG-4 encoded data 36 transport over the Internet. Additionally, the use of RTP makes it 37 possible to synchronize MPEG-4 data with other real-time data types. 39 This specification is a product of the Audio/Video Transport working 40 group within the Internet Engineering Task Force and ISO/IEC MPEG-4 ad 41 hoc group on MPEG-4 over Internet. Comments are solicited and should 42 be addressed to the working group's mailing list at rem-conf@es.net 43 and/or the authors. 45 1. Introduction 47 MPEG-4 is a recent standard from ISO/IEC for the coding of natural and 48 synthetic audio-visual data in the form of audiovisual objects that 49 are arranged into an audiovisual scene by means of a scene description 50 [1][2][3][4]. This draft specifies an RTP [5] payload format for 51 transporting MPEG-4 encoded data streams. 53 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 54 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 55 document are to be interpreted as described in RFC 2119 [6]. 57 The benefits of using RTP for MPEG-4 data stream transport include: 59 i. Ability to synchronize MPEG-4 streams with other RTP payloads 61 ii. Monitoring MPEG-4 delivery performance through RTCP 63 iii. Combining MPEG-4 and other real-time data streams received 64 from multiple end-systems into a set of consolidated streams 65 through RTP mixers 67 iv. Converting data types, etc. through the use of RTP translators. 69 1.1 Overview of MPEG-4 End-System Architecture 71 Fig. 1 below shows the general layered architecture of MPEG-4 72 terminals. The Compression Layer processes individual audio-visual 73 media streams. The MPEG-4 compression schemes are defined in the 74 ISO/IEC specifications 14496-2 [2] and 14496-3 [3]. The compression 75 schemes in MPEG-4 achieve efficient encoding over a bandwidth ranging 76 from several Kbps to many Mbps. The audio-visual content compressed by 77 this layer is organized into Elementary Streams (ESs). The MPEG-4 78 standard specifies MPEG-4 compliant streams. Within the constraint of 79 this compliance the compression layer is unaware of a specific delivery 80 technology, but it can be made to react to the characteristics of a 81 particular delivery layer such as the path-MTU or loss characteristics. 82 Also, some compressors can be designed to be delivery specific for 83 implementation efficiency. In such cases the compressor may work in a 84 non-optimal fashion with delivery technologies that are different than 85 the one it is specifically designed to operate with. 87 The hierarchical relations, location and properties of ESs in a 88 presentation are described by a dynamic set of Object Descriptors 89 (ODs). Each OD groups one or more ES Descriptors referring to a single 90 content item (audio-visual object). Hence, multiple alternative or 91 hierarchical representations of each content item are possible. 93 ODs are themselves conveyed through one or more ESs. A complete set of 94 ODs can be seen as an MPEG-4 resource or session description at a 95 stream level. The resource description may itself be hierarchical, i.e. 96 an ES conveying an OD may describe other ESs conveying other ODs. 98 The session description is accompanied by a dynamic scene description, 99 Binary Format for Scene (BIFS), again conveyed through one or more ESs. 100 At this level, content is identified in terms of audio-visual objects. 101 The spatiotemporal location of each object is defined by BIFS. The 102 audio-visual content of those objects that are synthetic and static are 103 described by BIFS also. Natural and animated synthetic objects may 104 refer to an OD that points to one or more ESs that carry the coded 105 representation of the object or its animation data. 107 By conveying the session (or resource) description as well as the scene 108 (or content composition) description through their own ESs, it is made 109 possible to change portions of the content composition and the number 110 and properties of media streams that carry the audio-visual content 111 separately and dynamically at well known instants in time. 113 One or more initial Scene Description streams and the corresponding OD 114 streams has to be pointed to by an initial object descriptor (IOD). The 115 IOD needs to be made available to the receivers through some out-of- 116 band means which are not defined in this document. 118 A homogeneous encapsulation of ESs carrying media or control (ODs, 119 BIFS) data is defined by the Sync Layer (SL) that primarily provides 120 the synchronization between streams. The Compression Layer organizes 121 the ESs in Access Units (AU), the smallest elements that can be 122 attributed individual timestamps. Integer or fractional AUs are then 123 encapsulated in SL packets. All consecutive data from one stream is 124 called an SL-packetized stream at this layer. The interface between the 125 compression layer and the SL is called the Elementary Stream Interface 126 (ESI). The ESI is informative. 128 The Delivery Layer in MPEG-4 consists of the Delivery Multimedia 129 Integration Framework defined in ISO/IEC 14496-6 [4]. This layer is 130 media unaware but delivery technology aware. It provides transparent 131 access to and delivery of content irrespective of the technologies 132 used. The interface between the SL and DMIF is called the DMIF 133 Application Interface (DAI). It offers content location independent 134 procedures for establishing MPEG-4 sessions and access to transport 135 channels. The specification of this payload format is considered as a 136 part of the MPEG-4 Delivery Layer. 138 media aware +-----------------------------------------+ 139 delivery unaware | COMPRESSION LAYER | 140 14496-2 Visual |streams from as low as Kbps to multi-Mbps| 141 14496-3 Audio +-----------------------------------------+ Elementary 142 Stream 143 ================================================================Interface 144 (ESI) 145 +-------------------------------------------+ 146 media and | SYNC LAYER | 147 delivery unaware | manages elementary streams, their synch- | 148 14496-1 Systems | ronization and hierarchical relations | 149 +-------------------------------------------+ DMIF 150 Application 151 ================================================================Interface 152 (DAI) 153 +-------------------------------------------+ 154 delivery aware | DELIVERY LAYER | 155 media unaware |provides transparent access to and delivery| 156 14496-6 DMIF | of content irrespective of delivery | 157 | technology | 158 +-------------------------------------------+ 160 Figure 1: General MPEG-4 terminal architecture 162 1.2 MPEG-4 Elementary Stream Data Packetization 164 The ESs from the encoders are fed into the SL with indications of AU 165 boundaries, random access points, desired composition time and the 166 current time. 168 The Sync Layer fragments the ESs into SL packets, each containing a 169 header which encodes information conveyed through the ESI. If the AU is 170 larger than an SL packet, subsequent packets containing remaining parts 171 of the AU are generated with subset headers until the complete AU is 172 packetized. 174 The syntax of the Sync Layer is not fixed and can be adapted to the 175 needs of the stream to be transported. This includes the possibility to 176 select the presence or absence of individual syntax elements as well as 177 configuration of their length in bits. The configuration for each 178 individual stream is conveyed in an SLConfigDescriptor, which is an 179 integral part of the ES Descriptor for this stream. 181 2. Analysis of the alternatives for carrying MPEG-4 over IP 183 2.1 MPEG-4 over UDP 185 Considering that the MPEG-4 SL defines several transport related 186 functions such as timing, sequence numbering, etc., this seems to be 187 the most straightforward alternative for carrying MPEG-4 data over IP. 188 One group of problems with this approach, however, stems from the 189 monolithic architecture of MPEG-4. No other multimedia data stream 190 (including those carried with RTP) can be synchronized with MPEG-4 data 191 carried directly over UDP. Furthermore, the dynamic scene and session 192 control concepts can't be extended to non-MPEG-4 data. 194 Even if the coordination with non-MPEG-4 data is overlooked, carrying 195 MPEG-4 data over UDP has the following additional shortcomings: 197 i. Mechanisms need to be defined to protect sensitive parts of 198 MPEG-4 data. Some of these (like FEC) are already defined for 199 RTP. 201 ii. There is no defined technique for synchronizing MPEG-4 202 streams from different servers in the variable delay environment 203 of the Internet. 205 iii. MPEG-4 streams originating from two servers may collide (their 206 sources may become unresolvable at the destination) in a multicast 207 session. 209 iv. An MPEG-4 backchannel needs to be defined for quality 210 feedback similar to that provided by RTCP. 212 v. RTP mixers and translators can't be used. 214 The backchannel problem may be alleviated by developing a reception 215 reporting protocol like RTCP. Such an effort may benefit from RTCP 216 design knowledge, but needs extensions. 218 2.2 RTP header followed by full MPEG-4 headers 220 This alternative may be implemented by using the send time or the 221 composition time coming from the reference clock as the RTP timestamp. 222 This way no new feedback protocol needs to be defined for MPEG-4's 223 backchannel, but RTCP may not be sufficient for MPEG-4's feedback 224 requirements which are still in the definition stage. Additionally, due 225 to the duplication of header information, such as the sequence numbers 226 and time stamps, this alternative causes unnecessary increases in the 227 overhead. Scene description or dynamic session control can't be extended 228 to non-MPEG-4 streams also. 230 2.3 MPEG-4 ESs over RTP with individual payload types 232 This is the most suitable alternative for coordination with the existing 233 Internet multimedia transport techniques and does not use MPEG-4 systems 234 at all. Complete implementation of it requires definition of potentially 235 many payload types, as already proposed for audio and video payloads 236 [7], and might lead to constructing new session and scene description 237 mechanisms. Considering the size of the work involved which essentially 238 reconstructs MPEG-4 systems, this may only be a long term alternative if 239 no other solution can be found. 241 2.4 RTP header followed by a reduced SL header 243 The inefficiency of the approach described in 2.2 can be fixed by using 244 a reduced SL header that does not carry duplicate information following 245 the RTP header. 247 2.5 Recommendation 249 Based on the above analysis, the best compromise is to map the MPEG-4 SL 250 packets onto RTP packets, such that the common pieces of the headers 251 reside in the RTP header that is followed by an optional reduced SL 252 header providing the MPEG-4 specific information. The details of this 253 payload format are described in the next section. 255 3. Payload Format 257 The RTP Payload consists of a single SL packet, including an SL packet 258 header without the sequenceNumber and compositionTimeStamp fields. Use 259 of all other fields in the SL packet headers that the RTP header does 260 not duplicate (including the decodingTimeStamp) is OPTIONAL. Packets 261 SHOULD be sent in the decoding order. 263 If the resulting, smaller, SL packet header consumes a non-integer 264 number of bytes, zero padding bits MUST be inserted at the end of the SL 265 header to byte-align the SL packet payload. 267 The size of the SL packets SHOULD be adjusted such that the resulting 268 RTP packet is not larger than the path-MTU. To handle larger packets, 269 this payload format relies on lower layers for fragmentation which may 270 not be desirable. 272 0 1 2 3 273 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 275 |V=2|P|X| CC |M| PT | sequence number | RTP 276 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 277 | timestamp | Header 278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 279 | synchronization source (SSRC) identifier | 280 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 281 : contributing source (CSRC) identifiers : 282 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 283 |SL Packet Header (variable # of bytes) | | 284 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP 285 | | 286 | SL Packet Payload (byte aligned) | Payload 287 | | 288 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 289 | :...OPTIONAL RTP padding | 290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 291 Figure 2 - An RTP packet for MPEG-4 293 3.1 RTP Header Fields Usage: 295 Payload Type (PT): The assignment of an RTP payload type for this new 296 packet format is outside the scope of this document, and will not be 297 specified here. It is expected that the RTP profile for a particular 298 class of applications will assign a payload type for this encoding, or 299 if that is not done then a payload type in the dynamic range shall be 300 chosen. 302 Marker (M) bit: Set to one to mark the last fragment (or only fragment) 303 of an AU. 305 Extension (X) bit: Defined by the RTP profile used. 307 Sequence Number: Derived from the sequenceNumber field of the SL packet 308 by adding a constant random offset. If the sequenceNumber has less than 309 16-bit length, the MSBs MUST initially be filled with a random value 310 that is incremented by one each time the sequenceNumber value of the SL 311 packet returns to zero. If the value sequenceNumber=0 is encountered in 312 multiple consecutive SL packets, indicating a deliberate duplication of 313 the SL packet, the sequence number SHOULD be incremented by one for each 314 of these packets after the first one. 316 In implementations where full SL packets are generated first and then 317 packetised in RTP, the sequenceNumber MUST be removed from the SL packet 318 header by bit-shifting the subsequent header elements towards the 319 beginning of the header. When unpacking the RTP packet this process can 320 be reversed with the knowledge of the SLConfigDescriptor. For using this 321 payload format, MPEG-4 implementations that do not produce the full SL 322 packet in the first place, but rather produce the RTP header and 323 stripped down (perhaps null) SL header directly are preferable. 325 However, the choice between generating SL packets and converting, or 326 generating RTP directly is an implementation detail, and does not affect 327 what goes on the wire. Both forms will interwork. 329 If no sequenceNumber field is configured for this stream (no 330 sequenceNumber field present in the SL packet header), then the RTP 331 packetizer MUST generate its own sequence numbers. 333 Timestamp: Set to the value in the compositionTimeStamp field of the SL 334 packet, if present. If compositionTimeStamp has less than 32 bits 335 length, the MSBs of timestamp MUST be set to zero. 337 Although it is available from the SL configuration data, the resolution 338 of the timestamp may need to be conveyed explicitly through some out- 339 of-band means to be used by network elements which are not MPEG-4 aware. 341 If compositionTimeStamp has more than 32 bits length, this payload 342 format cannot be used. 344 In case compositionTimeStamp is not present in the current SL packet, 345 but has been present in a previous SL packet, this same value MUST be 346 taken again as the compositionTimeStamp of the current SL packet. 348 If compositionTimeStamp is never present in SL packets for this stream, 349 the RTP packetizer SHOULD convey a reading of a local clock at the time 350 the RTP packet is created. 352 Similar to handling of the sequence numbers in implementations that 353 generate full SL packets, the compositionTimeStamp, if present, MUST 354 then be removed from the SL packet header by bit-shifting the subsequent 355 header elements towards the beginning of the SL packet header. When 356 unpacking the RTP packet this process can be reversed with the knowledge 357 of the SLConfigDescriptor and by evaluating the 358 compositionTimeStampFlag. 360 Timestamps are recommended to start at a random value for security 361 reasons [5, Section 5.1]. 363 SSRC: set as described in RFC1889 [5]. A mapping between the ES 364 identifiers (ESIDs) and SSRCs should be provided through out-of-band 365 means. 367 CC and CSRC fields are used as described in RFC 1889 [5]. 369 RTCP SHOULD be used as defined in RFC 1889 [5]. 371 RTP timestamps in RTCP SR packets: according to the RTP timing model, 372 the RTP timestamp that is carried into an RTCP SR packet is the same as 373 the CTS that would be applied to an RTP packet for data that was sampled 374 at the instant the SR packet is being generated and sent. The RTP 375 timestamp value is calculated from the NTP timestamp for the current 376 time which also goes in the RTCP SR packet. To perform that calculation, 377 an implementation needs to periodically establish a correspondence 378 between the CTS value of a data packet and the NTP time at which that 379 data was sampled. 381 4. Multiplexing 383 Since a typical MPEG-4 session may involve a large number of objects, 384 that may be as many as a few hundred, transporting each ES as an 385 individual RTP session may not always be practical. Allocating and 386 controlling hundreds of destination addresses for each MPEG-4 session 387 may pose insurmountable session administration problems. The 388 input/output processing overhead at the end-points will be extremely 389 high also. Additionally, low delay transmission of low bitrate data 390 streams, e.g. facial animation parameters, results in extremely high 391 header overheads. 393 To solve these problems, MPEG-4 data transport requires a multiplexing 394 scheme that allows selective bundling of several ESs. This is beyond the 395 scope of the payload format defined here. MPEG-4's Flexmux multiplexing 396 scheme may be used for this purpose by defining an additional RTP 397 payload format for "multiplexed MPEG-4 streams." On the other hand, 398 considering that many other payload types may have similar needs, a 399 better approach may be to develop a generic RTP multiplexing scheme 400 usable for MPEG-4 data. The multiplexing scheme reported in [8] may be a 401 candidate for this approach. 403 For MPEG-4 applications, the multiplexing technique needs to address the 404 following requirements: 406 i. The ESs multiplexed in one stream can change frequently during 407 a session. Consequently, the coding type, individual packet size 408 and temporal relationships between the multiplexed data units must 409 be handled dynamically. 411 ii. The multiplexing scheme should have a mechanism to determine 412 the ES identifier (ES_ID) for each of the multiplexed packets. 413 ES_ID is not a part of the SL header. 415 iii. In general, an SL packet does not contain information about its 416 size. The multiplexing scheme should be able to delineate the 417 multiplexed packets whose lengths may vary from a few bytes to 418 close to the path-MTU. 420 5. Security Considerations 422 RTP packets using the payload format defined in this specification are 423 subject to the security considerations discussed in the RTP 424 specification [5]. This implies that confidentiality of the media 425 streams is achieved by encryption. Because the data compression used 426 with this payload format is applied end-to-end, encryption may be 427 performed on the compressed data so there is no conflict between the two 428 operations. 430 This payload type does not exhibit any significant non-uniformity in the 431 receiver side computational complexity for packet processing to cause a 432 potential denial-of-service threat. 434 6. References 436 [1] ISO/IEC 14496-1 FDIS MPEG-4 Systems November 1998 438 [2] ISO/IEC 14496-2 FDIS MPEG-4 Visual November 1998 440 [3] ISO/IEC 14496-3 FDIS MPEG-4 Audio November 1998 442 [4] ISO/IEC 14496-6 FDIS Delivery Multimedia Integration 443 Framework, November 1998. 445 [5] Schulzrinne, Casner, Frederick, Jacobson RTP: A 446 Transport Protocol for Real Time Applications RFC 1889, 447 Internet Engineering Task Force, January 1996. 449 [6] S. Bradner, Key words for use in RFCs to Indicate 450 Requirement Levels, RFC 2119, March 1997. 452 [7] Y. Kikuchi, T. Nomura, S. Fukunaga, Y. Matsui, 453 H. Kimata, RTP payload format for MPEG-4 Audio/Visual 454 streams, work in progress, 455 draft-ietf-avt-rtp-mpeg4-es-02.txt, July 2000. 457 [8] B. Thompson, T. Koren, D. Wing, Tunneling multiplexed 458 Compressed RTP ("TCRTP"), work in progress, 459 draft-ietf-avt-tcrtp-00.txt, March 2000. 461 7. Authors' Addresses 463 M. Reha Civanlar 464 AT&T Labs - Research 465 100 Schultz Drive 466 Red Bank, NJ 07701 467 USA 468 e-mail: civanlar@research.att.com 470 Andrea Basso 471 AT&T Labs - Research 472 100 Schultz Drive 473 Red Bank, NJ 07701 474 USA 475 e-mail: basso@research.att.com 477 Stephen L. Casner 478 Packet Design, Inc. 479 66 Willow Place 480 Menlo Park, CA 94025 481 USA 482 casner@acm.org 484 Carsten Herpel 485 THOMSON multimedia 486 Karl-Wiechert-Allee 74 487 30625 Hannover 488 Germany 489 e-mail: herpelc@thmulti.com 491 Colin Perkins 492 USC Information Sciences Institute 493 4350 N. Fairfax Drive #620 494 Arlington, VA 22203 495 USA 496 e-mail: csp@isi.edu