idnits 2.17.1 draft-singer-rtp-qtfile-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 658 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 11 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 13, 1998) is 9539 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '5' is mentioned on line 391, but not defined ** Obsolete normative reference: RFC 1889 (ref. '1') (Obsoleted by RFC 3550) -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Normative reference to a draft: ref. '3' -- Possible downref: Normative reference to a draft: ref. '4' Summary: 11 errors (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Audio-Video Transport WG 3 INTERNET-DRAFT J. Geagan, K. Gong, A. Periyannan, D. Singer 4 draft-singer-rtp-qtfile-00 Apple Computer, Inc. 5 March 13, 1998 6 Expires: September 13, 1998 8 Support for RTP in a stored QuickTime Movie File 10 Status of This Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet- Drafts as reference 20 material or to cite them other than as ``work in progress.'' 22 To learn the current status of any Internet-Draft, please check the 23 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 24 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 25 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 26 ftp.isi.edu (US West Coast). 28 Distribution of this document is unlimited. 30 Abstract 32 This document proposes structures within a QuickTime movie file which 33 permit easy transmission of the media content over RTP. This 34 specification is intended to assist those who wish to stream stored 35 movies over RTP, those wishing to prepare movies for streaming, and 36 for those who might wish to record into QuickTime while preserving 37 RTP information. The bit-stream(s) of RTP packets are normally 38 compliant with the RTP payload definitions for their content, and 39 full inter-operability can be achieved. Each QuickTime media track 40 within a movie is sent over a separate RTP session and synchronized 41 using standard RTP techniques. This specification builds on the 42 published QuickTime file format specification. 44 1 Introduction 46 This document outlines how a set of sessions using the Realtime 47 Transport Protocol (RTP) [1] may be transmitted by a server program 49 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 1]^L 50 by reading a QuickTime movie. RTP is a generic protocol designed to 51 carry realtime media data along with synchronization information over 52 a datagram protocol (mostly UDP over IP). 54 QuickTime files form the storage basis of the QuickTime media 55 architecture; however, it is not necessary to use the QuickTime 56 software to read, construct, or stream RTP from the files. The file 57 format, without support for streaming or RTP, is fully described in 58 the published specification [2]. 60 The file format is capable of referring to media data in other files; 61 this enables re-use of content. These other files need not be 62 structured as QuickTime movies, and a number of 'foreign' formats can 63 thus be streamed over RTP under this specification, provided that 64 they can also be described by the QuickTime movie (i.e. described by 65 the movie meta-data), and that the streaming server is willing and 66 able to follow the links to these other files. 68 2 QuickTime File Format Overview 70 This section gives a brief overview of the file format. Readers 71 wanting a detailed description are encouraged to refer to the 72 published specification [2]. 74 A fundamental underlying concept in the QuickTime file format is that 75 the physical structure of the media data (the mapping of the media 76 onto physical storage records) is independent of the logical 77 structure of the media file. A QuickTime media composition is 78 described by a set of "movie" meta-data; this meta-data provides 79 declarative, structural/compositional, and temporal information about 80 the actual media data. 82 The media data may be in the same file as the descriptive logical 83 data (i.e., with the "movie" meta-data) or in separate files. A movie 84 structured into one file is commonly called "flat" or "self- 85 contained". Movies which are not self-contained may reference some or 86 all of their media data in other files. 88 This separation between logical organization and physical 89 organization makes the QuickTime file format ideally suited to 90 optimization in different ways for different scenarios. When editing 91 and compositing, this means that media data need not be copied or 92 re-coded as edits are applied and media is re-ordered; the meta-data 93 file may be extended and temporal mapping information adjusted. When 94 editing is completed, the relevant media data and meta-data may be 95 rewritten into a single, interleaved, optimized file for efficient 96 local or network access. However, both the structured and the 97 optimized files are valid QuickTime files, and both may be inspected, 99 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 2]^L 100 played, streamed, and reworked. 102 The use of movies which are not self-contained enables the same basic 103 media data to be used and re-used in any number of presentations. 104 This same advantage applies when serving, as will be seen below. 106 In both editing and streaming, this also permits any number of other 107 files to be treated as part of a presentation without copying the 108 media data which they contain. Editing can change and re-write just 109 the meta-data in the movie file, which is much quicker than reading 110 and re-writing all the media data.. 112 The QuickTime file is divided into a set of objects, called atoms. 113 Each object starts with an atom header, which declares its size and 114 type: 116 class Atom { 117 int(32) size; 118 char type[4]; 119 int(8) contents[]; 120 } 122 The size is a 32-bit integer, in bytes, including the size and type 123 header fields. There is also provision for 64-bit size fields. The 124 type field is four characters (usually printable), to permit easy 125 documentation and identification. The data in an object after the 126 type field may be fields, a sequence of contained objects, or both. 127 All field data are stored in big-endian format. 129 A QuickTime file consists of a sequence of objects. The two highest- 130 level objects are the media-data (mdat) and the meta-data (moov) 131 atoms. 133 The media-data object(s) contain the actual media (for example, 134 sequences of sound samples or video frames). Their format is not 135 constrained by the file format; they are not usually objects. Their 136 format is described in the meta-data, not by any declarations 137 physically contiguous with them. So, for example, in a movie 138 consisting solely of motion-JPEG, JPEG frames are stored contiguously 139 in the media data with no required intervening extra headers. The 140 media data within the media data objects is logically divided into 141 chunks; however, there are no explicit chunk markers. 143 When the QuickTime file references media data in other files, it is 144 not required that these 'secondary' files be formatted to this 145 specification, since these media data files are formatted as if they 147 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 3]^L 148 were the contents of a media object. Since the format here does not 149 require any headers or other information physically contiguous with 150 the media data, it is possible for the media data to be files which 151 contain 'foreign' headers (e.g. UNIX ".au" files, or AVI files) and 152 for the QuickTime meta-data to contain the appropriate declarative 153 information and reference the media data in the 'foreign' file. In 154 this way the file format can be used to update, without copying, 155 existing bodies of material in disparate formats. Thus editing and 156 serving may be done directly from these files, greatly extending 157 their utility. The QuickTime file format is a true unifying concept; 158 it is both an established format and is able to work with, include, 159 and thereby bring forward, other established formats. (The full range 160 of supported file types is large; consult the QuickTime web site 161 for more information.). 163 Free space (e.g. deleted by an editing operation) can also be 164 described by an object at this level. Any software reading the file 165 should ignore free space objects, and objects at any level which it 166 does not understand; this permits extension of the file at any level 167 by introducing new objects. The primary meta-data is the movie 168 object. A QuickTime file normally has exactly one movie object; it is 169 typically at the beginning or end of the file, to permit its easy 170 location (although this is not required). 172 The movie header provides basic information about the overall 173 presentation (its creation date, overall timescale, and so on). In 174 the sequence of contained objects there would normally be at least 175 one track, which describes temporally presented data. A track is a 176 media stream. 178 The track header provides basic information about the track (its ID, 179 timescale, and so on). Information at the track level is independent 180 of the media type contained in the track. Objects contained in the 181 track might be references to other tracks (e.g. for complex 182 compositing), or edit lists. In this sequence of contained objects 183 there would normally be a media object, which describes the media 184 which is presented when the track is played. 186 The media object contains declarations of the exact presentation 187 required by the track (e.g. that it is sampled audio, or MIDI, or 188 orientation information for a 3D Scene). The type of track is 189 declared by its handler. 191 Within the media information there is likewise a handler declaration 192 for the data handler (which fetches media data), and a data 193 information declaration. This defines which files contain the media 194 data for this track; it is by using this declaration that movies may 195 be built which span several files. At the lowest level, a sample 197 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 4]^L 198 table is used which relates the temporal aspect of the track to the 199 data stored in the file: 201 class sampletable { 202 int(32) size; 203 char type[4] = 'stbl'; 204 sampledescription sd; 205 timetosample tts; 206 syncsampletable syncs; 207 sampletochunk stoc; 208 samplesize ssize; 209 chunkoffset coffset; 210 } 212 The sample description contains information about the media (e.g. the 213 compression formats used in video). The time-to-sample table relates 214 time in the track, to the sample (by index) which should be displayed 215 at that time. The sync sample table declares which of these are sync 216 (key) samples, not dependent on other samples. 218 The sample-to-chunk object declares how to find the media data for a 219 given sample, and its description given its index. 221 The sample size table gives the size of each sample; and the chunk 222 offset table gives the offset into the containing file of the start 223 of each chunk. The chunk offset table can contain 32-bit or 64-bit 224 file offsets for chunks, permitting the use of very large files. 226 Walking this structure to find the appropriate data to display for a 227 given time is straightforward, mostly involving indexing and adding. 228 Using the sync table it is also possible then to back-up to the 229 preceding sync sample, and roll forward 'silently' accumulating 230 deltas to the desired starting point. Note that these tables which 231 give sample timing, size, and position information, are constructed 232 in such a way that they are naturally compact. 234 3 Support for streaming protocols 236 The QuickTime file format supports streaming of media data over a 237 network as well as local playback. The process of sending protocol 238 data units is time-based, just like the display of time-based data, 239 and is therefore suitably described by a time-based format. A 240 QuickTime file or 'movie' which supports streaming includes 241 information about the data units to stream. This information is 243 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 5]^L 244 included in additional tracks of the movie called "hint" tracks. 246 Hint tracks contain instructions for a streaming server which assist 247 in the formation of packets. These instructions may contain 248 immediate data for the server to send (e.g. header information) or 249 reference segments of the media data. These instructions are encoded 250 in the QuickTime file in the same way that editing or presentation 251 information is encoded in a QuickTime file for local playback. 252 Instead of editing or presentation information, information is 253 provided which allows a server to packetize the media data in a 254 manner suitable for streaming using a specific network transport. 256 The same media data is used in a QuickTime file which contains hints, 257 whether it is for local playback, or streaming over a number of 258 different transport types. Separate 'hint' tracks for different 259 transport types may be included within the same file and the media 260 will play over all such transport types without making any additional 261 copies of the media itself. In addition, existing media can be 262 easily made streamable by the addition of appropriate hint tracks for 263 specific transports. The media data itself need not be recast or 264 reformatted in any way. 266 This approach to streaming is more space efficient than an approach 267 that requires that the media information be partitioned into the 268 actual data units which will be transmitted for a given transport and 269 media format. Under such an approach, local playback requires either 270 re-assembling the media from the packets, or having two copies of the 271 media-one for local playback and one for streaming. Similarly, 272 streaming such media over multiple transports using this approach 273 requires multiple copies of the media data for each transport. This 274 is much less space efficient than hint tracks, unless the media data 275 must be heavily transformed to be streamed (e.g., by the application 276 of error-correcting coding techniques, or by encryption). 278 Support for streaming in the QuickTime file format is based upon the 279 following three design parameters: 281 (1) The media data is represented as a set of network-independent 282 standard QuickTime tracks, which may be played, edited, and so on, as 283 normal; 285 (2) There is a common declaration and base structure for server hint 286 tracks; this common format is protocol independent, but contains the 287 declarations of which protocol(s) are described in the server 288 track(s); 290 (3) There is a specific design of the server hint tracks for each 291 protocol which may be transmitted; all these designs use the same 293 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 6]^L 294 basic structure. For example, there may be designs for RTP (for the 295 Internet) and MPEG-2 transport (for broadcast), or for new standard 296 or vendor-specific protocols. 298 The resulting streams, sent by the servers under the direction of the 299 hint tracks, need contain no trace of QuickTime information. This 300 design does not require that QuickTime, or its structures or 301 declaration style, be used either in the data on the wire or in the 302 decoding station. For example, a QuickTime file using H.261 video and 303 DVI audio, streamed under RTP, results in a packet stream which is 304 fully compliant with the IETF specifications for packing those 305 codings into RTP. 307 The hint tracks are built and flagged so that when the presentation 308 is viewed directly (not streamed), they are ignored. 310 3.1 RTP Hint Tracks 312 This section presents an example track format for streaming RTP from 313 a QuickTime movie. For brevity, only the essential aspects are 314 presented here, for brevity. 316 In RTP, each media stream is sent as a separate RTP stream; 317 multiplexing is achieved by using IP's port-level multiplexing, not 318 by interleaving the data from multiple streams into a single RTP 319 session. Therefore each media track in the movie has an associated 320 RTP hint track. Each hint track contains a track reference back to 321 the media track which it is streaming. 323 This design decides the packet size at the time the hint track is 324 created; therefore, in the sample description for the hint track (a 325 data structure which can contain fields specific to the 'coding' - 326 which in this case is a protocol), we indicate the chosen packet 327 size. Note that it is valid for there to be several RTP hint tracks 328 for each media track, with different packet size choices. Other 329 protocols can be parameterized in a similar way. Similarly the time- 330 scale for the RTP clock is provided in the sample description. 332 The hint track is related to its base media track by a single track 333 reference declaration. (The RTP specification does not permit 334 multiplexing of media within a single RTP stream.) The sample 335 description for RTP declares the maximum packet size which this hint 336 track will generate. Partial session description (SAP/SDP) 337 information is stored in the track. 339 Each sample in the RTP hint track contains the instructions to send 340 out a set of packets which must be emitted at a given time. The time 341 in the hint track is emission time, not necessarily the media time of 343 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 7]^L 344 the associated media. 346 Notice that we now describe the internal structure of samples, which 347 are media data, not meta data, in the terminology of this proposal. 348 These need not be structured as objects. Each sample contains two 349 areas: the instructions to compose the packets, and any extra data 350 needed when sending those packets (e.g. an encrypted version of the 351 media data). 353 struct RTPsample { 354 int(16) packetcount; 355 RTPpacket packets[packetcount]; 356 byte extradata[]; 357 } 359 Each RTP packet contains the information to send a single packet. In 360 order to separate media time from emission time, an RTP time stamp is 361 specifically included, along with data needed to form the RTP header. 362 Other header information is supplied; the algorithms for forming the 363 RTP header given the information here are simple. Then there is a 364 table of construction entries: 366 struct RTPpacket { 367 int(32) RTPtime; 368 int(16) partialRTPheader; 369 int(16) RTPsequenceseed; 370 int(16) entrycount; 371 dataentry constructors[entrycount]; 372 } 374 There are various forms of the constructor. Each constructor is 16 375 bytes, to make iteration easier. The first byte is a union 376 discriminator: 378 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 8]^L 379 struct dataentry { 380 int(8) entrytype; 381 switch entrytype { 382 case immediate: 383 int(8) bytecount; 384 int(8) bytestocopy[bytecount]; 385 case mediasample: 386 int(8) reserved[5]; 387 int(16) length; 388 int(32) mediasamplenumber; 389 int(32) mediasampleoffset; 390 case hintsample: 391 int(8) reserved[5]; 392 int(16) length; 393 int(32) hintsamplenumber; 394 int(32) hintsampleoffset; 395 } 396 } 398 The immediate mode permits the insertion of payload-specific headers 399 (e.g. the RTP H.261 header). For hint tracks where the media is sent 400 unchanged, the mediasample entry then specifies the bytes to copy 401 from the media track, by giving the sample number, data offset, and 402 length to copy. For complex cases (e.g. encryption or forward error 403 correction), the transformed data would be placed into the hint 404 samples, and then hintsample mode would be used. Note that this would 405 be from the extradata field in the RTPsample itself. 407 Note that these structures should be flexible enough to cover not 408 only the standard RTP payloads (H.261, MPEG, etc.) but also private 409 packings such as the QuickTime-in-RTP [3], or generic packing as is 410 now being proposed [4]. 412 Notice that there is no requirement that successive packets transmit 413 successive bytes from the media stream. For example, to conform with 414 RTP-standard packing of H.261, it is sometimes required that a byte 415 be sent at the end of one packet and also at the beginning of the 416 next (when a macroblock boundary falls within a byte). 418 4 Open Issues 420 The following open issues need to be resolved: 422 - What information is needed about the tracks (e.g. average and 424 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 9]^L 425 peak data rate)? 427 - For tracks which use re-transmission, how should packet re- 428 transmissions be marked, and how should they be treated when seeking? 430 Acknowledgments 432 The authors would like to thank a number of people, particularly 433 Peter Hoddie (Apple Computer), William Belknap (IBM Corporation), 434 Christopher Walton (Netscape), Dave Pawson (Oracle), Ronald Jacoby 435 (Silicon Graphics, Inc.), and Gerard Fernando and Michael Speer (Sun 436 Microsystems). 438 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 10]^L 439 References 441 [1] H. Schulzrinne, et. al., "RTP : A Transport Protocol for Real- 442 Time Applications", IETF RFC 1889, January 1996. 444 [2] Apple Computer, Inc., "QuickTime File Format Specification", May 445 1996. 446 . 448 [3] A. Jones, et. al., "RTP Payload Format for QuickTime Media 449 Streams", IETF Draft, draft-ietf-avt-qt-rtp-00.txt, July 22 1997, 450 Expires: January 22 1998. 452 [4] A. Periyannan, et. al., "Delivering Media Generically over RTP", 453 IETF Draft, draft-periyannan-generic-rtp-00.txt, March 13 1998, 454 Expires: September 13 1998. 456 Authors' Contact Information 457 Alagu Periyannan 458 Email: alagu@apple.com 459 Tel: (408) 862 5387 460 Fax: (408) 974 0234 462 Jay Geagan 463 Email: geagan@apple.com 464 Tel: (408) 862 6562 466 Kevin Gong 467 Email: kevin@apple.com 468 Tel: (408) 974 4175 470 David Singer 471 Email: singer@apple.com 472 Tel: (408) 974 3162 474 Apple Computer, Inc. 475 One Infinite Loop, MS:302-3MT 476 Cupertino CA 95014 477 USA 479 J. Geagan, K. Gong, A. Periyannan, D. Singer [Page 11]^L