idnits 2.17.1 draft-singer-rtp-qtfile-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 14 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 2 instances of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 358 has weird spacing: '...version is th...' == Line 359 has weird spacing: '...ketsize is...' == Line 361 has weird spacing: '...rtpdata is a ...' == Line 362 has weird spacing: '...mescale is an...' == Line 364 has weird spacing: '...eoffset are o...' == (12 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '4' is mentioned on line 567, but not defined == Missing Reference: '15' is mentioned on line 518, but not defined -- Looks like a reference, but probably isn't: '14-count' on line 527 == Missing Reference: '3' is mentioned on line 566, but not defined ** Obsolete normative reference: RFC 1889 (ref. '1') (Obsoleted by RFC 3550) -- Possible downref: Non-RFC (?) normative reference: ref. '2' Summary: 7 errors (**), 0 flaws (~~), 11 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Audio-Video Transport WG 2 INTERNET-DRAFT D. Singer 3 draft-singer-rtp-qtfile-01.txt Apple Computer, Inc. 4 October 22 1999 5 Expires : April 22 1999 7 Support for RTP in a stored QuickTime Movie File 9 Status of This Memo 11 This document is an Internet-Draft and is NOT offered in accordance 12 with Section 10 of RFC2026, and the author does not provide the IETF 13 with any rights other than to publish as an Internet-Draft. In 14 addition, a license may be required to implement some aspects of this 15 format. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract 35 This document documents structures within a QuickTime movie file 36 which permit easy transmission of the media content over RTP. This 37 specification is intended to assist those who wish to stream stored 38 movies over RTP, those wishing to prepare movies for streaming, and 39 for those who might wish to record into QuickTime while preserving 40 RTP information. The bit-stream(s) of RTP packets are normally 41 compliant with the RTP payload definitions for their content, and 42 full inter-operability can be achieved. Each QuickTime media track 43 within a movie is sent over a separate RTP session and synchronized 44 using standard RTP techniques. This specification builds on the 45 published QuickTime file format specification, and matches the hint 46 track format used by the Darwin open-source streaming server. 48 1 Introduction 50 This document outlines how a set of sessions using the Realtime 51 Transport Protocol (RTP) [1] may be transmitted by a server program 52 by reading a QuickTime movie. RTP is a generic protocol designed to 53 carry realtime media data along with synchronization information over 54 a datagram protocol (mostly UDP over IP). 56 QuickTime files form the storage basis of the QuickTime media 57 architecture; however, it is not necessary to use the QuickTime 58 software to read, construct, or stream RTP from the files. The file 59 format, without support for streaming or RTP, is fully described in 60 the published specification [2]. 62 The file format is capable of referring to media data in other files; 63 this enables re-use of content. These other files need not be 64 structured as QuickTime movies, and a number of 'foreign' formats can 65 thus be streamed over RTP under this specification, provided that 66 they can also be described by the QuickTime movie (i.e. described by 67 the movie meta-data), and that the streaming server is willing and 68 able to follow the links to these other files. 70 2 QuickTime File Format Overview 72 This section gives a brief overview of the file format. Readers 73 wanting a detailed description are encouraged to refer to the 74 published specification [2]. 76 A fundamental underlying concept in the QuickTime file format is that 77 the physical structure of the media data (the mapping of the media 78 onto physical storage records) is independent of the logical 79 structure of the media file. A QuickTime media composition is 80 described by a set of "movie" meta-data; this meta-data provides 81 declarative, structural/compositional, and temporal information about 82 the actual media data. 84 The media data may be in the same file as the descriptive logical 85 data (i.e., with the "movie" meta-data) or in separate files. A movie 86 structured into one file is commonly called "flat" or "self- 87 contained". Movies which are not self-contained may reference some or 88 all of their media data in other files. 90 This separation between logical organization and physical 91 organization makes the QuickTime file format ideally suited to 92 optimization in different ways for different scenarios. When editing 93 and compositing, this means that media data need not be copied or 94 re-coded as edits are applied and media is re-ordered; the meta-data 95 file may be extended and temporal mapping information adjusted. When 96 editing is completed, the relevant media data and meta-data may be 97 rewritten into a single, interleaved, optimized file for efficient 98 local or network access. However, both the structured and the 99 optimized files are valid QuickTime files, and both may be inspected, 100 played, streamed, and reworked. 102 The use of movies which are not self-contained enables the same basic 103 media data to be used and re-used in any number of presentations. 104 This same advantage applies when serving, as will be seen below. 106 In both editing and streaming, this also permits any number of other 107 files to be treated as part of a presentation without copying the 108 media data which they contain. Editing can change and re-write just 109 the meta-data in the movie file, which is much quicker than reading 110 and re-writing all the media data.. 112 The QuickTime file is divided into a set of objects, called atoms. 113 Each object starts with an atom header, which declares its size and 114 type: 116 class Atom { 117 int(32) size; 118 char type[4]; 119 int(8) contents[]; 120 } 122 The size is a 32-bit integer, in bytes, including the size and type 123 header fields. There is also provision for 64-bit size fields. The 124 type field is four characters (usually printable), to permit easy 125 documentation and identification. The data in an object after the 126 type field may be fields, a sequence of contained objects, or both. 127 All field data are stored in big-endian format. 129 A QuickTime file consists of a sequence of objects. The two highest- 130 level objects are the media-data (mdat) and the meta-data (moov) 131 atoms. 133 The media-data object(s) contain the actual media (for example, 134 sequences of sound samples or video frames). Their format is not 135 constrained by the file format; they are not usually objects. Their 136 format is described in the meta-data, not by any declarations 137 physically contiguous with them. So, for example, in a movie 138 consisting solely of motion-JPEG, JPEG frames are stored contiguously 139 in the media data with no required intervening extra headers. The 140 media data within the media data objects is logically divided into 141 chunks; however, there are no explicit chunk markers. 143 When the QuickTime file references media data in other files, it is 144 not required that these 'secondary' files be formatted to this 145 specification, since these media data files are formatted as if they 146 were the contents of a media object. Since the format here does not 147 require any headers or other information physically contiguous with 148 the media data, it is possible for the media data to be files which 149 contain 'foreign' headers (e.g. UNIX ".au" files, or AVI files) and 150 for the QuickTime meta-data to contain the appropriate declarative 151 information and reference the media data in the 'foreign' file. In 152 this way the file format can be used to update, without copying, 153 existing bodies of material in disparate formats. Thus editing and 154 serving may be done directly from these files, greatly extending 155 their utility. The QuickTime file format is a true unifying concept; 156 it is both an established format and is able to work with, include, 157 and thereby bring forward, other established formats. (The full range 158 of supported file types is large; consult the QuickTime web site 159 for more information.). 161 Free space (e.g. deleted by an editing operation) can also be 162 described by an object at this level. Any software reading the file 163 should ignore free space objects, and objects at any level which it 164 does not understand; this permits extension of the file at any level 165 by introducing new objects. The primary meta-data is the movie 166 object. A QuickTime file normally has exactly one movie object; it is 167 typically at the beginning or end of the file, to permit its easy 168 location (although this is not required). 170 The movie header provides basic information about the overall 171 presentation (its creation date, overall timescale, and so on). In 172 the sequence of contained objects there would normally be at least 173 one track, which describes temporally presented data. A track is a 174 media stream. 176 The track header provides basic information about the track (its ID, 177 timescale, and so on). Information at the track level is independent 178 of the media type contained in the track. Objects contained in the 179 track might be references to other tracks (e.g. for complex 180 compositing), or edit lists. In this sequence of contained objects 181 there would normally be a media object, which describes the media 182 which is presented when the track is played. 184 The media object contains declarations of the exact presentation 185 required by the track (e.g. that it is sampled audio, or MIDI, or 186 orientation information for a 3D Scene). The type of track is 187 declared by its handler. 189 Within the media information there is likewise a handler declaration 190 for the data handler (which fetches media data), and a data 191 information declaration. This defines which files contain the media 192 data for this track; it is by using this declaration that movies may 193 be built which span several files. At the lowest level, a sample 194 table is used which relates the temporal aspect of the track to the 195 data stored in the file: 197 class sampletable { 198 int(32) size; 199 char type[4] = 'stbl'; 200 sampledescription sd; 201 timetosample tts; 202 syncsampletable syncs; 203 sampletochunk stoc; 204 samplesize ssize; 205 chunkoffset coffset; 206 } 208 The sample description contains information about the media (e.g. the 209 compression formats used in video). The time-to-sample table relates 210 time in the track, to the sample (by index) which should be displayed 211 at that time. The sync sample table declares which of these are sync 212 (key) samples, not dependent on other samples. 214 The sample-to-chunk object declares how to find the media data for a 215 given sample, and its description given its index. 217 The sample size table gives the size of each sample; and the chunk 218 offset table gives the offset into the containing file of the start 219 of each chunk. The chunk offset table can contain 32-bit or 64-bit 220 file offsets for chunks, permitting the use of very large files. 222 Walking this structure to find the appropriate data to display for a 223 given time is straightforward, mostly involving indexing and adding. 224 Using the sync table it is also possible then to back-up to the 225 preceding sync sample, and roll forward 'silently' accumulating 226 deltas to the desired starting point. Note that these tables which 227 give sample timing, size, and position information, are constructed 228 in such a way that they are naturally compact. 230 3 Support for streaming protocols 231 The QuickTime file format supports streaming of media data over a 232 network as well as local playback. The process of sending protocol 233 data units is time-based, just like the display of time-based data, 234 and is therefore suitably described by a time-based format. A 235 QuickTime file or 'movie' which supports streaming includes 236 information about the data units to stream. This information is 237 included in additional tracks of the movie called "hint" tracks. 239 Hint tracks contain instructions for a streaming server which assist 240 in the formation of packets. These instructions may contain 241 immediate data for the server to send (e.g. header information) or 242 reference segments of the media data. These instructions are encoded 243 in the QuickTime file in the same way that editing or presentation 244 information is encoded in a QuickTime file for local playback. 245 Instead of editing or presentation information, information is 246 provided which allows a server to packetize the media data in a 247 manner suitable for streaming using a specific network transport. 249 The same media data is used in a QuickTime file which contains hints, 250 whether it is for local playback, or streaming over a number of 251 different transport types. Separate 'hint' tracks for different 252 transport types may be included within the same file and the media 253 will play over all such transport types without making any additional 254 copies of the media itself. In addition, existing media can be 255 easily made streamable by the addition of appropriate hint tracks for 256 specific transports. The media data itself need not be recast or 257 reformatted in any way. 259 This approach to streaming is more space efficient than an approach 260 that requires that the media information be partitioned into the 261 actual data units which will be transmitted for a given transport and 262 media format. Under such an approach, local playback requires either 263 re-assembling the media from the packets, or having two copies of the 264 media-one for local playback and one for streaming. Similarly, 265 streaming such media over multiple transports using this approach 266 requires multiple copies of the media data for each transport. This 267 is much less space efficient than hint tracks, unless the media data 268 must be heavily transformed to be streamed (e.g., by the application 269 of error-correcting coding techniques, or by encryption). 271 Support for streaming in the QuickTime file format is based upon the 272 following three design parameters: 274 (1) The media data is represented as a set of network-independent 275 standard QuickTime tracks, which may be played, edited, and so on, as 276 normal; 278 (2) There is a common declaration and base structure for server hint 279 tracks; this common format is protocol independent, but contains the 280 declarations of which protocol(s) are described in the server 281 track(s); 283 (3) There is a specific design of the server hint tracks for each 284 protocol which may be transmitted; all these designs use the same 285 basic structure. For example, there may be designs for RTP (for the 286 Internet) and MPEG-2 transport (for broadcast), or for new standard 287 or vendor-specific protocols. 289 The resulting streams, sent by the servers under the direction of the 290 hint tracks, need contain no trace of QuickTime information. This 291 design does not require that QuickTime, or its structures or 292 declaration style, be used either in the data on the wire or in the 293 decoding station. For example, a QuickTime file using H.261 video and 294 DVI audio, streamed under RTP, results in a packet stream which is 295 fully compliant with the IETF specifications for packing those 296 codings into RTP. 298 The hint tracks are built and flagged so that when the presentation 299 is viewed directly (not streamed), they are ignored. 301 3.1 RTP Hint Tracks 303 The RTP specification recommends sending each media stream as a 304 separate RTP stream; multiplexing is achieved by using IP's port- 305 level multiplexing, not by interleaving the data from multiple 306 streams into a single RTP session. However, MPEG specifications do 307 define methods to multiplex several media tracks into one RTP track, 308 and this may be necessary in some applications. Each hint track is 309 therefore tied, not to one, but a set of media tracks by track 310 references. The set of references form a table, which is indexed by 311 the samples (see below) when selecting data from the media tracks. 312 This makes either multiplexing scheme possible. 314 This design decides the packet size at the time the hint track is 315 created; therefore, in the sample description for the hint track (a 316 data structure which can contain fields specific to the 'coding' - 317 which in this case is a protocol), we indicate the chosen packet 318 size. Note that it is valid for there to be several RTP hint tracks 319 for each media track, with different packet size choices. Other 320 protocols can be parameterized in a similar way. Similarly the time- 321 scale for the RTP clock is provided in the sample description. 323 3.1.1 Sample Description Format 325 In the file format, each track has a description of its contents; for 326 hint tracks, this description defines and parameterizes the protocol. 328 RTP hint tracks are hint tracks (media handler 'hint'), with an 329 entry-format in the sample description of 'rtp ' 331 aligned(8) class RtpSampleEntry extends SampleEntry('rtp ') { 332 unsigned int(32) timescale; 333 unsigned int(16) rtphinttrackversion = 1; 334 unsigned int(16) rtplastcompatibleversion = 1; 335 unsigned int(32) maxpacketsize; 336 rtptags[] rtpdata; 337 } 339 aligned(8) class rtptag(tagtype) { 340 unsigned int(32) size; 341 unsigned int(32) type = tagtype; 342 } 344 aligned8) class timescaletag extends rtptag('tims') { 345 unsigned int(32) timescale; 346 } 348 aligned8) class timestampoffsettag extends rtptag('tsro') { 349 unsigned int(32) timeoffset; 350 } 352 aligned8) class sequenceoffsettag extends rtptag('snro') { 353 unsigned int(32) sequenceoffset; 354 } 356 The semantics of these fields are as follows: rtphinttrackversion 357 is the version of this hint track; this document is version 1 358 rtplastcompatibleversion is the version of the oldest compatible 359 reader that should be able to read this hint track maxpacketsize is 360 the size, in bytes, of the largest packet this track will form 361 rtpdata is a series of rtptags, to fill the rest of the atom, 362 selected from the subclasses of rtptag timescale is an obligatory 363 tag; it is the rtptimescale that was used to form this hint track 364 timeoffset and sequenceoffset are optional; they indicate that the 365 server should use these fixed offsets for these fields in the RTP 366 packets, instead of truly random numbers 368 3.1.2 Declarative and Session Description data 370 To aid servers which use the SDP format, the hint tracks contain base 371 data which can be used in assembling a complete SDP description. 372 This data is stored in hint-information ('hnti') atoms within user- 373 data ('udta') atoms in the movie atom, or in each track. In the 374 movie, the hnti atom has a sub-atom of type 'rtp ' and starts with 375 'sdp ' (note the spaces). Within RTP hint tracks, the sub-atom has 376 the type 'sdp ' (again, note the space). The contents in either case 377 is ASCII text, suitable for forming into complete SDP descriptions. 378 The server will need to generate a number of the lines of the SDP; 379 the data supplied here is only partial, limited to that known at 380 hinting time. There is also an optional user-data atom giving 381 overall information about the hint track. 383 aligned(8) class hintinformation extends Atom('hinf') { 384 infotags[] infodata; 385 } 387 aligned(8) class infotag(tagtype) { 388 unsigned int(32) size; 389 unsigned int(32) type = tagtype; 390 } 392 The following information tags and values are defined. They are all 393 optional, and unrecognized tags should be ignored. 394 tag value field type value 395 trpy unsigned int(64) total bytes that will be sent, 396 including RTP headers, but not 397 other headers outside that (e.g 398 UDP, IP or link layer headers) 399 nump unsigned int(64) total number of packets sent 400 tpyl unsigned int(64) total bytes that will be sent, 401 not including RTP headers 402 maxr unsigned int(32)[2] maximum data rate. two values, 403 granularity (in milliseconds), 404 and m, the maximum data 405 transmitted in any interval of 406 that duration. There may be 407 multiple maxr tags. 408 dmed unsigned int(64) total bytes copied by reference 409 from media tracks 410 dimm unsigned int(64) total bytes sent as immediate 411 data from the hint track 412 drep unsigned int(64) total bytes of repeated data 413 that will be sent 414 tmin unsigned int(32) smallest relative transmission 415 time, in milliseconds 416 tmax unsigned int(32) largest relative transmission 417 time, in milliseconds 418 pmax unsigned int(32) largest packet sent, including 419 RTP header 420 dmax unsigned int(32) largest packet duration, in 421 milliseconds 423 payt unsigned int(32), string the payload type, followed by a 424 counted string of the rtpmap 425 information 427 3.1.3 RTP Sample Format 429 Each sample in the RTP hint track contains the instructions to send 430 out a set of packets which must be transmitted at a given time. The 431 time in the hint track is transmission time, not necessarily the 432 media time of the associated media. 434 Notice that we now describe the internal structure of samples, which 435 are media data, not meta data, in the terminology of this proposal. 436 These need not be structured as objects. 438 Each sample contains two areas: the instructions to compose the 439 packets, and any extra data needed when sending those packets (e.g. 440 an encrypted version of the media data). 442 aligned(8) class RTPsample { 443 unsigned int(16) packetcount; 444 unsigned int(16) reserved; 445 RTPpacket packets[packetcount]; 446 byte extradata[]; 447 } 449 Each RTP packet contains the information to send a single packet. In 450 order to separate media time from transmission time, an RTP time 451 stamp is specifically included, along with data needed to form the 452 RTP header. Other header information is supplied; the algorithms for 453 forming the RTP header given the information here are simple. Then 454 there is a table of construction entries: 456 aligned(8) class RTPpacket { 457 signed int(32) relative-time; 458 // the next fields form initialization for the RTP 459 // header (16 bits), and the bit positions correspond 460 bit(2) reserved; 461 bit(1) P-bit; 462 bit(1) X-bit; 463 bit(4) reserved; 464 bit(1) M-bit; 465 bit(7) payload-type; 467 unsigned int(16) RTPsequenceseed; 468 unsigned int(13) flags; 469 unsigned int(1) x-flag; 470 unsigned int(1) b-flag; 471 unsigned int(1) r-flag; 472 unsigned int(16) entrycount; 473 dataentry constructors[entrycount]; 474 if (x-flag) { 475 unsigned int(32) extra-information-size; 476 TLV tlventries[]; 477 } 478 } 480 aligned(32) class TLV { 481 unsigned int(32) tlvsize; 482 unsigned int(32) tlvtype; 483 unsigned int(8) tlvdata; 484 } 486 The relative-time field is a signed value in the hint track's 487 timescale, adjusting the transmission time of the packet away from 488 the RTP sample time. This allows the hinter to smooth the data rate 489 of the transmitted packets. 491 The x-flag indicates that there is extra information after the 492 constructors, in the form of TLVentries. Only one such entry is 493 currently defined; tlvtype = 'rtpo' gives a 32-bit signed integer 494 offset to the actual RTP time-stamp to place in the packet. This 495 enables packets to be placed in the hint track in decoding order, but 496 have their presentation time-stamp in the transmitted packet be in a 497 different order. Note that all TLVentries are defined to be 32-bit 498 aligned, and therefore their length should be padded to a 4-byte 499 boundary; the only existing entry has a length of 4 bytes, so this 500 is not currently an issue. 502 The b-flag indicates a disposable 'b-frame'. The r-flag indicates a 503 'repeat packet', one that is sent as a duplicate of a previous 504 packet. Servers may wish to optimize handling of these packets. 506 There are various forms of the constructor. Each constructor is 16 507 bytes, to make iteration easier. The first byte is a union 508 discriminator: 510 aligned(8) class RTPconstructor(type) { 511 unsigned int(8) constructor-type = type; 513 } 515 aligned(8) class RTPnoopconstructor 516 extends RTPconstructor(0) 517 { 518 unsigned int(8) pad[15]; // 15 bytes 519 ignored 520 } 522 aligned(8) class RTPimmediateconstructor 523 extends RTPconstructor(1) 524 { 525 unsigned int(8) count; 526 unsigned int(8) data[count]; 527 unsigned int(8) pad[14-count]; 528 } 530 aligned(8) class RTPsampleconstructor 531 extends RTPconstructor(2) 532 { 533 unsigned int(8) trackrefindex; 534 unsigned int(16) length; 535 unsigned int(32) samplenumber; 536 unsigned int(32) sampleoffset; 537 unsigned int(16) bytesperblock = 1; 538 unsigned int(16) samplesperblock = 1; 539 } 541 aligned(8) class RTPsampledescriptionconstructor 542 extends RTPconstructor(3) 543 { 544 unsigned int(8) trackrefindex; 545 unsigned int(16) length; 546 unsigned int(32) sampledescriptionindex; 547 unsigned int(32) descriptionoffset; 548 } 550 The immediate mode permits the insertion of payload-specific headers 551 (e.g. the RTP H.261 header). For hint tracks where the media is sent 552 unchanged, the sample entry then specifies the bytes to copy from the 553 media track, by giving the sample number, data offset, and length to 554 copy. For complex cases (e.g. encryption or forward error 555 correction), the transformed data would be placed into the hint 556 samples, and then hintsample mode would be used. Note that this would 557 be from the extradata field in the RTPsample itself. 559 The bytesperblock and samplesperblock concern compressed audio. This 560 allows translation of the samplenumber into an actual byte offset in 561 the audio track. The sampledescription mode allows sending of 562 (portions of) sample descriptions as part of an RTP packet. 564 Note that these structures should be flexible enough to cover not 565 only the standard RTP payloads (H.261, MPEG, etc.) but also private 566 packings such as the QuickTime-in-RTP [3], or generic packing as is 567 now being proposed [4]. 569 Notice that there is no requirement that successive packets transmit 570 successive bytes from the media stream. For example, to conform with 571 RTP-standard packing of H.261, it is sometimes required that a byte 572 be sent at the end of one packet and also at the beginning of the 573 next (when a macroblock boundary falls within a byte). Conversely, 574 payload packings that interleave the data to achieve error resilience 575 will skip some bytes, to send them in another packet. 577 Note that it is possible, and legal, to copy all data into the hint 578 track, and use sample constructors with a trackrefindex of -1 579 uniformly. These will be simpler to interpret for the server, but 580 the file will be larger. 582 Acknowledgments 584 The author would like to thank a number of people, particularly Peter 585 Hoddie (Apple Computer), William Belknap (IBM Corporation), 586 Christopher Walton (Netscape), Dave Pawson (Oracle), Ronald Jacoby 587 (Silicon Graphics, Inc.), and Gerard Fernando and Michael Speer (Sun 588 Microsystems). 590 References 592 [1] H. Schulzrinne, et. al., "RTP : A Transport Protocol for Real- 593 Time Applications", IETF RFC 1889, January 1996. 595 [2] Apple Computer, Inc., "QuickTime File Format Specification", May 596 1996. 597 . 599 Expires : April 22 1999 601 Author's Contact Information 602 David Singer 603 Email: singer@apple.com 604 Tel: (408) 974 3162 606 Apple Computer, Inc. 607 One Infinite Loop, MS:302-3MT 608 Cupertino CA 95014 609 USA