idnits 2.17.1 draft-fleischman-asf-rtp-record-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 18 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 16 instances of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The "Author's Address" (or "Authors' Addresses") section title is misspelled. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 14, 1997) is 9660 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 698 looks like a reference -- Missing reference section? '2' on line 663 looks like a reference -- Missing reference section? '4' on line 650 looks like a reference -- Missing reference section? '5' on line 117 looks like a reference -- Missing reference section? '6' on line 117 looks like a reference -- Missing reference section? '7' on line 650 looks like a reference -- Missing reference section? '10' on line 222 looks like a reference -- Missing reference section? '3' on line 624 looks like a reference -- Missing reference section? '9' on line 624 looks like a reference -- Missing reference section? '8' on line 734 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Eric Fleischman 2 draft-fleischman-asf-rtp-record-00 Anders Klemets 3 Microsoft Corporation 4 November 14, 1997 5 Expires: May 14, 1998 7 Recording MBone Sessions to ASF Files 9 Status of This Memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its areas, and 13 its working groups. Note that other groups may also distribute working 14 documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is inappropriate to use Internet-Drafts as reference material 19 or to cite them other than as ``work in progress.'' 21 To learn the current status of any Internet-Draft, please check the 22 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 23 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 25 ftp.isi.edu (US West Coast). 27 Distribution of this document is unlimited. 29 Abstract 31 This document specifies two approaches by which multimedia data (e.g., 32 MBone conferences), transmitted using the Real-Time Protocol (RTP), may 33 be recorded to Advanced Streaming Format (ASF) files. The first method 34 requires a minimum amount of buffering at the recording station but 35 results in recordings which identically preserve the received content 36 including out of order packets, network ''jitter'', etc. The second 37 approach requires buffering at the recording station but results in 38 enhanced recordings (i.e., higher percentage of correctly ordered 39 packets, elimination of a percentage of received jitter, potential 40 recovery of a percentage of lost packets). Both approaches record all 41 received RTP content and the relevant subset of RTCP information. This 42 recording occurs transparently to the MBone conference or RTP session, 43 and does not involve any alterations to normal RTP, RTCP, or ASF use. 45 1. Introduction 47 The MBone is the part of the Internet that supports IP multicast, and 48 thus permits efficient many-to-many communication. It is used 49 extensively for multimedia conferencing. Such conferences usually have 50 the property that tight coordination of conference membership is not 51 necessary; to receive a conference, a user at an MBone site only has to 52 know the conference's multicast group address and the UDP ports for the 53 conference data streams. The specific MBone conferences addressed by 54 this document are those which use the Real-time Transport Protocol (RTP, 55 see [1]). In addition, the mechanisms described within this document 56 also support unicast RTP uses. 58 This document describes two methods for recording multimedia data that 59 is transmitted using the Real-Time Transport Protocol (RTP, see [1]) 60 into Advanced Streaming Format (ASF; see [2]) files. The approach is 61 independent of the network protocol used to transmit RTP packets and 62 supports the recording of both unicasted and multicasted sessions. Data 63 thus recorded may subsequently be played back by recreating the original 64 RTP packets and transmitting them using either unicast or multicast 65 techniques. A recording can also be played back locally, using a 66 suitable playback tool. Playback can be controlled using RTSP [4] or 67 other comparable stream control mechanisms. 69 RTP is a protocol for carrying arbitrary real-time data. Each RTP 70 packet contains a sequence number and timestamp, which can be used by a 71 receiver to detect losses and present the data at the right time. RTP 72 uses a control protocol, RTCP, which can be used to synchronize 73 different real-time streams. For synchronization to be possible, the 74 streams must be transmitted such that each stream has a distinct RTP 75 synchronization source (SSRC) identifier. RTP is most commonly used 76 over UDP. However, it may be used with any transport protocol that 77 detects bit errors, and that conveys the length of an RTP packet. RTP 78 does not specify a mechanism for the reliable transfer of data. The 79 protocol also does not address the encapsulation of specific media 80 types, but instead defers it to various profile specifications. 82 ASF is an extensible file format for recording optionally synchronized 83 multimedia streams. The format is not tied to any particular media type 84 or compression scheme. Similarly, the file format was designed to be 85 operating system and data communications protocol independent. 87 2. ASF Overview 88 The Advanced Streaming Format is defined in [2]. An ASF file consists 89 of three top-level objects: The Header Object, the Data Object and, 90 optionally, the Index Object. 92 The Header Object provides global information about the file as a whole 93 as well as specific information about the multimedia data stored within 94 the Data Object. This latter content provides the information necessary 95 to correctly interpret each of the media streams found within the Data 96 Object. The Header Object is a container for other objects that provide 97 the following specific functions: 99 * File Properties Object -- describes the global file attributes. 100 * Stream Properties Object -- defines a media stream, its 101 characteristics, and the information needed to decode that stream. 102 * Content Description Object -- contains all bibliographic information, 103 which may be either general for the file as a whole or stream 104 specific. 105 * Component Download Object -- provides information on playback 106 components. 107 * Stream Group Object -- logically groups media streams together into 108 specific rendering contexts. 109 * Scaleable Object -- defines scalability relationships among 110 (scaleable) media streams containing bands. 111 * Prioritization Object -- defines the relative prioritization between 112 media streams. 113 * Mutual Exclusion Object -- defines exclusion relationships between 114 media streams (e.g., language selection) 115 * Inter-Media Dependency Object -- defines dependency relationships 116 among mixed media streams. 117 * Rating Object -- provides the W3C PICS ([5], [6]) rating of the file. 118 * Index Parameters Object -- supplies the information necessary to 119 regenerate the index of an ASF file. 120 * Language List Object -- supplies Language Identifier information that 121 is used by several other ASF objects. 123 The Data Object contains all the data for each of the recorded media 124 streams. This data is stored in the form of ASF Data Units. In the 125 general case, ASF Data Units are designed to be directly insertable into 126 the payloads of data communications transport protocols in order to be 127 streamed across the network. Each ASF Data Unit is of variable length, 128 and contains data for only one media stream. Data units are sorted 129 within the Data Object based on the time at which they should be 130 delivered (send time). Due to the way Data Units are sorted, consecutive 131 Data Units may contain data from different media streams. 133 ASF media streams logically (in the general case) consist of sub- 134 elements that are referred to as objects. What an object happens to be 135 in a given media stream is entirely media stream dependent (e.g., it is 136 a specific image within an image media stream, a frame within a (non- 137 scalable) video stream, etc). 139 The Index Object contains a time-based index into the multimedia data of 140 an ASF file. The time interval that each index entry represents is set 141 at authoring time and stored in the Index Object. Since it is not 142 required to index into every media stream in a file, a list of the media 143 streams that are indexed follows the time interval value. Each index 144 entry consists of one data unit offset per media stream being indexed. 145 This information allows stream-specific index operations to occur. 147 A minimal ASF implementation consists of a Header Object containing 148 solely a File Properties Object, one Stream Properties object, and one 149 Language List Object as well as a Data Object containing only a single 150 ASF data unit. 152 3. Recording MBone Sessions 154 The process of recording MBONE sessions may be viewed as optionally 155 consisting of four steps: 157 Step 1 -- Create the ASF Header Object, which will provide the 158 context for correctly interpreting the data that may subsequently 159 be recorded. 161 Step 2 -- Record one or more RTP streams into the ASF Data Object. 163 Step 3 -- Optionally post-process the ASF Header Object to ensure 164 that it is as complete and as efficiently stored as possible 166 Step 4 -- Optionally create an ASF Index Object. 168 3.1. Preparing ASF Header Information 170 The ASF Header Object contains various other objects that contain 171 information about the media streams in the Data Object. It is often 172 desirable to create an ASF Header Object before the transmission that is 173 to be recorded has begun. This would be appropriate if information is 174 already available that describes the RTP sources that are to be 175 recorded. Such information might be obtained through SDP [7], RTSP [4], 176 or some other non-RTP means. It is also possible to add information to 177 the ASF Header Object as new information is learned during the recording 178 of the RTP traffic. 180 ASF requires that an instance of the Stream Properties Object (SPO) must 181 be defined to describe each media stream recorded within the Data 182 Object. A media stream generally corresponds to an RTP source in an RTP 183 session. RTP sources, in turn, are identified by the value of the SSRC 184 field in the RTP header. The IP address and port number to which the 185 data is sent identifies RTP sessions. On the MBone, most applications 186 send audio and video on separate RTP sessions, and thus audio and video 187 would be recorded as two separate media streams. However, all RTP 188 packets that belong to a media stream are expected to have identical RTP 189 Payload Type fields. If an RTP source changes the value it is using for 190 the RTP Payload Type field :mid-session", then RTP packets with the new 191 (i.e., different) Payload Type fields should be stored as a different 192 media stream within ASF with its own unique SPO. It is recommended that 193 the relationship between streams that compose the traffic from a single 194 RTP source be associated by grouping them via the ASF Header Object's 195 Stream Group Object. 197 While the session announcement will generally provide enough information 198 to construct an initial File Properties Object (FPO) and some of the 199 necessary SPOs before the session begins, loosely controlled (MBone) 200 conferences can permit additional participants to join the conference. 201 Therefore, provision should be made to anticipate the possibility of 202 additional speakers joining the session. A recommended way to satisfy 203 this provision is to reserve space within the ASF Header Object via the 204 ASF Placeholder Object (See Appendix A) where additional ASF objects may 205 be written (e.g., additional SPOs) as the MBone session dynamically 206 progresses. 208 Static RTP Payload Types may be handled in one of two ways: 209 1. Static RTP Payload Types should be translated into the equivalent 210 ASF standard media type (see Section 8 of [2]) using the equivalent 211 ASF codec (e.g., see Reference [10]), if known. 212 2. Alternatively, they can be recorded as RTP Media Types as defined in 213 Appendix B. 215 Dynamic RTP Payload Types may be handled in one of three ways: 216 1. The dynamic RTP payload type should be translated into the equivalent 217 ASF standard media type (see Section 8 of [2]) using the equivalent 218 ASF codec, if known. This means that the recorder will need to 219 identify the actual codec used by that dynamic RTP Payload Type 220 instance based upon the available information. The identity of this 221 codec will then need to be expressed as a specific ASF UUID 222 identifier (e.g., see Reference [10]) within the SPO's Codec ID 223 field. 224 2. Alternatively, the recorder can translate the dynamic RTP payload 225 type to the appropriate static RTP payload type, if any and record it 226 as an RTP Media Type as defined in Appendix B. 227 3. Alternatively, the recorder can record it as a dynamic RTP payload 228 type as defined in Appendix B. 230 RTP payload types, which can not be deciphered by any of the above 231 approaches, should be ignored (i.e., that media stream can not be 232 recorded). 234 Note that if the RTP payload is translated into the equivalent ASF 235 standard media type, an inverse transformation will need to be applied 236 by a playback device, if the recording is retransmitted as RTP packets. 238 3.2. Two Recording Approaches 240 The capabilities of local systems vary. For this reason, the document 241 suggests that limited capability systems seek to record data via the 242 Packet Capture Mode, which is described in section 3.2.1. More capable 243 systems are recommended to use the Record Structure Mode, described in 244 section 3.2.2. 246 3.2.1. Packet Capture Mode (Limited Buffering) 248 The Packet Capture Mode recording alternative seeks to write RTP data as 249 it is received to the ASF Data Object on the disk. The clock of the 250 recording computer is used to determine the ASF Data Unit's Send Time 251 value. The Send Time value is calculated by subtracting the multimedia 252 session's start time (as recorded by the recording computer) from the 253 recording computer's current time and converting the result into 254 millisecond units. 256 The RTP timestamp is directly written as the ASF Data Unit's 257 Presentation Time value, again making the necessary conversions to 258 account for the fact that the initial RTP timestamp value is random 259 while the initial ASF Send Time and Presentation Time values are zero. 260 The granularity of the Presentation Time units (i.e., the Presentation 261 Time Numerator and Presentation Time Denominator fields within the SPO) 262 should be set to the clock granularity for that RTP source. ASF's 263 default presentation time granularity (i.e., a millisecond) should 264 initially be used for those cases in which the actual clock granularity 265 is not known. 267 The value of the Presentation Time Flags within the SPO for this media 268 stream shall thus be configured to be "11" (i.e., Full Data Unit 269 Presentation Time). 271 RTCP Sender Reports (SR) for the RTP source being recorded can be used 272 to calculate the clock granularity of the source. This is useful if the 273 clock granularity is otherwise unknown. It is also possible to use 274 Sender Reports to detect skews between the clock granularity used by the 275 source, and the granularity that is given by the RTP Payload Type 276 specification or profile. If such a skew is detected, the Rational Time 277 Values (i.e., Presentation Time Numerator and Presentation Time 278 Denominator fields) of the SPO should be altered accordingly. 280 This approach has the advantage of being simple and direct to implement. 281 It has the following disadvantages: 282 * Jitter is preserved - and repeated re-recordings of the same 283 contents by this manner may exacerbate the jitter on each subsequent 284 recording. 285 * Out-of-order packets remain out of order. 287 3.2.2. Record Structure Mode (Buffering) 289 The Record Structure Mode requires that packets be buffered a finite 290 amount of time (e.g., 5 seconds) before being written to disk. Packets 291 within the buffer should be correctly ordered. Packet holes occurring 292 within the buffer interval should be filled by retransmitted packets (if 293 any). 295 Within this approach, the value of the RTP Timestamp field is used to 296 compute the send time. Since the RTP timestamp starts at a random value, 297 while the ASF Send Time and Presentation Time start at zero, a 298 conversion into appropriate ASF Send Time values must be made. The send 299 time is stored with a 1-millisecond granularity. The appropriate RTP 300 Payload Type specification or profile gives the granularity of the RTP 301 Timestamp. RTCP Sender Reports (SR) may be used calculate the 302 granularity of the RTP Timestamp if it is otherwise unknown. Sender 303 Reports can also be used to detect skews between the RTP Timestamp 304 granularity and the granularity specified in the RTP Payload Type 305 specification or profile. If such a skew is detected, the send time 306 values for currently buffered packets of that media type have to be 307 altered (retaining their millisecond granularities) to correctly reflect 308 the skew. 310 The following values should be recorded within the Stream Properties 311 Object for the media streams recorded by this approach: The clock 312 frequency of the RTP payload type should be appropriately recorded into 313 the Presentation Time Numerator and Presentation Time Denominator 314 fields. The Presentation Time Flag value should have the value of "01" 315 and the Presentation Time Delta field should have a value of zero. This 316 means that both the ASF send time and presentation time have the same 317 value and that subsequent RTP retransmissions of this data will contain 318 only one timestamp (i.e., RTP's timestamp). 320 This approach has the advantage of correcting some of the received 321 jitter, correctly sorting some of the out-of-order packets, and 322 potentially filling in some lost packets (assuming a retransmission 323 scheme is used). The disadvantage of this approach is that it is more 324 complex to implement. This is particularly the case if the RTP payload 325 type's clock frequency is not known ahead of time and has to be 326 subsequently learned via RTCP transmissions. In addition, it requires 327 additional buffering on the recording computer. 329 3.3. Recording MBONE Sessions 331 The following translations from RTP packet fields to ASF data fields are 332 identical for both recording approaches. 334 3.3.1. RTP Mixers and Translators 336 The combined streams resulting from Mixers and Translators need to be 337 demultiplexed back into their original component streams when being 338 recorded into ASF, if possible. If this is not possible, then copies of 339 the RTP packet containing data that is attributed to multiple sources 340 need to be stored into each of these sources' media streams (i.e., ASF 341 Data Units). In either case, these streams may be optionally re-mixed 342 when they are subsequently replayed from the ASF files depending upon 343 local implementation considerations. 345 3.3.2. RTP Packet Information 347 The RTP Header's Payload Type field combined with the SSRC is used to 348 determine the ASF Stream Number value for that media stream. This Stream 349 Number value identifies which SPO instance should be used to define this 350 media stream. This value is recorded into the Stream Number field of the 351 ASF Data Unit. 353 The Version field in the RTP header is not recorded into the ASF file 354 unless it is a version other than 2. If the Version field in the RTP 355 header is other than 2, the RTP version number should be recorded into 356 the ASF Header Object's Content Description Object (CDO; see Section 5.4 357 of [2]) using a value of 73 for the Field Type field. 359 The Padding bit, and the Padding field that is present if the bit is 360 set, is not recorded. If an RTP packet where the Padding bit was set is 361 received, the padding field should be removed from the RTP payload. 362 Padding may be regenerated when retransmitting the recording, if 363 necessary. 365 SSRC information should be written into the CDO as an aid for 366 remembering the association between an SSRC and a Media Stream. This 367 will also permit the original sequence number to be optionally recreated 368 once the recorded data is retransmitted. The 32-bit SSRC value will need 369 to be converted into a string when it is stored into the Value field of 370 the CDO. When storing the SSRC as a Unicode string, the SSRC is treated 371 as an unsigned 32-bit integer, and it must be converted to the local 372 byte order (i.e., host byte order). The value of the Field Type field is 373 70. 375 Because the initial RTP timestamp value is a random value, the initial 376 RTP timestamp value should also be recorded into the CDO. This will 377 permit the original timestamp sequence to be optionally recreated once 378 the recorded data is retransmitted. The 32-bit timestamp value will need 379 to be converted into a Unicode string when it is recorded into the Value 380 field of the CDO. The value of the CDO's Field Type field is 71. 382 The initial RTP Sequence Number value should be recorded into the CDO. 383 This will permit the original number to be optionally recreated once the 384 recorded data is retransmitted. The 16-bit Sequence Number value will 385 need to be converted into a Unicode string when it is stored within the 386 Value field of the CDO. When storing the Sequence Number as a string, 387 the Sequence Number is treated as an unsigned 16-bit integer, and it 388 must be converted to the local byte order (i.e., host byte order). The 389 value of the Field Type field is 72. 391 It should be noted that ASF's concept of Object Number differs from 392 RTP's concept of Sequence Number although they are both used to identify 393 out-of-order and missing information. [Note: earlier versions of the ASF 394 spec used the term "ObjectID" instead of "Object Number".] The former 395 identifies specific media stream "objects" as a part of a fragmentation 396 and grouping schema. What an object happens to be in a given media 397 stream is entirely media stream dependent (e.g., it is a specific image 398 within an image media stream, a frame within a (non-scalable) video 399 stream, etc). Since object fragmentation occurs within a specific RTP 400 Payload Type instance and RTP headers do not indicate this type of 401 information, an identical translation of the original Object Number 402 semantics would require a decoding of the media stream. The value of 403 pursuing this type of overhead is highly questionable, especially when 404 the ultimate goal of identifying missing or out-of-order information is 405 common between the two approaches. Therefore, the RTP sequence number 406 should be directly mapped into the ASF's Object Number field of the ASF 407 Data Unit. Since the 16-bit Sequence Number starts at a random interval 408 while the 8-bit Object Number starts at zero, the mapping between the 409 Sequence Number and Object ID needs to reflect this difference (e.g., 410 Current-Sequence-Number value minus Original Sequence-Number value = 411 Object Number) and account for the fact that Object Numbers "wrap 412 around" to zero every 2^8th packet and Sequence Numbers "wrap around" 413 when their value hits 2^16. 415 If the CSRC fields within the RTP header are demultiplexed into their 416 original component streams when being recorded, then the CSRC fields are 417 not recorded. If, however, this is not possible, then the CSRC 418 information should be written into the ASF Data Unit's extension field 419 as described below. 421 If the RTP payload has been converted into an "equivalent ASF standard 422 media type" (see Section 3.1), then the RTP Extension Object described 423 by the next paragraph is optional. However, if the RTP Media Type 424 described in Appendix B has been used to record the data, then the RTP 425 Extension Object is required to be used if either the RTP Header's M-bit 426 or the RTP Header's eXtension (X) bit are ever set within that stream, 427 or if CSRC information is ever needed to be recorded within that media 428 stream. The RTP Extension Object permits exact copies of the original 429 RTP packets to be regenerated, if desired. 431 The RTP Extension Object is an instance of the Extension Object that is 432 described within Section 5.3.1 of [2]. Extension Objects are associated 433 with a specific media stream's SPO and indicate the semantics and format 434 of specific data (i.e., in this case RTP Packet Header data) that is 435 stored on a per packet basis within the Extension Data field of the ASF 436 Data Unit (see Section 6.1 of [2]). The RTP Extension Object is defined 437 as follows: 438 * The value of the Extension Data Size field is 0xFFFF 439 * The UUID value of the Extension System field is {96800c63-4c94-11d1- 440 837b-0080c7a37f95}. 442 These definitions indicate that this recording shall follow the 443 "variable length" extension data encoding format (i.e., one bit length 444 field followed by the extension data) within the Extension Data field of 445 the ASF Data Unit. 447 In the case of the RTP Extension Object, the Extension Data field of the 448 ASF Data Unit has the following syntax: 450 Field Name: Size: 451 Extension Length 8 bits - Size in bytes of the Extension Data and Flag 452 fields (i.e. sizeof (Extension Data) + 1) 453 Flag 8 bits 454 X-bit 1 bit (LSB) -- contains the RTP Header's X-bit value 455 CSRC Count 4 bits -- contains the RTP Header's CC value 456 M-bit 1 bit -- contains the RTP Header's M-bit value 457 Reserved 2 bits (MSB) 458 Extension Data RTP Header CSRC list, if any, followed by Extension 459 Data, if any 461 The "variable length" encoding means that if either the X bit is set or 462 the CSRC Count has a non-zero value, then the Extension length, flag, 463 and RTP header extension data are written into the Extension Data field 464 of the ASF Data Unit. If both the X bit is cleared and the CSRC Count 465 has a zero value, then only the extension length and flag fields are 466 written to the Extension Data field of the ASF Data Unit. If both the X- 467 bit is set and the CSRC Count field has a non-zero value, then the CSRC 468 list of the RTP Header appears first immediately followed by the RTP 469 Header Extension data within the Extension Data field. These fields are 470 arranged in big-endian order (also known as network byte order). 472 3.3.3. RTCP Packet Information 474 RR and BYE packets are not recorded into ASF files. Clock skew 475 information obtained from SR packets is used for the timestamp 476 calculations described in Sections 3.3.1 and 3.3.2. Other information 477 contained in SR packets, except for APP and SDES information, is not 478 recorded. 480 SDES information is stored in the ASF Header Object's Content 481 Description Object (CDO). Appropriate SDES items (i.e., "CNAME", "NAME", 482 "EMAIL", "PHONE", "LOC", "TOOL", "NOTE", and "PRIV") shall be written 483 into the CDO as described by Appendix C. Synchronization relationships 484 between media streams containing the same CNAME value should be retained 485 via associating them by ASF's Inter-Media Dependency Object (Section 486 5.12 of [2]). 488 APP information should be handled in one of two ways. 489 1. If the recorder understands (through out-of-band mechanisms outside 490 of the scope of both ASF and RTP) that the APP information contains 491 script commands or invocations, which correspond to either the ASF 492 Header Object's Script Command Object (see section 5.5 of [2]) or to 493 a Command Media stream type (see section 8.7 of [2]), then the 494 recorder can convert the APP information into the appropriate ASF 495 constructs. 496 2. If the recorder does not understand the APP information then that 497 information should be appropriately recorded "as is" into the ASF 498 Header Object's Script Command Object. 500 If the values of the SDES fields from a particular RTP source change 501 during the recording, it is recommended that the CDO contain the initial 502 value for the SDES field. Subsequent values of the SDES fields should 503 then be recorded as a separate media stream, via the mechanisms 504 described in Appendix D. 506 3.4. Optional Post-Processing of the ASF Header 508 Whenever live recordings are made, the Live Bit must be set in ASF's 509 File Properties Object. This signifies that certain fields in the ASF 510 File Properties Object and the Stream Properties Object(s) are invalid 511 and should be ignored. In addition, these same files are likely to also 512 contain the ASF Placeholder Object (see Appendix A). It is highly 513 recommended, but not required, that post-processing be done to ASF files 514 to clear the Live Bit, remove the ASF Placeholder Object, and to write 515 valid data into the fields which are invalid when the Live Bit is set. 516 3.5. Optional Creation of the ASF Index Object 518 ASF uses the Index Parameters Object in the ASF Header to identify the 519 parameters and media streams whose data will be indexed. This object is 520 described in Section 5.14 of [2]. If the Index Parameters Object does 521 not yet exist for this file, then it needs to be constructed before the 522 Index Object is built. Using the information contained within the Index 523 Parameters Object, the Index Object is constructed as defined in Section 524 7 of [2]. 526 3.6. Playback of the Recorded RTP Data 528 Recorded media streams are stored into the ASF Data Object as ASF Data 529 Units (see Section 6.1 of [2]). Each ASF Data Unit contains a "header 530 field" together with the media data which is being stored. The payload 531 of each RTP packet comprises the media data stored within the ASF Data 532 Unit. The RTP header itself is not stored but its content is mapped into 533 the SPO, CDO, and the header field of the ASF Data Unit. 535 The ASF file contains sufficient information to play back the recorded 536 data, either locally or via a remote playback device. When RTP packets 537 are recorded into the ASF file using the RTP Media Type (see Appendix 538 B), sufficient information exists to regenerate RTP packets with the 539 same SSRC and sequence numbers as the original packets, if desired. 540 Additionally, it is possible to regenerate RTCP SDES and APP packets 541 with the same content as those sent by the original RTP source. This 542 permits recorded data to be retransmitted into an existing MBone 543 conference, for example, in such a manner that it may appear that the 544 data originates from the original RTP source. 546 This specification does not define a required feature set for playback 547 devices. For example, even though it is possible to retransmit the 548 recorded data using RTP, playback devices are not required to do so. 550 Appendix A. ASF Placeholder Object Definition 552 "Loosely controlled" sessions permit participants to enter and leave 553 without membership control or parameter negotiation. Since one can not 554 always predict how many participants will speak, nor what media types 555 they will use, a mechanism is needed to reserve space within the Header 556 Object so that new Header Objects (e.g., Stream Properties Objects) may 557 be readily added to the header when needed without requiring the header 558 to be re-written. 560 The purpose of the ASF Placeholder Object is to fulfill this "place 561 holder" function. New header objects are added into the space reserved 562 by the ASF Placeholder Object. The ASF Placeholder Object will then 563 reduce the amount of space it is reserving by the amount taken by the 564 new object(s). 566 ASF Placeholder Objects are ignored (skipped over) when ASF Header 567 Information is conveyed to remote nodes. Even so, it is recommended that 568 they be removed by post processing (see section 3.4) to make more 569 compact files. 571 The ASF Placeholder Object is defined as follows: 573 Field Name: Size: Value: 574 Object ID 128 bits This field contains the following UUID 575 value: {D6E22A0F-35DA-11d1-9034- 576 00A0C90349BE} 577 Object Size 64 bits The size of this object in bytes (i.e., 578 Reserved field value + 24) 579 Reserved (Object Size 580 - 24) * 8 bits Reserved space 582 Appendix B. RTP Media Type 584 ASF has defined standard media types for Audio, Video, Image, Timecode, 585 Text, MIDI, Command, and Media-Objects (Hotspots) in Section 8 of [2]. 586 Implementations, which support these types of media streams, are 587 expected to implement them in the manner defined within the ASF 588 standard. MBone content, which is stored within ASF, is therefore 589 expected to be mapped into the standard ASF media streams format 590 whenever possible. 592 However, occasions will exist when it will not be possible to conform to 593 this requirement. Possible reasons include the following: 594 * The recorder may not be aware of which media type is associated with 595 an RTP Payload Type (i.e., whether the RTP Payload Type is referring 596 to Audio, Video, or some other media type). 597 * The recorder may not know which ASF-defined codec corresponds to the 598 codec assumed by the RTP Payload Type and therefore it would be 599 unable to complete the mapping into a standard ASF media type. 600 * The RTP Payload Type may indicate an interleaved data stream (e.g., 601 video and audio combined into a single stream). No standard ASF media 602 type has yet been defined for such interleaved data. 603 * The RTP Payload Type may indicate a media type which is not among the 604 standard ASF Media Types. 605 For these reasons and others, a provision must exist to record MBone 606 data as a distinct RTP Media Type. This appendix defines the format of 607 RTP Media Type. 609 The RTP Media Type is defined within the Stream Properties Object (SPO) 610 by placing the UUID value {96800c65-4c94-11d1-837b-0080c7a37f95} into 611 the Stream Type field. The following information is then stored as Type- 612 Specific Data field within the SPO: 614 Field Name: Field Type: Size (bits): Description: 616 Payload Type UINT 8 The Payload Type value 617 indicated by the RTP header. 618 Profile Size UINT 16 Size in bytes of the Profile 619 field. 620 Profile UINT8 ? ASCII string identifying the 621 Profile which has defined the 622 Payload Type. (E.g., "AVP" 623 for the profile defined by 624 [3] and [9].) An empty string 625 is used if the profile is not 626 known. 627 Announcement ID Size UINT 16 Size in bytes of the 628 Announcement ID field. 629 Announcement ID UINT8 ? MIME Type of the session 630 announcement mechanism used. 631 (E.g., "application/x-sdp" 632 for SDP [7] announcements.) 633 Announcement Size UINT 16 Size in bytes of the 634 Announcement field. 635 Announcement UINT8 ? ASCII string containing the 636 definition for this media 637 stream. (E.g., for SDP [7] 638 announcements, this would 639 contain the entire rtpmap 640 entry for this media stream.) 641 All ASCII strings in the RTP Media Type are terminated by a NULL 642 character. These fields should be stored in little-endian byte order 643 (i.e., the orientation used in the ASF Header Object). 645 The final four fields (i.e., Announcement ID Size, Announcement ID, 646 Announcement Size, and Announcement) are used to convey information 647 about the dynamic RTP payload type. This information might have been 648 available to the recording device through non-RTP means. Examples of 649 possible sources of such information include session descriptions, such 650 as SDP [7], and presentation descriptions [4]. However, if a static RTP 651 Payload Type is being specified, both the Announcement ID Size and the 652 Announcement Size fields may have a value of zero indicating that the 653 Announcement ID and Announcement fields have not been specified. 655 The rest of the SPO should be specified as indicated in Section 3.2 656 above. 658 The received RTP data of this media stream is stored into the ASF Data 659 Object as described in Section 3.2 and Section 3.3 above. 661 Appendix C. Recording SDES Information 663 Section 5.4 of [2] describes the syntax and semantics of the Content 664 Description Object (CDO) within the ASF Header Object. This object 665 consists of an array of Description Records containing four logical 666 entries: 667 1. A Field Type value which identifies the semantics of the entry. Each 668 SDES packet may be recorded to the CDO using the following pre- 669 defined Field Type (unsigned integer) values: 671 SDES entry: Field Type Value: 672 CNAME 61 673 NAME 62 674 EMAIL 63 675 PHONE 64 676 LOC 65 677 TOOL 66 678 NOTE 67 679 PRIV 68 681 2. Stream Number to identify to which media stream this CDO entry 682 refers. 683 3. Name - Name of the entry. This field is redundant to the Field Type 684 value and therefore the field is frequently not used. However, 685 applications may optionally use this field for language 686 "localization" reasons (e.g., to translate the entry into a specific 687 target language). 688 4. Value - the information conveyed by the specific SDES message (e.g., 689 User and domain name in a CNAME packet). 691 Appendix D. SDES Media Streams 693 Section 3.3.3 stated that the first instance of a specific SDES RTCP 694 instance (i.e., a specific SDES item associated with a specific RTP 695 source identifier; e.g., a CNAME value for a specific SSRC) should be 696 recorded into the Content Description Object (CDO). The Stream Number 697 field within the CDO should refer to the media stream associated with 698 the RTP source identifier (i.e., SSRC/CSRC field of section 6.4 of [1]) 699 of that SDES packet chunk. The CDO has provisions for storing only one 700 SDES type instance (e.g., only one instance of a CNAME) for any given 701 media stream. Therefore, subsequent instances of the same SDES type for 702 that media stream will need to be recorded as a distinct "media stream" 703 if that information is to be preserved. This appendix defines how to 704 create such an SDES media stream. 706 An SDES media stream consists of SDES information written into the ASF 707 Data Object via the mechanisms described in section 3.2. Each SDES media 708 stream records SDES information from only one RTP source identifier. A 709 Stream Properties Object (SPO) is constructed for each SDES media 710 stream. That SDES media stream should also be associated with (i.e., 711 synchronized with) the media stream containing the RTP data of that same 712 RTP source identifier via the ASF Header Object's Inter-Media Dependency 713 Object. 715 The SPO for a SDES media stream should be constructed as follows: 716 * The UUID of the SDES Media Stream is {96800c62-4c94-11d1-837b- 717 0080c7a37f95}. This value should be written into the Stream Type 718 field of ASF's Stream Properties Object (SPO) to identify SDES Media 719 Streams. 720 * The value of the Type-Specific Data Length field within the SPO is 721 zero (i.e., no Type-Specific Data). 723 The format of an SDES Media Stream consists of one or more instances 724 (per ASF Data Unit) of the following structure: 726 Field Name: Field Type: Size (bits): Description: 727 Type Array Size UINT 16 Size in bytes of the Type 728 Array 729 Value Array Size UINT 16 Size in bytes of the Value 730 Array 731 Type Array UINT8 ? UTF-2 string [8] identifying 732 the specific SDES type 733 instance (e.g., "CNAME") 734 Value Array UINT8 ? UTF-2 string [8] containing 735 the SDES value (e.g., "user 736 and domain name" for a CNAME) 738 Authors Address 739 Eric Fleischman 740 E-mail: ericfl@microsoft.com 741 and 742 Anders Klemets 743 E-mail: anderskl@microsoft.com 744 Microsoft Corporation 745 1 Microsoft Way 746 Redmond, WA 98052-8300 747 USA 749 References: 750 1 H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson., "RTP : A 751 Transport Protocol for Real-Time Applications", IETF RFC 1889, 752 January 1996. 753 2 Microsoft Corporation, "Advanced Streaming Format (ASF) 754 Specification", http://www.microsoft.com/asf/specs.htm, September 755 1997. 756 3 H. Schulzrinne, "RTP Profile for Audio and Video Conference with 757 Minimal Control", IETF RFC 1890, January 1996. 758 4 H. Schulzrinne, A. Rao, and R. Lanphier "Real Time Streaming 759 Protocol (RTSP)", work in progress. 760 5 J. Miller, P. Resnick, and D. Singer, "Rating Services and Rating 761 Systems (and Their Machine Readable Descriptions)," World Wide Web 762 Consortium http://www.w3.org/PICS/services.html, May 5 1996. 763 6 T. Krauskopf, J. Miller, P. Resnick, and G. W. Treese, "Label Syntax 764 and Communication Protocols," World Wide Web Consortium 765 http://www.w3.org/PICS/labels.html, May 5 1996. 766 7 M. Handley, V. Jacobson, "SDP: Session Description Protocol", work 767 in progress. 768 8 International Standards Organization, "ISO/IEC DIS 10646-1:1993 769 information technology - universal multiple-octet coded character 770 set (UCS) - part I: Architecture and basic multilingual plane," 771 1993. 772 9 "RTP Payload types (PT) for standard audio and video encodings", 773 ftp://ftp.isi.edu/in-notes/iana/assignments/rtp-av-payload-types 774 10 "ASF Codec GUIDs", http://www.microsoft.com/asf/guids.htm