Internet Engineering Task Force Audio-Video Transport WG & Others INTERNET-DRAFT D. Singer & P. Westerink draft-singer-mpeg4-ip-00 Apple Computer & IBM July 3 2000 Expires: January 3, 2001 MPEG Document number: M6150 A Framework for the delivery of MPEG-4 over IP-based Protocols Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Abstract This document forms an umbrella specification for the carriage and operation of MPEG-4 multimedia sessions over IP-based protocols, including RTP, RTSP, and HTTP, among others. It also serves to document the standard MIME types associated with MPEG-4. 1 Introduction MPEG-4 is a complex multimedia system designed for delivery over a variety of transport protocols. It includes scene management, D. Singer, P Westerink [Page 1] Internet Draft draft-singer-mpeg4-ip-00 July 3 2000 interactivity, video, audio, and other streams. This document provides a number of specifications for the detailed mapping of MPEG-4 into several IP-based protocols. Open issues: it might be desirable to signal to the terminal the amount of buffering assumed by the encoding/transmission process (in addition to any network jitter). 2 Use of RTP There are a number of Internet Drafts describing RTP packetization schemes for MPEG-4 data [5] [6] [7] [8] [9]. This draft does not specify any new one. Media-aware packetization (e.g. video frames split at recoverable sub-frame boundaries) is a principle in RTP, and thus it is likely that several RTP schemes will be needed, to suit both the different kinds of media - audio, video, etc. - and different encodings (e.g. AAC and CELP audio codecs). This specification requires that, no matter what packetization scheme is used, there are a number of common characteristics that all MUST have. 2.1] The RTP timestamp corresponds to the CTS of the earliest AU within the packet. 2.2] RTP packets SHOULD have sequence numbers, and be sent, in decoding order. Note that in the case where multiple, interleaved access units are sent in one packet, this will not be adhered to completely. However, any transmission scheme MUST respect decoding dependencies in any re-ordering it does, so that dependent access units do not arrive before the access units they depend on. 2.3] The MPEG-4 timescale (clock ticks per second) SHOULD be used as the RTP timescale, e.g. as declared in RTP. 2.4] To achieve a base level of interoperability, and to ensure that any MPEG-4 stream may be carried, all senders and receivers SHOULD implement the simple scheme for the carriage of 1 elementary stream over RTP [8]. It is not a requirement that any particular session use this scheme for any particular scheme, merely that every terminal be able to receive this scheme. 2.5] There is a single payload format for the carriage of Flexmux over RTP [5]. Senders and receivers MAY implement this scheme. 2.6] Streams SHOULD be synchronized using RTP techniques (notable RTCP sender reports); the MPEG-4 OCR is logically mapped to the NTP D. Singer, P Westerink [Page 2] Internet Draft draft-singer-mpeg4-ip-00 July 3 2000 time axis used in RTCP. Other payload formats MAY be used. They are signalled as dynamic payload IDs, defined by a suitable names (e.g. a payload name in an SDP RTPMAP attribute). In particular, the development of specialized RTP payloads for video (e.g. respecting video packets) and audio (e.g. providing interleave [10]) is expected. It is possible that these schemes can be compatible with the simple scheme required here [8]. For those streams requiring reliable delivery, the recommendation is to investigate the leverage of existing work in the IETF in this area (including, but not limited to FEC, re-transmission, or repetition), rather than making it a characteristic of the packing scheme itself. However, techniques in combined source/channel coding, or error- correction which is dependent on the coding scheme, may make other schemes attractive [9]. 3 SDP Information This specification currently assumes that any session described by SDP (e.g. in SAP, as a file download, as a DESCRIBE over RTSP) has at most one MPEG-4 session. It is desirable that this restriction be lifted. 3.1] Senders SHOULD alert receivers that an MPEG-4 session is included, by means of an SDP attribute that is general (i.e. before any "media" lines). This takes the form of an attribute line: a=mpeg4-iod [] In an RTSP session, the location is optional. If the location is not supplied, the IOD is retrieved over the RTSP session by using DESCRIBE with an accept of type application/mpeg4-iod. Where the SDP information is supplied by some other means (e.g. as a file, in SAP), the location is obligatory and should be a URL enclosed in double- quotes, which will supply the IOD (e.g. small ones may be encoded using DATA:, or otherwise HTTP: or other suitable file-access URL). 3.2] The mapping of RTP streams to elementary streams. This needs to cover the Flexmux case as well as the single stream. Within the SDP information, a stream-specific attribute SHOULD be present for each MPEG-4 stream. It takes one of two forms, depending on whether a single elementary stream, or a flexmux, is carried. a=mpeg4-esid a.b.c or a=mpeg4-esids m.i:a.b.c,n.k:p.q D. Singer, P Westerink [Page 3] Internet Draft draft-singer-mpeg4-ip-00 July 3 2000 The first attribute is used for single streams; the second for flexmux. In this, a is the ESID of the top-level OD stream (declared within the IOD), b is the ESID within that scope of another OD stream, and c is the ESID within that scope of the stream; similarly for p and q. m and n are muxcodes (identifying the table used), and i and k are flexmux channel numbers within the indicated muxcode. 3.3] The flexmux stream also needs a muxcode table supplied to the receiver. These are indicated via a stream-level attribute. a=mpeg4-muxcodetable where is a URL enclosed in double quotes, that will supply the table(s). If they are small, a DATA: URL will probably suffice to carry them in-line. If not, the URL should use a file-retrieval scheme (e.g. HTTP, FTP). The data at the indicated URL consists of some number of concatenated muxcode tables, complete, in binary format (but note that DATA URLs allow for base64 encoding of binary data, which would be needed here). The mime type of a muxcode table needs deciding (application/mpeg4-muxcodetable). These tables have an intrinsic length, so simple concatenation suffices. 4 MIME Types Amendment 1 of the MPEG-4 standard (also known as version 2) is nearing completion, and includes a standard file type for encapsulating MPEG-4 data. This file type can be used in a number of ways: perhaps the most important are its use as an interchange format for MPEG-4 data, its use as a content-download format, and as the format read by streaming media servers. These first two uses will be greatly facilitated if there is a standard MIME type for serving these files (e.g. over HTTP). The MPEG-4 standard is broad, and therefore the type of data which may be in such a file can vary. In brief, simple compressed video and audio (using a number of different compression algorithms) can be included; interactive scene information; meta-data about the presentation; references to MPEG-4 media streams outside the file; and so on. The historical approach for MPEG data is to declare it under "video". Though MPEG-4 is considerably broader than MPEG-1 and MPEG-2, I believe that we should follow this precedent, as "multipart" seems inappropriate and "application" too diffuse. However, MPEG-4 may be used for a purely audio environment, and in that case the type audio/mpeg4 should be used. In either case, these indicate files D. Singer, P Westerink [Page 4] Internet Draft draft-singer-mpeg4-ip-00 July 3 2000 conforming to the "MP4" specification (ISO/IEC 14496 Amendment 1, systems file format). 3.1] When an MP4 file is served (e.g. over HTTP) or otherwise must be identified by a MIME type, the type "video/mpeg4" SHOULD be used. The type "audio/mpeg4" MAY be used when the MPEG-4 presentation contained within the MP4 file has no visual aspects and is entirely auditory. 3.2] In some cases, the initial object descriptor needs to be identified with a MIME type. In this case, the type "application/mpeg4-iod" SHOULD be used. 3.3] In some cases, the muxcode table needed by a flexmux decoder needs to be identified with a MIME type. In this case, the type "application/mpeg4-muxcodetable" SHOULD be used. 3.4] The payload names used in an RTPMAP attribute within SDP, to specify the mapping of payload number to its definition, also come from the MIME namespace. Each of the RTP payload mappings defined above has a distinct name. For those payloads carrying a variety of MPEG stream types, the name SHOULD be drawn from the "video" namespace. For those payloads specific to audio only, the name SHOULD be drawn from the "audio" namespace. Given the broad and general nature of MPEG-4, and the interactive environment, it is hard to say that there are no security considerations. However, none are known to the author at this time, and the standard was developed with the intent that there be none. MIME media type name: video, and audio MIME subtype name: mpeg4 MIME media type name: application MIME subtype name: mpeg4-iod, mpeg4-muxcodetable Required parameters: none Optional parameters: none Encoding considerations: base64 generally preferred; files are binary and should be transmitted without CR/LF conversion, 7-bit stripping etc. Security considerations: None known at the time of writing Interoperability considerations: A number of interoperating implementations exist within the MPEG-4 community; and that community has reference software for reading and writing the file format. Published specification: Pending (ISO/IEC 14496, MPEG-4 D. Singer, P Westerink [Page 5] Internet Draft draft-singer-mpeg4-ip-00 July 3 2000 Systems Amendment 1). Applications: Multimedia Additional information: Magic number(s): none File extension(s): mp4 and mpg4 are both declared at Macintosh File Type Code(s): mpg4 is registered with Apple Person to contact for info: David Singer, singer@apple.com Intended usage: Common Author/Change controller: David Singer, MPEG-4 file format chair 5 RTSP usage RTSP may be used as a session control protocol for sessions which carry MPEG-4 information. When RTSP is used as a session-control protocol: 5.1] RTP SHOULD be used as the transport protocol. 5.2] The initial DESCRIBE format SHOULD be SDP. If the SDP information reveals that an IOD is needed, and the terminal does not already have it, then a second DESCRIBE accepting an IOD SHOULD be performed (see above). 5.3] Note that if all MPEG-4 streams are closed (TEARDOWN) then the RTSP session ID will be lost. The next (re-)opened stream will supply a new session ID. Care should be taken that the target of the URL has not changed in the interval; new DESCRIBEs may be needed. Acknowledgments This draft has benefited greatly by contributions from many people, including Mike Coleman, Jean-Claude Duford, Carsten Herpel, Olivier Avaro, Paul Christ, and many others. Their insight, foresight, and contribution is gratefully acknowledged. Little has been invented here by the authors; this is mostly a collation of greatness that has gone before. D. Singer, P Westerink [Page 6] Internet Draft draft-singer-mpeg4-ip-00 July 3 2000 References [1] H. Schulzrinne, et. al., "RTP : A Transport Protocol for Real- Time Applications", IETF RFC 1889, January 1996. [2] H. Schulzrinne, et. al., "RTP Profile for Audio and Video Conference with Minimal Control", IETF RFC 1890, January 1996. [3] H. Schulzrinne, et. al., "Real Time Streaming Protocol", IETF Draft, draft-ietf-mmusic-rtsp-09.txt, February 2 1998, Expires: August 2 1998. [4] M. Handley, "SDP: Session Description Protocol", IETF Draft, draft-ietf-mmusic-sdp-05.txt, November 21 1997, Expires: November 21 1998. [5] C.Roux et al., "RTP Payload Format for Flexmultiplexed MPEG-4 Streams", IETF Draft, draft-rgcc-avt-mpeg4flexmux-00, March, 09 2000 expires Sept 9 2000 [6] Yoshihiro Kikuchi et al., "RTP payload format for MPEG-4 Audio/Visual streams", IETF Draft, draft-ietf-avt-rtp-mpeg4-es-01, Feb 1 2000, expires Aug 1 2000 [7] C.Guillemot et al., "RTP Payload Format for MPEG-4 with Flexible Error Resiliency", IETF Draft, draft-ietf-avt-mpeg4streams-00, March 1 2000, expires Sept 1 2000 [8] R Civanlar et al., " RTP Payload Format for MPEG-4 Streams", IETF Draft, draft-ietf-avt-rtp-mpeg4-02, ?? 2000, expires ?? 20000 [9] C.Guillemot et al., "RTP payload format for MPEG-4 Visual Advanced Profiles", IETF Draft, draft-gc-avt-mpeg4visual-00.txt, March 1 2000, expires Sept 1 2000 [10] R. Finlayson, "A More Loss-Tolerant RTP Payload Format for MP3 Audio", IETF Draft, draft-ietf-avt-rtp-mp3-01.txt, Mar 10 2000, expires Sep 10 2000 Authors' Contact Information David Singer Email: singer@apple.com Tel: +1 408 974 3162 Apple Computer, Inc. One Infinite Loop, MS:302-3MT Cupertino CA 95014 USA D. Singer, P Westerink [Page 7] Internet Draft draft-singer-mpeg4-ip-00 July 3 2000 Peter Westerink Email: peterw@us.ibm.com Tel: +1 914 784 7173 IBM 30 Saw Mill River Road Hawthorne, NY 10532 USA D. Singer, P Westerink [Page 8]