Audio-Video Transport WG D. Singer, Y Lim Internet Draft Apple Computer, mp4cast Document: draft-singer-mpeg4-ip-03 July 2001 Category: Expires January 2002 MPEG reference: N4282 A Framework for the delivery of MPEG-4 over IP-based Protocols Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Abstract This document forms an umbrella specification for the carriage and operation of MPEG-4 multimedia sessions over IP-based protocols, including RTP, RTSP, and HTTP, among others. It addresses IP Multicast as well. It also serves to document the standard MIME types associated with MPEG-4 files. Singer & Lim Informational - Expires Jan 2002 1 A Framework for the delivery of MPEG-4 July 2001 1 Introduction ISO/IEC 14496 is a standard designed for the representation and delivery of multimedia information over a variety of transport protocols. It includes interactive scene management, visual and audio representations as well as systems functionality like multiplexing, synchronization, and an object descriptor framework. This document provides a framework for the carriage of ISO/IEC14496 contents over IP networks and guidelines for designing payload format specifications for the detailed mapping of ISO/IEC 14496 content into several IP-based protocols Glossary of terms and acronyms AAC - MPEG-4 advanced audio codec AU - access unit in an ES (the smallest media data unit to which timing can be attributed). BIFS - binary format for scenes; the MPEG-4 scene composition system CELP - MPEG-4 speech codec CTS - composition time stamp DTS - decoding time stamp ES - elementary stream ESID - elementary stream ID FCR - flexmux clock reference FlexMux - a multiplex of several PDUs into a single unit; not used for multiplexing in RTP IOD - initial object descriptor; the 'hook' to the MPEG-4 streams needed to start a session OCR - object clock reference; an external clock reference for an MEG-4 stream OD - object descriptor; declares and defines an MPEG-4 stream SL - synchronization layer SL Packet - synchronization layer protocol data unit, in MPEG-4 systems 2 Use of RTP There are a number of RTP packetization schemes for ISO/IEC 14496 data[5] [6] [9]. Media-aware packetization (e.g. video frames split at recoverable sub-frame boundaries) is a principle in RTP, and thus it is likely that several RTP schemes will be needed, to suit both the different kinds of media - audio, video, etc. - and different encodings (e.g. AAC and CELP audio codecs) [8].This specification does not specify any payload format but do specify a general framework to design and utilize the payload formats in appropriate way. This specification requires that, no matter what packetization scheme is used, there are a number of common characteristics that all MUST Singer & Lim Informational û Expires Jan. 2002 2 A Framework for the delivery of MPEG-4 July 2001 have: however, such characteristics depend on the fact that the RTP Session contains a single elementary stream or a flexmux stream. In case an RTP Session contains a single elementary stream the following characteristics apply: 2.1] The RTP timestamp corresponds to the presentation time (e.g. CTS) of the earliest AU within the packet. 2.2] RTP packets have sequence numbers in transmission order. The payloads logically or physically have SL Sequence numbers, which are in decoding order, for each elementary stream. 2.3] The ISO/IEC 14496 timescale (clock ticks per second), which is timeStampResolution in the case of ISO/IEC 14496 Systems, MUST be used as the RTP timescale, e.g. as declared in SDP for an RTP stream. 2.4] To achieve a base level of interoperability, and to ensure that any ISO/IEC 14496 stream may be carried, all senders and receivers MUST implement a default RTP payload mapping scheme. It is highly desirable that this default scheme is common for both pure Audio and Visual streams as well as for SL Packetized streams. This default scheme is not yet identified. 2.5] Streams SHOULD be synchronized using RTP techniques (notable RTCP sender reports). When the ISO/IEC 14496 OCR is used, it is logically mapped to the NTP time axis used in RTCP. 2.6] The RTP packetization schemes may be used for ISO/IEC 14496 elementary streams 'standing alone' (e.g. without ISO/IEC 14496 systems, including BIFS); or they may be used within an overall presentation using the object descriptor framework. In the latter case, an SLConfigDescriptor is sent describing the stream. Logically, each RTP stream is passed through a mapping function which is specific to the payload format used; this mapping function yields an SL packetized stream. The SLConfigDescriptor describes this logical stream, not the actual bits in the RTP payload. For example, the RTP sequence number may be used to make the SLPacketHeader sequence number; other SL fields may be set in this way, dynamically, or from static values in the payload specification. For example, as all RTP packets carry a composition time-stamp, the flag in the SL header indicating its presence can normally be statically defined as 'true'. Each payload format for ISO/IEC 14496 content MUST specify the mapping function for the formation of the SLConfigDescriptor and the SLPacketHeader. In the case of RFC 3016, the mapping will be defined in a new section. Singer & Lim Informational û Expires Jan. 2002 3 A Framework for the delivery of MPEG-4 July 2001 +----------------+ +---------------+ +---------+ | RTP Packet | | Normative | | | | | -----> | mapping | ----->| | |(visual, audio) | | function | | | +----------------+ +---------------+ | | | ISO/IEC | +----------------+ +---------------+ | | | RTP Packet | | Normative | | 14496 | | | -----> | mapping | ----->| | |(generic format)| | function | | SL | +----------------+ +---------------+ | | . . | packets | . . | | . . | | +----------------+ +---------------+ | | | RTP Packet | | Normative | | | | | -----> | mapping | ----->| | |(FlexMux format)| | function | | | +----------------+ +---------------+ +---------+ In case an RTP Session contains a flexmultiplexed stream the following characteristics apply: 2.6] There is a single payload format for the carriage of Flexmux Streams over RTP [5]. Senders and receivers MAY implement this scheme. 2.7] The RTP timestamp corresponds to the FCR if present at the Flexmux level. 2.8] The ISO/IEC 14496 Flexmux timescale (FCR resolution in ticks per second) SHOULD be used as the RTP timescale (as can be declared in SDP). 2.9] the ISO/IEC 14496 FCR is logically mapped to the NTP time axis used in RTCP. Other payload formats MAY be used. They are signalled as dynamic payload IDs, defined by a suitable name (e.g. a payload name in an SDP RTPMAP attribute). In particular, the development of specialized RTP payloads for video (e.g. respecting video packets) and audio (e.g. providing interleave) is expected. It is possible that these schemes can be compatible with the default scheme required here. There may be a choice of RTP payload formats for a given stream (e.g. as an elementary stream, an SL-packetized stream, using FlexMux, and so on). It is recommended that * terminals implementing a given sub-system (e.g. video) accept at least an ES and the default SL packings of that stream; for example, this means accepting the draft by RFC 3016. and also the generic payload format for ISO/IEC 14496 Visual; Singer & Lim Informational û Expires Jan. 2002 4 A Framework for the delivery of MPEG-4 July 2001 * terminals implementing a given payload format accept any stream over that format for which they have a decoder, even if that packing is not normally the 'best' packing. Future versions of this specification will identify the single standard RTP packing format for each ISO/IEC 14496 stream type. However, at the time of writing the RTP payload format specifications are still being defined, and the set is incomplete. These recommendations will form the basis for improved interoperability. For those streams requiring a certain Quality of Service (specifiable appropriately) , the recommendation is to further investigate possible solutions such as the leverage of existing work in the IETF in this area (including, but not limited to FEC, re-transmission, or repetition). However, techniques in data-dependent error correction, or combined source/channel coding solutions make other schemes attractive. Also, it is recommended that requirement such as efficient grouping mechanisms (i.e. the ability to send in a single RTP packet multiple consecutive Aus, each with its own SL information) and low overhead are also taken into account. 3 SDP Information This specification considers only ISO/IEC 14496 Systems related issues. Usage of SDP information for specific payload format shall be specified in each RTP payload format RFCs. The usage of elementary streams in other contexts is not addressed here: codepoints for this case are specified in [6], and in other places. This specification currently assumes that any session described by SDP (e.g. in SAP, as a file download, as a DESCRIBE over RTSP) has at most one ISO/IEC 14496 session. It is desirable that this restriction be lifted. 3.1] Senders SHOULD alert receivers that an ISO/IEC 14496 session is included, by means of an SDP attribute that is general (i.e. before any "media" lines). This takes the form of an attribute line: a=mpeg4-iod [] location: In an RTSP session, this is an optional attribute. If not supplied, the IOD is retrieved over the RTSP session by using DESCRIBE with an accept of type application/mpeg4-iod. Where the SDP information is supplied by some other means (e.g. as a file, in SAP), the location is obligatory. The location should be a URL enclosed in double-quotes, which will supply the IOD (e.g. small ones may be encoded using "data:", otherwise "http:" or other suitable file- access URL). The InitialObjectDescriptor is defined in sub-clause 8.6.3.1 of ISO/IEC 14496-1. Singer & Lim Informational û Expires Jan. 2002 5 A Framework for the delivery of MPEG-4 July 2001 or: a=mpeg4-iod-xmt [] location: In an RTSP session, this is an optional attribute. If not supplied, the IOD is retrieved over the RTSP session by using DESCRIBE with an accept of type application/mpeg4-iod-xmt. Where the SDP information is supplied by some other means (e.g. as a file, in SAP), the location is obligatory. The location shall be a URL enclosed in double-quotes, which will supply the IOD in XMT format (e.g. small ones may be encoded using "data:", otherwise "http:" or other suitable file-access URL). The InitialObjectDescriptor is defined in sub-clause 8.6.3.1 of ISO/IEC 14496-1, and its XMT format is defined in ISO/IEC 14496-1 2001 PDAM 2. Any receivers using IOD shall understand binary IOD and may understand textual IOD. 3.2] New encoding names for the a = rtpmap attribute It is recommended that, no matter what payload format is used, each media stream be placed in a media section that is appropriate. For example, a payload format which can carry both video and audio streams may be used in sections of SDP starting both with "m=video" and "m=audio". The MIME name for the payload format is thus registered under all applicable branches. a = rtpmap: /