Internet Engineering Task Force Audio Visual Transport WG Internet-Draft C.Guillemot, P.Christ, S.Wesner, A. Klemets draft-ietf-avt-mpeg4streams-00.txt INRIA / Univ. Stuttgart - RUS / Microsoft March, 1 2000 Expires: September, 1 2000 RTP Payload Format for MPEG-4 with Flexible Error Resiliency STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as refer- ence material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes a payload format, which can be used for the transport of both MPEG-4 Elementary Streams (ES), i.e audio, visual, BIFS and OD streams and MPEG-4 Sync Layer and Flexmux packet streams, in RTP [1] packets. The payload format allows for protec- tion against loss in a generic way. The mechanisms proposed can op- erate both on full and partial MPEG-4 ES Access Units, on Sync Layer packets, or Flexmux packets. These mechanisms can cover a broad range of protection schemes and avoid extra connection management complexity - e.g. for separate FEC channels - in MPEG-4 applications with a potentially high number of streams. Guillemot/Christ/Wesner/Klemets. [Page 1] Internet-Draft Payload Format for MPEG-4 Streams March 2000 Table of Contents 1 Introduction..............................................3 2 MPEG-4 overview...........................................4 2.1 Scene description framework...............................4 2.2 MPEG-4 Systems............................................4 2.3 MPEG-4 profiles...........................................6 3 Design Considerations.....................................6 4 Payload Format specification..............................9 4.1 RTP Header Usage..........................................9 4.2 Payload Header...........................................10 4.3 Payload for the transport of ES..........................11 4.4 Payload for the transport of SL-PDUs.....................11 4.5 Payload for the transport of FlexMux-PDUs................12 5 Extension data field for FEC data........................13 5.1 Extension data field for Parity Codes....................13 6 Multiplexing.............................................17 7 Security Considerations..................................17 8 Authors Addresses........................................17 9 References...............................................18 List of Figures Figure 1: Structure of FlexMux packet in simple mode.............5 Figure 2: Structure of FlexMux packet in MuxCode mode............6 Figure 3: Architecture...........................................7 Figure 4: Example of ESI.........................................8 Figure 5: RTP payload format....................................10 Figure 6: Portrait of the unified approach for transport of ES and SL packetized streams................................12 Figure 7: Sample RTP payload for SL PDU transport...............12 Figure 8: Sample RTP payload for FlexMux-PDU transport with protection support...................................13 Figure 9: Sample RTP payload for FlexMux-PDU transport..........13 Figure 10: FEC Header for Parity Codes..........................14 Figure 11: Simplified FEC Header for Parity Codes (with default masks)...............................................15 Figure 12: FEC Header for Reed-Solomon Codes....................16 Figure 13: Example of Interleaving (for P=7)....................16 Guillemot/Christ/Wesner/Klemets. [Page 2] Internet-Draft Payload Format for MPEG-4 Streams March 2000 1 Introduction The MPEG-4 standard targets a very large range of applications: from classical videotelephony and videoconferencing applications to ap- plications requiring a very high degree of interaction with audio- visual scenes. In order to reach this latter goal very advanced tools have been specified in the different parts of the standard (Audio, Visual, Systems) which can be configured according to pro- files to meet various application requirements. This document is motivated by the large number of profiles, the large variety of MPEG-4 compressed streams (audio, visual, BIFS, OD, SL, FlexMux), and by the need for a flexible degree of protection to be applied to them. In addition to having a unique payload format for both MPEG-4 Elementary Streams (ES), Synchronization Layer packet (SL-PDU Streams) or Flexmux packet streams, another motiva- tion is flexibility in associating error control mechanisms with the compressed media streams, in order to provide protection to various applications, not restricted just to simple profiles. The error control mechanisms can be dynamically adapted to different types of stream elements (e.g. Access Units, segments, packets) and/or net- work characteristics . This design of this payload format has been inspired by previous proposals for generic payload formats, [2-3]. Additionally, it at- tempts to federate different error control approaches under a single protocol support mechanism. The rationale for this payload format consists in: - A unified approach for both MPEG-4 ES, MPEG-4 sync layer, and Flexmux packet streams - with simple grouping mechanisms. - A solution independent of the usage or the non-usage of the MPEG-4 OD framework, and not restricted to MPEG-4 simple pro- files. - Protection against packet loss with a flexible support of a range of loss control mechanisms (redundant data such as re- peated important segments of the elementary streams or FEC) adapted to typed segments of streams. Typed segments are parts of Access Units (AUs) being - in terms of the encoding syntax - syntactical and semantically meaningful parts of an AU - cf. [4], 7.2.3: "Such partial AUs may have significance for im- proved error resilience". - Access Units are the smallest entities in the bitstream that can be attributed individual timestamps. The - in-band - mechanism proposed avoid extra connection management complexity possibly brought by separate FEC channels. Indeed, in MPEG-4 applications, the number of streams can potentially be high. Guillemot/Christ/Wesner/Klemets. [Page 3] Internet-Draft Payload Format for MPEG-4 Streams March 2000 - protection against packet loss with a protocol support easily adaptable to varying network conditions, for both "live" and "pre-recorded" visual contents. The list of all the protection schemes supported will be announced via an out-of-band signaling at the beginning of the session, using for example SDP [7]. The protection scheme used at a specific in- stant during the session will be signaled via the extension type (XT) field in the payload header. 2 MPEG-4 overview 2.1 Scene description framework An MPEG-4 scene is composed of media objects. The MPEG-4 dynamic- scene description framework, which defines the spatio-temporal rela- tion of the media objects as well as their contents, is inspired by VRML. The compressed binary representation of the scene description is called BIFS (Binary Format for Scenes), [4]. The compressed scene description is conveyed through one or more Elementary Streams (ES). A compression layer produces the compressed representations of the audio-visual objects that will be inserted into the scene. These compressed representations are organized into Elementary Streams (ES). Elementary Stream Descriptors provide information relative to the stream, such as the compression scheme used. Elementary stream data is partitioned into Access Units. The delineation of an Access Unit is completely determined by the entity - the compression layer - that generates the elementary stream. An Access Unit is the smallest data entity to which timing information can be attributed. Two Access Units shall never refer to the same point in time. Natural and animated synthetic objects may refer to an Object De- scriptor (OD), which points to one or more Elementary Streams that carry the coded representation of the object or its animation data. An OD serves as a grouping of one or more Elementary Stream Descrip- tors that refer to a single media object. The OD also defines the hierarchical relations and properties of the Elementary Streams De- scriptors. A complete set of ODs can be seen as an MPEG-4 resource or session description. The Object Descriptors are conveyed through one or more Elementary Streams. By conveying the session (or resource) de- scription as well as the scene description through their own Elemen- tary Streams, it becomes possible to change portions of scenes and/or properties of media streams separately and dynamically at well-known instants of time. 2.2 MPEG-4 Systems Guillemot/Christ/Wesner/Klemets. [Page 4] Internet-Draft Payload Format for MPEG-4 Streams March 2000 The MPEG-4 Systems specification [4] also defines a packetization of ES data into access units or parts thereof. The packets are called SL packets, or SL-PDUs. The resulting sequence of SL packets is called the SL-Packetized Stream (SPS). Access Units are the only semantic entities at this layer and their content is opaque. Pack- etization information has to be exchanged between the entity that generates an elementary stream and the sync layer. This relation is best described by a conceptual interface between both layers, termed the Elementary Stream Interface (ESI). A SL packet (SL-PDU) consists of an SL packet header and a SL packet payload. The SL packet header provides means for continuity check- ing in case of data loss and carries the coded representation of the time stamps and associated information. This syntax is configurable to adapt to the needs of different types of elementary streams and is defined in the SLConfigDescriptor. A SL-PDU does not contain an indication of its length. Therefore, SL packets must be framed by a suitable lower layer protocol. Conse- quently, a SL-PDU stream is not a self-contained data stream that can be stored or decoded without such framing. SL-PDUs of varying instantaneous bit rate can then be interleaved by using the FlexMux tool. The basic data entity of the FlexMux is a FlexMux packet, which has a variable length. Two different modes of operation of the FlexMux: the Simple Mode and the MuxCode Mode. In the simple mode one SL packet is encapsulated in one FlexMux packet and tagged by an index which is equal to the FlexMux Channel number as shown in Figure 1 below [4]. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+... +-+-+-+-+-+-+-+ | index | length | SL-PDU | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+... +-+-+-+-+-+-+-+ | header | payload | +-+-+-+-+-+-+... +-+-+-+-+-+-+-+ Figure 1: Structure of FlexMux packet in simple mode In the MuxCode mode one or more SL packets are encapsulated in one FlexMux packet. In this mode the index value is used to dereference configuration information that defines the allocation of the FlexMux packet payload to different FlexMux Channels. Guillemot/Christ/Wesner/Klemets. [Page 5] Internet-Draft Payload Format for MPEG-4 Streams March 2000 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+... +-+-+-+-+-+-+-+ | index | length | version |SL-PD | à |SL-PDU | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+... +-+-+-+-+-+-+-+ Figure 2: Structure of FlexMux packet in MuxCode mode The Flexmux tool is optional. 2.3 MPEG-4 profiles In order to allow effective implementations of the standard, subsets of the MPEG-4 Systems, Visual, and Audio tool sets have been identified, that can be used for specific applications. Profiles exist for various types of media content (audio [8], visual [5], and graphics [5]) and for scene descriptions [4]. Depending on the different visual profiles, different sets of parameters will be present in the header of the VideoObjectPlane(). A set of error resilience tools has been defined in the MPEG-4 vis- ual syntax in order to recover corrupted headers [5]. In particular, the VideoObjectPlane data is structured in video packets, the entry point being defined by the function video_packet_header(), and delimited by resync_markers. Basic configuration parameters can be inserted in the packet header. However, this concerns only parameters used in the simple visual profile, many parameters essential in the simple scalable, main and core profiles are not covered by this mechanism [5]. Also, no such mechanism has been defined for BIFS and ODs streams. Although, TCP could be envisaged for the transport of BIFS and ODs under mild time constraints, TCP may not be suited under tight timing constraints for scene animation, update, and in multicast scenarios. 3 Design Considerations The design goals of this RTP payload format are to provide the fol- lowing: - a unified solution, with error protection easily adaptable to varying network conditions, for both "live" and "pre-recorded" contents. - a unified solution for the transport of SL packet streams - with a possible N-to-1 mapping -, of Flexmux packet streams, and for the transport of robust ES (audio, visual, BIFS, Ods, IPMP) data. - a solution supporting advanced profiles (i.e. not restricted to the simple audio/visual profile), and independent of the usage or non-usage of the OD framework. C.Guillemot et al. [Page 6] Internet-Draft Payload Format for MPEG-4 Streams March 2000 Figure 3, on the following page, shows the adopted model. It relies on an optional network adaptation layer, which supports protection mechanisms. Ideally, this network adaptation layer is both media and network aware. The compression layer organizes the ESs in Access Units (AU). The AUs are the smallest entities that can be attributed individual timestamps. The timestamps may be obtained directly, through the ESI, with syntax as specified by the SLConfigDescriptor. If the SLConfigDescriptor indicates that timestamps are absent, the time- stamps may be obtained indirectly, for example, by using the frame rate. The compression layer passes full or partial Access Units, together with indications of AU boundaries, random access points, desired timing information as described by the SLConfigDescriptor, directly or indirectly (via the sync layer) to the network adaptation layer. It is however preferable, for implementation efficiency, to pass the ES data directly to the network adaptation layer, i.e. to avoid pro- ducing the full SL packets. Partial AUs or typed segments are - in terms of the encoding syntax - syntactical and semantically meaning- ful parts of an AU - cf. [4], 7.2.3, "Such partial AUs may have sig- nificance for improved error resilience".) --- ---------------------------------- |S| | Compression Layer | Media aware |L| ----------------------------------- | | | |C| ES Descriptor | | |o| |----------|---------| | |n| ES Type RAP Flag QoS | |f| | | | | |.| -------------V----------V---------V-----|---- ESI |D| | |e| ------------------------------- | |s| | | | |c| | Network Adaptation Layer |<-O Network aware |r| | ->Redundancy, FEC | | | |.| | | | | | -----------|-+- - - - - - - - -| - - -| | | --|-----------------|------|--- | | | | | -------------|-- -------------V------V-----V------ | QoS | | RTP | | Ext. | |"SL" | Media | | monitoring | | Hdr.| | Data= | | | | ---------------- | | | e.g. | | | | | | | FEC | | | | --------------------------------- Figure 3: Architecture Guillemot/Christ/Wesner/Klemets. [Page 7] Internet-Draft Payload Format for MPEG-4 Streams March 2000 Figure 4 lists parameters that should be passed along with the ES data. The SLConfigDescriptor indicates the presence or absence of each parameter. When any of these parameters are present, then the adaptation layer will directly produce the "stripped down" SL header to be inserted in the payload of the RTP packet. Note that, the normative behavior at the receiving side can be as- sured when the OD framework is present by using the SLConfigDescrip- tor, which is visible in the compression layer, or, outside the OD framework, by other means signaling the ES syntax e.g. through a "capability exchange". DTS: Decoding Time Stamp CTS: Composition Time Stamp OCR: Object Clock Reference IdleFlag loop(randomAccess Flag AUStartFlag AUEndFlag Esdata dataLength degradationPriority segmentType ) Figure 4: Example of ESI The payload format also specifies a mechanism for grouping an AU or a partial AU, an SL-PDU or a FlexMux PDU together with protection data (FEC, redundant data). This mechanism makes it possible to adapt the protection of the different typed segments, or SL-PDUs, to varying network conditions during the session, as well as to a deg- radation priority indicated by the SLConfigDescriptor. The grouping mechanism can be also used for grouping SL-PDUs, or possible Flex- Mux PDUs (the length of which is today limited to 256 bytes, length field of 8 bits). The payload format also supports a fragmentation mechanism where the full AUs or the partial AUs passed by the compression layer are fragmented at arbitrary boundaries. This may result in fragments that are not independently decodable. This kind of fragmentation Guillemot/Christ/Wesner/Klemets. [Page 8] Internet-Draft Payload Format for MPEG-4 Streams March 2000 may be used in situations when the RTP packets are not allowed to exceed the path-MTU size. However, this media-unaware fragmentation is not recommended. It is preferable that the compression layer provides partial AUs, in the form of typed segments, of a size small enough so that the resulting RTP packet can fit the MTU size. However, it may be useful in the case of large audio frames which would have to be fragmented to fit the MTU size. Consecutive segments (e.g. video packets [5]) of the same type will be packed consecutively in the same RTP payload. The compression layer should provide partial AUs, of a size small enough so that the resulting RTP packet can fit the MTU size. Note that passing par- tial AUs of small size will also facilitate congestion and rate con- trol based on the real output buffer management. RTP packets that transport fragments belonging to the same AU will have their RTP timestamp set to the same value. 4 Payload Format specification The packet will consist of an RTP header followed by possibly multi- ple payloads. They should be sent in the decoding order. 4.1 RTP Header Usage Each RTP packet starts with a fixed RTP header. The following fields of the fixed RTP header are used: - Marker bit (M bit): The marker bit of the RTP header is set to 1 when the current packet carries the end of an access unit AU. - Payload Type (PT): Different payload types should be assigned for MPEG4 ES, MPEG4 SL-PDU, MPEG-4 FlexMux streams. A payload type in the dynamic range should be chosen. - Timestamp: The RTP timestamp is set to the composition timestamp (CTS), if its presence is indicated by the SLConfigDescriptor, and if its length is not more than 32 bits. Otherwise, i.e. if the CTS is not present or when not using the OD framework, the RTP timestamp should be set to the sampling instant of the first AU contained in the packet. The RTP timestamp encodes in this case the presentation time of the first AU contained in the packet. The RTP timestamp may be the same o successive packets if an AU occupies more than one packet. If the packet contains only 'extension' data objects (see below), then the RTP timestamp is set at the value of the presentation time of the AU to which the first extension data object (e.g. FEC or redundant data) applies. Guillemot/Christ/Wesner/Klemets. [Page 9] Internet-Draft Payload Format for MPEG-4 Streams March 2000 SSRC: A mapping between the ES identifiers and the SSRCs should be provided via out-of-band signaling (e.g. SDP). 4.2 Payload Header The payload header is always present, with a variable length, and is defined as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E| XT | LENGTH |EBITS| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . + Extension data +-+-+-+-+-+-+-+-+ . |G|E|F| res | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LENGTH | FOFFSET | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| | . . Media Payload | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: RTP payload format G (Group) (1 bit): If this field is 1, it indicates that the object associated to the current header is followed by another object. E (Extension) (1 bit): If its value is 1 then the next object contains Extension data. If its value is 0, then the next object contains AU data (full AU or partial AU - typed segment -). res (Reserved) (5 bits): this field is only present if the E-field is 0, resulting in always 1 byte for {G,E=1,XT} or {G,E=0,F,res} XT (Extension type) (6 bits): This field is only present if E is set to 1. It then specifies the type of extension data. Examples of types will be FEC data with the specification of the FEC coding scheme (parity codes, block codes such as Reed Solomon codes,etc.), redundant data with duplicated high priority headers etc. LENGTH (13 bits): this field specifies the length in bytes of the next object. If the object is the last object of the payload (G=0) then this field is not present. Guillemot/Christ/Wesner/Klemets. [Page 10] Internet-Draft Payload Format for MPEG-4 Streams March 2000 EBITS (3 bits): Indicates the number of bits that shall be ignored in the last byte of the extension data. If the object is the last object of the payload (G=0) then this field is not present. F (Fragmentation) (1 bit): This field is only present when the E- field is 0. If its value is 1, then the next object is a fragment of a typed segment. If this field is 0, then the next object is a com- plete typed segment or complete AU. FOFFSET (16 bits): This field is present only when the F field is present and F=1. It contains the byte offset of the first byte of the fragment of the segment from the beginning of the AU. This field should be indeed rarely present, but may be useful to position the segment in the AU, when large Aus (eg. audio frames) have to be fragmented. 4.3 Payload for the transport of ES An AU may be fragmented across packets. However, AU headers and independently decodable partial AUs (or segments, e.g. video packets in the case of video streams) shall not be split across RTP packets. All AU-level decoder configuration information can be considered as information of high priority, since, if lost, the whole AU is lost. The extension data field may then be used for repeating the corre- sponding headers. 4.4 Payload for the transport of SL-PDUs First SL-PDU in the payload: If the presence of the DTS - Decoding Time Stamp - is indicated by the SLConfigDescriptor, then the DTS value is placed as the first data of the media payload, the length of the field being provided by the SLConfigDescriptor. If the presence of the OCR - Object Clock Reference - is indicated by the SLConfigDescriptor, then the OCR value is placed as the sec- ond field of the media payload, the length of the field being pro- vided by the SLConfigDescriptor. If the payload format is used to accommodate SL-packet streams, the SN number, if present, can be placed as the third field of the media payload. Corresponding length values are provided by the SLConfigDe- scriptor. If the resulting optional parameters consume a non-integer number of bytes, zero padding bits must be inserted at the end of these pa- rameters to byte-align the rest of the payload. Guillemot/Christ/Wesner/Klemets. [Page 11] Internet-Draft Payload Format for MPEG-4 Streams March 2000 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Payload Header | Optional Extension| Opt. parameters | Media | | | data | as indicated by |.........| | | | SLConfigDesc | payload | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6: Portrait of the unified approach for transport of ES and SL packetized streams In scenarios where the sync layer is used without a need for further protection, the payload will be as illustrated in Figure 7. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E|F| res | optional SL header parameters as indicated by . +-+-+-+-+-+-+-+-+ the SLConfigDescriptor . | . . Media payload | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7: Sample RTP payload for SL PDU transport. N SL-PDUs in one RTP packet: The first SL-PDU in the packet will be treated as above. Each of the subsequent SL-PDUs will be a media object delimited by G,E, F, RES and LENGTH fields. The corresponding media object will start by the SL-PDU header immediately followed by the SL-PDU payload. The LENGTH field will indicate the length of the corresponding SL-PDU. 4.5 Payload for the transport of FlexMux-PDUs The RTP payload consists of one or more complete FlexMux-PDUs as visualized in the figure below. Each FlexMux-PDU consists of an index element that identifies the content of the FlexMux-PDU, the length of the payload, a version number (only for MuxCode mode) and the payload itself as specified in [1]. The length, the index and the version (only in the MuxCode mode) elements are placed as the first bytes in the media payload. If preceded by an extension data field, the whole payload of the packet will be as shown in figure 8 below. If the extension data field is not used then the payload will be as shown in figure 9. Guillemot/Christ/Wesner/Klemets. [Page 12] Internet-Draft Payload Format for MPEG-4 Streams March 2000 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E| XT | LENGTH |EBITS| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . + Extension data +-+-+-+-+-+-+-+-+ . |G|E|F| res | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | index | length | version | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . . FlexMux PDU Payload | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8: Sample RTP payload for FlexMux-PDU transport with protection support. The usage of the extension data mechanism allows to apply the FEC directly on FlexMux PDUs, hence, especially in the simple FlexMux mode, to apply different levels of protection to FlexMux PDUS transporting data from streams of different types (eg. BIFS, OD, IPMP, Audio, Video). 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |G|E| res | index | length | version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . FlexMux PDU payload . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 9: Sample RTP payload for FlexMux-PDU transport 5 Extension data field for FEC data 5.1 Extension data field for Parity Codes The Extension data field can be used for transporting FEC (parity codes) data in the spirit of [9]. The XT field is set to the type associated to the FEC mechanism (parity codes) used. The XT field Guillemot/Christ/Wesner/Klemets. [Page 13] Internet-Draft Payload Format for MPEG-4 Streams March 2000 semantic, with all the FEC mechanisms supported, is announced via a non-RTP out of band signaling, such as SDP [7], with appropriate ex- tensions. Then the FEC mechanisms can, during the session, and de- pending on the segment type, and on the network characteristics, be adapted with a simple in-band signaling. The FEC operation, as defined in [9], acts on a stream of media packets without extension data, and generates a stream of FEC pack- ets. The media payload of the above media packets is then encapsu- lated in the object containing the AU data. The FEC header and FEC data are encapsulated in the extension data field. The extension data length field is set to the length of the FEC header plus FEC payload. The FEC header in the case of parity codes is given in Figure 10. It is inspired from the header specified in [9], with the following modifications: 1)- the PT recovery field is not used, since the payload type of the packets transported in a given channel is supposed to be known, namely to be of the type corresponding to this proposed payload; 2)- a R bit has been added in order to protect the marker bit of the media packets; 3)- In order for the FEC header to be byte-aligned, it is also proposed to reduce the mask length by 2 bits (22 bits instead of 24). This should be acceptable, since 24 bits induces a very high delay. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SN Base | length recovery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|R| Mask | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . TS Recovery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 10: FEC Header for Parity Codes On the receiver side, the FEC packets will be reconstructed as de- fined in [9], by copying the sequence number, SSRC, CC field, RTP version and extension bit from the RTP header of the packets re- ceived. The fields SN base, E, Mask, TS recovery of the FEC header are de- fined as in [9]. The bit R is the Marker recovery bit. The marker bit is computed from the RTP media packets marker bits M, to which is applied the protection operation. The Length Recovery field determines the length of the recovered packets and is here computed via the protection operation applied to the 16 bit natural binary representation of the lengths (in bytes) Guillemot/Christ/Wesner/Klemets. [Page 14] Internet-Draft Payload Format for MPEG-4 Streams March 2000 of the media payload, CSRC list, extension and padding of media packets associated with this FEC data, PLUS THE MARKER BIT. The length recovery field makes it possible to apply the procedure to media packets that are not of the same length. The protection also applies to sync layer parameters when present in the payload of the media packets. The advantage of the approach - with respect to having separate FEC packets - is a reduced overhead for sending the FEC data. It is also proposed to allocate 3 Extension Types to parity codes with 3 different default masks in order to reduce the overhead of the FEC header which would therefore become as in Figure 11below: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SN Base | length recovery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|R| TS Recovery . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . |res | +-+-+-+-+-+-+-+-+ Figure 11: Simplified FEC Header for Parity Codes (with default masks) The Extension data field can be used for transporting FEC (parity codes) data in the spirit of [10]. The XT field is set at to the type associated to the FEC mechanism (parity codes) used. The XT field semantic, with all the FEC mechanisms supported, is announced via a non-RTP out of band signaling, such as SDP [7], with appro- priate extensions. The FEC operation, as defined in [10], acts on a stream of media packets without extension data, generating a stream of FEC packets. The media payload of the above media packets is then encapsulated in the object containing the AU data. The FEC header and FEC data are encapsulated in the extension data field. The extension data length field is set to the length of the FEC header plus FEC payload. The FEC header for Reed-Solomon codes is provided in figure 12. It is inspired from the header specified in [10], with the following modifications: 1)- the PT recovery field is not used, since the pay- load type of the packets transported in a given channel is supposed to be known, namely to be of the type corresponding to this proposed payload; 2)- a R bit has been added in order to protect the marker Guillemot/Christ/Wesner/Klemets. [Page 15] Internet-Draft Payload Format for MPEG-4 Streams March 2000 bit of the media packets; 3)- In order for the FEC header to be byte-aligned, it is also proposed to reduce the length of the K field to 6 bits instead of 8 bits. Indeed, 8 bits would allow to process 256 media packets inducing a very high delay. The length of the N field is also reduced to 7 bits (corresponding to the maximum code rate of ») instead of 8 bits, and accordingly reduce the length of the i field from 8 to 6 bits, since the i field indicates the po- sition of the packet within the N-K FEC packets.4)- A P field has been added allowing for interleaving in order to create a FEC code capable of correcting longer bursts of packet losses. The P field defines the interleaving periodicity minus 1, as illustrated in fig- ure 11 below for the special case of P=7. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SN Base | length recovery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|R| N | k | i | P |TS Recovery . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . TS Recovery (cnt'd) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 12: FEC Header for Reed-Solomon Codes The advantage of the approach - with respect to having separate FEC packets - is a reduced overhead for sending the FEC data. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1 | 8 | 15 | 22 | à.. |mn-6 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 2 | 9 | 16 | 23 | à. |mn-5 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 7 | 14 | 21 | 28 | à.. |mn | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 13: Example of Interleaving (for P=7) Guillemot/Christ/Wesner/Klemets. [Page 16] Internet-Draft Payload Format for MPEG-4 Streams March 2000 6 Multiplexing MPEG-4 applications can involve a large number of ESs, and thus also a large number of RTP sessions. A multiplexing scheme allowing se- lective bundling of ES may therefore be necessary for some applica- tions. The multiplexing problem is outside the scope of this payload format. 7 Security Considerations RTP packets transporting information with the proposed payload for- mat are subject to the security considerations discussed in the RTP specification [1]. This implies that confidentiality of the media streams is achieved by encryption. If the entire stream (extension data and AU data) is to be secured and all the participants are expected to have the keys to decode the entire stream, then the encryption is performed in the usual manner, and there is no conflict between the two operations (encapsulation and encryption). The need for a portion of stream (e.g. extension data) to be en- crypted with a different key, or not to be encrypted, would require application level signaling protocols to be aware of the usage of the XT field, and to exchange keys and negotiate their usage on the media and extension data separately. 8 Authors Addresses Christine Guillemot INRIA Campus Universitaire de Beaulieu 35042 RENNES Cedex, FRANCE email: Christine.Guillemot@irisa.fr Paul Christ Computer Center - RUS University of Stuttgart Allmandring 30 D70550 Stuttgart, Germany. email: Paul.Christ@rus.uni-stuttgart.de Stefan Wesner Computer Center - RUS University of Stuttgart Allmandring 30 D70550 Stuttgart, Germany. email: wesner@rus.uni-stuttgart.de Anders Klemets 1 Microsoft Way Redmond, WA 98052-6399 USA. E-mail: anderskl@microsoft.com Guillemot/Christ/Wesner/Klemets. [Page 17] Internet-Draft Payload Format for MPEG-4 Streams March 2000 9 References [1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson "RTP: A Transport Protocol for Real Time Applications", RFC 1889, Internet Engineering Task Force, January 1996. [2] A. Klemets, 'Common Generic RTP Payload Format', draft-klemets generic-rtp-00, March 13, 1998. [3] A. Periyannan, D. Singer, M. Speer, 'Delivering Media Generi- cally over RTP', draft-periyannan-generic-rtp-00, March 13, 1998 [4] ISO/IEC 14496-1 FDIS MPEG-4 Systems November 1998 [5] ISO/IEC 14496-2 FDIS MPEG-4 Visual November 1998 [6] Mark Handley, Van Jacobson, 'SDP: Session Description Proto- col', draft-ietf-mmusic-sdp-07.txt, 2nd Apr 1998. [7] ISO/IEC 14496-3 FDIS MPEG-4 Audio November 1998. [8] J. Rosenberg, H. Schulzrinne, "An RTP Payload format for Generic Forward Error Correction", draft-ietf-avt-fec-05.txt, 26 Feb. 1999. [9] J. Rosenberg, H. Schulzrinne, "An RTP Payload format for Reed Solomon Codes", draft-ietf-avt-reedsolomon-00.txt, 3 November 1998. Guillemot/Christ/Wesner/Klemets. [Page 18]