Internet Engineering Task Force G. Liebl Internet Draft LNT, Munich Univ. of Technology Document: draft-ietf-avt-uxp-06.txt October 2003 M. Wagner, J. Pandel, W. Weng Expires: April 2004 Siemens AG, Munich An RTP Payload Format for Erasure-Resilient Transmission of Progressive Multimedia Streams Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document specifies an efficient way to ensure erasure- resilient transmission of progressively encoded multimedia sources via RTP using Reed-Solomon (RS) codes together with interleaving. The level of erasure protection can be explicitly adapted to the importance of the respective parts in the source stream, thus allowing a graceful degradation of application quality with increasing packet loss rate on the network. Hence, this type of unequal erasure protection (UXP) schemes is intended to cope with the rapidly varying channel conditions on wireless access links to the Internet backbone. Furthermore, protection of non-progressive multimedia streams is ensured, since equal erasure protection (EXP) represents a subset of generic UXP. By applying interleaving and RS codes a payload format is defined, which can be easily integrated into the existing framework for RTP. Table of Contents 1. Introduction.................................................2 2. Conventions used in this Document............................3 3. Reed-Solomon Codes...........................................6 4. Progressive Source Coding....................................7 5. General Structure of UXP Schemes.............................8 Liebl,Wagner,Pandel,Weng [Page1] Internet Draft Unequal Erasure Protection October 2003 6. RTP payload structure.......................................13 7. Indication of UXP in SDP....................................20 8. Security Considerations.....................................21 9. Application Statement.......................................22 10. Intellectual Property Considerations.......................23 11. References.................................................23 12. Acknowledgments............................................24 13. Author's Addresses.........................................24 1. Introduction Due to the increasing popularity of high-quality multimedia applications over the Internet and the high level of public acceptance of existing mobile communication systems, there is a strong demand for a future combination of these two techniques: One possible scenario consists of an integrated communication environment, where users can set up multimedia connections anytime and anywhere via radio access links to the Internet. For this reason, several packet-oriented transmission modes like EGPRS (Enhanced General Packet Radio Service) or UMTS (Universal Mobile Telecommunications System) can be used, which are mostly based on the same principle: Long message blocks, i.e. IP packets, that enter the wireless part of the network are split up into segments of desired length, which can be multiplexed onto link layer packets of fixed size. The latter are then transmitted sequentially over the wireless link, reassembled, and passed on to the next network element. However, compared to the rather benign channel characteristics on today's fixed networks, wireless links suffer from severe fading, noise, and interference conditions in general, thus resulting in a comparably high residual bit error rate after detection and decoding. By use of efficient CRC-mechanisms, these bit errors are usually detected with very high probability, and every corrupted segment, i.e. which contains at least one erroneous bit, is discarded to prevent error propagation through the network. But if only one single segment is missing at the reassemble stage, the upper layer IP packet cannot be reconstructed anymore. The result is a significant increase in packet loss rate at IP level. Since most multimedia applications can only recover from a very limited number of lost IP packets, it is vitally necessary to keep packet loss at IP level within a certain acceptable range depending on the individual quality-of-service requirements. However, due to the delay constraints typically imposed by most audio or video codecs, the use of ARQ-schemes is often prohibited both at link level and at transport level. In addition, retransmission strategies cannot be applied to any broadcast or multicast scenarios. Thus, forward erasure correction strategies have to be considered, which provide a simple means to Liebl,Wagner,Pandel,Weng [Page 2] Internet Draft Unequal Erasure Protection October 2003 reconstruct the content of lost packets at the receiver from the redundancy that has been spread out over a certain number of consecutive packets. There already exist some previous studies and proposals regarding erasure-resilient packet transmission [1,8]. Since most of them are based on the assumption that all parts in a message block are equally important to the receiver, i.e. the respective application cannot operate on partly complete blocks, they were optimized with respect to assigning equal erasure protection over the whole message block. However, recent developments both in audio and video coding have introduced the notion of progressively encoded media streams, for which unequal erasure protection strategies seem to be more promising, as it will be explained in more detail below. Although the scheme defined in [1] is in principle capable of supporting some kind of unequal erasure protection, possible implementations seem to be quite complex with respect to the gain in performance. Finally, in [1] it is assumed that consecutive RTP packets can have variable length, which would cause significant segmentation overhead at the link layer of almost all wireless systems. This document defines a payload format for RTP, such that different elements in a progressively encoded multimedia stream can be protected against packet erasures according to their respective quality-of-service requirement. The general principle, including the use of Reed-Solomon codes together with an appropriate interleaving scheme for adding redundancy, follows the ideas already presented in [3], but allows for finer granularity in the structure of the progressive media stream. The proposed scheme is generic in the way that it (1) is independent of the type of media stream, be it audio or video, and (2) can be adapted to varying transmission quality very quickly by use of inband-signaling. 2. Conventions used in this Document The following terms are used throughout this document: 1.) Segment: denotes a link layer transport unit. 2.) Segmentation/Reassembly Process: If the size of the transport units at the link layer is smaller than that at the upper layers, message blocks have to be split up into several parts, i.e. segments, which are then transmitted subsequently over the link. If nothing is lost, the original message block can be restored at the receiving entity (reassembly). 3.) Codec: denotes a functional pair consisting of a source encoding unit at the sender and a corresponding source decoding unit at the receiver; usually standardized for different media applications like audio or video. Liebl,Wagner,Pandel,Weng [Page 3] Internet Draft Unequal Erasure Protection October 2003 4.) Media stream: A bitstream. which results at the output of an encoder for a specific media type, e.g. H.263, MPEG-4 Visual. 5.) Progressive media stream: A media stream which can be divided into successive elements. The distinct elements are of different importance to the decoding process and are commonly ordered from highest to least importance, where the latter elements depend on the previous. 6.) Progressive source coding: results in a progressive media stream. 7.) Reed-Solomon (RS) code: belongs to the class of linear nonbinary block codes, and is uniquely specified by the block length n, the number of parity symbols t, and the symbol alphabet. 8.) n: is a variable, which denotes both the block length of a RS codeword, and the number of columns in a TB (see 19). 9.) k: is a variable, which denotes the number of information symbols in an RS codeword. 10.) t: is a variable, which denotes the number of parity symbols in an RS codeword. 11.) Erasure: When a packet is lost during transmission, an erasure is said to have happened. Since the position of the erased packet in a sequence is usually known, a corresponding erasure marker can be set at the receiving entity. 12.) Base layer: comprises the first and most important elements of the progressive media stream, without which all subsequent information is useless. 13.) Enhancement layer: comprises one or more sets of the less important subsequent elements of the progressive media stream. A specific enhancement layer can be decoded, if and only if the base layer and all previous enhancement layer data (of higher importance) are available. 14.) Info stream: denotes the bitstream which has to be protected by the UXP scheme. It usually consists of the media stream (progressively source encoded or not), which is arranged according to a desired syntax (e.g. to achieve an appropriate framing, see Sect. 6.3 ). In any case, it is assumed that every info stream is already octet-aligned according to the standard procedures defined in the context of the used syntax specifications. 15.) Info octet: Denotes one element of the info stream. 16.) Transmission block (TB): denotes a memory array of L rows and n columns. Each row of a TB represents a RS codeword, whereas each column, together with the respective UXP header (see 36) in front, forms the payload of a single RTP packet. Each TB consists of at least two distinct transmission sub blocks (TSB, see20): The first L_s rows belong to the signaling TSB, whereas the last L_d=(L-L_s) rows belong to one or more data TSB. 17.) Transmission sub block (TSB): denotes a memory array of 0 +-+-+-+-+-+-+-+-+-+ |&|&|&|&|&|&|&|*|*| +-+-+-+-+-+-+-+-+-+ <------------><---> k=n-t t (&:info) (*:parity) Fig. 1: Structure of a systematic RS codeword Liebl,Wagner,Pandel,Weng [Page 6] Internet Draft Unequal Erasure Protection October 2003 4. Progressive Source Coding The output of an encoder for a specific media type, e.g. H.263 or MPEG-4 Visual is said to be a media stream. If the media stream consists of several distinct elements, which are of different importance with respect to the quality of the decoding process at the receiver, then the media stream is progressive. The progressive media stream is often organized in separate layers. Hence, there exists at least one layer, often called base layer, without which decoding fails at all, whereas all the other layers, often called enhancement layers, just help to continually improve the quality. Consequently, the different layers are usually contained in the (source-)encoded media stream in decreasing order of importance, i.e. the base layer data is followed by the various enhancement layers. An example can be found in the fine granular scalability modes which have been proposed to various standardization bodies like MPEG, where the resolution of the scaling process in the progressive source encoder is as low as one symbol in the enhancement layer [4]. Another example is given by data partitioning which can be applied to the ITU/MPEG H.26L standard [5], MPEG-4, and H.263++. Also, the existence of I,P, and B frames in streams which comply with standards like MPEG-2 can be interpreted as progressive. From the above definition, it is quite obvious that the most important base layer data must be protected as strongly as possible against packet loss during transmission. However, the protection of the enhancement layers can be continually lowered, since a loss at these stages has only minor consequences for the decoding process. Thus, by using a suitable unequal erasure protection strategy across a progressive media stream, the overhead due to redundancy is reduced. Furthermore, if channel conditions get worse during transmission, only more and more enhancement layers are lost, i.e. a graceful degradation in application quality at the receiver is achieved [6]. Nevertheless, it should be mentioned that the specific structure of the media stream strongly depends on the actual media codec in use and does not always provide suitable mechanisms for transport over data networks, like framing (see also Sect. 6.3 ). In order to keep the description of the unequal erasure protection strategy in Sect. 5 as general as possible, the final bitstream which has to be protected by the proposed UXP scheme will be called "info stream" in the following. Furthermore, it is assumed that every info stream is already octet-aligned according to the standard procedures defined in the context of the used syntax specifications. Liebl,Wagner,Pandel,Weng [Page 7] Internet Draft Unequal Erasure Protection October 2003 5. General Structure of UXP Schemes In this section, the principle features of the proposed UXP scheme are described with a special focus on the protection and reconstruction procedure which is applied to the info stream. In addition, the behavior of the sender and receiver is specified as far as it concerns the reconstruction of the info stream. However, the complete UXP payload structure, including the additional UXP header, is described in Sect. 6. The reason for using the term "info stream" as well as the details of the construction are described in Sect. 6.3 . For now, we assume that we have an info stream which has to be protected. Fig. 1 already illustrated the structure of a systematic RS codeword, which shall be represented by a single row with n successive symbols that contain the information and the parity octets. This structure shall now be extended by forming a transmission block (TB) consisting of L codewords of length n octets each, which amounts to a total of L rows and n columns [7]: Each column, together with the respective UXP header in front, shall represent the payload of an RTP packet, i.e. the whole data of a TB is transmitted via a sequence of n RTP packets all carrying a payload of length (L+2) octets (UXP header included). Each TB usually consists of two or more horizontal sub blocks, the so-called transmission sub blocks (TSB), as can be seen in Fig. 2: The first L_s rows always belong to the signaling TSB, which is used to convey the actual redundancy profile in the data part to the receiver (see 6.4.). The following L_d=(L-L_s) rows belong to one or more data TSBs, which contain the interleaved and RS encoded info stream, as will be described below. Liebl,Wagner,Pandel,Weng [Page 8] Internet Draft Unequal Erasure Protection October 2003 Transmission Block (TB) /\ +-+-+-+-+-+-+-+-+-+ /\ | | signaling TSB | | L_s octets | +-+-+-+-+-+-+-+-+-+ \/ | | | /\ /\ | + data TSB #1 + | L_d(1) octets | | | | | | | +-+-+-+-+-+-+-+-+-+ \/ | L octets | | | /\ | payload | + data TSB #2 + | L_d(2) octets | per packet | + | | | L_d oct. | +-+-+-+-+-+-+-+-+-+ \/ | | | . | . | | + . + . | | | . | . | | +-+-+-+-+-+-+-+-+-+ /\ | | | data TSB #z | | L_d(z) octets | \/ +-+-+-+-+-+-+-+-+-+ \/ \/ <-----------------> n packets Fig. 2: General structure of a TB Since the UXP procedure is mainly applied to the data TSBs, it will be described next, whereas the content and syntax of the signaling TSB will be defined in section 6.4. For means of simplification, only one single data TSB will be assumed throughout the following explanation of the encoding and decoding procedure. However, an extension to more than one data TSB per TB is straightforward, and will be shown in section 6.5. As depicted in Fig. 3, the rows of a transmission sub block shall be assembled into T+1 different classes EPC_i, where i=0...T, such that each class contains exactly R_i=|EPC_i| consecutive rows of the matrix, where the R_i have to satisfy the following relationship: R_0+R_1+...+R_T=L_d Liebl,Wagner,Pandel,Weng [Page 9] Internet Draft Unequal Erasure Protection October 2003 Data Transmission Sub Block (data TSB) T <-------> /\ +-+-+-+-+-+-+-+-+-+ /\ | |&|&|&|&|&|*|*|*|*| | | +-+-+-+-+-+-+-+-+-+ | R_T=3 | |&|&|&|&|&|*|*|*|*| | | +-+-+-+-+-+-+-+-+-+ | L_d octets | |&|&|&|&|&|*|*|*|*| \/ per packet | +-+-+-+-+-+-+-+-+-+ /\ | |%|%|%|%|%|%|*|*|*| | R_(T-1)=1 | +-+-+-+-+-+-+-+-+-+ \/ | |$|$|$|$|$|$|$|*|*| . | +-+-+-+-+-+-+-+-+-+ . | |!|!|!|!|!|!|!|!|*| . | +-+-+-+-+-+-+-+-+-+ /\ | |#|#|#|#|#|#|#|#|#| | R_0=1 \/ +-+-+-+-+-+-+-+-+-+ \/ <-----------------> n packets &,%,$,!,# : info octets belonging to a certain info stream in decreasing order of importance * : parity octets gained from Reed-Solomon coding Fig. 3: General structure for coding with unequal erasure protection Furthermore, all rows in a particular class EPC_i shall contain exactly the same number of parity octets, which is equal to the index i of the class. For each row in a certain class EPC_i, the same (n,n-i) RS code shall be applied. As can be observed from Fig. 3, class EPC_T contains the largest number of parity octets per row, i.e. offers the highest erasure protection capability in the block. Consequently, the most important element in the info stream must be assigned to class EPC_T, where the value of T should be chosen according to the desired outage threshold of the application given a certain packet erasure rate on the link. All other classes EPC_(T-1)...EPC_0 shall be sequentially filled with the remaining elements of the info stream in decreasing order of importance, where the optimal choice for the size of each class (0 or more rows), i.e. the structure of the redundancy profile, should depend on the quality-of-service requirements for the various (progressively-encoded) layers. The following set of rules contains a compact description of all the operations that must be performed for each transmission block: 1.) The total number of columns n of the TB shall be chosen according to the actual delay constraints of the application. Liebl,Wagner,Pandel,Weng [Page 10] Internet Draft Unequal Erasure Protection October 2003 2.) Next, the expected number of rows reserved for the signaling TSB has to selected, which limits the data TSB to L_d=(L-L_s) rows. 3.) The maximum erasure correction capability T in the data TSB should be chosen according to the desired outage threshold of the application given the actual packet erasure rate on the link. 4.) The redundancy profile for the rest of the data TSB should depend on the size and number of the various layers in the info stream, as well as the desired probability of successful decoding for each of them (quality-of-service requirement). 5.) Any suitable optimization algorithm may be used for deriving an adequate redundancy profile. However, the result has to satisfy the following constraints: a) All available info octet positions in the data TSB have to be completely filled. If the info stream is too short for a desired profile, media stuffing may be applied to the empty info octet positions at the end of the data TSB by appending a sufficient number of octets (with arbitrary value, e.g. 0x00). The actual number of stuffing symbols per data TSB is then signaled via the respective stuffing indicator (see Sect. 6.4.). However, before resorting to any stuffing, it should be checked whether it is possible to strengthen the protection of certain rows instead, thus improving the overall robustness of the decoding process. b) The info stream SHOULD be fully contained within the data TSB (unless cutting it off at a specific point is explicitly allowed by the properties of the info stream). c) The number of required descriptors and stuffing indicators (see section 6.4.) to signal the profile SHALL NOT exceed the space initially reserved for them in the signaling TSB. Constraints a) and b) should be already incorporated in the optimization algorithm. However, if constraint c) is not met, the data TSB has to be reduced by one row in favor of the signaling TSB to accommodate more space for the descriptors and stuffing indicators, i.e. steps 2-5 have to be repeated until a valid redundancy profile has been obtained. 6.) For each nonempty class EPC_i, i=T...0, in the data TSB, the following steps have to be performed: a) All rows of this specific class SHALL be filled from left to right and top to bottom with data octets of the info stream. b) For each row in the class, the required i parity-check octets are computed from the same set of codewords of an (n,n-i) RS code, and filled in the empty positions at the end of each row. Thus, every row in the class constitutes a valid codeword of the chosen RS code. 7.) After having filled the whole data TSB with information and parity octets, the redundancy profile is mapped to the signaling TSB as described in section 6.4. 8.) Each column of the resulting TB is now read out octet-wise from top to bottom and, together with the respective UXP header (see section 6.2.) in front, is mapped onto the payload section of one and only one RTP packet. Liebl,Wagner,Pandel,Weng [Page 11] Internet Draft Unequal Erasure Protection October 2003 9.) The n resulting RTP packets SHALL be transmitted consecutively to the remote host, starting with the leftmost one. 10.) At the corresponding protocol entity at the remote host, the payload (without the UXP header) of all successfully received RTP packets belonging to the same sending TB SHALL be filled into a similar receiving TB column-wise from top to bottom and left to right. 11.) For every erased packet of a received TB, the respective column in the TB shall be filled with a suitable erasure marker. 12.) Before any other operations can be performed, the redundancy profile has to be restored from the signaling TSB according to the procedure defined in Sect. 6.4.. If the attempt fails because of too many lost packets, the whole TB SHALL be discarded and the receiving entity should wait for the next incoming TB. 13.) If the attempt to recover the redundancy profile has been successful, a decoding operation shall be performed for each row of the data TSB by applying any suitable algorithm for erasure decoding. 14.) For all rows of the data TSB for which the decoding operation has been successful, the reconstructed data octets are read out from left to right and top to bottom, and appended to the reconstructed version of the info stream. One can easily realize that the above rules describe an interleaver, i.e. at the sender a single codeword of a TB is spread out over n successive packets. Thus, each codeword of a transmitted TB experiences the same number of erasures at exactly the same positions. Two important conclusions can be drawn from this: a) Since the same RS code is applied to all rows contained in a specific class, either all of them can be correctly decoded or none. Hence, there exist no partly decodable classes at the receiver. b) If decoding is successful for a certain class EPC_i, all the classes EPC_(i+1)...EPC_T can also be decoded, since they are protected by at least one more parity octet per row. Together with rule 6, it is therefore always ensured, that in case a decodable enhancement layer exists, all other layers it depends on can also be reconstructed! Given the maximum erasure protection value T, the redundancy profile for a data TSB of size (L_d x n) shall be denoted by a so-called erasure protection vector EPV of length (T+1), where EPV:=(R_0,R_1,...,R_(T-1),R_T) From the above definition, it is easy to realize that the trivial cases of no erasure protection and EXP are a subset of UXP: a) no erasure protection at all: all application data is mapped onto class EPC_0, i.e. EPV=(L_d,0,0,...,0). b) EXP: all application data is mapped onto class EPC_T, i.e. EPV=(0,0,...,0,R_T=L_d). Liebl,Wagner,Pandel,Weng [Page 12] Internet Draft Unequal Erasure Protection October 2003 Hence, the UXP payload format also can be used with info streams which are non progressive. 6. RTP payload structure This section is organized as follows. First, the specific settings in the RTP header are shown. Next, the RTP payload header for UXP (the so-called UXP header) is specified. After that, the structure of the bitstream which is protected by UXP, the so-called info stream, is discussed. Finally, the in-band signaling of the erasure protection vector is introduced. For every packet, the UXP payload is formed by reading out a column of the TB and prefixing it with the UXP header. Thus, an UXP-compliant RTP packet looks as follows: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- |RTP Header| UXP Header| one column of the TB | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 6.1 Specific Settings in the RTP Header The timestamp of each RTP packet is set to the sampling time of the first octet of the progressive media stream in the corresponding TB. If several data TSBs are included in one TB, the sampling time of data TSB #1 is relevant. This results in the TS value being the same for all RTP packets belonging to a specific TB. The payload type is of dynamic type, and obtained through out-of- band signaling similar to [1]. End systems, which cannot recognize a payload type, must discard it. The marker bit is set to 1 in the last packet of a TB; otherwise, its value is 0. All other fields in the RTP header are set to those values proposed for regular multimedia transmission using the RTP-format of the media stream which is protected by UXP, e.g for MPEG-4 Visual as specified in RFC 3016. 6.2. Structure of the UXP Header The UXP header shall consist of 2 octets, and is shown in Fig. 4: Liebl,Wagner,Pandel,Weng [Page 13] Internet Draft Unequal Erasure Protection October 2003 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |X| block PT | block length n| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 4: Proposed UXP header The fields in the UXP header are defined as follows: - X (bit 0): extension bit, reserved for future enhancements, currently not in use -> default value: 0 - block PT (bits 1-7): regular RTP payload type to indicate the media type contained in the info stream - block length n (bits 8-15): indicates total number of RTP - packets resulting from one TB (which equals the number of columns of the TB) The syntax of the info stream which is protected by UXP is specified by the RTP payload type field contained in the UXP header. The details of the info stream are described in Sec. 6.3 For example, payload type H.263 means that the info stream conforms to the specifications of the RTP profile for H.263 and does not represent the "raw" H.263 media stream produced by an H.263 encoder. However, UXP can also be applied to the "raw" media stream (in case it is already octet-aligned), if this can be signaled to the receiver via other means, e.g. by use of H.245 or SDP. Based on the RTP sequence number, the marker bit, and the repetition of the block length n in each UXP header, the receiving entity is able to recognize both TB boundaries and the actual position of packets (both received and lost ones) in the TB. 6.3 Framing and Timing Mechanism in UXP: The Info Stream As described in Sect. 5, UXP creates its own packetization scheme by interleaving. The regular framing and timing structure of RTP is therefore destroyed. This section describes which kind of problems arise with interleaving and how they can be solved. This finally leads to the specification of the info stream. The timestamp of an RTP packet usually describes the sampling time of the first octet included in the RTP data packet. This is in principle also true for UXP RTP packets. According to the time stamp definition in Sect. 6.1 every packet contains the timestamp of the sampling time of the first octet in the corresponding TB. Therefore, all packets which belong to one TB contain the same timestamp. This can lead to problems since due to the theoretical size limit of a TB (the limit for the number of columns is 256, and the limit for the number of rows is the maximum packet size), it can contain data from different sampling time instances, e.g. several video frames. Then the timing Liebl,Wagner,Pandel,Weng [Page 14] Internet Draft Unequal Erasure Protection October 2003 information of the later frames has to be determined from the media stream itself and not from the RTP timestamp. A second problem arising with interleaving is that the framing mechanism of RTP is not supported. Consider a media encoder, which does not create a fully decodable bitstream, e.g. H.26L with the video coding layer (VCL) and network adaptation layer (NAL) concept [9]. In this concept the VCL creates slices which are prepared for transmission over several networks at the NAL. Consequently, in case of RTP transmission, header information which allows to decode the slices is included only in the RTP packets. Thus, to fill an UXP TB with the "raw" media stream from the VCL can lead, even without packet losses, to a non-decodable stream. The framing problem can be solved in two ways: One solution could be to use the RTP payload specification of a given media stream to create a bitstream with an appropriate framing, resulting in the so-called info stream. For example, to create an H.263 info stream, the following steps are necessary: 1.) Generate an H.263-compliant media stream, i.e. take a slice or a video frame directly from the H.263 encoder. 2.) Apply the H.263 payload specification (e.g. RFC 2429) to create the RTP payload for only one packet. 3.) Insert the latter row by row into one data TSB. It is possible to apply the procedure mentioned above several times for different data TSBs (see Sect. 6.5.). Due to the in- band signaling, it is possible to determine the beginning and end of every TSB without parsing the whole TB. This allows a fast decomposition of the TB into the different TSBs. Another solution of the framing problem would be to rely on the framing mechanism of the media stream. This is, for example, possible for media streams which contain start codes. The timing problem can be solved in two ways. One solution is to comply with the RTP payload specification of the media stream. If the specification allows to put into one packet octets which belong to different sampling times, this should also be allowed for a TB. The second solution for the timing problem is to rely on the timing information contained in the media stream itself, if available. Therefore, there are two different modes for framing: 1.) RTP payload framing (if an RTP payload specification exists for the media stream), 2.) pure media stream framing (if framing is contained in the media stream), and two different modes for timing: 1.) timing rules of the RTP payload specification for the media stream, 2.) timing information within the media stream. Liebl,Wagner,Pandel,Weng [Page 15] Internet Draft Unequal Erasure Protection October 2003 All combinations of timing and framing modes are possible, but framing mode 1 and timing mode 1 represent the default mode of operation for UXP. The use of other timing and framing modes has to be signaled by non RTP means. The info stream is thus defined by the media stream together with framing and timing rules. In the following, some examples will be given: 1.) The info stream for MPEG-4 Visual according to RFC 3016 is the pure MPEG-4 compliant media stream, since RFC 3016 specifies (in case of video) to take the MPEG-4 compliant video stream as payload. 2.) The info stream for H.263+ can be created according to RFC 2429 as follows: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- |H.263+ payload| H.263+ compliant stream (possibly changed with| |header | respect to RFC 2429) containing a slice/frame | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- This info stream is inserted into one single data TSB. If necessary, for example, if the slices are too short to achieve a reasonable TB size, several info streams can be inserted in one TB by concatenating several data TSBs to a single TB (see Sect. 6.5.). 6.4. In-band Signaling of the Structure of the Redundancy Profile To enable a dynamic adaptation to varying link conditions, the actual redundancy profile used in the data TSB as well as the beginning and end of a TSB must be signaled to the receiving entity. Since out-of-band signaling either results in excessive additional control traffic, or prevents quick changes of the profile between successive TBs, an in-band signaling procedure is desired. Since without knowledge of the correct redundancy profile, the decoding process cannot be applied to any of the erasure protection classes, the redundancy profile has to be protected at least as strongly as the most important element in the info stream. Therefore, an additional class EPC_P is used in the signaling TSB, where the number of parity symbols is by default set to the following value: P=ceil(n/2.0) Hence, up to 50% of the RTP packets can be lost, before the redundancy profile cannot be recovered anymore. This seems to be a reasonable value for the lowest point of operation over a lossy link. Alternatively, P may be explicitly signaled during session setup by means of SDP or H.245 protocol. Consequently, since all other classes must have equal or less erasure protection capability, the maximum allowable value for class EPC_T in the data TSB is now limited to T<=P. Liebl,Wagner,Pandel,Weng [Page 16] Internet Draft Unequal Erasure Protection October 2003 The signaling of the erasure protection vector is accomplished by means of descriptors. In the following we describe an efficient encoding scheme for the descriptors. For each class EPC_i with R_i>0, there is a descriptor DP_i providing information about the size of class EPC_i (i.e. the value of R_i) and establishing a relationship between the erasure protection of class EPC_i and that of the class EPC_(i+j), where j>0 and j is the smallest value for which R_(i+j)>0 is true. A descriptor DP_i is mapped onto one octet, which is sub-divided into two half-octets (i.e. the higher and the lower four bits). The first half-octet is of type unsigned and contains the 4-bit representation of the decimal value R_i. The second half-octet is of type signed and contains the difference in erasure protection between class EPC_i and class EPC_(i+j), i.e. the signed 4-bit representation of the decimal value (-j) (where the MSB denotes the sign, and the lower three bits the absolute value). Note that the erasure protection P of class EPC_p is fixed, whereas the size R_P may vary. Thus, the data to be filled into class EPC_P shall consist of a sequence of descriptors separated by stuffing indicators (see below), where the number of descriptors is primarily given by the number of protection classes EPC_i, 0<=i<=T, in the data TSB with R_i>0. Without a-priori knowledge, the initial value for the size of the signaling TSB, R_P, should be set to one (row). When the number of necessary descriptors and stuffing indicators exceeds the (n- P) information positions, one or more additional rows have to be reserved. This is usually done by increasing the value for L_s to R_P>1, i.e. the data TSB is reduced to (L-R_P) rows. Hence, in order to indicate the actual size of the signaling TSB, an additional descriptor is inserted at the very beginning, which takes on the value 0xq0, where q denotes the (octal) four bit representation of the decimal value R_P. Furthermore, the end of each data TSB is signaled by the otherwise unused descriptor value 0x00, followed by exactly one stuffing indicator (SI). The latter is mapped onto an octet, which is of type unsigned and contains the 8-bit representation of the decimal value of the number of media stuffing symbols used at the end of the respective data TSB. The (extended) sequence of descriptors and stuffing indicators is then mapped to the octet positions in the R_P rows of the signaling TSB from left to right and top to bottom. Each row is then encoded with the same (n,n-P) RS code. If the number of descriptors and stuffing indicators is less than the available octet positions, however, empty positions in class EPC_P may be filled up with the otherwise unused descriptor 0x00. At the receiving entity, the sequence of descriptors shall be recovered by performing erasure decoding on the first row of the TB (which definitely belongs to the signaling TSB) using the same algorithm as later for the data TSB. If successful, the very first descriptor now indicates the number of rows of the signaling TSB, and the next (R_P-1) rows are decoded to Liebl,Wagner,Pandel,Weng [Page 17] Internet Draft Unequal Erasure Protection October 2003 reconstruct the redundancy profile for the data TSB(s), together with the number of media stuffing symbols denoted by the respective SI(s). The complete structure of the TB is now depicted in Fig. 5. Transmission Block (TB) P <---------> /\ +-+-+-+-+-+-+-+-+-+ /\ | |?|?|?|?|*|*|*|*|*| | R_P=1 | +-+-+-+-+-+-+-+-+-+ \/ | |&|&|&|&|&|*|*|*|*| /\ | +-+-+-+-+-+-+-+-+-+ | R_T=3 | |&|&|&|&|&|*|*|*|*| | | +-+-+-+-+-+-+-+-+-+ | L octets | |&|&|&|&|&|*|*|*|*| \/ payload | +-+-+-+-+-+-+-+-+-+ /\ per packet | |%|%|%|%|%|%|*|*|*| | R_(T-1)=1 | +-+-+-+-+-+-+-+-+-+ \/ | |$|$|$|$|$|$|$|*|*| . | +-+-+-+-+-+-+-+-+-+ . | |!|!|!|!|!|!|!|!|*| . | +-+-+-+-+-+-+-+-+-+ /\ | |#|#|#|#|#|#|#|#|#| | R_0=1 \/ +-+-+-+-+-+-+-+-+-+ \/ <-----------------> n packets ? : descriptors and stuffing indicators for in-band signaling of the redundancy profile &,%,$,!,# : info octets belonging to a certain element of the info stream in decreasing order of importance * : parity octets gained from Reed-Solomon coding Fig. 5: General structure for UXP with in-band signaling of the redundancy profile The following simple example is meant to illustrate the idea behind using descriptors: Let an erasure protection vector of length T+1=7 be given as follows: EPV=(R_0,R_1,...,R_5,R_6)=(7,0,2,2,0,3,10) Hence, the length L of the TB (including one row for the signaling TSB) is equal to 7+2+2+3+10+1=25 (rows/octets). If the Liebl,Wagner,Pandel,Weng [Page 18] Internet Draft Unequal Erasure Protection October 2003 width is assumed to be equal to 20 (columns/packets), then the erasure protection of the descriptors is P=10. The corresponding sequence of descriptors can be written as DP=(DP_6,DP_5,DP_3,DP_2,DP_0)=(0xAC,0x39,0x2A,0x29,0x7A), where the values of the descriptors are given in hexadecimal notation. Next, the descriptor indicating the length of the signaling TSB has to be inserted, the end of the data TSB has to be marked by 0x00, and the SI has to be appended. If the number of media stuffing symbols is assumed to be 3, the 10 info octets in the signaling TSB take on the following values (descriptor stuffing included): (0x10,0xAC,0x39,0x2A,0x29,0x7A,0x00,0x03,0x00,0x00) 6.5. Optional Concatenation of Transmission Sub Blocks The following procedure may be applied if a single info stream would be too short to achieve an efficient mapping to a transmission block with respect to the fixed payload length L and the desired number of packets n. For example, intra-coded video frames (I-frames) are usually much larger than the following predicted ones (P-frames). In this case, a certain number z of successive small info streams should be each mapped to a transmission sub block with length L_d(y) and width n, such that L_d(1)+L_d(2)+...+L_d(z)=L_d. The resulting transmission sub blocks can then be easily concatenated to form a TB of size L x n having one common signaling TSB (see Fig. 2): Since the second half-octet of the descriptors is of type signed (cf. Sect. 6.4.), we are able to signal both decreasing and increasing erasure protection profiles. Again, we will give a simple example to illustrate this idea: Let the erasure protection vectors for two concatenated data TSBs be given as follows: EPV1=(R1_0,R1_1,...,R1_5,R1_6)=(0,0,2,2,0,3,10), EPV2=(R2_0,R2_1,...,R2_5,R2_6)=(0,0,2,2,0,3,10). Hence, two single identical data TSBs will be concatenated to form a TB of length L=2*(2+2+3+10)+2=36 (rows/octets). If the width is again assumed to be equal to 20 (columns/packets), then the erasure protection of the descriptors is P=10. We reserve a total of two rows for the signaling TSB. The corresponding sequence of descriptors can now be written as DP=(0xAC,0x39,0x2A,0x29,0xA4,0x39,0x2A,0x29), where the values of the descriptors are given in hexadecimal notation. The values of the first four descriptors are taken from the descriptor of EPV1 as described in Sect. 6.4. (without the SI). The last four descriptors are taken from the descriptor of EPV2 (without SI) with one exception. The fifth descriptor of DP (i.e. 0xA4) is created as follows: The first half-octed is created according to Sect. 6.4. However, the second half-octed describes no longer the difference between R_P and R2_6. It rather describes the Liebl,Wagner,Pandel,Weng [Page 19] Internet Draft Unequal Erasure Protection October 2003 difference between R1_2 and R2_6, i.e. R1_2-R2_6, which can be a positive or negative number. If the number of media stuffing symbols is assumed to be 3 for each data TSB, the 20 info octet positions in the signaling TSB are filled with the following values (descriptor stuffing included): (0x20,0xAC,0x39,0x2A,0x29,0x00,0x03,0xA4,0x39,0x2A,0x29,0x00,0x03 , 0x00,0x00,0x00,0x00,0x00,0x00,0x00) Therefore from the example above, the following general rule MUST be used to create the resulting descriptors for concatenated data TSB #u and data TSB #v, where v=u+1: Let EPVu=(Au_0,Au_1,...) and EPVv=(Av_0, Av_1,...) be the corresponding erasure protection vectors and DPu and DPv the corresponding descriptors created according to Sect. 6.4. (with stuffing). Let w be the smallest index for which Au_w >0. Let x be the largest index for which Av_x >0. The resulting descriptor can be created by concatenation of DPu and DPv where the first descriptor of DPv should be changed as follows: The second half byte is defined by Au_w-Av_x. 7. Indication of UXP in SDP From the discussion in Sect. 6.3 , we know that UXP encapsulates and protects the info stream. The info stream consists usually of a regular RTP-Payload format, e.g. RFC 3016. There is no static payload type assignment for UXP, so dynamic payload type numbers MUST be used. The binding to the number is indicated by an rtpmap attribute. The name used in this binding is "UXP". The payload type number of UXP is indicated in the "m" line of the media as well as the payload type of the info-stream. A sample indication of UXP in SDP is as follows: m = video 8000 RTP/AVP 98 99 a = rtpmap:98 UXP/90000 a = rtpmap:99 MP4V-ES/90000 Here, PT 98 indicates that the payload consists of UXP with the corresponding info stream "MP4V-ES". Alternatively, PT 99 can be used which indicates "MP4V-ES" without UXP. Since UXP is generic, several payload types can be protected. The lines m = video 8000 RTP/AVP 98 99 100 a = rtpmap:98 UXP/90000 Liebl,Wagner,Pandel,Weng [Page 20] Internet Draft Unequal Erasure Protection October 2003 a = rtpmap:99 MP4V-ES/90000 a = rtpmap:100 H263-1998/90000 mean that UXP can be used with either "MP4V-ES" or "H263-1998" as info stream (indicated by PT 98 in the RTP-Header and either block PT=99 or block PT=100 in the UXP-Header). Alternatively, PT=99 or PT=100 in the RTP-Header means the use of "MP4V-ES" or "H263-1998" without UXP. As described in Sect. 6.4., the parameter P has the default value P=ceil(n/2.0), if not otherwise stated. The parameter P MAY be specified explicitly by means of SDP: a = fmtp:98 UXP-prof: fvalue where fvalue is a floating point number in the interval (0 < fvalue <1) and specifies P by P=ceil(n*fvalue). For example, if we set fvalue=0.5, a = fmtp:98 UXP-prof: 0.5 we get the default value for P, since P=ceil(n/2.0). The ABNF for fvalue according to RFC 2234 is fvalue = "0" "." 1*2DIGIT 8. Security Considerations The payload of the RTP-packets consists of an interleaved media and parity stream. Therefore, it is reasonable to encrypt the resulting stream with one key rather than using different keys for media and parity data. It should also be noted that encryption of the media data without encryption of the parity data could enable known-plaintext attacks. The overall proportion between parity octets and info octets should be chosen carefully if the packet loss is due to network congestion. If the proportion of parity octets per TB is increased in this case, it could lead to increasing network congestion. Therefore, the proportion between parity octets and info octets per TB MUST NOT be increased as packet loss increases due to network congestion. The overall ratio between parity and info octets MUST NOT be higher than 1:1, i.e. the absolute bitrate spent for redundancy must not be larger than the bitrate required for transmission of multimedia data itself. Liebl,Wagner,Pandel,Weng [Page 21] Internet Draft Unequal Erasure Protection October 2003 9. Application Statement There are currently two different schemes proposed for unequal error protection in the IETF-AVT: Unequal Level Protection (ULP) and Unequal Erasure Protection (UXP). Although both methods seem to address the same problem, the proposed solutions differ in many respects. This section tries to describe possible application scenarios and to show the strengths and weaknesses of both approaches. The main difference between both approaches is that while ULP preserves the structure of the packets which have to be protected and provides the redundancy in extra packets, UXP interleaves the info stream which has to be protected, inserts the redundancy information, and thus creates a totally new packet structure. Another difference concerns multicast compatibility: It cannot be assumed that all future terminals will be able to apply UXP/ULP. Therefore, backward compatibility could be an issue in some cases. Since ULP does not change the original packet structure, but only adds some extra packets, it is possible for terminals which do not support ULP to discard the extra packets. In case of UXP, however, two separate streams with and without erasure protection have to be sent, which increases the overall data rate. Next, both approaches offer different mechanisms to adjust packet sizes, if necessary: UXP allows to adjust the packet sizes arbitrarily. This is an advantage in case the loss probability is dependent on the packet length, which happens, for example, if the end-to-end connection contains wireless links. In this case proper adjustment of the packet size is one essential network adaptation technique. In addition, if a preencoded stream is sent over the network, the packet size can be adjusted independently of slice structures. Since ULP does not change the existing packetization scheme, this flexibility does not exist. The ability of UXP to adjust the packet size arbitrarily can be especially exploited in a streaming scenario, if a delay of several hundred milliseconds is acceptable. It is then possible to fill several video frames into a single TB of desired size, e.g. a group of pictures consisting of I-frame, P-frames and B- frames. The redundancy scheme can thus be selected in such a way as to guarantee the following property: In case of packet loss, the P-frames are only recoverable if the I-frame on which the decoding of P-frames depends is recoverable. The same is true for B-frames, which can only be decoded if the respective P-frames are recoverable. This prevents situations in which, for example, the B-frames have been received correctly, but the P-frames have been lost, i.e. assures a gradual decrease in application quality also on the frame level. Of course, a similar encoding is possible with ULP. But in this case one might have to send several frames within one packet which leads to large packet sizes. Furthermore, decoding delay is also a crucial issue in communications. Again, both approaches have different delay Liebl,Wagner,Pandel,Weng [Page 22] Internet Draft Unequal Erasure Protection October 2003 properties: UXP introduces a decoding delay because a reasonable amount of correctly received packets are necessary to start decoding of a TB. The delay in general depends on the dimensions of the interleaver. This should be considered for any system design which includes UXP. With ULP, every correctly received media packet can be decoded right away. However, a significant delay is introduced, if packets are corrupted, because in this case one has to wait for several redundancy packets. Thus, the delay is in general dependent on the actual ULP-FEC-packet scheme and cannot be considered in advance during the system design phase. Finally, we want to point out that UXP uses RS codes which are known to be the most efficient type of block codes in terms of erasure correction capability. 10. Intellectual Property Considerations Siemens AG has filed patent applications that might possibly have technical relations to this contribution. On IPR related issues, Siemens AG refers to the Siemens Statement on Patent Licensing, see http://www.ietf.org/ietf/IPR/SIEMENS- General. 11. References Normative References [1] J. Rosenberg and H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", Request for Comments 2733, Internet Engineering Task Force, Dec. 1999. [2] Shu Lin and Daniel J. Costello, Error Control Coding: Fundamentals and Applications, Prentice-Hall, Inc., Englewood Cliffs, N.J., 1983. Informative References [3] A. Albanese, J. Bloemer, J. Edmonds, M. Luby, and M. Sudan, "Priority encoding transmission", IEEE Trans. Inform. Theory, vol. 42, no. 6, pp. 1737-1744, Nov. 1996. [4] W. Li: "Streaming video profile in MPEG-4", IEEE Trans. on Circuits and Systems for Video Technology, Vol. 11, no. 3, 301- 317, March 2001. [5] G. Blaettermann, G. Heising, and D. Marpe: "A Quality Scalable Mode for H.26L", ITU-T SG16, Q.15, Q15-J24, Osaka, May 2000. [6] F. Burkert, T. Stockhammer, and J. Pandel, "Progressive A/V coding for lossy packet networks - a principle approach", Tech. Rep., ITU-T SG16, Q.15, Q15-I36, Red Bank, N.J., Oct. 1999. Liebl,Wagner,Pandel,Weng [Page 23] Internet Draft Unequal Erasure Protection October 2003 [7] Guenther Liebl, "Modeling, theoretical analysis, and coding for wireless packet erasure channels", Diploma Thesis, Inst. for Communications Engineering, Munich University of Technology, 1999. [8] U. Horn, K. Stuhlmuller, M. Link, and B. Girod, "Robust Internet video transmission based on scalable coding and unequal error protection", Image Com., vol. 15, no. 1-2, pp. 77-94, Sep. 1999. [9] S. Wenger, "H.26L over IP: The IP-Network Adaptation Layer", Packet Video 2002, Pittsburgh, Pennsylvania, USA, April 24- 26,2002. 12. Acknowledgments Many thanks to Philippe Gentric, Stephen Casner, and Hermann Hellwagner for helpful comments and improvements. The authors would like to thank Thomas Stockhammer who came up with the original idea of UXP. Also, the help of Gero Baese, Frank Burkert, and Minh Ha Nguyen for the development of UXP is well acknowledged. 13. Author's Addresses Guenther Liebl Institute for Communications Engineering (LNT) Munich University of Technology D-80290 Munich Germany Email: {liebl}@lnt.e-technik.tu-muenchen.de Marcel Wagner, Juergen Pandel, Wenrong Weng Siemens AG - Corporate Technology CT IC 2 D-81730 Munich Germany Email: {marcel.wagner,juergen.pandel,wenrong.weng}@mchp.siemens.de Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise Liebl,Wagner,Pandel,Weng [Page 24] Internet Draft Unequal Erasure Protection October 2003 explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES; EXPRESS OR IMPLIED; INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Liebl,Wagner,Pandel,Weng [Page 25]