idnits 2.17.1 draft-ietf-avt-uxp-02.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 2 longer pages, the longest (page 19) being 60 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2733 (ref. '1') (Obsoleted by RFC 5109) -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force G. Liebl, 2 T.Stockhammer 3 Internet Draft LNT, Munich Univ. 4 of Technology 5 Document: draft-ietf-avt-uxp-02.txt 6 March 1, 2002 M. Wagner, J.Pandel, 7 W. Weng, G. Baese, 8 M. Nguyen, F. Burkert 9 Expires: Sept. 1, 2002 Siemens AG, Munich 11 An RTP Payload Format for Erasure-Resilient Transmission of Progressive 12 Multimedia Streams 14 Status of this Memo 16 This document is an Internet-Draft and is in full conformance with 17 all provisions of Section 10 of RFC2026 []. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. Internet-Drafts are draft documents valid for a maximum of 23 six months and may be updated, replaced, or obsoleted by other 24 documents at any time. It is inappropriate to use Internet- Drafts 25 as reference material or to cite them other than as "work in 26 progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 1. Abstract 34 This document specifies an efficient way to ensure erasure-resilient 35 transmission of progressively encoded multimedia sources via RTP 36 using Reed-Solomon codes. The level of erasure protection can be 37 explicitly adapted to the importance of the respective parts in the 38 source stream, thus allowing a graceful degradation of application 39 quality with increasing packet loss rate on the network. Hence, this 40 type of unequal erasure protection (UXP) schemes is intended to cope 41 with the rapidly varying channel conditions on wireless access links 42 to the Internet backbone. Nevertheless, backward compatibility to 43 currently standardized non-progressive multimedia codecs is ensured, 44 since equal erasure protection (EXP) represents a subset of generic 45 UXP. By defining a comparably simple payload format, the proposed 46 scheme can be easily integrated into the existing framework for RTP. 48 Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page1] 49 2. Conventions used in this document 51 The following terms are used throughout this document: 53 1.) Message block: a higher layer transport unit (e.g. an IP 54 packet), that enters/leaves the segmentation/reassembly stage at the 55 interface to wireless data link layers. 57 2.) Segment: denotes a link layer transport unit. 59 3.) CRC: Cyclic Redundancy Check, usually added to transport units 60 at the sender to detect the existence of erroneous bits in a 61 transport unit at the receiver. 63 4.) Segmentation/Reassembly Process: If the size of the transport 64 units at the link layer is smaller than that at the upper layers, 65 message blocks have to be split up into several parts, i.e. 66 segments, which are then transmitted subsequently over the link. If 67 nothing is lost, the original message block can be restored at the 68 receiving entity (reassembly). 70 5.) Quality-of-service: application-dependent criterion to define a 71 certain desired operation point. 73 6.) Codec: denotes a functional pair consisting of a source encoding 74 unit at the sender and a corresponding source decoding unit at the 75 receiver; usually standardized for different multimedia applications 76 like audio or video. 78 7.) Progressive source coding: results in successive blocks of 79 (source-)encoded data (e.g. a single video or audio frame), each of 80 which can be viewed as a bitstream of certain length, whose distinct 81 elements are of different importance to the reconstruction process 82 at the decoder. Elements are commonly ordered from highest to least 83 importance, where the latter elements depend on the previous. 85 8.) Reed-Solomon (RS) code: belongs to the class of linear nonbinary 86 block codes, and is uniquely specified by the block length n, the 87 number of parity symbols t, and the symbol alphabet. 89 9.) n: is a variable, which denotes both the block length of a RS 90 codeword, and the number of columns in a TB (see 16). 92 10.) k: is a variable, which denotes the number of information 93 symbols in a RS codeword. 95 11.) t: is a variable, which denotes the number of parity symbols in 96 a RS codeword. 98 12.) Erasure: When a packet is lost during transmission, an erasure 99 is said to have happened. Since the position of the erased packet in 100 a sequence is usually known, a corresponding erasure marker can be 101 set at the receiving entity. 103 13.) Base layer: comprises the first and most important elements in 104 a progressively encoded bitstream, without which all subsequent 105 information is useless. 107 14.) Enhancement layer: comprises one or more sets of the less 108 important subsequent elements in a progressively encoded bitstream. 109 A specific enhancement layer can be decoded, if and only if the base 110 layer and all previous enhancement layer data (of higher importance) 111 is available. 113 15.) Info stream: denotes the final bitstream which has to be 114 protected by the proposed UXP scheme. It usually consists of the 115 (source-encoded) bitstream (progressive or not), which is already 116 arranged according to a desired syntax (e.g. as specified in the 117 respective RTP profile for the media codec in use). 118 In any case, it is assumed that every info stream is already octet- 119 aligned according to the standard procedures defined in the context 120 of the used syntax specifications. 122 16.) Transmission block (TB): denotes a memory array of L rows and n 123 columns. Each row of a TB represents a RS codeword, whereas each 124 column, together with the respective UXP header (see 33) in front, 125 forms the payload of a single RTP packet. 126 Each TB consists of at least two distinct transmission sub blocks 127 (TSB, see 17): The first L_s rows belong to the signaling TSB, 128 whereas the last L_d=(L-L_s) rows belong to one or more data TSB. 130 17.) Transmission sub block (TSB): denotes a memory array of 0 308 +-+-+-+-+-+-+-+-+-+ 309 |&|&|&|&|&|&|&|*|*| 310 +-+-+-+-+-+-+-+-+-+ 311 <------------><---> 312 k=n-t t 313 (&:info) (*:parity) 315 Fig. 1: Structure of a systematic RS codeword 317 5. Progressive Source Coding 319 If the output of a multimedia codec, be it audio or video, is said 320 to be progressive, the encoded bitstream must consist of several 321 distinct elements, often organized in separate layers. The latter 322 shall be defined via their relative importance with respect to the 323 quality of the reconstruction process at the receiver. Hence, there 324 exists at least one layer, often called base layer, without which 325 reconstruction fails at all, whereas all the other layers, often 326 called enhancement layers, just help to continually improve the 327 quality. Consequently, the different layers are usually contained in 328 the (source-)encoded bitstream in decreasing order of importance, 329 i.e. the base layer data is followed by the various enhancement 330 layers. 331 An example can be found in the fine granular scalability modes which 332 have been proposed to various standardization bodies like MPEG-4 [4] 333 or ITU (H.26L) [5], where the resolution of the scaling process in 334 the progressive source encoder is as low as one symbol in the 335 enhancement layer. 337 From the above definition, it is quite obvious that the most 338 important base layer data must be protected as strongly as possible 339 against packet loss during transmission. However, the protection of 340 the enhancement layers could be continually lowered, since a loss at 341 this stage has only minor consequences for the reconstruction 342 process. Thus, by using a suitable unequal erasure protection 343 strategy across a progressive source stream, the overhead due to 344 redundancy spent per (channel-)encoded block is reduced. 345 Furthermore, if channel conditions get worse during transmission, 346 only more and more enhancement layers are lost, i.e. a graceful 347 degradation in application quality at the receiver is achieved [6]. 349 Nevertheless, it should be mentioned that the specific structure of 350 a (source-)encoded bitstream strongly depends on the actual media 351 codec in use, and the desired syntax which is used for adapting the 352 output of the codec to a suitable transport level format (see also 353 7.3). In order to keep the description of the unequal erasure 354 protection strategy in section 6 as general as possible, the final 355 bitstream which has to be protected by the proposed UXP scheme will 356 be called "info stream" in the following. Furthermore, it is assumed 357 that every info stream is already octet-aligned according to the 358 standard procedures defined in the context of the used syntax 359 specifications. 361 6. General Structure of UXP schemes 363 In this section, the principle features of the proposed UXP scheme 364 are described with a special focus on the protection and 365 reconstruction procedure which is applied to the info stream. In 366 addition, the behavior of the sender and receiver is specified as 367 far as it concerns the reconstruction of the info stream. However, 368 the complete UXP payload structure, including the additional UXP 369 header, is described in section 7. 371 Fig. 1 already illustrated the structure of a systematic codeword, 372 which shall be represented by a single row and n successive columns 373 that contain the information and the parity bytes. This structure 374 shall now be extended by forming a transmission block (TB) 375 consisting of L codewords of length n bytes each, which amounts to a 376 total of L rows and n columns [7]: Each column, together with the 377 respective UXP header in front, shall represent the payload of an 378 RTP packet, i.e. the whole data of a TB is transmitted via a 379 sequence of n RTP packets all carrying a payload of length (L+2) 380 bytes (UXP header included). 382 The value of L should be chosen in such a way that the whole length 383 of the resulting IP packet (i.e. RTP payload plus sum of RTP, UDP, 384 and IP header) equals a multiple of the segment size on the wireless 385 link to avoid stuffing at the data link layer. 387 Each TB usually consists of two or more horizontal slices, the so- 388 called transmission sub blocks (TSB), as can be seen in Fig. 2: The 389 first L_s rows always belong to the signaling TSB, which is used to 390 convey the actual redundancy profile in the data part to the 391 receiver (see 7.3). The following L_d=(L-L_s) rows belong to one or 392 more data TSBs, which contain the interleaved and RS encoded info 393 stream, as will be described below. 395 Transmission Block (TB) 397 /\ +-+-+-+-+-+-+-+-+-+ /\ 398 | | signaling TSB | | L_s bytes 399 | +-+-+-+-+-+-+-+-+-+ \/ 400 | | | /\ /\ 401 | + data TSB #1 + | L_d(1) bytes | 402 | | | | | 403 | +-+-+-+-+-+-+-+-+-+ \/ | 404 L bytes | | | /\ | 405 payload | + data TSB #2 + | L_d(2) bytes | 406 per packet | + | | | L_d bytes 407 | +-+-+-+-+-+-+-+-+-+ \/ | 408 | | . | . | 409 | + . + . | 410 | | . | . | 411 | +-+-+-+-+-+-+-+-+-+ /\ | 412 | | data TSB #z | | L_d(z) bytes | 413 \/ +-+-+-+-+-+-+-+-+-+ \/ \/ 414 <-----------------> 415 n packets 417 Fig. 2: General structure of a TB 419 Since the UXP procedure is mainly applied to the data TSBs, it will 420 be described next, whereas the content and syntax of the signaling 421 TSB will be defined in section 7.3. 423 For means of simplification, only one single data TSB will be 424 assumed throughout the following explanation of the encoding and 425 decoding procedure. However, an extension to more than one data TSB 426 per TB is straightforward, and will be shown in section 7.4. 428 As depicted in Fig. 3, the rows of a transmission sub block shall be 429 partitioned into T+1 different classes CA_i, where i=0...T, such 430 that each class contains exactly A_i=|CA_i| consecutive rows of the 431 matrix, where the A_i have to satisfy the following relationship: 433 A_0+A_1+...+A_T=L_d 434 Data Transmission Sub Block (data TSB) 435 T 436 <-------> 437 /\ +-+-+-+-+-+-+-+-+-+ /\ 438 | |&|&|&|&|&|*|*|*|*| | 439 | +-+-+-+-+-+-+-+-+-+ | A_T=3 440 | |&|&|&|&|&|*|*|*|*| | 441 | +-+-+-+-+-+-+-+-+-+ | 442 L_d bytes | |&|&|&|&|&|*|*|*|*| \/ 443 per packet | +-+-+-+-+-+-+-+-+-+ /\ 444 | |%|%|%|%|%|%|*|*|*| | A_(T-1)=1 445 | +-+-+-+-+-+-+-+-+-+ \/ 446 | |$|$|$|$|$|$|$|*|*| . 447 | +-+-+-+-+-+-+-+-+-+ . 448 | |!|!|!|!|!|!|!|!|*| . 449 | +-+-+-+-+-+-+-+-+-+ /\ 450 | |#|#|#|#|#|#|#|#|#| | A_0=1 451 \/ +-+-+-+-+-+-+-+-+-+ \/ 452 <-----------------> 453 n packets 455 &,%,$,!,# : info bytes belonging to a certain info stream in 456 decreasing order of importance 457 * : parity bytes gained from Reed-Solomon coding 459 Fig. 3: General structure for coding with unequal erasure protection 461 Furthermore, all rows in a particular class CA_i shall contain 462 exactly the same number of parity bytes, which is equal to the index 463 i of the class. For each row in a certain class CA_i, the same (n,n- 464 i) RS code shall be applied. 466 As can be observed from Fig. 3, class CA_T contains the largest 467 number of parity bytes per row, i.e. offers the highest erasure 468 protection capability in the block. Consequently, the most important 469 element in the info stream must be assigned to class CA_T, where the 470 value of T should be chosen according to the desired outage 471 threshold of the application given a certain packet erasure rate on 472 the link. 473 All other classes CA_(T-1)...CA_0 shall be sequentially filled with 474 the remaining elements of the info stream in decreasing order of 475 importance, where the optimal choice for the size of each class (0 476 or more rows), i.e. the structure of the redundancy profile, should 477 depend on the quality-of-service requirements for the various 478 (progressively-encoded) layers. 480 The following set of rules contains a compact description of all the 481 operations that must be performed for each transmission block: 483 1.) The total number of columns n of the TB shall be chosen 484 according to the actual delay constraints of the application. 486 2.) Next, the expected number of rows reserved for the signaling TSB 487 has to selected, which limits the data TSB to L_d=(L-L_s) rows. 489 3.) The maximum erasure correction capability T in the data TSB 490 should be chosen according to the desired outage threshold of the 491 application given the actual packet erasure rate on the link. 493 4.) The redundancy profile for the rest of the data TSB should 494 depend on the size and number of the various layers in the info 495 stream, as well as the desired probability of successful decoding 496 for each of them (quality-of-service requirement). 498 5.) Any suitable optimization algorithm may be used for deriving an 499 adequate redundancy profile. However, the result has to satisfy the 500 following constraints: 501 a) All available info byte positions in the data TSB have to be 502 completely filled. If the info stream is too short for a desired 503 profile, media stuffing may be applied to the empty info byte 504 positions at the end of the data TSB by appending a sufficient 505 number of bytes (with arbitrary value, e.g. 0x00). The actual number 506 of stuffing symbols per data TSB is then signaled via the respective 507 stuffing indicator (see 7.3). However, before resorting to any 508 stuffing, it should be checked whether it is possible to strengthen 509 the protection of certain rows instead, thus improving the overall 510 robustness of the decoding process. 511 b) The info stream should be fully contained within the data TSB 512 (unless cutting it off at a specific point is explicitly allowed by 513 the properties of the used media codec). 514 c) The number of required descriptors and stuffing indicators (see 515 section 7.3) to signal the profile shall not exceed the space 516 initially reserved for them in the signaling TSB. 517 Constraints a) and b) should be already incorporated in the 518 optimization algorithm. However, if constraint c) is not met, the 519 data TSB has to be reduced by one row in favor of the signaling TSB 520 to accomodate more space for the descriptors and stuffing 521 indicators, i.e. steps 2-5 have to be repeated until a valid 522 redundancy profile has been obtained. 524 6.) For each nonempty class CA_i, i=T...0, in the data TSB, the 525 following steps have to be performed: 526 a) All rows of this specific class shall be filled from left to 527 right and top to bottom with data bytes of the info stream in 528 decreasing order of importance (i.e. starting with the most 529 important element). 530 b) For each row in the class, the required i parity-check bytes are 531 computed from the same set of codewords of an (n,n-i) RS code, and 532 filled in the empty positions at the end of each row. Thus, every 533 row in the class constitutes a valid codeword of the chosen RS code. 535 7.) After having filled the whole data TSB with information and 536 parity bytes, the redundancy profile is mapped to the signaling TSB 537 as described in section 7.3. 539 8.) Each column of the resulting TB is now read out byte-wise from 540 top to bottom and, together with the respective UXP header (see 541 section 7.2) in front, is mapped onto the payload section of one and 542 only one RTP packet. 544 9.) The n resulting RTP packets shall be transmitted subsequently to 545 the remote host, starting with the leftmost one. 547 10.) At the corresponding protocol entity at the remote host, the 548 payload (without the UXP header) of all successfully received RTP 549 packets belonging to the same sending TB shall be filled into a 550 similar receiving TB column-wise from top to bottom and left to 551 right. 553 11.) For every erased packet of a received TB, the respective column 554 in the TB shall be filled with a suitable erasure marker. 556 12.) Before any other operations can be performed, the redundancy 557 profile has to be restored from the signaling TSB according to the 558 procedure defined in section 7.3. If the attempt fails because of 559 too many lost packets, the whole TB shall be discarded and the 560 receiving entity should wait for the next incoming TB (the source 561 decoder may be informed about the missing info stream, if required). 563 13.) If the attempt to recover the redundancy profile has been 564 successful, a decoding operation shall be performed for each row of 565 the data TSB by applying any suitable algorithm for erasure 566 decoding. 568 14.) For all rows of the data TSB for which the decoding operation 569 has been successful, the reconstructed data bytes are read out from 570 left to right and top to bottom, and appended to the reconstructed 571 version of the info stream. 573 15.) For all rows of the data TSB for which the decoding operation 574 has failed, a sufficient number of suitable dummy symbols may be 575 added to the reconstructed info stream to inform the source decoder 576 about the missing symbols. 578 One can easily realize that the above rules describe an interleaver, 579 i.e. at the sender a single codeword of a TB is spread out over n 580 successive packets. Thus, each codeword of a transmitted TB 581 experiences the same number of erasures at exactly the same 582 positions. 583 Two important conclusions can be drawn from this: 584 a) Since the same RS code is applied to all rows contained in a 585 specific class, either all of them can be correctly decoded or not. 586 Hence, there exist no partly decodable classes at the receiver. 587 b) If decoding is successful for a certain class CA_i, all the 588 classes CA_(i+1)...CA_T can also be decoded, since they are 589 protected by at least one more parity byte per row. Together with 590 rule 6, it is therefore always ensured, that in case a decodable 591 enhancement layer exists, all other layers it depends on can also be 592 reconstructed! 594 Given the maximum erasure protection value T, the redundancy profile 595 for a data TSB of size (L_d x n) shall be denoted by a so-called 596 erasure protection vector AV of length (T+1), where 598 AV:=(A_0,A_1,...,A_(T-1),A_T) 600 From the above definition, it is easy to realize that the trivial 601 cases of no erasure protection and EXP are a subset of UXP: 602 a) no erasure protection at all: all application data is mapped onto 603 class CA_0, i.e. AV=(L_d,0,0,...,0). 604 b) EXP: all application data is mapped onto class CA_T, i.e. 605 AV=(0,0,...,0,A_T=L_d). 607 Hence, backward compatibility to currently standardized non- 608 progressive multimedia codecs is definitely achieved. 610 7. RTP payload structure 612 For every packet whose payload is formed by reading out a column of 613 the TB, the RTP header must be followed by an UXP header. 615 7.1. Specific settings in the RTP header 617 The timestamp of each RTP packet resulting from reading out a TB is 618 set to the time instant when the first byte of the progressive 619 source data stream has been written into the TB. This results in the 620 TS value being the same for all RTP packets belonging to a specific 621 TB. 623 The payload type is of dynamic type, and obtained through out-of- 624 band signaling similar to [1]. The signaling protocol must establish 625 a payload length to be associated with the payload type value. End 626 systems, which cannot recognize a payload type, must discard it. 628 The marker bit is set to 1 for every last packet in a TB. Otherwise, 629 its value is 0. 631 All other fields in the RTP header are set to those values proposed 632 for regular multimedia transmission using the same source codecs, 633 but no erasure protection scheme enabled. 635 The RTP payload shall consist of the UXP header followed by one 636 column of the TB. 638 7.2. Structure of the UXP header 640 The UXP header shall consist of 2 octets, and is shown in Fig. 4: 642 0 1 1 1 1 1 1 643 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 644 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 645 |X| block PT | block length n| 646 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 648 Fig. 4: Proposed UXP header 650 The fields in the header shall be defined as follows: 651 - X (bit 0): extension bit, reserved for future enhancements, 652 currently not in use -> default value: 0 654 - block PT (bits 1-7): regular RTP payload type to indicate the 655 media type contained in the info stream 657 - block length n (bits 8-15): indicates total number of RTP packets 658 resulting from one TB (which equals 659 the number of columns of the TB) 661 The syntax of the info stream which is protected by UXP is specified 662 by the RTP payload type field contained in the UXP header. For 663 example, payload type H.263 means that the info stream conforms to 664 the specifications of the RTP profile for H.263, but does not 665 represent the "raw" H.263 stream produced by a H.263 encoder. 666 However, UXP can also be applied to the raw output of the media 667 codec (in case it is already octet-aligned), if this can be signaled 668 to the receiver via other means, e.g. by use of H.245 or SDP. 670 Based on the RTP sequence number, the marker bit, and the repetition 671 of the block length n in each UXP header, the receiving entity is 672 able to recognize both TB boundaries and the actual position of lost 673 packets in the TB. Furthermore, the specific choice of equal TS 674 values for all RTP packets belonging to a TB allows for overcoming 675 possible sequence number overflow. 677 7.3. In-band signaling of the structure of the redundancy profile 679 To enable a dynamic adaptation to varying link conditions, the 680 actual redundancy profile used in the data TSB must be signaled to 681 the receiving entity. Since out-of-band signaling either results in 682 excessive additional control traffic, or prevents quick changes of 683 the profile between successive TBs, an in-band signaling procedure 684 is desired. 686 As without knowledge of the correct redundancy profile, the decoding 687 process cannot be applied to any of the erasure protection classes, 688 it has to be protected at least as strongly as the most important 689 element in the info stream against packet loss. Therefore, an 690 additional class CA_P is used in the signaling TSB, where the number 691 of parity symbols is by default set to the following value: 693 P=ceil(n/2) 695 Hence, up to 50% of the RTP packets can be lost, before the 696 redundancy profile cannot be recovered anymore. This seems to be a 697 reasonable value for the lowest point of operation over a lossy 698 link. Alternatively, p may be explicitly signaled during session 699 setup by means of SDP or H.245 protocol. 701 Consequently, since all other classes must have equal or less 702 erasure protection capability, the maximum allowable value for class 703 CA_T in the data TSB is now limited to T<=P. 705 The signaling of the erasure protection vector is accomplished by 706 means of descriptors. For each class CA_i with A_i>0, there is a 707 descriptor DP_i providing information about the size of class CA_i 708 (i.e. the value of A_i) and establishing a relationship between the 709 erasure protection of class CA_i and that of the first preceding 710 class CA_(i+j) with A_(i+j)>0, where j>0. A descriptor DP_i is 711 mapped onto one byte, which is sub-divided into two half-bytes (i.e. 712 the higher and the lower four bits). The first half-byte is of type 713 unsigned and contains the 4-bit representation of the decimal value 714 A_i. The second half-byte is of type signed and contains the 715 difference in erasure protection between class CA_i and class 716 CA_(i+j), i.e. the signed 4-bit representation of the decimal value 717 (-j) (where the MSB denotes the sign, and the lower three bits the 718 absolute value). Note that the erasure protection p of class CA_p is 719 fixed, whereas the size A_p may vary. 721 Thus, the data to be filled into class CA_p shall consist of a 722 sequence of descriptors separated by stuffing indicators (see 723 below), where the number of descriptors is primarily given by the 724 number of protection classes CA_i, 0<=i<=T, in the data TSB with 725 A_i>0. 726 Without a-priori knowledge, the initial value for the size of the 727 signaling TSB should be set to one (row). When the number of 728 necessary descriptors and stuffing indicators exceeds the (n-p) 729 information positions, one or more additional rows have to be 730 reserved. This is usually done by increasing the value for L_s to 731 A_p>1, i.e. the data TSB is reduced to (L-A_p) rows. Hence, in order 732 to indicate the actual size of the signaling TSB, an additional 733 descriptor is inserted at the very beginning, which takes on the 734 value 0xq0, where q denotes the (octal) four bit representation of 735 the decimal value A_p. 737 Furthermore, the end of each data TSB is signaled by the otherwise 738 unused descriptor value 0x00, followed by exactly one stuffing 739 indicator (SI). The latter is mapped onto a byte, which is of type 740 unsigned and contains the 8-bit representation of the decimal value 741 of the number of media stuffing symbols used at the end of the 742 respective data TSB. 744 The (extended) sequence of descriptors and stuffing indicators is 745 then mapped to the info byte positions in the A_p rows of the 746 signaling TSB from left to right and top to bottom. Each row is then 747 encoded with the same (n,n-p) RS code. 749 If the number of descriptors and stuffing indicators is less than 750 the available info byte positions, however, empty positions in class 751 CA_p may be filled up with the otherwise unused descriptor 0x00. 753 At the receiving entity, the sequence of descriptors shall be 754 recovered by performing erasure decoding on the first row of the TB 755 (which definitely belongs to the signaling TSB) using the same 756 algorithm as later for the data TSB. If successful, the very first 757 descriptor now indicates the number of rows of the signaling TSB, 758 and the next (A_p-1) rows are decoded to reconstruct the redundancy 759 profile for the data TSB(s), together with the number of media 760 stuffing symbols denoted by the respective SI(s). 762 The complete structure of the TB is now depicted in Fig. 5. 764 Transmission Block (TB) 765 P 766 <---------> 767 /\ +-+-+-+-+-+-+-+-+-+ /\ 768 | |?|?|?|?|*|*|*|*|*| | A_P=1 769 | +-+-+-+-+-+-+-+-+-+ \/ 770 | |&|&|&|&|&|*|*|*|*| /\ 771 | +-+-+-+-+-+-+-+-+-+ | A_T=3 772 | |&|&|&|&|&|*|*|*|*| | 773 | +-+-+-+-+-+-+-+-+-+ | 774 L bytes | |&|&|&|&|&|*|*|*|*| \/ 775 payload | +-+-+-+-+-+-+-+-+-+ /\ 776 per packet | +%|%|%|%|%|%|*|*|*| | A_(T-1)=1 777 | +-+-+-+-+-+-+-+-+-+ \/ 778 | |$|$|$|$|$|$|$|*|*| . 779 | +-+-+-+-+-+-+-+-+-+ . 780 | |!|!|!|!|!|!|!|!|*| . 781 | +-+-+-+-+-+-+-+-+-+ /\ 782 | |#|#|#|#|#|#|#|#|#| | A_0=1 783 \/ +-+-+-+-+-+-+-+-+-+ \/ 784 <-----------------> 785 n packets 787 ? : descriptors and stuffing indicators for in-band 788 signaling of the redundancy profile 790 &,%,$,!,# : info bytes belonging to a certain element of the 791 info stream in decreasing order of importance 793 * : parity bytes gained from Reed-Solomon coding 795 Fig. 5: General structure for UXP with in-band signaling of the 796 redundancy profile 798 The following simple example is meant to illustrate the idea behind 799 using descriptors: Let an erasure protection vector of length T+1=7 800 be given as follows: 801 AV=(A_0,A_1,...,A_5,A_6)=(7,0,2,2,0,3,10) 802 Hence, the length L of the TB (including one row for the signaling 803 TSB) is equal to 7+2+2+3+10+1=25 (rows/bytes). If the width is 804 assumed to be equal to 20 (columns/packets), then the erasure 805 protection of the descriptors is p=10. 806 The corresponding sequence of descriptors can be written as 807 DP=(DP_6,DP_5,DP_3,DP_2,DP_0)=(0xAC,0x39,0x2A,0x29,0x7A), 808 where the values of the descriptors are given in hexadecimal 809 notation. Next, the descriptor indicating the length of the 810 signaling TSB has to be inserted, the end of the data TSB has to be 811 marked by 0x00, and the SI has to be appended. If the number of 812 media stuffing symbols is assumed to be 3, the 10 info bytes in the 813 signaling TSB take on the following values (descriptor stuffing 814 included): 816 (0x10,0xAC,0x39,0x2A,0x29,0x7A,0x00,0x03,0x00,0x00) 818 7.4 Optional Concatenation of Transmission Sub Blocks: 820 The following procedure may be applied if a single info stream would 821 be too short to achieve an efficient mapping to a transmission block 822 with respect to the fixed payload length L and the desired number of 823 packets n. For example, intra-coded video frames (I-frames) are 824 usually much larger than the following predicted ones (P-frames). In 825 this case, a certain number z of successive small info streams 826 should be each mapped to a transmission sub block with length L_d(y) 827 and width n, such that L_d(1)+L_d(2)+?+L_d(z)=L_d. 828 The resulting transmission sub blocks can then be easily 829 concatenated to form a TB of size L x n having one common signaling 830 TSB: Since the second half-byte of the descriptors is of type 831 signed, we are able to incorporate both decreasing and increasing 832 erasure protection profiles within one single signaling TSB. 833 Note that once the lengths L_d(y) of the individual blocks have been 834 fixed, the respective redundancy profiles can be determined 835 independently of each other. However, the space initially reserved 836 for the signaling TSB should be already large enough to avoid 837 profile recalculation for each of the data TSBs in case the sequence 838 of descriptors gets too long! 840 Again, we will give a simple example to illustrate this idea: Let 841 the erasure protection vectors for two concatenated data TSBs be 842 given as follows: 844 AV1=(A1_0,A1_1,...,A1_5,A1_6)=(0,0,2,2,0,3,10), 845 AV2=(A2_0,A2_1,...,A2_5,A2_6)=(0,0,2,2,0,3,10). 847 Hence, two single identical data TSBs will be concatenated to form a 848 TB of length L=2*(2+2+3+10)+2=36 (rows/bytes). If the width is again 849 assumed to be equal to 20 (columns/packets), then the erasure 850 protection of the descriptors is p=10, and therefore a total of two 851 rows for the signaling TSB have been reserved this time. The 852 corresponding sequence of descriptors can now be written as 853 DP=(0xAC,0x39,0x2A,0x29,0xA4,0x39,0x2A,0x29), where the values of 854 the descriptors are given in hexadecimal notation. If the number of 855 media stuffing symbols is assumed to be 3 for each data TSB, the 20 856 info byte positions in the signaling TSB are filled with the 857 following values (descriptor stuffing included): 859 (0x20,0xAC,0x39,0x2A,0x29,0x00,0x03,0xA4,0x39,0x2A,0x29,0x00,0x03, 860 0x00,0x00,0x00,0x00,0x00,0x00,0x00) 862 8. Security Considerations 864 The payload of the RTP-packets consists of an interleaved multimedia 865 and parity stream. Therefore, it is reasonable to encrypt the 866 resulting stream with one key rather than using different keys for 867 multimedia and parity data. It should also be noted that encryption 868 of the multimedia data without encryption of the parity data could 869 enable known-plaintext attacks. 871 The overall proportion between parity bytes and info bytes should be 872 chosen carefully if the packet loss is due to network congestion. If 873 the proportion of parity bytes per TB is increased in this case, it 874 could lead to increasing network congestion. Therefore, the 875 proportion between parity bytes and info bytes per TB MUST NOT be 876 increased as packet loss increases due to network congestion. 878 The overall ratio between parity and info bytes MUST NOT be higher 879 than 1:1, i.e. the absolute bitrate spent for redundancy must not be 880 larger than the bitrate required for transmission of multimedia data 881 itself. 883 9. Application Statement 885 There are currently two different schemes proposed for unequal error 886 protection in the IETF-AVT: Unequal Level Protection (ULP) and 887 Unequal Erasure Protection (UXP). 888 Although both methods seem to address the same problem, the proposed 889 solutions differ in many respects. This section tries to describe 890 possible application scenarios and to show the strength and 891 weaknesses of both approaches. 893 The main difference between both approaches is that while ULP 894 preserves the structure of the packets which have to protected and 895 provides the redundancy in extra packets, UXP interleaves the info 896 stream which has to be protected, inserts the redundancy information, 897 and thus creates a totally new packet structure. 899 Another difference concerns multicast compatibility: It cannot be 900 assumed that all future terminals will be able to apply UXP/ULP. 901 Therefore, backward compatibility could be an issue in some cases. 902 Since ULP does not change the original packet structure, but only 903 adds some extra packets, it is possible for terminals which do not 904 support ULP to discard the extra packets. In case of UXP, however, 905 two separate streams with and without erasure protection have to be 906 sent, which increases the bandwidth. 908 Next, both approaches offer different mechanism to adjust packet 909 sizes, if necessary: UXP allows to adjust the packet sizes 910 arbitrarily. This is an advantage in case the loss probability is 911 dependent on the packet length, which happens, for example, if the 912 end-to-end connection contains wireless links. In this case proper 913 adjustment of the packet size is one essential network adaption 914 technique. In addition, if a preencoded stream is sent over the 915 network, the packet size can be adjusted independently of slice 916 structures. 917 Since ULP does not change the existing packetization scheme, this 918 flexibility does not exist. 920 The ability of UXP to adjust the packet size arbitrarily can be 921 especially exploited in a streaming scenario, if a delay of several 922 hundred milliseconds is acceptable. It is then possible to fill 923 several video frames into a single TB of desired size, e.g. a group 924 of pictures consisting of I-frame, P-frames and B-frames. The 925 redundancy scheme can thus be selected in such a way as to guarantee 926 the following property: In case of packet loss, the streams for P- 927 frames are only recoverable, if the I-frame, on which the decoding of 928 P-frames depends, is recoverable. The same is true for B-frames, 929 which can only be decoded if the respective P-frames are recoverable. 930 This prevents situations in which, for example, the B-frames have 931 been received correctly, but the P-frames have been lost, i.e. 932 assures a gradual decrease in application quality also on the frame 933 level. Of course, a similar encoding is possible with ULP. But in 934 this case one might have to send several frames within one packet 935 which leads to large packet sizes. 937 Furthmore, decoding delay is also a crucial issue in communications. 938 Again, both approaches have different delay properties: UXP 939 introduces a decoding delay because a reasonable amount of correctly 940 received packets are necessary to start decoding of a TB. The delay 941 in general depends on the dimensions of the interleaver. This should 942 be considered for any system design which includes UXP. 943 With ULP, every correctly received media packet can be decoded right 944 away. However, a significant delay is introduced, if packets are 945 corrupted, because in this case one has to wait for several 946 redundancy packets. Thus, the delay is in general dependent on the 947 actual ULP-FEC-packet scheme and cannot be considered in advance 948 during the system design phase. 950 Finally, we want to point out that UXP uses RS-codes which are known 951 to be the most efficient type of block codes in terms of erasure 952 correction capability. 954 10. Intellectual Property Considerations 956 Siemens AG has filed patent applications that might possibly have 957 technical relations to this contribution. 958 On IPR related issues, Siemens AG refers to the Siemens Statement on 959 Patent Licensing, see http://www.ietf.org/ietf/IPR/SIEMENS-General. 961 11. References 963 [1] J. Rosenberg and H. Schulzrinne, "An RTP Payload Format for 964 Generic Forward Error Correction", Request for Comments 2733, 965 Internet Engineering Task Force, Dec. 1999. 967 [2] A. Albanese, J. Bloemer, J. Edmonds, M. Luby, and M. Sudan, 968 "Priority encoding transmission", IEEE Trans. Inform. Theory, vol. 969 42, no. 6, pp. 1737-1744, Nov. 1996. 971 [3] Shu Lin and Daniel J. Costello, Error Control Coding: 972 Fundamentals and Applications, Prentice-Hall, Inc., Englewood 973 Cliffs, N.J., 1983. 975 [4] W. Li: "Fine Granularity Scalability Using Bit-Plane Coding of 976 DCT Coefficients", ISO/IEC JTC1/SC29/WG11, Doc. MPEG98/M4204, Dec. 977 1998. 979 [5] G. Blaettermann, G. Heising, and D. Marpe: "A Quality Scalable 980 Mode for H.26L", ITU-T SG16, Q.15, Q15-J24, Osaka, May 2000. 982 [6] F. Burkert, T. Stockhammer, and J. Pandel, "Progressive A/V 983 coding for lossy packet networks - a principle approach", Tech. 984 Rep., ITU-T SG16, Q.15, Q15-I36, Red Bank, N.J., Oct. 1999. 986 [7] Guenther Liebl, "Modeling, theoretical analysis, and coding for 987 wireless packet erasure channels", Diploma Thesis, Inst. for 988 Communications Engineering, Munich University of Technology, 1999. 990 12. Acknowledgments 992 Many thanks to Thomas Stockhammer, who initially came up with the 993 idea of unequal erasure protection to improve progressive video 994 transmission over lossy networks. 996 13. Author's Addresses 998 Guenther Liebl, Thomas Stockhammer 999 Institute for Communications Engineering (LNT) 1000 Munich University of Technology 1001 D-80290 Munich 1002 Germany 1003 Email: {liebl,tom}@lnt.e-technik.tu-muenchen.de 1005 Minh-Ha Nguyen, Frank Burkert 1006 Siemens AG - ICM D MP RD MCH 83/81 1007 D-81675 Munich 1008 Germany 1009 Email: {minhha.nguyen,frank.burkert}@mch.siemens.de 1011 Marcel Wagner, Juergen Pandel, Wenrong Weng, Gero Baese 1012 Siemens AG - Corporate Technology CT IC 2 1013 D-81730 Munich 1014 Germany 1015 Email: 1016 {marcel.wagner,juergen.pandel,wenrong.weng,gero.baese}@mchp.siemens. 1017 de 1019 Full Copyright Statement 1021 "Copyright (C) The Internet Society (date). All Rights Reserved. 1022 This document and translations of it may be copied and furnished to 1023 others, and derivative works that comment on or otherwise explain it 1024 or assist in its implementation may be prepared, copied, published 1025 and distributed, in whole or in part, without restriction of any 1026 kind, provided that the above copyright notice and this paragraph 1027 are included on all such copies and derivative works. However, this 1028 document itself may not be modified in any way, such as by removing 1029 the copyright notice or references to the Internet Society or other 1030 Internet organizations, except as needed for the purpose of 1031 developing Internet standards in which case the procedures for 1032 copyrights defined in the Internet Standards process must be 1033 followed, or as required to translate it into languages other than 1034 English. 1036 The limited permissions granted above are perpetual and will not be 1037 revoked by the Internet Society or its successors or assigns. 1039 This document and the information contained herein is provided on an 1040 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1041 TASK FORCE DISCLAIMS ALL WARRANTIES; EXPRESS OR IMPLIED; INCLUDING 1042 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF INFORMATION HEREIN 1043 WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1044 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.