idnits 2.17.1 draft-ietf-avt-uxp-01.txt: ** The Abstract section seems to be numbered -(456): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(791): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand corner of the first page ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 4 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 2 longer pages, the longest (page 19) being 60 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. (A line matching the expected section header was found, but with an unexpected indentation: ' 1. Abstract' ) ** The document seems to lack an Introduction section. (A line matching the expected section header was found, but with an unexpected indentation: ' 3. Introduction' ) ** The document seems to lack a Security Considerations section. (A line matching the expected section header was found, but with an unexpected indentation: ' 8. Security Considerations' ) ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** There are 635 instances of too long lines in the document, the longest one being 15 characters in excess of 72. ** The abstract seems to contain references ([2], [3], [4], [5], [6], [7], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 960 looks like a reference -- Missing reference section? '2' on line 964 looks like a reference -- Missing reference section? '3' on line 968 looks like a reference -- Missing reference section? '4' on line 972 looks like a reference -- Missing reference section? '5' on line 976 looks like a reference -- Missing reference section? '6' on line 979 looks like a reference -- Missing reference section? '7' on line 983 looks like a reference Summary: 14 errors (**), 0 flaws (~~), 3 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force G. Liebl, 3 T.Stockhammer 4 Internet Draft LNT, Munich Univ. of 5 Technology 6 Document: draft-ietf-avt-uxp-01.txt 7 November 2001 M. Wagner, J.Pandel, 8 W. Weng, G. Baese, 9 M. Nguyen, F. Burkert 10 Expires: May 2002 Siemens AG, Munich 12 An RTP Payload Format for Erasure-Resilient Transmission of Progressive 13 Multimedia Streams 15 Status of this Memo 17 This document is an Internet-Draft and is in full conformance with 18 all provisions of Section 10 of RFC2026 []. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. Internet-Drafts are draft documents valid for a maximum of 24 six months and may be updated, replaced, or obsoleted by other 25 documents at any time. It is inappropriate to use Internet- Drafts 26 as reference material or to cite them other than as "work in 27 progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 1. Abstract 35 This document specifies an efficient way to ensure erasure-resilient 36 transmission of progressively encoded multimedia sources via RTP 37 using Reed-Solomon codes. The level of erasure protection can be 38 explicitly adapted to the importance of the respective parts in the 39 source stream, thus allowing a graceful degradation of application 40 quality with increasing packet loss rate on the network. Hence, this 41 type of unequal erasure protection (UXP) schemes is intended to cope 42 with the rapidly varying channel conditions on wireless access links 43 to the Internet backbone. Nevertheless, backward compatibility to 44 currently standardized non-progressive multimedia codecs is ensured, 45 since equal erasure protection (EXP) represents a subset of generic 46 UXP. By defining a comparably simple payload format, the proposed 47 scheme can be easily integrated into the existing framework for RTP. 49 Liebl,Stockhammer,Wagner,Pandel,Weng,Baese,Nguyen,Burkert [Page1] 50 2. Conventions used in this document 52 The following terms are used throughout this document: 54 1.) Message block: a higher layer transport unit (e.g. an IP 55 packet), that enters/leaves the segmentation/reassembly stage at the 56 interface to wireless data link layers. 58 2.) Segment: denotes a link layer transport unit. 60 3.) CRC: Cyclic Redundancy Check, usually added to transport units 61 at the sender to detect the existence of erroneous bits in a 62 transport unit at the receiver. 64 4.) Segmentation/Reassembly Process: If the size of the transport 65 units at the link layer is smaller than that at the upper layers, 66 message blocks have to be split up into several parts, i.e. 67 segments, which are then transmitted subsequently over the link. If 68 nothing is lost, the original message block can be restored at the 69 receiving entity (reassembly). 71 5.) Quality-of-service: application-dependent criterion to define a 72 certain desired operation point. 74 6.) Codec: denotes a functional pair consisting of a source encoding 75 unit at the sender and a corresponding source decoding unit at the 76 receiver; usually standardized for different multimedia applications 77 like audio or video. 79 7.) Progressive source coding: results in successive blocks of 80 (source-)encoded data (e.g. a single video or audio frame), each of 81 which can be viewed as a bitstream of certain length, whose distinct 82 elements are of different importance to the reconstruction process 83 at the decoder. Elements are commonly ordered from highest to least 84 importance, where the latter elements depend on the previous. 86 8.) Reed-Solomon (RS) code: belongs to the class of linear nonbinary 87 block codes, and is uniquely specified by the block length n, the 88 number of parity symbols t, and the symbol alphabet. 90 9.) n: is a variable, which denotes both the block length of a RS 91 codeword, and the number of columns in a TB (see 16). 93 10.) k: is a variable, which denotes the number of information 94 symbols in a RS codeword. 96 11.) t: is a variable, which denotes the number of parity symbols in 97 a RS codeword. 99 12.) Erasure: When a packet is lost during transmission, an erasure 100 is said to have happened. Since the position of the erased packet in 101 a sequence is usually known, a corresponding erasure marker can be 102 set at the receiving entity. 104 13.) Base layer: comprises the first and most important elements in 105 a progressively encoded bitstream, without which all subsequent 106 information is useless. 108 14.) Enhancement layer: comprises one or more sets of the less 109 important subsequent elements in a progressively encoded bitstream. 110 A specific enhancement layer can be decoded, if and only if the base 111 layer and all previous enhancement layer data (of higher importance) 112 is available. 114 15.) Info stream: denotes the final bitstream which has to be 115 protected by the proposed UXP scheme. It usually consists of the 116 (source-encoded) bitstream (progressive or not), which is already 117 arranged according to a desired syntax (e.g. as specified in the 118 respective RTP profile for the media codec in use). 119 In any case, it is assumed that every info stream is already octet- 120 aligned according to the standard procedures defined in the context 121 of the used syntax specifications. 123 16.) Transmission block (TB): denotes a memory array of L rows and n 124 columns. Each row of a TB represents a RS codeword, whereas each 125 column, together with the respective UXP header (see 33) in front, 126 forms the payload of a single RTP packet. 127 Each TB consists of at least two distinct transmission sub blocks 128 (TSB, see 17): The first L_s rows belong to the signaling TSB, 129 whereas the last L_d=(L-L_s) rows belong to one or more data TSB. 131 17.) Transmission sub block (TSB): denotes a memory array of 0 309 +-+-+-+-+-+-+-+-+-+ 310 |&|&|&|&|&|&|&|*|*| 311 +-+-+-+-+-+-+-+-+-+ 312 <------------><---> 313 k=n-t t 314 (&:info) (*:parity) 316 Fig. 1: Structure of a systematic RS codeword 318 5. Progressive Source Coding 320 If the output of a multimedia codec, be it audio or video, is said 321 to be progressive, the encoded bitstream must consist of several 322 distinct elements, often organized in separate layers. The latter 323 shall be defined via their relative importance with respect to the 324 quality of the reconstruction process at the receiver. Hence, there 325 exists at least one layer, often called base layer, without which 326 reconstruction fails at all, whereas all the other layers, often 327 called enhancement layers, just help to continually improve the 328 quality. Consequently, the different layers are usually contained in 329 the (source-)encoded bitstream in decreasing order of importance, 330 i.e. the base layer data is followed by the various enhancement 331 layers. 332 An example can be found in the fine granular scalability modes which 333 have been proposed to various standardization bodies like MPEG-4 [4] 334 or ITU (H.26L) [5], where the resolution of the scaling process in 335 the progressive source encoder is as low as one symbol in the 336 enhancement layer. 338 From the above definition, it is quite obvious that the most 339 important base layer data must be protected as strongly as possible 340 against packet loss during transmission. However, the protection of 341 the enhancement layers could be continually lowered, since a loss at 342 this stage has only minor consequences for the reconstruction 343 process. Thus, by using a suitable unequal erasure protection 344 strategy across a progressive source stream, the overhead due to 345 redundancy spent per (channel-)encoded block is reduced. 346 Furthermore, if channel conditions get worse during transmission, 347 only more and more enhancement layers are lost, i.e. a graceful 348 degradation in application quality at the receiver is achieved [6]. 350 Nevertheless, it should be mentioned that the specific structure of 351 a (source-)encoded bitstream strongly depends on the actual media 352 codec in use, and the desired syntax which is used for adapting the 353 output of the codec to a suitable transport level format (see also 354 7.3). In order to keep the description of the unequal erasure 355 protection strategy in section 6 as general as possible, the final 356 bitstream which has to be protected by the proposed UXP scheme will 357 be called "info stream" in the following. Furthermore, it is assumed 358 that every info stream is already octet-aligned according to the 359 standard procedures defined in the context of the used syntax 360 specifications. 362 6. General Structure of UXP schemes 364 In this section, the principle features of the proposed UXP scheme 365 are described with a special focus on the protection and 366 reconstruction procedure which is applied to the info stream. In 367 addition, the behavior of the sender and receiver is specified as 368 far as it concerns the reconstruction of the info stream. However, 369 the complete UXP payload structure, including the additional UXP 370 header, is described in section 7. 372 Fig. 1 already illustrated the structure of a systematic codeword, 373 which shall be represented by a single row and n successive columns 374 that contain the information and the parity bytes. This structure 375 shall now be extended by forming a transmission block (TB) 376 consisting of L codewords of length n bytes each, which amounts to a 377 total of L rows and n columns [7]: Each column, together with the 378 respective UXP header in front, shall represent the payload of an 379 RTP packet, i.e. the whole data of a TB is transmitted via a 380 sequence of n RTP packets all carrying a payload of length (L+2) 381 bytes (UXP header included). 383 The value of L should be chosen in such a way that the whole length 384 of the resulting IP packet (i.e. RTP payload plus sum of RTP, UDP, 385 and IP header) equals a multiple of the segment size on the wireless 386 link to avoid stuffing at the data link layer. 388 Each TB usually consists of two or more horizontal slices, the so- 389 called transmission sub blocks (TSB), as can be seen in Fig. 2: The 390 first L_s rows always belong to the signaling TSB, which is used to 391 convey the actual redundancy profile in the data part to the 392 receiver (see 7.3). The following L_d=(L-L_s) rows belong to one or 393 more data TSBs, which contain the interleaved and RS encoded info 394 stream, as will be described below. 396 Transmission Block (TB) 398 /\ +-+-+-+-+-+-+-+-+-+ /\ 399 | | signaling TSB | | L_s bytes 400 | +-+-+-+-+-+-+-+-+-+ \/ 401 | | | /\ /\ 402 | + data TSB #1 + | L_d(1) bytes | 403 | | | | | 404 | +-+-+-+-+-+-+-+-+-+ \/ | 405 L bytes | | | /\ | 406 payload | + data TSB #2 + | L_d(2) bytes | 407 per packet | + | | | L_d bytes 408 | +-+-+-+-+-+-+-+-+-+ \/ | 409 | | . | . | 410 | + . + . | 411 | | . | . | 412 | +-+-+-+-+-+-+-+-+-+ /\ | 413 | | data TSB #z | | L_d(z) bytes | 414 \/ +-+-+-+-+-+-+-+-+-+ \/ \/ 415 <-----------------> 416 n packets 418 Fig. 2: General structure of a TB 420 Since the UXP procedure is mainly applied to the data TSBs, it will 421 be described next, whereas the content and syntax of the signaling 422 TSB will be defined in section 7.3. 424 For means of simplification, only one single data TSB will be 425 assumed throughout the following explanation of the encoding and 426 decoding procedure. However, an extension to more than one data TSB 427 per TB is straightforward, and will be shown in section 7.4. 429 As depicted in Fig. 3, the rows of a transmission sub block shall be 430 partitioned into T+1 different classes CA_i, where i=0...T, such 431 that each class contains exactly A_i=|CA_i| consecutive rows of the 432 matrix, where the A_i have to satisfy the following relationship: 434 A_0+A_1+...+A_T=L_d 435 Data Transmission Sub Block (data TSB) 436 T 437 <-------> 438 /\ +-+-+-+-+-+-+-+-+-+ /\ 439 | |&|&|&|&|&|*|*|*|*| | 440 | +-+-+-+-+-+-+-+-+-+ | A_T=3 441 | |&|&|&|&|&|*|*|*|*| | 442 | +-+-+-+-+-+-+-+-+-+ | 443 L_d bytes | |&|&|&|&|&|*|*|*|*| \/ 444 per packet | +-+-+-+-+-+-+-+-+-+ /\ 445 | +%|%|%|%|%|%|*|*|*| | A_(T-1)=1 446 | +-+-+-+-+-+-+-+-+-+ \/ 447 | |$|$|$|$|$|$|$|*|*| . 448 | +-+-+-+-+-+-+-+-+-+ . 449 | |�|�|�|�|�|�|�|�|*| . 450 | +-+-+-+-+-+-+-+-+-+ /\ 451 | |#|#|#|#|#|#|#|#|#| | A_0=1 452 \/ +-+-+-+-+-+-+-+-+-+ \/ 453 <-----------------> 454 n packets 456 &,%,$,�,# : info bytes belonging to a certain info stream in 457 decreasing order of importance 458 * : parity bytes gained from Reed-Solomon coding 460 Fig. 3: General structure for coding with unequal erasure protection 462 Furthermore, all rows in a particular class CA_i shall contain 463 exactly the same number of parity bytes, which is equal to the index 464 i of the class. For each row in a certain class CA_i, the same (n,n- 465 i) RS code shall be applied. 467 As can be observed from Fig. 3, class CA_T contains the largest 468 number of parity bytes per row, i.e. offers the highest erasure 469 protection capability in the block. Consequently, the most important 470 element in the info stream must be assigned to class CA_T, where the 471 value of T should be chosen according to the desired outage 472 threshold of the application given a certain packet erasure rate on 473 the link. 474 All other classes CA_(T-1)...CA_0 shall be sequentially filled with 475 the remaining elements of the info stream in decreasing order of 476 importance, where the optimal choice for the size of each class (0 477 or more rows), i.e. the structure of the redundancy profile, should 478 depend on the quality-of-service requirements for the various 479 (progressively-encoded) layers. 481 The following set of rules contains a compact description of all the 482 operations that must be performed for each transmission block: 484 1.) The total number of columns n of the TB shall be chosen 485 according to the actual delay constraints of the application. 487 2.) Next, the expected number of rows reserved for the signaling TSB 488 has to selected, which limits the data TSB to L_d=(L-L_s) rows. 490 3.) The maximum erasure correction capability T in the data TSB 491 should be chosen according to the desired outage threshold of the 492 application given the actual packet erasure rate on the link. 494 4.) The redundancy profile for the rest of the data TSB should 495 depend on the size and number of the various layers in the info 496 stream, as well as the desired probability of successful decoding 497 for each of them (quality-of-service requirement). 499 5.) Any suitable optimization algorithm may be used for deriving an 500 adequate redundancy profile. However, the result has to satisfy the 501 following constraints: 502 a) All available info byte positions in the data TSB have to be 503 completely filled. If the info stream is too short for a desired 504 profile, media stuffing may be applied to the empty info byte 505 positions at the end of the data TSB by appending a sufficient 506 number of bytes (with arbitrary value, e.g. 0x00). The actual number 507 of stuffing symbols per data TSB is then signaled via the respective 508 stuffing indicator (see 7.3). However, before resorting to any 509 stuffing, it should be checked whether it is possible to strengthen 510 the protection of certain rows instead, thus improving the overall 511 robustness of the decoding process. 512 b) The info stream should be fully contained within the data TSB 513 (unless cutting it off at a specific point is explicitly allowed by 514 the properties of the used media codec). 515 c) The number of required descriptors and stuffing indicators (see 516 section 7.3) to signal the profile shall not exceed the space 517 initially reserved for them in the signaling TSB. 518 Constraints a) and b) should be already incorporated in the 519 optimization algorithm. However, if constraint c) is not met, the 520 data TSB has to be reduced by one row in favor of the signaling TSB 521 to accomodate more space for the descriptors and stuffing 522 indicators, i.e. steps 2-5 have to be repeated until a valid 523 redundancy profile has been obtained. 525 6.) For each nonempty class CA_i, i=T...0, in the data TSB, the 526 following steps have to be performed: 527 a) All rows of this specific class shall be filled from left to 528 right and top to bottom with data bytes of the info stream in 529 decreasing order of importance (i.e. starting with the most 530 important element). 531 b) For each row in the class, the required i parity-check bytes are 532 computed from the same set of codewords of an (n,n-i) RS code, and 533 filled in the empty positions at the end of each row. Thus, every 534 row in the class constitutes a valid codeword of the chosen RS code. 536 7.) After having filled the whole data TSB with information and 537 parity bytes, the redundancy profile is mapped to the signaling TSB 538 as described in section 7.3. 540 8.) Each column of the resulting TB is now read out byte-wise from 541 top to bottom and, together with the respective UXP header (see 542 section 7.2) in front, is mapped onto the payload section of one and 543 only one RTP packet. 545 9.) The n resulting RTP packets shall be transmitted subsequently to 546 the remote host, starting with the leftmost one. 548 10.) At the corresponding protocol entity at the remote host, the 549 payload (without the UXP header) of all successfully received RTP 550 packets belonging to the same sending TB shall be filled into a 551 similar receiving TB column-wise from top to bottom and left to 552 right. 554 11.) For every erased packet of a received TB, the respective column 555 in the TB shall be filled with a suitable erasure marker. 557 12.) Before any other operations can be performed, the redundancy 558 profile has to be restored from the signaling TSB according to the 559 procedure defined in section 7.3. If the attempt fails because of 560 too many lost packets, the whole TB shall be discarded and the 561 receiving entity should wait for the next incoming TB (the source 562 decoder may be informed about the missing info stream, if required). 564 13.) If the attempt to recover the redundancy profile has been 565 successful, a decoding operation shall be performed for each row of 566 the data TSB by applying any suitable algorithm for erasure 567 decoding. 569 14.) For all rows of the data TSB for which the decoding operation 570 has been successful, the reconstructed data bytes are read out from 571 left to right and top to bottom, and appended to the reconstructed 572 version of the info stream. 574 15.) For all rows of the data TSB for which the decoding operation 575 has failed, a sufficient number of suitable dummy symbols may be 576 added to the reconstructed info stream to inform the source decoder 577 about the missing symbols. 579 One can easily realize that the above rules describe an interleaver, 580 i.e. at the sender a single codeword of a TB is spread out over n 581 successive packets. Thus, each codeword of a transmitted TB 582 experiences the same number of erasures at exactly the same 583 positions. 584 Two important conclusions can be drawn from this: 585 a) Since the same RS code is applied to all rows contained in a 586 specific class, either all of them can be correctly decoded or not. 587 Hence, there exist no partly decodable classes at the receiver. 588 b) If decoding is successful for a certain class CA_i, all the 589 classes CA_(i+1)...CA_T can also be decoded, since they are 590 protected by at least one more parity byte per row. Together with 591 rule 6, it is therefore always ensured, that in case a decodable 592 enhancement layer exists, all other layers it depends on can also be 593 reconstructed! 595 Given the maximum erasure protection value T, the redundancy profile 596 for a data TSB of size (L_d x n) shall be denoted by a so-called 597 erasure protection vector AV of length (T+1), where 599 AV:=(A_0,A_1,...,A_(T-1),A_T) 601 From the above definition, it is easy to realize that the trivial 602 cases of no erasure protection and EXP are a subset of UXP: 603 a) no erasure protection at all: all application data is mapped onto 604 class CA_0, i.e. AV=(L_d,0,0,...,0). 605 b) EXP: all application data is mapped onto class CA_T, i.e. 606 AV=(0,0,...,0,A_T=L_d). 608 Hence, backward compatibility to currently standardized non- 609 progressive multimedia codecs is definitely achieved. 611 7. RTP payload structure 613 For every packet whose payload is formed by reading out a column of 614 the TB, the RTP header must be followed by an UXP header. 616 7.1. Specific settings in the RTP header 618 The timestamp of each RTP packet resulting from reading out a TB is 619 set to the time instant when the first byte of the progressive 620 source data stream has been written into the TB. This results in the 621 TS value being the same for all RTP packets belonging to a specific 622 TB. 624 The payload type is of dynamic type, and obtained through out-of- 625 band signaling similar to [1]. The signaling protocol must establish 626 a payload length to be associated with the payload type value. End 627 systems, which cannot recognize a payload type, must discard it. 629 The marker bit is set to 1 for every last packet in a TB. Otherwise, 630 its value is 0. 632 All other fields in the RTP header are set to those values proposed 633 for regular multimedia transmission using the same source codecs, 634 but no erasure protection scheme enabled. 636 The RTP payload shall consist of the UXP header followed by one 637 column of the TB. 639 7.2. Structure of the UXP header 641 The UXP header shall consist of 2 octets, and is shown in Fig. 4: 643 0 1 1 1 1 1 1 644 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 645 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 646 |X| block PT | block length n| 647 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 649 Fig. 4: Proposed UXP header 651 The fields in the header shall be defined as follows: 652 - X (bit 0): extension bit, reserved for future enhancements, 653 currently not in use -> default value: 0 655 - block PT (bits 1-7): regular RTP payload type to indicate the 656 media type contained in the info stream 658 - block length n (bits 8-15): indicates total number of RTP packets 659 resulting from one TB (which equals 660 the number of columns of the TB) 662 The syntax of the info stream which is protected by UXP is specified 663 by the RTP payload type field contained in the UXP header. For 664 example, payload type H.263 means that the info stream conforms to 665 the specifications of the RTP profile for H.263, but does not 666 represent the "raw" H.263 stream produced by a H.263 encoder. 667 However, UXP can also be applied to the raw output of the media 668 codec (in case it is already octet-aligned), if this can be signaled 669 to the receiver via other means, e.g. by use of H.245 or SDP. 671 Based on the RTP sequence number, the marker bit, and the repetition 672 of the block length n in each UXP header, the receiving entity is 673 able to recognize both TB boundaries and the actual position of lost 674 packets in the TB. Furthermore, the specific choice of equal TS 675 values for all RTP packets belonging to a TB allows for overcoming 676 possible sequence number overflow. 678 7.3. In-band signaling of the structure of the redundancy profile 680 To enable a dynamic adaptation to varying link conditions, the 681 actual redundancy profile used in the data TSB must be signaled to 682 the receiving entity. Since out-of-band signaling either results in 683 excessive additional control traffic, or prevents quick changes of 684 the profile between successive TBs, an in-band signaling procedure 685 is desired. 687 As without knowledge of the correct redundancy profile, the decoding 688 process cannot be applied to any of the erasure protection classes, 689 it has to be protected at least as strongly as the most important 690 element in the info stream against packet loss. Therefore, an 691 additional class CA_P is used in the signaling TSB, where the number 692 of parity symbols is by default set to the following value: 694 P=ceil(n/2) 696 Hence, up to 50% of the RTP packets can be lost, before the 697 redundancy profile cannot be recovered anymore. This seems to be a 698 reasonable value for the lowest point of operation over a lossy 699 link. Alternatively, p may be explicitly signaled during session 700 setup by means of SDP or H.245 protocol. 702 Consequently, since all other classes must have equal or less 703 erasure protection capability, the maximum allowable value for class 704 CA_T in the data TSB is now limited to T<=P. 706 The signaling of the erasure protection vector is accomplished by 707 means of descriptors. For each class CA_i with A_i>0, there is a 708 descriptor DP_i providing information about the size of class CA_i 709 (i.e. the value of A_i) and establishing a relationship between the 710 erasure protection of class CA_i and that of the first preceding 711 class CA_(i+j) with A_(i+j)>0, where j>0. A descriptor DP_i is 712 mapped onto one byte, which is sub-divided into two half-bytes (i.e. 713 the higher and the lower four bits). The first half-byte is of type 714 unsigned and contains the 4-bit representation of the decimal value 715 A_i. The second half-byte is of type signed and contains the 716 difference in erasure protection between class CA_i and class 717 CA_(i+j), i.e. the signed 4-bit representation of the decimal value 718 (-j) (where the MSB denotes the sign, and the lower three bits the 719 absolute value). Note that the erasure protection p of class CA_p is 720 fixed, whereas the size A_p may vary. 722 Thus, the data to be filled into class CA_p shall consist of a 723 sequence of descriptors separated by stuffing indicators (see 724 below), where the number of descriptors is primarily given by the 725 number of protection classes CA_i, 0<=i<=T, in the data TSB with 726 A_i>0. 727 Without a-priori knowledge, the initial value for the size of the 728 signaling TSB should be set to one (row). When the number of 729 necessary descriptors and stuffing indicators exceeds the (n-p) 730 information positions, one or more additional rows have to be 731 reserved. This is usually done by increasing the value for L_s to 732 A_p>1, i.e. the data TSB is reduced to (L-A_p) rows. Hence, in order 733 to indicate the actual size of the signaling TSB, an additional 734 descriptor is inserted at the very beginning, which takes on the 735 value 0xq0, where q denotes the (octal) four bit representation of 736 the decimal value A_p. 738 Furthermore, the end of each data TSB is signaled by the otherwise 739 unused descriptor value 0x00, followed by exactly one stuffing 740 indicator (SI). The latter is mapped onto a byte, which is of type 741 unsigned and contains the 8-bit representation of the decimal value 742 of the number of media stuffing symbols used at the end of the 743 respective data TSB. 745 The (extended) sequence of descriptors and stuffing indicators is 746 then mapped to the info byte positions in the A_p rows of the 747 signaling TSB from left to right and top to bottom. Each row is then 748 encoded with the same (n,n-p) RS code. 750 If the number of descriptors and stuffing indicators is less than 751 the available info byte positions, however, empty positions in class 752 CA_p may be filled up with the otherwise unused descriptor 0x00. 754 At the receiving entity, the sequence of descriptors shall be 755 recovered by performing erasure decoding on the first row of the TB 756 (which definitely belongs to the signaling TSB) using the same 757 algorithm as later for the data TSB. If successful, the very first 758 descriptor now indicates the number of rows of the signaling TSB, 759 and the next (A_p-1) rows are decoded to reconstruct the redundancy 760 profile for the data TSB(s), together with the number of media 761 stuffing symbols denoted by the respective SI(s). 763 The complete structure of the TB is now depicted in Fig. 5. 765 Transmission Block (TB) 766 P 767 <---------> 768 /\ +-+-+-+-+-+-+-+-+-+ /\ 769 | |?|?|?|?|*|*|*|*|*| | A_P=1 770 | +-+-+-+-+-+-+-+-+-+ \/ 771 | |&|&|&|&|&|*|*|*|*| /\ 772 | +-+-+-+-+-+-+-+-+-+ | A_T=3 773 | |&|&|&|&|&|*|*|*|*| | 774 | +-+-+-+-+-+-+-+-+-+ | 775 L bytes | |&|&|&|&|&|*|*|*|*| \/ 776 payload | +-+-+-+-+-+-+-+-+-+ /\ 777 per packet | +%|%|%|%|%|%|*|*|*| | A_(T-1)=1 778 | +-+-+-+-+-+-+-+-+-+ \/ 779 | |$|$|$|$|$|$|$|*|*| . 780 | +-+-+-+-+-+-+-+-+-+ . 781 | |�|�|�|�|�|�|�|�|*| . 782 | +-+-+-+-+-+-+-+-+-+ /\ 783 | |#|#|#|#|#|#|#|#|#| | A_0=1 784 \/ +-+-+-+-+-+-+-+-+-+ \/ 785 <-----------------> 786 n packets 788 ? : descriptors and stuffing indicators for in-band 789 signaling of the redundancy profile 791 &,%,$,�,# : info bytes belonging to a certain element of the 792 info stream in decreasing order of importance 794 * : parity bytes gained from Reed-Solomon coding 796 Fig. 5: General structure for UXP with in-band signaling of the 797 redundancy profile 799 The following simple example is meant to illustrate the idea behind 800 using descriptors: Let an erasure protection vector of length T+1=7 801 be given as follows: 802 AV=(A_0,A_1,...,A_5,A_6)=(7,0,2,2,0,3,10) 803 Hence, the length L of the TB (including one row for the signaling 804 TSB) is equal to 7+2+2+3+10+1=25 (rows/bytes). If the width is 805 assumed to be equal to 20 (columns/packets), then the erasure 806 protection of the descriptors is p=10. 807 The corresponding sequence of descriptors can be written as 808 DP=(DP_6,DP_5,DP_3,DP_2,DP_0)=(0xAC,0x39,0x2A,0x29,0x7A), 809 where the values of the descriptors are given in hexadecimal 810 notation. Next, the descriptor indicating the length of the 811 signaling TSB has to be inserted, the end of the data TSB has to be 812 marked by 0x00, and the SI has to be appended. If the number of 813 media stuffing symbols is assumed to be 3, the 10 info bytes in the 814 signaling TSB take on the following values (descriptor stuffing 815 included): 817 (0x10,0xAC,0x39,0x2A,0x29,0x7A,0x00,0x03,0x00,0x00) 819 7.4 Optional Concatenation of Transmission Sub Blocks: 821 The following procedure may be applied if a single info stream would 822 be too short to achieve an efficient mapping to a transmission block 823 with respect to the fixed payload length L and the desired number of 824 packets n. For example, intra-coded video frames (I-frames) are 825 usually much larger than the following predicted ones (P-frames). In 826 this case, a certain number z of successive small info streams 827 should be each mapped to a transmission sub block with length L_d(y) 828 and width n, such that L_d(1)+L_d(2)+?+L_d(z)=L_d. 829 The resulting transmission sub blocks can then be easily 830 concatenated to form a TB of size L x n having one common signaling 831 TSB: Since the second half-byte of the descriptors is of type 832 signed, we are able to incorporate both decreasing and increasing 833 erasure protection profiles within one single signaling TSB. 834 Note that once the lengths L_d(y) of the individual blocks have been 835 fixed, the respective redundancy profiles can be determined 836 independently of each other. However, the space initially reserved 837 for the signaling TSB should be already large enough to avoid 838 profile recalculation for each of the data TSBs in case the sequence 839 of descriptors gets too long! 841 Again, we will give a simple example to illustrate this idea: Let 842 the erasure protection vectors for two concatenated data TSBs be 843 given as follows: 845 AV1=(A1_0,A1_1,...,A1_5,A1_6)=(0,0,2,2,0,3,10), 846 AV2=(A2_0,A2_1,...,A2_5,A2_6)=(0,0,2,2,0,3,10). 848 Hence, two single identical data TSBs will be concatenated to form a 849 TB of length L=2*(2+2+3+10)+2=36 (rows/bytes). If the width is again 850 assumed to be equal to 20 (columns/packets), then the erasure 851 protection of the descriptors is p=10, and therefore a total of two 852 rows for the signaling TSB have been reserved this time. The 853 corresponding sequence of descriptors can now be written as 854 DP=(0xAC,0x39,0x2A,0x29,0xA4,0x39,0x2A,0x29), where the values of 855 the descriptors are given in hexadecimal notation. If the number of 856 media stuffing symbols is assumed to be 3 for each data TSB, the 20 857 info byte positions in the signaling TSB are filled with the 858 following values (descriptor stuffing included): 860 (0x20,0xAC,0x39,0x2A,0x29,0x00,0x03,0xA4,0x39,0x2A,0x29,0x00,0x03, 861 0x00,0x00,0x00,0x00,0x00,0x00,0x00) 863 8. Security Considerations 865 The payload of the RTP-packets consists of an interleaved multimedia 866 and parity stream. Therefore, it is reasonable to encrypt the 867 resulting stream with one key rather than using different keys for 868 multimedia and parity data. It should also be noted that encryption 869 of the multimedia data without encryption of the parity data could 870 enable known-plaintext attacks. 872 The overall proportion between parity bytes and info bytes should be 873 chosen carefully if the packet loss is due to network congestion. If 874 the proportion of parity bytes per TB is increased in this case, it 875 could lead to increasing network congestion. Therefore, the 876 proportion between parity bytes and info bytes per TB MUST NOT be 877 increased as packet loss increases due to network congestion. 879 The overall ratio between parity and info bytes MUST NOT be higher 880 than 1:1, i.e. the absolute bitrate spent for redundancy must not be 881 larger than the bitrate required for transmission of multimedia data 882 itself. 884 9. Application Statement 886 There are currently two different schemes proposed for unequal error 887 protection in the IETF-AVT: Unequal Level Protection (ULP) and 888 Unequal Erasure Protection (UXP). 889 Although both methods seem to address the same problem, the proposed 890 solutions differ in many respects. This section tries to describe 891 possible application scenarios and to show the strength and 892 weaknesses of both approaches. 894 The main difference between both approaches is that while ULP 895 preserves the structure of the packets which have to protected and 896 provides the redundancy in extra packets, UXP interleaves the info 897 stream which has to be protected, inserts the redundancy information, 898 and thus creates a totally new packet structure. 900 Another difference concerns multicast compatibility: It cannot be 901 assumed that all future terminals will be able to apply UXP/ULP. 902 Therefore, backward compatibility could be an issue in some cases. 903 Since ULP does not change the original packet structure, but only 904 adds some extra packets, it is possible for terminals which do not 905 support ULP to discard the extra packets. In case of UXP, however, 906 two separate streams with and without erasure protection have to be 907 sent, which increases the bandwidth. 909 Next, both approaches offer different mechanism to adjust packet 910 sizes, if necessary: UXP allows to adjust the packet sizes 911 arbitrarily. This is an advantage in case the loss probability is 912 dependent on the packet length, which happens, for example, if the 913 end-to-end connection contains wireless links. In this case proper 914 adjustment of the packet size is one essential network adaption 915 technique. In addition, if a preencoded stream is sent over the 916 network, the packet size can be adjusted independently of slice 917 structures. 918 Since ULP does not change the existing packetization scheme, this 919 flexibility does not exist. 921 The ability of UXP to adjust the packet size arbitrarily can be 922 especially exploited in a streaming scenario, if a delay of several 923 hundred milliseconds is acceptable. It is then possible to fill 924 several video frames into a single TB of desired size, e.g. a group 925 of pictures consisting of I-frame, P-frames and B-frames. The 926 redundancy scheme can thus be selected in such a way as to guarantee 927 the following property: In case of packet loss, the streams for P- 928 frames are only recoverable, if the I-frame, on which the decoding of 929 P-frames depends, is recoverable. The same is true for B-frames, 930 which can only be decoded if the respective P-frames are recoverable. 931 This prevents situations in which, for example, the B-frames have 932 been received correctly, but the P-frames have been lost, i.e. 933 assures a gradual decrease in application quality also on the frame 934 level. Of course, a similar encoding is possible with ULP. But in 935 this case one might have to send several frames within one packet 936 which leads to large packet sizes. 938 Finally, decoding delay is also a crucial issue in communications. 939 Again, both approaches have different delay properties: UXP 940 introduces a decoding delay because a reasonable amount of correctly 941 received packets are necessary to start decoding of a TB. The delay 942 in general depends on the dimensions of the interleaver. This should 943 be considered for any system design which includes UXP. 944 With ULP, every correctly received media packet can be decoded right 945 away. However, a significant delay is introduced, if packets are 946 corrupted, because in this case one has to wait for several 947 redundancy packets. Thus, the delay is in general dependent on the 948 actual ULP-FEC-packet scheme and cannot be considered in advance 949 during the system design phase. 951 10. Intellectual Property Considerations 953 Siemens AG has filed patent applications that might possibly have 954 technical relations to this contribution. 955 On IPR related issues, Siemens AG refers to the Siemens Statement on 956 Patent Licensing, see http://www.ietf.org/ietf/IPR/SIEMENS-General. 958 11. References 960 [1] J. Rosenberg and H. Schulzrinne, "An RTP Payload Format for 961 Generic Forward Error Correction", Request for Comments 2733, 962 Internet Engineering Task Force, Dec. 1999. 964 [2] A. Albanese, J. Bloemer, J. Edmonds, M. Luby, and M. Sudan, 965 "Priority encoding transmission", IEEE Trans. Inform. Theory, vol. 966 42, no. 6, pp. 1737-1744, Nov. 1996. 968 [3] Shu Lin and Daniel J. Costello, Error Control Coding: 969 Fundamentals and Applications, Prentice-Hall, Inc., Englewood 970 Cliffs, N.J., 1983. 972 [4] W. Li: "Fine Granularity Scalability Using Bit-Plane Coding of 973 DCT Coefficients", ISO/IEC JTC1/SC29/WG11, Doc. MPEG98/M4204, Dec. 974 1998. 976 [5] G. Blaettermann, G. Heising, and D. Marpe: "A Quality Scalable 977 Mode for H.26L", ITU-T SG16, Q.15, Q15-J24, Osaka, May 2000. 979 [6] F. Burkert, T. Stockhammer, and J. Pandel, "Progressive A/V 980 coding for lossy packet networks - a principle approach", Tech. 981 Rep., ITU-T SG16, Q.15, Q15-I36, Red Bank, N.J., Oct. 1999. 983 [7] Guenther Liebl, "Modeling, theoretical analysis, and coding for 984 wireless packet erasure channels", Diploma Thesis, Inst. for 985 Communications Engineering, Munich University of Technology, 1999. 987 12. Acknowledgments 989 Many thanks to Thomas Stockhammer, who initially came up with the 990 idea of unequal erasure protection to improve progressive video 991 transmission over lossy networks. 993 13. Author's Addresses 995 Guenther Liebl, Thomas Stockhammer 996 Institute for Communications Engineering (LNT) 997 Munich University of Technology 998 D-80290 Munich 999 Germany 1000 Email: {liebl,tom}@lnt.e-technik.tu-muenchen.de 1002 Minh-Ha Nguyen, Frank Burkert 1003 Siemens AG - ICM D MP RD MCH 83/81 1004 D-81675 Munich 1005 Germany 1006 Email: {minhha.nguyen,frank.burkert}@mch.siemens.de 1008 Marcel Wagner, Juergen Pandel, Wenrong Weng, Gero Baese 1009 Siemens AG - Corporate Technology CT IC 2 1010 D-81730 Munich 1011 Germany 1012 Email: 1013 {marcel.wagner,juergen.pandel,wenrong.weng,gero.baese}@mchp.siemens. 1014 de 1016 Full Copyright Statement 1018 "Copyright (C) The Internet Society (date). All Rights Reserved. 1019 This document and translations of it may be copied and furnished to 1020 others, and derivative works that comment on or otherwise explain it 1021 or assist in its implementation may be prepared, copied, published 1022 and distributed, in whole or in part, without restriction of any 1023 kind, provided that the above copyright notice and this paragraph 1024 are included on all such copies and derivative works. However, this 1025 document itself may not be modified in any way, such as by removing 1026 the copyright notice or references to the Internet Society or other 1027 Internet organizations, except as needed for the purpose of 1028 developing Internet standards in which case the procedures for 1029 copyrights defined in the Internet Standards process must be 1030 followed, or as required to translate it into languages other than 1031 English. 1033 The limited permissions granted above are perpetual and will not be 1034 revoked by the Internet Society or its successors or assigns. 1036 This document and the information contained herein is provided on an 1037 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1038 TASK FORCE DISCLAIMS ALL WARRANTIES; EXPRESS OR IMPLIED; INCLUDING 1039 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF INFORMATION HEREIN 1040 WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1041 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.