idnits 2.17.1 draft-ietf-avt-rtp-g718-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 8 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 22, 2010) is 5117 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC3550' is mentioned on line 890, but not defined -- Looks like a reference, but probably isn't: '4340' on line 594 == Unused Reference: 'RFC5104' is defined on line 1159, but no explicit reference was found in the text == Unused Reference: 'RFC4340' is defined on line 1184, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AMR-WB' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.718' ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 2326 (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport WG Ari Lakaniemi 2 Internet Draft Nokia 3 Intended status: Standards track Ye-Kui Wang 4 Expires: October 2010 Huawei Technologies 5 April 22, 2010 7 RTP payload format for G.718 speech/audio 8 draft-ietf-avt-rtp-g718-03.txt 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This Internet-Draft will expire on October 22, 2010. 32 Copyright Notice 34 Copyright (c) 2010 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the BSD License. 47 Abstract 49 This document specifies the Real-Time Transport Protocol (RTP) 50 payload format for the Embedded Variable Bit-Rate (EV-VBR) 51 speech/audio codec, specified in ITU-T G.718. A media type 52 registration for this RTP payload format is also included. 54 Conventions used in this document 56 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 57 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 58 document are to be interpreted as described in RFC 2119 [RFC2119]. 60 Table of Contents 62 1. Introduction...................................................3 63 2. Background.....................................................3 64 2.1. The G.718 codec...........................................3 65 2.2. Benefits of layered design................................5 66 2.3. Transmitting layered data.................................5 67 2.4. Scaling scenarios & rate control..........................6 68 3. G.718 RTP payload format.......................................7 69 3.1. Payload Structure.........................................7 70 3.1.1. Payload Header.......................................7 71 3.1.2. G.718 transport blocks...............................8 72 3.2. Handling the Encoded data................................11 73 3.3. G.718 scaling............................................13 74 3.4. CRC verification.........................................14 75 3.5. G.718 session............................................14 76 3.6. Cross-stream/cross-layer timing synchronization..........14 77 3.7. RTP Header usage.........................................15 78 4. Payload Format Parameters.....................................15 79 4.1. Media Type Registration..................................15 80 4.2. Mapping to SDP Parameters................................17 81 4.3. Offer/answer considerations..............................18 82 4.4. Declarative usage of SDP.................................18 83 4.5. SDP examples.............................................18 84 5. Security Considerations.......................................20 85 6. Congestion control............................................21 86 7. IANA Considerations...........................................22 87 APPENDIX A: Payload examples.....................................23 88 A.1. Simple payload examples..................................23 89 A.1.1. All the layers in the same payload..................23 90 A.1.2. Layers in separate RTP streams......................24 91 A.2. Advanced examples........................................25 92 A.2.1. Different update rate for subset of layers..........25 93 A.2.2. Redundant frames with limited set of layers.........26 94 8. References....................................................28 95 8.1. Normative References.....................................28 96 8.2. Informative References...................................29 97 Author's Addresses...............................................30 98 Acknowledgment...................................................30 99 9. Open Issues...................................................30 100 10. Changes Log..................................................31 102 1. Introduction 104 The International Telecommunication Union (ITU-T) Recommendation 105 G.718 [G.718] specifies the Embedded Variable Bit Rate (EV-VBR) 106 speech/audio codec. This document specifies the Real-time Transport 107 Protocol (RTP) [RFC3550] payload format for this codec. 109 2. Background 111 2.1. The G.718 codec 113 G.718 is an embedded variable rate speech codec having a layered 114 design. The bitstream of the G.718 core codec consists of a core 115 layer, denoted as L1, and four enhancement layers, denoted as L2-L5. 116 The bit-rates of the G.718 core codec range from 8 kbit/s (core layer 117 only) to 32 kbit/s (with all layers up to L5). Furthermore, the G.718 118 codec supports also discontinuous transmission (DTX) and comfort 119 noise generation (CNG) by sending Silence Descriptor (SID) frames 120 during periods of non-active input signal, resulting in a reduced 121 bit-rate. The sampling frequency of the core codec is 16 kHz and the 122 codec operates on 20 ms frames. The G.718 codec is also capable of 123 narrowband operation with audio input and/or output at 8 kHz sampling 124 frequency. 126 While transmitting/receiving the core layer L1 is enough for 127 successful decoding of the audio content, each of the enhancement 128 layers Ln (n being 2 to 5, inclusive) provides an improvement to 129 reconstructed audio quality. Thus, the core layer ensures the basic 130 communication while the enhancement layers can be used to improve the 131 perceptual quality. Furthermore, enhancement layers are dependent on 132 all the lower layers in a sense that successful decoding of layer Ln 133 requires also all the layers Lm with mn MUST 614 also be discarded. 616 3.5. G.718 session 618 An G.718 session consists of one or several RTP sessions carrying 619 encoded G.718 data according the payload format specified in section 620 3.1. 622 3.6. Cross-stream/cross-layer timing synchronization 624 In case an G.718 session consists of multiple RTP sessions, the RTP 625 packets transmitted on separate RTP sessions need to be synchronized 626 in order to enable reconstruction of the frames in the receiving end. 627 Since each of the RTP sessions uses its own random initial value for 628 the RTP timestamp, there is also a random offset between the RTP 629 timestamps values carrying the EDUs belonging to the same encoded 630 frame in different RTP sessions. 632 The receiver MUST use the traditional RTCP based mechanism to 633 synchronize streams by using the RTP and NTP timestamps of the RTCP 634 Sender Reports (SR) it receives. 636 3.7. RTP Header usage 638 This section specifies the usage of some fields of the RTP header 639 (specified in section 5 of [RFC3550]) with the G.718 RTP payload 640 format. Setting of other RTP header fields is as specified in 641 [RFC3550]. 643 The RTP timestamp corresponds to the sampling instant of the first 644 encoded sample of the earliest frame in the payload. The timestamp 645 clock frequency is 32 kHz. 647 The marker bit (M) of each of the RTP streams of the session SHALL be 648 set to value 1 if the payload carries an EDU belonging to the first 649 frame after an inactive period, i.e. an EDU from the first frame of a 650 talkspurt. For all other packets the marker bit is set to value 0. 652 4. Payload Format Parameters 654 This section defines the parameters that may be used to configure 655 optional features in the G.718 RTP transmission. 657 The parameters are defined here as part of the media subtype 658 registration for the G.718 codec. Mapping of the parameters into the 659 Session Description Protocol (SDP) [RFC4566] is also provided for 660 those applications that use SDP. In control protocols that do not 661 use MIME or SDP, the media type parameters must be mapped to the 662 appropriate format used with that control protocol. 664 4.1. Media Type Registration 666 This registration is done using the template defined in RFC 4288 667 [RFC4288] and following RFC 4855 [RFC4855]. 669 Type name: audio 671 Subtype name: G718 673 Required parameters: none 675 Optional parameters: 677 mode: This parameter MAY be used to indicate whether the 678 mode with layer L1 being present or the AMR-WB 679 compatible mode (with layer L1' being present) is in 680 use. If this parameter is not present or the value of 681 this parameter is equal to 0, the mode with layer L1 682 being present is in use. Otherwise, the AMR-WB 683 compatible mode is in use. When this parameter is 684 present, the value MUST be either 0 or 1. 686 Author's note: When the upcoming stereo and SWB options are 687 present, the semantics of this parameter may change. 689 layers: The numbers of the layers (in range from 1 to 5, 690 denoting layers from L1 to L5, respectively) 691 transmitted in this session, expressed as comma- 692 separated list of layer numbers. If the parameter is 693 present, at least layer L1 or L1' MUST be included in 694 the list of layers in one of the RTP sessions included 695 in the G.718 session. If the parameter is not present, 696 all layers up to layer L5 MAY be used in the session. 698 Author's note: Why not use semantics similarly as L-ID? 700 ptime: The recommended length of time (in milliseconds) 701 represented by the media in a packet. See Section 6 702 of [RFC4566]. 704 maxptime: The maximum length of time (in milliseconds) that can 705 be encapsulated in a packet. See Section 6 of 706 [RFC4566] 708 Author's note: Some further study is needed to see if separate 709 parameters for sending and receiving capabilities/preferences are 710 needed -- especially for upcoming stereo and SWB options. 712 Author's note: The support for upcoming SWB and stereo options 713 needs to be taken into account. Basically we can either 1) extend 714 the parameter "layers" to cover also this aspect, or 2) define 715 separate parameter(s) for these new options when more details on 716 the stereo/SWB support are available. 718 Encoding considerations: 720 This media type is framed and contains binary data; see Section 4.8 721 of [RFC4288]. 723 Security considerations: See Section 6 of RFC xxxx 725 Interoperability considerations: none 727 Published specification: RFC xxxx 729 Applications which use this media type: 731 For example Voice over IP, audio and video conferencing, audio 732 streaming and voice messaging. 734 Additional information: none 736 Person & email address to contact for further information: 738 Ari Lakaniemi, ari.lakaniemi@nokia.com 740 Intended usage: COMMON 742 Restrictions on usage: 744 This media type depends on RTP framing, and hence is only defined 745 for transfer via RTP [RFC3550] 747 Author: 749 Ari Lakaniemi, ari.lakaniemi@nokia.com 751 Change controller: 753 IETF Audio/Video Transport working group delegated from the IESG 755 4.2. Mapping to SDP Parameters 757 The information carried in the media type specification has a 758 specific mapping to fields of the SDP [RFC4566], which is commonly 759 used to describe RTP sessions. When SDP is used to specify sessions 760 employing the G.718 codec, the mapping is as follows: 762 o The media type ("audio") goes in SDP "m=" as the media name. 764 o The media subtype ("G718") goes in SDP "a=rtpmap" as the encoding 765 name. The RTP clock rate in "a=rtpmap" MUST be 32000 for G.718. 767 Author's note: The current choice for the RTP clock rate is a 768 'placeholder'. The clock rate needs to be set according to SWB 769 sampling rate, which is still T.B.D. Since the core codec employs 770 16000 Hz sampling rate, an integer multiple of 16000 Hz seems to 771 be a preferable choice. 773 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 774 "a=maxptime" attributes, respectively. 776 o Any remaining parameters go in the SDP "a=fmtp" attribute by 777 copying them directly from the media type string as a semicolon 778 separated list of parameter=value pairs. 780 4.3. Offer/answer considerations 782 The following considerations apply when using the SDP offer/answer 783 [RFC3264] mechanism to negotiate the G.718 transport. The parameter 784 "layers" MAY be used to indicate the layer configuration for the each 785 RTP session belonging to current G.718 session an end-point making 786 the offer is ready to transmit and wishes to receive. 788 o In case the G.718 session consists of a single RTP session, it is 789 RECOMMENDED not to impose any layer restrictions for the session 790 but to use the rate control functionality to set possible 791 restrictions on usage of the higher or highest layers. If the 792 offer includes a layer configuration parameter, the answer MAY use 793 different configuration, but the highest layer in the answer MUST 794 NOT be higher than the highest layer of the offered configuration. 796 Author's note: Support for answer modifying the layer 797 configuration is FFS. 799 In case the G.718 session consists of multiple RTP sessions, the 800 answer MUST use the layer configurations provided in the offer for 801 the sessions it accepts. 803 4.4. Declarative usage of SDP 805 In declarative usage, such as SDP in RTSP [RFC2326] or SAP [RFC2974], 806 the parameter "layers" SHALL be interpreted to provide a set of 807 layers that the sender may use in the session. 809 4.5. SDP examples 811 Some example SDP session descriptions utilizing G.718 encodings are 812 provided below. 814 The first example illustrates the simple case where the G.718 session 815 employing a single RTP session and the AVPF profile is offered, and 816 the answer accepts the offer without any changes. 818 Offer: 820 m=audio 49120 RTP/AVPF 97 821 a=rtpmap:97 G718/32000/1 823 Answer: 825 m=audio 49120 RTP/AVPF 97 826 a=rtpmap:97 G718/32000/1 828 The second example shows a bit more complex case where the G.718 829 session using a single RTP session and the AVPF profile is offered 830 with restriction to send/receive only with layers L1 and L2. The 831 answer indicates that the other end-point is happy to receive (and 832 send) layers up to L5. 834 Offer: 836 m=audio 49120 RTP/AVPF 97 837 a=rtpmap:97 G718/32000/1 838 a=fmtp:97 layers=1,2 840 Answer: 842 m=audio 49120 RTP/AVPF 97 843 a=rtpmap:97 G718/32000/1 844 a=fmtp:97 layers=1,2,3,4,5 846 The third example shows an G.718 session using multiple RTP sessions 847 with the AVPF profile. The answerer wishes to use only layers up to 848 L3. 850 Offer: 852 m=audio 49120 RTP/AVPF 97 853 a=rtpmap:97 G718/32000/1 854 a=fmtp:97 layers=1,2 855 a=mid=1 857 m=audio 49122 RTP/AVPF 98 858 a=rtpmap:98 G718/32000/1 859 a=fmtp:98 layers=3 860 a=mid=2 861 a=depend:lay 1 863 m=audio 49124 RTP/AVPF 99 864 a=rtpmap:99 G718/32000/1 865 a=fmtp:99 layers=4,5 866 a=mid=3 867 a=depend:lay 1 2 869 Answer: 871 m=audio 49120 RTP/AVPF 97 872 a=rtpmap:97 G718/32000/1 873 a=fmtp:97 layers=1,2 874 a=mid=1 876 m=audio 49120 RTP/AVPF 98 877 a=rtpmap:98 G718/32000/1 878 a=fmtp:98 layers=3 879 a=mid=2 880 a=depend:lay 1 882 Note that the dependency signaling according to [smd-sdp] is used in 883 the third example above to indicate the relationship between the 884 layers distributed into separate RTP sessions. 886 5. Security Considerations 888 RTP packets using the payload format defined in this specification 889 are subject to the security considerations discussed in the RTP 890 specification [RFC3550], and in any appropriate RTP profile (for 891 example [RFC3551] or [RFC4585]). This implies that confidentiality 892 of the media streams is achieved by encryption; for example, through 893 the application of SRTP [RFC3711]. Because the data compression used 894 with this payload format is applied end-to-end, any encryption needs 895 to be performed after compression. 897 A potential denial-of-service threat exists for data encodings using 898 compression techniques that have non-uniform receiver-end 899 computational load. The attacker can inject pathological datagrams 900 into the stream that will increase the processing load of the decoder 901 and may cause the receiver to be overloaded. For example inserting 902 additional EDUs representing the higher enhancement layers on top of 903 the ones actually transmitted may increase the decoder load. However, 904 the G.718 codec is not particularly vulnerable to such an attack, 905 since the majority of the computational load in an G.718 session is 906 associated to the encoder. Another form of possible attach might be 907 forging of codec bit-rate control messages, which may result in 908 encoder operating employing higher number of enhancement layers than 909 originally intended and thereby requiring larger amount of 910 computation resources. Therefore, the usage of data origin 911 authentication and data integrity protection of at least the RTP 912 packet is RECOMMENDED; for example, with SRTP [RFC3711]. 914 Note that the appropriate mechanism to ensure confidentiality and 915 integrity of RTP packets and their payloads is very dependent on the 916 application and on the transport and signaling protocols employed. 917 Thus, although SRTP is given as an example above, other possible 918 choices exist. 920 Note that end-to-end security with either authentication, integrity 921 or confidentiality protection will prevent a network element not 922 within the security context from performing media-aware operations 923 other than discarding complete packets. To allow any (media-aware) 924 intermediate network element to perform its operations, it is 925 required to be a trusted entity which is included in the security 926 context establishment. 928 6. Congestion control 930 As scalable codec G.718 implicitly provides means for congestion 931 control by providing a possibility for 'thinning' the bitstream. The 932 RTP payload format according to this specification provides several 933 different means for reducing the G.718 session bandwidth. The most 934 appropriate mechanism (in terms of impact to the user experience) 935 depends on the employed payload structure and also on the employed 936 session configuration (single RTP session or multiple RTP sessions). 937 The following means (in no particular order) can be used to assist 938 congestion control procedures -- either by the sender or by the 939 intermediate node. 941 o The transport blocks carrying the EDUs representing the highest 942 layers within the payload may be dropped. 944 o The payloads carrying the EDUs representing the highest layers in 945 an G.718 session are dropped. 947 o Transport blocks or payloads carrying EDUs belonging to redundant 948 frames included in the payload are dropped. 950 7. IANA Considerations 952 IANA is kindly requested to register a media type for the G.718 codec 953 for RTP transport, as specified in section 4.1. of this document. 955 APPENDIX A: Payload examples 957 The G.718 payload structure enables flexible transport either by 958 carrying all layers in the same payload or separating the layers into 959 separate payloads. The following subsections illustrate different 960 possibilities for transport by simple examples. Note that examples do 961 not show the full payload structure to keep the illustration simple. 963 A.1. Simple payload examples 965 A.1.1. All the layers in the same payload 967 The illustration below shows layers L1-L3 from two encoded frames 968 encapsulated into separate payloads using single transport block. 970 +-------+--------+-----+------+------+------+ 971 | RTP1 | L-ID=3 |NF=0 |F1-L1 |F1-L2 |F1-L3 | 972 +-------+--------+-----+------+------+------+ 974 +-------+--------+-----+------+------+------+ 975 | RTP2 | L-ID=3 |NF=0 |F2-L1 |F2-L2 |F2-L3 | 976 +-------+--------+-----+------+------+------+ 978 In case the same layers from two input frames are encapsulated into 979 one payload using single transport block, the structure is as shown 980 below. 982 +-------+--------+-----+------+------+------+------+------+------+ 983 | RTP1 | L-ID=3 |NF=1 |F1-L1 |F2-L1 |F1-L2 |F2-L2 |F3-L3 |F2-L3 | 984 +-------+--------+-----+------+------+------+------+------+------+ 986 The third example illustrates the case where the layers L1-L3 from 987 two input frames are encapsulated into one payload using two separate 988 transport blocks, the first one carrying L1 and the other one 989 containing L2 and L3. 991 +-------+--------+-----+------+------+ 992 | RTP1 | L-ID=1 |NF=1 |F1-L1 |F2-L1 | 993 +-------+--------+-----+------+------+------+------+ 994 | L-ID=7 |NF=1 |F1-L2 |F2-L2 |F2-L2 |F2-L3 | 995 +--------+-----+------+------+------+------+ 997 A.1.2. Layers in separate RTP streams 999 In this case the data for each layer is transmitted in its own 1000 payload. 1002 In the first example each transport block including a single EDU is 1003 carried in its own RTP payload. 1005 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1006 | RTP1a | L-ID=1 |NF=0 |F1-L1| | RTP1b | L-ID=6 |NF=0 |F1-L2| 1007 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1009 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1010 | RTP1c |L-ID=10 |NF=0 |F1-L3| | RTP2a | L-ID=1 |NF=0 |F2-L1| 1011 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1013 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1014 | RTP2b | L-ID=6 |NF=0 |F2-L2| | RTP2c |L-ID=10 |NF=0 |F2-L3| 1015 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1017 If the payloads carry data from two consecutive input frames, the 1018 same encoded data as in the previous example is arranged as follows. 1020 +-------+--------+-----+-----+-----+ 1021 | RTP1a | L-ID=1 |NF=1 |F1-L1|F2-L1| 1022 +-------+--------+-----+-----+-----+ 1024 +-------+--------+-----+-----+-----+ 1025 | RTP1b | L-ID=6 |NF=1 |F1-L2|F2-L2| 1026 +-------+--------+-----+-----+-----+ 1028 +-------+--------+-----+-----+-----+ 1029 | RTP1c |L-ID=10 |NF=1 |F1-L3|F2-L3| 1030 +-------+--------+-----+-----+-----+ 1032 A.2. Advanced examples 1034 A.2.1. Different update rate for subset of layers 1036 An example employing different update rates (i.e. different number of 1037 frames per packet) for selected subsets of layers. In these examples 1038 all core codec layers L1-L5 are shown. 1040 +-------+--------+-----+-----+-----+-----+-----+ 1041 | RTP1 | L-ID=1 |NF=3 |F1-L1|F2-L1|F3-L1|F4-L1| 1042 +-------+--------+-----+-----+-----+-----+-----+ 1044 +-------+--------+-----+-----+-----+-----+-----+ 1045 | RTP2a | L-ID=7 |NF=1 |F1-L2|F2-L2|F1-L3|F2-L3| 1046 +-------+--------+-----+-----+-----+-----+-----+ 1048 +-------+--------+-----+-----+-----+ 1049 | RTP3a |L-ID=14 |NF=0 |F1-L4|F1-L5| 1050 +-------+--------+-----+-----+-----+ 1052 +-------+--------+-----+-----+-----+ 1053 | RTP3b |L-ID=14 |NF=0 |F2-L4|F2-L5| 1054 +-------+--------+-----+-----+-----+ 1056 +-------+--------+-----+-----+-----+-----+-----+ 1057 | RTP2b | L-ID=7 |NF=1 |F3-L2|F4-L2|F3-L3|F4-L3| 1058 +-------+--------+-----+-----+-----+-----+-----+ 1060 +-------+--------+-----+-----+-----+ 1061 | RTP3c |L-ID=14 |NF=0 |F3-L4|F3-L5| 1062 +-------+--------+-----+-----+-----+ 1064 +-------+--------+-----+-----+-----+ 1065 | RTP3d |L-ID=14 |NF=0 |F4-L4|F4-L5| 1066 +-------+--------+-----+-----+-----+ 1068 A.2.2. Redundant frames with limited set of layers 1070 An example transmitting layers L1-L3 as primary data and L1 (of the 1071 previous frame) as redundant data is shown below. Each payload 1072 carries one primary (i.e. new) frame in one transport block and one 1073 redundant frame, which in this example is the frame preceding the 1074 primary frame, in another transport block. 1076 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1077 | RTP1 | L-ID=1 |NF=0 |F0-L1| L-ID=3 |NF=0 |F1-L1|F1-L2|F1-L3| 1078 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1080 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1081 | RTP2 | L-ID=1 |NF=0 |F1-L1| L-ID=3 |NF=0 |F2-L1|F2-L2|F2-L3| 1082 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1084 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1085 | RTP3 | L-ID=1 |NF=0 |F2-L1| L-ID=3 |NF=0 |F3-L1|F3-L2|F3-L3| 1086 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1088 Alternatively, the payload carrying also redundant data for a subset 1089 of layers can be arranged differently, as shown in the example below. 1091 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1092 | RTP1 | L-ID=3 |NF=0 |F0-L1|F0-L2|F0-L3| L-ID=1 |NF=0 |F1-L1| 1093 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1095 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1096 | RTP2 | L-ID=3 |NF=0 |F1-L1|F1-L2|F1-L3| L-ID=1 |NF=0 |F2-L1| 1097 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1099 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1100 | RTP3 | L-ID=3 |NF=0 |F2-L1|F2-L2|F2-L3| L-ID=1 |NF=0 |F3-L1| 1101 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1103 Now the first transport block carries the primary data and the second 1104 transport block carries the redundant data, which in this case covers 1105 the frame following the primary frame. The benefit of this approach 1106 is that the redundant data is included in the last (secondary) 1107 transport block of the payload, which might be beneficial for 1108 possible payload scaling operation within the network. 1110 8. References 1112 8.1. Normative References 1114 [AMR-WB] 3GPP TS 26.171, "Adaptive Multi-Rate Wideband (AMR-WB) 1115 speech codec; General description (Release 7)", v7.0.0, 1116 September 2006. 1118 [G.718] ITU-T Recommendation G.718, "Frame Error Robust Narrowband 1119 and Wideband Embedded Variable Bit-Rate Coding of Speech 1120 and Audio from 8-32 Kbit/s", (consented) May 2008. 1122 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1123 Requirement Levels", BCP 14, RFC 2119, March 1997. 1125 [RFC3264] Rosenberg, J., Schulzrinne, H., "An Offer/Answer Model with 1126 Session Description Protocol (SDP)", RFC 3264, June 2002. 1128 [RFC3550]Schulzrinne, H., Casner, S., Frederick, R. and Jacobson, V., 1129 "RTP: A Transport Protocol for Real-Time Applications", STD 1130 64, RFC 3550, July 2003. 1132 [RFC3551] Schulzrinne, H., Casner, S., "RTP Profile for Audio and 1133 Video Conferences with Minimal Control", STD 65, RFC 3551, 1134 July 2003. 1136 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman, 1137 K., "The Secure Real-Time Transport Protocol (SRTP)", RFC 1138 3711, March 2004. 1140 [RFC4288] Freed, N., Klensin, J., "Media Type Specifications and 1141 Registration Procedures", BCP 13, RFC 4288, December 2005. 1143 [RFC4566] Handley, M., Jacobson, V. and Perkins, C., "SDP: Session 1144 Description Protocol", RFC 4566, July 2006. 1146 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., Rey, J., 1147 "Extended RTP Profile for Real-Time Transport Control 1148 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 1149 2006. 1151 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 1152 Formats", RFC 4855, February 2007. 1154 [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., Xie, Q., "RTP 1155 Payload Format and File Storage Format fort he Adaptive 1156 Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) 1157 Audio Codecs", RFC 4867, April 2007. 1159 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., Burman, B., "Codec 1160 Control Messages in the RTP Audio-Visual Profile with 1161 Feedback (AVPF)", RFC 5104, Feburary 2008. 1163 [smd-sdp] Schierl, T., Wenger, S., "Signaling media decoding 1164 dependency in Session Description Protocol (SDP)", draft- 1165 schierl-mmusic-layered-codec-04 (work in progress), June 1166 2007. 1168 8.2. Informative References 1170 [McCanne] McCanne, S., Jacobson, V., and Vetterli, M., "Receiver- 1171 driven layered multicast", in Proc. of ACM SIGCOMM'96, 1172 pages 117--130, Stanford, CA, August 1996. 1174 [RFC2326] Schulzrinne, H., Rao, A., Lanphier, R., "Real Time 1175 Streaming Protocol (RTSP)", RFC 2326, April 1998. 1177 [RFC2974] Handley, M., Perkins, C., Whelan, E., "Session Announcement 1178 Protocol", RFC 2974, October 2000. 1180 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., 1181 Fairhurst, G., "The Lightweight User Datagram Protocol 1182 (UDP-Lite)", RFC 3828, July 2004. 1184 [RFC4340] Kohler, E., Handley, M., Floyd, S., "Data Congestion 1185 Control Protocol (DCCP)", RFC 4340, March 2006. 1187 [RFC5117] Westerlund, M., Wenger, S., "RTP Topologies", RFC 5117, 1188 January 2008. 1190 Author's Addresses 1192 Ari Lakaniemi 1193 Nokia 1194 P.O.Box 407 1195 FIN-00045 Nokia Group, FINLAND 1197 Phone: +358-71-8008000 1198 Email: ari.lakaniemi@nokia.com 1200 Ye-Kui Wang 1201 Huawei Technologies 1202 400 Somerset Corp Blvd, Suite 602 1203 Bridgewater, NJ 08807, USA 1205 Phone: +1-908-541-3518 1206 EMail: yekuiwang@huawei.com 1208 Acknowledgment 1210 Funding for the RFC Editor function is currently provided by the 1211 Internet Society. 1213 9. Open Issues 1215 1) Support of super-wideband (SWB) audio and stereophonic encoding 1216 extensions to ITU-T G.718 currently being worked on by ITU-T is to 1217 be specified after ITU-T completes the work in that regards. 1219 a. Some further study is needed to see if separate parameters 1220 for sending and receiving capabilities/preferences are needed 1221 -- especially for upcoming stereo and SWB options. 1223 b. The support for upcoming SWB and stereo options needs to be 1224 taken into account. Basically we can either 1) extend the 1225 parameter "layers" to cover also this aspect, or 2) define 1226 separate parameter(s) for these new options when more details 1227 on the stereo/SWB support are available. 1229 2) For streaming or other applications that allow for relatively long 1230 end-to-end delay, sometimes it would be beneficial to aggregate 1231 more than 4 frames in one Transport Block (TB). Should the length 1232 of the NF field be larger? 1234 3) On layer structure and configuration signalling. Currently, a 1235 unique layer ID is assigned for any possible layer combinations. 1237 See the editing notes below Table 3 for other possible approaches. 1238 One of the alternative ways may be chosen in the final draft. 1240 4) Currently, it is mandated that lower layer EDUs of later frames go 1241 before higher layer EDUs of earlier frames in a transport block. 1242 This way is friendlier to adaptation (dropping of higher layers). 1243 However, if all layers are received, then the depacketizer needs 1244 to reorder the EDUs to their decoding order before feeding them to 1245 the decoder. Therefore, the other way around (i.e. lower layer 1246 EDUs of later frames go after higher layer EDUs of earlier frames, 1247 or EDUs in transport blocks are placed in decoding order) is more 1248 friendly to the depacketizer. Another benefit of the latter is 1249 that it does not introduce any end-to-end delay. Which way to be 1250 specified (or both allowed if needed) is FFS. 1252 5) MANEs dropping RTP packets are RTP translators. But are those 1253 MANEs dropping a subset of the transport blocks in one packet also 1254 RTP translators? 1256 6) The RTCP based cross-session synchronization is not possible until 1257 the first RTCP SRs are received in all sessions. This implies that 1258 decoding only a subset of layers may be possible until RTCP SRs in 1259 all sessions have been received. This may imposes higher end-to- 1260 end delay or higher bandwidth for RTCP data, and the approach may 1261 not work perfectly for some multicast topologies. There is a study 1262 ongoing by some AVT members. Once there is an acceptable solution 1263 fouthe draft documenting that solution may be referenced in this 1264 draft. 1266 7) It might be better to change the semantics of the media type 1267 parameter 'layers' to be similar as that for L-ID. 1269 8) Offer/answer with answer being capable of modifying the layer 1270 configuration is FFS. 1272 9) Some references need to be updated in the final draft. 1274 10. Changes Log 1276 From draft-ietf-avt-rtp-g718-00 to draft-ietf-avt-rtp-g718-01 1278 - Updated the boiler template. 1280 - Changed Ye-Kui Wang's affiliation and address. 1282 From draft-ietf-avt-rtp-g718-01 to draft-ietf-avt-rtp-g718-02 1283 - Updated the boiler template (added the last sentence in Copyright 1284 Notice).