idnits 2.17.1 draft-ietf-avt-rtp-g718-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 8 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 28, 2009) is 5475 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC3550' is mentioned on line 895, but not defined -- Looks like a reference, but probably isn't: '4340' on line 600 == Unused Reference: 'RFC5104' is defined on line 1164, but no explicit reference was found in the text == Unused Reference: 'RFC4340' is defined on line 1189, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AMR-WB' -- Possible downref: Non-RFC (?) normative reference: ref. 'G.718' ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 2326 (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport WG Ari Lakaniemi 2 Internet Draft Nokia 3 Intended status: Standards track Ye-Kui Wang 4 Expires: October 2009 Huawei Technologies 5 April 28, 2009 7 RTP payload format for G.718 speech/audio 8 draft-ietf-avt-rtp-g718-01.txt 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. This document may contain material 14 from IETF Documents or IETF Contributions published or made publicly 15 available before November 10, 2008. The person(s) controlling the 16 copyright in some of this material may not have granted the IETF 17 Trust the right to allow modifications of such material outside the 18 IETF Standards Process. Without obtaining an adequate license from 19 the person(s) controlling the copyright in such materials, this 20 document may not be modified outside the IETF Standards Process, and 21 derivative works of it may not be created outside the IETF Standards 22 Process, except to format it for publication as an RFC or to 23 translate it into languages other than English. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF), its areas, and its working groups. Note that 27 other groups may also distribute working documents as Internet-Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/ietf/1id-abstracts.txt. 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 This Internet-Draft will expire on October 28, 2009. 42 Copyright Notice 44 Copyright (c) 2009 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents in effect on the date of 49 publication of this document (http://trustee.ietf.org/license-info). 50 Please review these documents carefully, as they describe your rights 51 and restrictions with respect to this document. 53 Abstract 55 This document specifies the Real-Time Transport Protocol (RTP) 56 payload format for the Embedded Variable Bit-Rate (EV-VBR) 57 speech/audio codec, specified in ITU-T G.718. A media type 58 registration for this RTP payload format is also included. 60 Conventions used in this document 62 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 63 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 64 document are to be interpreted as described in RFC 2119 [RFC2119]. 66 Table of Contents 68 1. Introduction...................................................3 69 2. Background.....................................................3 70 2.1. The G.718 codec...........................................3 71 2.2. Benefits of layered design................................5 72 2.3. Transmitting layered data.................................5 73 2.4. Scaling scenarios & rate control..........................6 74 3. G.718 RTP payload format.......................................7 75 3.1. Payload Structure.........................................7 76 3.1.1. Payload Header.......................................7 77 3.1.2. G.718 transport blocks...............................8 78 3.2. Handling the Encoded data................................11 79 3.3. G.718 scaling............................................13 80 3.4. CRC verification.........................................14 81 3.5. G.718 session............................................14 82 3.6. Cross-stream/cross-layer timing synchronization..........14 83 3.7. RTP Header usage.........................................15 84 4. Payload Format Parameters.....................................15 85 4.1. Media Type Registration..................................15 86 4.2. Mapping to SDP Parameters................................17 87 4.3. Offer/answer considerations..............................18 88 4.4. Declarative usage of SDP.................................18 89 4.5. SDP examples.............................................18 90 5. Security Considerations.......................................20 91 6. Congestion control............................................21 92 7. IANA Considerations...........................................22 93 APPENDIX A: Payload examples.....................................23 94 A.1. Simple payload examples..................................23 95 A.1.1. All the layers in the same payload..................23 96 A.1.2. Layers in separate RTP streams......................24 97 A.2. Advanced examples........................................25 98 A.2.1. Different update rate for subset of layers..........25 99 A.2.2. Redundant frames with limited set of layers.........26 100 8. References....................................................28 101 8.1. Normative References.....................................28 102 8.2. Informative References...................................29 103 Author's Addresses...............................................30 104 Acknowledgment...................................................30 105 9. Open Issues...................................................30 106 10. Changes Log..................................................31 108 1. Introduction 110 The International Telecommunication Union (ITU-T) Recommendation 111 G.718 [G.718] specifies the Embedded Variable Bit Rate (EV-VBR) 112 speech/audio codec. This document specifies the Real-time Transport 113 Protocol (RTP) [RFC3550] payload format for this codec. 115 2. Background 117 2.1. The G.718 codec 119 G.718 is an embedded variable rate speech codec having a layered 120 design. The bitstream of the G.718 core codec consists of a core 121 layer, denoted as L1, and four enhancement layers, denoted as L2-L5. 122 The bit-rates of the G.718 core codec range from 8 kbit/s (core layer 123 only) to 32 kbit/s (with all layers up to L5). Furthermore, the G.718 124 codec supports also discontinuous transmission (DTX) and comfort 125 noise generation (CNG) by sending Silence Descriptor (SID) frames 126 during periods of non-active input signal, resulting in a reduced 127 bit-rate. The sampling frequency of the core codec is 16 kHz and the 128 codec operates on 20 ms frames. The G.718 codec is also capable of 129 narrowband operation with audio input and/or output at 8 kHz sampling 130 frequency. 132 While transmitting/receiving the core layer L1 is enough for 133 successful decoding of the audio content, each of the enhancement 134 layers Ln (n being 2 to 5, inclusive) provides an improvement to 135 reconstructed audio quality. Thus, the core layer ensures the basic 136 communication while the enhancement layers can be used to improve the 137 perceptual quality. Furthermore, enhancement layers are dependent on 138 all the lower layers in a sense that successful decoding of layer Ln 139 requires also all the layers Lm with mn MUST 620 also be discarded. 622 3.5. G.718 session 624 An G.718 session consists of one or several RTP sessions carrying 625 encoded G.718 data according the payload format specified in section 626 3.1. 628 3.6. Cross-stream/cross-layer timing synchronization 630 In case an G.718 session consists of multiple RTP sessions, the RTP 631 packets transmitted on separate RTP sessions need to be synchronized 632 in order to enable reconstruction of the frames in the receiving end. 633 Since each of the RTP sessions uses its own random initial value for 634 the RTP timestamp, there is also a random offset between the RTP 635 timestamps values carrying the EDUs belonging to the same encoded 636 frame in different RTP sessions. 638 The receiver MUST use the traditional RTCP based mechanism to 639 synchronize streams by using the RTP and NTP timestamps of the RTCP 640 Sender Reports (SR) it receives. 642 3.7. RTP Header usage 644 This section specifies the usage of some fields of the RTP header 645 (specified in section 5 of [RFC3550]) with the G.718 RTP payload 646 format. Setting of other RTP header fields is as specified in 647 [RFC3550]. 649 The RTP timestamp corresponds to the sampling instant of the first 650 encoded sample of the earliest frame in the payload. The timestamp 651 clock frequency is 32 kHz. 653 The marker bit (M) of each of the RTP streams of the session SHALL be 654 set to value 1 if the payload carries an EDU belonging to the first 655 frame after an inactive period, i.e. an EDU from the first frame of a 656 talkspurt. For all other packets the marker bit is set to value 0. 658 4. Payload Format Parameters 660 This section defines the parameters that may be used to configure 661 optional features in the G.718 RTP transmission. 663 The parameters are defined here as part of the media subtype 664 registration for the G.718 codec. Mapping of the parameters into the 665 Session Description Protocol (SDP) [RFC4566] is also provided for 666 those applications that use SDP. In control protocols that do not 667 use MIME or SDP, the media type parameters must be mapped to the 668 appropriate format used with that control protocol. 670 4.1. Media Type Registration 672 This registration is done using the template defined in RFC 4288 673 [RFC4288] and following RFC 4855 [RFC4855]. 675 Type name: audio 677 Subtype name: G718 679 Required parameters: none 681 Optional parameters: 683 mode: This parameter MAY be used to indicate whether the 684 mode with layer L1 being present or the AMR-WB 685 compatible mode (with layer L1' being present) is in 686 use. If this parameter is not present or the value of 687 this parameter is equal to 0, the mode with layer L1 688 being present is in use. Otherwise, the AMR-WB 689 compatible mode is in use. When this parameter is 690 present, the value MUST be either 0 or 1. 692 Author's note: When the upcoming stereo and SWB options are 693 present, the semantics of this parameter may change. 695 layers: The numbers of the layers (in range from 1 to 5, 696 denoting layers from L1 to L5, respectively) 697 transmitted in this session, expressed as comma- 698 separated list of layer numbers. If the parameter is 699 present, at least layer L1 or L1' MUST be included in 700 the list of layers in one of the RTP sessions included 701 in the G.718 session. If the parameter is not present, 702 all layers up to layer L5 MAY be used in the session. 704 Author's note: Why not use semantics similarly as L-ID? 706 ptime: The recommended length of time (in milliseconds) 707 represented by the media in a packet. See Section 6 708 of [RFC4566]. 710 maxptime: The maximum length of time (in milliseconds) that can 711 be encapsulated in a packet. See Section 6 of 712 [RFC4566] 714 Author's note: Some further study is needed to see if separate 715 parameters for sending and receiving capabilities/preferences are 716 needed -- especially for upcoming stereo and SWB options. 718 Author's note: The support for upcoming SWB and stereo options 719 needs to be taken into account. Basically we can either 1) extend 720 the parameter "layers" to cover also this aspect, or 2) define 721 separate parameter(s) for these new options when more details on 722 the stereo/SWB support are available. 724 Encoding considerations: 726 This media type is framed and contains binary data; see Section 4.8 727 of [RFC4288]. 729 Security considerations: See Section 6 of RFC xxxx 730 Interoperability considerations: none 732 Published specification: RFC xxxx 734 Applications which use this media type: 736 For example Voice over IP, audio and video conferencing, audio 737 streaming and voice messaging. 739 Additional information: none 741 Person & email address to contact for further information: 743 Ari Lakaniemi, ari.lakaniemi@nokia.com 745 Intended usage: COMMON 747 Restrictions on usage: 749 This media type depends on RTP framing, and hence is only defined 750 for transfer via RTP [RFC3550] 752 Author: 754 Ari Lakaniemi, ari.lakaniemi@nokia.com 756 Change controller: 758 IETF Audio/Video Transport working group delegated from the IESG 760 4.2. Mapping to SDP Parameters 762 The information carried in the media type specification has a 763 specific mapping to fields of the SDP [RFC4566], which is commonly 764 used to describe RTP sessions. When SDP is used to specify sessions 765 employing the G.718 codec, the mapping is as follows: 767 o The media type ("audio") goes in SDP "m=" as the media name. 769 o The media subtype ("G718") goes in SDP "a=rtpmap" as the encoding 770 name. The RTP clock rate in "a=rtpmap" MUST be 32000 for G.718. 772 Author's note: The current choice for the RTP clock rate is a 773 'placeholder'. The clock rate needs to be set according to SWB 774 sampling rate, which is still T.B.D. Since the core codec employs 775 16000 Hz sampling rate, an integer multiple of 16000 Hz seems to 776 be a preferable choice. 778 o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 779 "a=maxptime" attributes, respectively. 781 o Any remaining parameters go in the SDP "a=fmtp" attribute by 782 copying them directly from the media type string as a semicolon 783 separated list of parameter=value pairs. 785 4.3. Offer/answer considerations 787 The following considerations apply when using the SDP offer/answer 788 [RFC3264] mechanism to negotiate the G.718 transport. The parameter 789 "layers" MAY be used to indicate the layer configuration for the each 790 RTP session belonging to current G.718 session an end-point making 791 the offer is ready to transmit and wishes to receive. 793 o In case the G.718 session consists of a single RTP session, it is 794 RECOMMENDED not to impose any layer restrictions for the session 795 but to use the rate control functionality to set possible 796 restrictions on usage of the higher or highest layers. If the 797 offer includes a layer configuration parameter, the answer MAY use 798 different configuration, but the highest layer in the answer MUST 799 NOT be higher than the highest layer of the offered configuration. 801 Author's note: Support for answer modifying the layer 802 configuration is FFS. 804 In case the G.718 session consists of multiple RTP sessions, the 805 answer MUST use the layer configurations provided in the offer for 806 the sessions it accepts. 808 4.4. Declarative usage of SDP 810 In declarative usage, such as SDP in RTSP [RFC2326] or SAP [RFC2974], 811 the parameter "layers" SHALL be interpreted to provide a set of 812 layers that the sender may use in the session. 814 4.5. SDP examples 816 Some example SDP session descriptions utilizing G.718 encodings are 817 provided below. 819 The first example illustrates the simple case where the G.718 session 820 employing a single RTP session and the AVPF profile is offered, and 821 the answer accepts the offer without any changes. 823 Offer: 825 m=audio 49120 RTP/AVPF 97 826 a=rtpmap:97 G718/32000/1 828 Answer: 830 m=audio 49120 RTP/AVPF 97 831 a=rtpmap:97 G718/32000/1 833 The second example shows a bit more complex case where the G.718 834 session using a single RTP session and the AVPF profile is offered 835 with restriction to send/receive only with layers L1 and L2. The 836 answer indicates that the other end-point is happy to receive (and 837 send) layers up to L5. 839 Offer: 841 m=audio 49120 RTP/AVPF 97 842 a=rtpmap:97 G718/32000/1 843 a=fmtp:97 layers=1,2 845 Answer: 847 m=audio 49120 RTP/AVPF 97 848 a=rtpmap:97 G718/32000/1 849 a=fmtp:97 layers=1,2,3,4,5 851 The third example shows an G.718 session using multiple RTP sessions 852 with the AVPF profile. The answerer wishes to use only layers up to 853 L3. 855 Offer: 857 m=audio 49120 RTP/AVPF 97 858 a=rtpmap:97 G718/32000/1 859 a=fmtp:97 layers=1,2 860 a=mid=1 862 m=audio 49122 RTP/AVPF 98 863 a=rtpmap:98 G718/32000/1 864 a=fmtp:98 layers=3 865 a=mid=2 866 a=depend:lay 1 868 m=audio 49124 RTP/AVPF 99 869 a=rtpmap:99 G718/32000/1 870 a=fmtp:99 layers=4,5 871 a=mid=3 872 a=depend:lay 1 2 874 Answer: 876 m=audio 49120 RTP/AVPF 97 877 a=rtpmap:97 G718/32000/1 878 a=fmtp:97 layers=1,2 879 a=mid=1 881 m=audio 49120 RTP/AVPF 98 882 a=rtpmap:98 G718/32000/1 883 a=fmtp:98 layers=3 884 a=mid=2 885 a=depend:lay 1 887 Note that the dependency signaling according to [smd-sdp] is used in 888 the third example above to indicate the relationship between the 889 layers distributed into separate RTP sessions. 891 5. Security Considerations 893 RTP packets using the payload format defined in this specification 894 are subject to the security considerations discussed in the RTP 895 specification [RFC3550], and in any appropriate RTP profile (for 896 example [RFC3551] or [RFC4585]). This implies that confidentiality 897 of the media streams is achieved by encryption; for example, through 898 the application of SRTP [RFC3711]. Because the data compression used 899 with this payload format is applied end-to-end, any encryption needs 900 to be performed after compression. 902 A potential denial-of-service threat exists for data encodings using 903 compression techniques that have non-uniform receiver-end 904 computational load. The attacker can inject pathological datagrams 905 into the stream that will increase the processing load of the decoder 906 and may cause the receiver to be overloaded. For example inserting 907 additional EDUs representing the higher enhancement layers on top of 908 the ones actually transmitted may increase the decoder load. However, 909 the G.718 codec is not particularly vulnerable to such an attack, 910 since the majority of the computational load in an G.718 session is 911 associated to the encoder. Another form of possible attach might be 912 forging of codec bit-rate control messages, which may result in 913 encoder operating employing higher number of enhancement layers than 914 originally intended and thereby requiring larger amount of 915 computation resources. Therefore, the usage of data origin 916 authentication and data integrity protection of at least the RTP 917 packet is RECOMMENDED; for example, with SRTP [RFC3711]. 919 Note that the appropriate mechanism to ensure confidentiality and 920 integrity of RTP packets and their payloads is very dependent on the 921 application and on the transport and signaling protocols employed. 922 Thus, although SRTP is given as an example above, other possible 923 choices exist. 925 Note that end-to-end security with either authentication, integrity 926 or confidentiality protection will prevent a network element not 927 within the security context from performing media-aware operations 928 other than discarding complete packets. To allow any (media-aware) 929 intermediate network element to perform its operations, it is 930 required to be a trusted entity which is included in the security 931 context establishment. 933 6. Congestion control 935 As scalable codec G.718 implicitly provides means for congestion 936 control by providing a possibility for 'thinning' the bitstream. The 937 RTP payload format according to this specification provides several 938 different means for reducing the G.718 session bandwidth. The most 939 appropriate mechanism (in terms of impact to the user experience) 940 depends on the employed payload structure and also on the employed 941 session configuration (single RTP session or multiple RTP sessions). 942 The following means (in no particular order) can be used to assist 943 congestion control procedures -- either by the sender or by the 944 intermediate node. 946 o The transport blocks carrying the EDUs representing the highest 947 layers within the payload may be dropped. 949 o The payloads carrying the EDUs representing the highest layers in 950 an G.718 session are dropped. 952 o Transport blocks or payloads carrying EDUs belonging to redundant 953 frames included in the payload are dropped. 955 7. IANA Considerations 957 IANA is kindly requested to register a media type for the G.718 codec 958 for RTP transport, as specified in section 4.1. of this document. 960 APPENDIX A: Payload examples 962 The G.718 payload structure enables flexible transport either by 963 carrying all layers in the same payload or separating the layers into 964 separate payloads. The following subsections illustrate different 965 possibilities for transport by simple examples. Note that examples do 966 not show the full payload structure to keep the illustration simple. 968 A.1. Simple payload examples 970 A.1.1. All the layers in the same payload 972 The illustration below shows layers L1-L3 from two encoded frames 973 encapsulated into separate payloads using single transport block. 975 +-------+--------+-----+------+------+------+ 976 | RTP1 | L-ID=3 |NF=0 |F1-L1 |F1-L2 |F1-L3 | 977 +-------+--------+-----+------+------+------+ 979 +-------+--------+-----+------+------+------+ 980 | RTP2 | L-ID=3 |NF=0 |F2-L1 |F2-L2 |F2-L3 | 981 +-------+--------+-----+------+------+------+ 983 In case the same layers from two input frames are encapsulated into 984 one payload using single transport block, the structure is as shown 985 below. 987 +-------+--------+-----+------+------+------+------+------+------+ 988 | RTP1 | L-ID=3 |NF=1 |F1-L1 |F2-L1 |F1-L2 |F2-L2 |F3-L3 |F2-L3 | 989 +-------+--------+-----+------+------+------+------+------+------+ 991 The third example illustrates the case where the layers L1-L3 from 992 two input frames are encapsulated into one payload using two separate 993 transport blocks, the first one carrying L1 and the other one 994 containing L2 and L3. 996 +-------+--------+-----+------+------+ 997 | RTP1 | L-ID=1 |NF=1 |F1-L1 |F2-L1 | 998 +-------+--------+-----+------+------+------+------+ 999 | L-ID=7 |NF=1 |F1-L2 |F2-L2 |F2-L2 |F2-L3 | 1000 +--------+-----+------+------+------+------+ 1002 A.1.2. Layers in separate RTP streams 1004 In this case the data for each layer is transmitted in its own 1005 payload. 1007 In the first example each transport block including a single EDU is 1008 carried in its own RTP payload. 1010 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1011 | RTP1a | L-ID=1 |NF=0 |F1-L1| | RTP1b | L-ID=6 |NF=0 |F1-L2| 1012 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1014 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1015 | RTP1c |L-ID=10 |NF=0 |F1-L3| | RTP2a | L-ID=1 |NF=0 |F2-L1| 1016 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1018 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1019 | RTP2b | L-ID=6 |NF=0 |F2-L2| | RTP2c |L-ID=10 |NF=0 |F2-L3| 1020 +-------+--------+-----+-----+ +-------+--------+-----+-----+ 1022 If the payloads carry data from two consecutive input frames, the 1023 same encoded data as in the previous example is arranged as follows. 1025 +-------+--------+-----+-----+-----+ 1026 | RTP1a | L-ID=1 |NF=1 |F1-L1|F2-L1| 1027 +-------+--------+-----+-----+-----+ 1029 +-------+--------+-----+-----+-----+ 1030 | RTP1b | L-ID=6 |NF=1 |F1-L2|F2-L2| 1031 +-------+--------+-----+-----+-----+ 1033 +-------+--------+-----+-----+-----+ 1034 | RTP1c |L-ID=10 |NF=1 |F1-L3|F2-L3| 1035 +-------+--------+-----+-----+-----+ 1037 A.2. Advanced examples 1039 A.2.1. Different update rate for subset of layers 1041 An example employing different update rates (i.e. different number of 1042 frames per packet) for selected subsets of layers. In these examples 1043 all core codec layers L1-L5 are shown. 1045 +-------+--------+-----+-----+-----+-----+-----+ 1046 | RTP1 | L-ID=1 |NF=3 |F1-L1|F2-L1|F3-L1|F4-L1| 1047 +-------+--------+-----+-----+-----+-----+-----+ 1049 +-------+--------+-----+-----+-----+-----+-----+ 1050 | RTP2a | L-ID=7 |NF=1 |F1-L2|F2-L2|F1-L3|F2-L3| 1051 +-------+--------+-----+-----+-----+-----+-----+ 1053 +-------+--------+-----+-----+-----+ 1054 | RTP3a |L-ID=14 |NF=0 |F1-L4|F1-L5| 1055 +-------+--------+-----+-----+-----+ 1057 +-------+--------+-----+-----+-----+ 1058 | RTP3b |L-ID=14 |NF=0 |F2-L4|F2-L5| 1059 +-------+--------+-----+-----+-----+ 1061 +-------+--------+-----+-----+-----+-----+-----+ 1062 | RTP2b | L-ID=7 |NF=1 |F3-L2|F4-L2|F3-L3|F4-L3| 1063 +-------+--------+-----+-----+-----+-----+-----+ 1065 +-------+--------+-----+-----+-----+ 1066 | RTP3c |L-ID=14 |NF=0 |F3-L4|F3-L5| 1067 +-------+--------+-----+-----+-----+ 1069 +-------+--------+-----+-----+-----+ 1070 | RTP3d |L-ID=14 |NF=0 |F4-L4|F4-L5| 1071 +-------+--------+-----+-----+-----+ 1073 A.2.2. Redundant frames with limited set of layers 1075 An example transmitting layers L1-L3 as primary data and L1 (of the 1076 previous frame) as redundant data is shown below. Each payload 1077 carries one primary (i.e. new) frame in one transport block and one 1078 redundant frame, which in this example is the frame preceding the 1079 primary frame, in another transport block. 1081 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1082 | RTP1 | L-ID=1 |NF=0 |F0-L1| L-ID=3 |NF=0 |F1-L1|F1-L2|F1-L3| 1083 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1085 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1086 | RTP2 | L-ID=1 |NF=0 |F1-L1| L-ID=3 |NF=0 |F2-L1|F2-L2|F2-L3| 1087 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1089 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1090 | RTP3 | L-ID=1 |NF=0 |F2-L1| L-ID=3 |NF=0 |F3-L1|F3-L2|F3-L3| 1091 +-------+--------+-----+-----+--------+-----+-----+-----+-----+ 1093 Alternatively, the payload carrying also redundant data for a subset 1094 of layers can be arranged differently, as shown in the example below. 1096 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1097 | RTP1 | L-ID=3 |NF=0 |F0-L1|F0-L2|F0-L3| L-ID=1 |NF=0 |F1-L1| 1098 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1100 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1101 | RTP2 | L-ID=3 |NF=0 |F1-L1|F1-L2|F1-L3| L-ID=1 |NF=0 |F2-L1| 1102 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1104 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1105 | RTP3 | L-ID=3 |NF=0 |F2-L1|F2-L2|F2-L3| L-ID=1 |NF=0 |F3-L1| 1106 +-------+--------+-----+-----+-----+-----+--------+-----+-----+ 1108 Now the first transport block carries the primary data and the second 1109 transport block carries the redundant data, which in this case covers 1110 the frame following the primary frame. The benefit of this approach 1111 is that the redundant data is included in the last (secondary) 1112 transport block of the payload, which might be beneficial for 1113 possible payload scaling operation within the network. 1115 8. References 1117 8.1. Normative References 1119 [AMR-WB] 3GPP TS 26.171, "Adaptive Multi-Rate Wideband (AMR-WB) 1120 speech codec; General description (Release 7)", v7.0.0, 1121 September 2006. 1123 [G.718] ITU-T Recommendation G.718, "Frame Error Robust Narrowband 1124 and Wideband Embedded Variable Bit-Rate Coding of Speech 1125 and Audio from 8-32 Kbit/s", (consented) May 2008. 1127 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1128 Requirement Levels", BCP 14, RFC 2119, March 1997. 1130 [RFC3264] Rosenberg, J., Schulzrinne, H., "An Offer/Answer Model with 1131 Session Description Protocol (SDP)", RFC 3264, June 2002. 1133 [RFC3550]Schulzrinne, H., Casner, S., Frederick, R. and Jacobson, V., 1134 "RTP: A Transport Protocol for Real-Time Applications", STD 1135 64, RFC 3550, July 2003. 1137 [RFC3551] Schulzrinne, H., Casner, S., "RTP Profile for Audio and 1138 Video Conferences with Minimal Control", STD 65, RFC 3551, 1139 July 2003. 1141 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., Norrman, 1142 K., "The Secure Real-Time Transport Protocol (SRTP)", RFC 1143 3711, March 2004. 1145 [RFC4288] Freed, N., Klensin, J., "Media Type Specifications and 1146 Registration Procedures", BCP 13, RFC 4288, December 2005. 1148 [RFC4566] Handley, M., Jacobson, V. and Perkins, C., "SDP: Session 1149 Description Protocol", RFC 4566, July 2006. 1151 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., Rey, J., 1152 "Extended RTP Profile for Real-Time Transport Control 1153 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 1154 2006. 1156 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 1157 Formats", RFC 4855, February 2007. 1159 [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., Xie, Q., "RTP 1160 Payload Format and File Storage Format fort he Adaptive 1161 Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) 1162 Audio Codecs", RFC 4867, April 2007. 1164 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., Burman, B., "Codec 1165 Control Messages in the RTP Audio-Visual Profile with 1166 Feedback (AVPF)", RFC 5104, Feburary 2008. 1168 [smd-sdp] Schierl, T., Wenger, S., "Signaling media decoding 1169 dependency in Session Description Protocol (SDP)", draft- 1170 schierl-mmusic-layered-codec-04 (work in progress), June 1171 2007. 1173 8.2. Informative References 1175 [McCanne] McCanne, S., Jacobson, V., and Vetterli, M., "Receiver- 1176 driven layered multicast", in Proc. of ACM SIGCOMM'96, 1177 pages 117--130, Stanford, CA, August 1996. 1179 [RFC2326] Schulzrinne, H., Rao, A., Lanphier, R., "Real Time 1180 Streaming Protocol (RTSP)", RFC 2326, April 1998. 1182 [RFC2974] Handley, M., Perkins, C., Whelan, E., "Session Announcement 1183 Protocol", RFC 2974, October 2000. 1185 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., 1186 Fairhurst, G., "The Lightweight User Datagram Protocol 1187 (UDP-Lite)", RFC 3828, July 2004. 1189 [RFC4340] Kohler, E., Handley, M., Floyd, S., "Data Congestion 1190 Control Protocol (DCCP)", RFC 4340, March 2006. 1192 [RFC5117] Westerlund, M., Wenger, S., "RTP Topologies", RFC 5117, 1193 January 2008. 1195 Author's Addresses 1197 Ari Lakaniemi 1198 Nokia 1199 P.O.Box 407 1200 FIN-00045 Nokia Group, FINLAND 1202 Phone: +358-71-8008000 1203 Email: ari.lakaniemi@nokia.com 1205 Ye-Kui Wang 1206 Huawei Technologies 1207 400 Somerset Corp Blvd, Suite 602 1208 Bridgewater, NJ 08807, USA 1210 Phone: +1-908-541-3518 1211 EMail: yekuiwang@huawei.com 1213 Acknowledgment 1215 Funding for the RFC Editor function is currently provided by the 1216 Internet Society. 1218 9. Open Issues 1220 1) Support of super-wideband (SWB) audio and stereophonic encoding 1221 extensions to ITU-T G.718 currently being worked on by ITU-T is to 1222 be specified after ITU-T completes the work in that regards. 1224 a. Some further study is needed to see if separate parameters 1225 for sending and receiving capabilities/preferences are needed 1226 -- especially for upcoming stereo and SWB options. 1228 b. The support for upcoming SWB and stereo options needs to be 1229 taken into account. Basically we can either 1) extend the 1230 parameter "layers" to cover also this aspect, or 2) define 1231 separate parameter(s) for these new options when more details 1232 on the stereo/SWB support are available. 1234 2) For streaming or other applications that allow for relatively long 1235 end-to-end delay, sometimes it would be beneficial to aggregate 1236 more than 4 frames in one Transport Block (TB). Should the length 1237 of the NF field be larger? 1239 3) On layer structure and configuration signalling. Currently, a 1240 unique layer ID is assigned for any possible layer combinations. 1242 See the editing notes below Table 3 for other possible approaches. 1243 One of the alternative ways may be chosen in the final draft. 1245 4) Currently, it is mandated that lower layer EDUs of later frames go 1246 before higher layer EDUs of earlier frames in a transport block. 1247 This way is friendlier to adaptation (dropping of higher layers). 1248 However, if all layers are received, then the depacketizer needs 1249 to reorder the EDUs to their decoding order before feeding them to 1250 the decoder. Therefore, the other way around (i.e. lower layer 1251 EDUs of later frames go after higher layer EDUs of earlier frames, 1252 or EDUs in transport blocks are placed in decoding order) is more 1253 friendly to the depacketizer. Another benefit of the latter is 1254 that it does not introduce any end-to-end delay. Which way to be 1255 specified (or both allowed if needed) is FFS. 1257 5) MANEs dropping RTP packets are RTP translators. But are those 1258 MANEs dropping a subset of the transport blocks in one packet also 1259 RTP translators? 1261 6) The RTCP based cross-session synchronization is not possible until 1262 the first RTCP SRs are received in all sessions. This implies that 1263 decoding only a subset of layers may be possible until RTCP SRs in 1264 all sessions have been received. This may imposes higher end-to- 1265 end delay or higher bandwidth for RTCP data, and the approach may 1266 not work perfectly for some multicast topologies. There is a study 1267 ongoing by some AVT members. Once there is an acceptable solution 1268 fouthe draft documenting that solution may be referenced in this 1269 draft. 1271 7) It might be better to change the semantics of the media type 1272 parameter 'layers' to be similar as that for L-ID. 1274 8) Offer/answer with answer being capable of modifying the layer 1275 configuration is FFS. 1277 9) Some references need to be updated in the final draft. 1279 10. Changes Log 1281 From draft-ietf-avt-rtp-g718-00 to draft-ietf-avt-rtp-g718-01 1283 - Updated the boiler template. 1285 - Changed Ye-Kui Wang's affiliation and address.