Network Working Group Johan Sjoberg INTERNET-DRAFT Magnus Westerlund Category: Standards Track Ericsson Expires: August 2004 Ari Lakaniemi Nokia February 13, 2004 Real-Time Transport Protocol (RTP) Payload Format for Adaptive Multi- Rate Wideband plus (AMR-WB+) Audio Codec Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is an individual submission to the IETF. Comments should be directed to the authors. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document specifies a real-time transport protocol (RTP) payload format to be used for Adaptive Multi-Rate Wideband plus (AMR-WB+) encoded audio signals. The AMR-WB+ codec is an audio extension of the AMR-WB codec providing additional modes designed to give higher quality of music and speech than the original modes. The payload format is designed according to the principles outlined in the existing payload formats for AMR and AMR-WB, RFC3267. A MIME type registration is included for AMR-WB+. Sjoberg, et. al. [Page 1] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 TABLE OF CONTENTS 1. Definitions.........................................................3 1.1. Glossary.......................................................3 1.2. Terminology....................................................3 2. Introduction........................................................3 3. Background on AMR-WB+ and Design Principles.........................4 3.1. The AMR-WB+ Audio Codec........................................5 3.2. Multi-rate Encoding and Mode Adaptation........................6 3.3. Voice Activity Detection and Discontinuous Transmission........6 3.4. Support for Multi-Channel Session..............................6 3.5. Unequal Bit-error Detection and Protection.....................7 3.5.1. Applying UEP and UED in an IP Network.....................7 3.6. Robustness against Packet Loss.................................8 3.6.1. Use of Forward Error Correction (FEC).....................8 3.6.2. Use of Frame Interleaving................................10 3.7. AMR-WB+ Audio over IP scenarios...............................10 4. RTP Payload Format for AMR-WB+.....................................11 4.1. RTP Header Usage..............................................11 4.2. Payload Structure.............................................12 4.3. Payload definitions...........................................13 4.3.1. The Payload Header.......................................13 4.3.2. The Payload Table of Contents and Frame CRCs.............14 4.3.3. Audio Data...............................................18 4.3.4. Methods for Forming the Payload..........................18 4.3.5. Payload Examples.........................................19 4.4. Implementation Considerations.................................21 5. Congestion Control.................................................21 6. Security Considerations............................................21 6.1. Confidentiality...............................................22 6.2. Authentication................................................22 6.3. Decoding Validation...........................................23 7. Payload Format Parameters..........................................23 7.1. MIME Registration.............................................23 7.2. Mapping MIME Parameters into SDP..............................25 7.2.1. Offer-Answer Model Considerations........................25 7.2.2. Examples.................................................26 8. IANA Considerations................................................26 9. Acknowledgements...................................................26 10. References........................................................27 10.1. Normative references.........................................27 10.2. Informative References.......................................27 11. Authors' Addresses................................................28 12. IPR Notice........................................................29 13. Copyright Notice..................................................30 Sjoberg, et. al. Standards Track [Page 2] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 1. Definitions 1.1. Glossary 3GPP - the Third Generation Partnership Project AMR - Adaptive Multi-Rate Codec AMR-WB - Adaptive Multi-Rate Wideband Codec AMR-WB+ - Adaptive Multi-Rate Wideband plus Codec CMR - Codec Mode Request CN - Comfort Noise DTX - Discontinuous Transmission FEC - Forward Error Correction SCR - Source Controlled Rate Operation SID - Silence Indicator (the frames containing only CN parameters) VAD - Voice Activity Detection UED - Unequal Error Detection UEP - Unequal Error Protection 1.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [3]. 2. Introduction This document specifies the payload format for packetization of AMR- WB+ encoded audio signals into the Real-time Transport Protocol (RTP) [4]. The payload format supports transmission of multiple channels according to the mode definition (modes are mono or stereo modes), multiple frames per payload, and robustness against packet loss and bit errors. Background on AMR-WB+ and design principles can be found in Section 3. The payload format itself is specified in Section 4 and follows the principles used in [4], [8], and [9]. In Section 7, a MIME type registration is provided. The intention with this RTP payload format definition is to follow closely to the payload format definitions of AMR and AMR-WB [9]. However, AMR-WB+ has a couple of features not available in AMR or AMR-WB. The new features are; all modes do not have the same sampling rate, and modes are either mono or stereo modes. On the other hand AMR-WB+ is intended to use IP transport and this removes the need for interworking with other transport networks. The bandwidth efficient mode defined in [9] is not specified for AMR- WB+. AMR-WB+ will mainly be used in streaming scenarios and there Sjoberg, et. al. Standards Track [Page 3] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 the benefit of using an octet-aligned format to decrease the complexity of the server is large. The saved bandwidth using bandwidth efficient mode would also be very small for all extension modes. The inbuilt codec support for stereo encoding makes the implementation of multi-channel support difficult, but also less needed. Therefore the multi-channel support is removed from this payload format compared to AMR and AMR-WB payload format. There is no file format for AMR-WB+ defined within this specification. Instead the 3GPP defined ISO based 3GP file format [18] will support AMR-WB+, and provides all functionality need from a file format. This format does also support storage of AMR and AMR- WB, plus other multi-media formats allowing for synchronized playback. As the 3GP format provides much greater capability than the previously defined formats for AMR and AMR-WB, this format is expected to be used and be sufficient for all use cases. 3. Background on AMR-WB+ and Design Principles The Adaptive Multi-Rate plus (AMR-WB+) audio codec is designed for encoding and transport of speech and low bit-rate audio with good quality. The codec is being specified by 3GPP, and primary target applications within 3GPP are packet switched streaming (PSS) [17] and multimedia messaging (MMS) services. However, due to its flexibility and robustness, AMR-WB+ is very well suited for streaming services in highly varying transport environments, e.g. the Internet. Because of the flexibility of this codec, the behavior in a particular application is controlled by several parameters that select options or specify the acceptable values for a variable. These options and variables are described in general terms at appropriate points in the text of this specification as parameters to be established through out-of-band means. In Section 7, all of the parameters are specified in the form of MIME subtype registrations for the AMR-WB+ encoding. The method used to signal these parameters at session setup or to arrange prior agreement of the participants is beyond the scope of this document; however, Section 7 provides a mapping of the parameters into the Session Description Protocol (SDP) [7] for those applications that use SDP. Note that the AMR-WB+ design and specification work in 3GPP is still work in progress. Target is to finalize the codec specifications within 3GPP Release 6 timeline, the release will be frozen earliest in June 2004. However, due to non-finished status of the codec work some of the issues discussed in this internet-draft are still subject to change, but the draft presents the situation according to authorsÆ best knowledge at the time of writing. Sjoberg, et. al. Standards Track [Page 4] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 3.1. The AMR-WB+ Audio Codec The AMR-WB+ audio codec was originally developed by 3GPP to be used for streaming and messaging services in GSM and 3G cellular systems. AMR-WB+ is designed as an audio extension to the AMR-WB speech codec. Thus, it includes the nine coding modes specified for AMR-WB, extended with four new modes with bit rates ranging from 14 to 24 kbit/s. Whereas the AMR-WB modes employ 16000 Hz sampling frequency and operates on monophonic signal in all modes, the extension modes operate at sampling rates 16000, 24000 or 32000 Hz, and the input signal can be either monophonic or stereophonic audio, depending on the mode. The audio processing is performed on equal sizeframes, the transport frames correspond to 20 ms duration. This means that each AMR-WB+ transport frame represents 320, 480 or 640 audio samples for each channel, depending on the employed sampling frequency. The AMR-WB+ codec includes four extension modes in addition to the AMR-WB modes, as introduced in Table 1 below. However, since the codec design work is still going on, the final specification may include different set of modes. Sampling Mono/ Number of Number of Index Mode rate [kHz] stereo bits per frame class A bits -------------------------------------------------------------------- 0 WB 6.60 kbps 16 mono 132 54 1 WB 8.80 kbps 16 mono 177 64 2 WB 12.65 kbps 16 mono 253 72 3 WB 14.25 kbps 16 mono 285 72 4 WB 15.85 kbps 16 mono 317 72 5 WB 18.25 kbps 16 mono 365 72 6 WB 19.85 kbps 16 mono 397 72 7 WB 23.05 kbps 16 mono 461 72 8 WB 23.85 kbps 16 mono 477 72 9 WB SID 16 mono 40 40 10 WB+ 14 kbps 16 mono 280 ?? 11 WB+ 18 kbps 16/24 stereo 360 ?? 12 WB+ 24 kbps 16/24 mono 480 ?? 13 WB+ 24 kbps 16/24 stereo 480 ?? 14 LOST_SPEECH - - 0 15 NO_DATA - - 0 Table 1: AMR-WB+ modes. NOTE! THIS TABLE WILL BE REPLACED BY A REFERENCE TO THE APPROPRIATE 3GPP SPECIFICATION AS SOON AS IT IS AVAIBLE. Note that modes with index in the range 0 û 9 are the same as defined for AMR-WB in [9], and modes with index in range 10 û 13 are the extension modes. Sjoberg, et. al. Standards Track [Page 5] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 3.2. Multi-rate Encoding and Mode Adaptation The multi-rate encoding (i.e., multi-mode) capability of AMR-WB+ is designed for preserving high audio quality under a wide range of bandwidth requirements and transmission conditions. AMR-WB+ enables seamless switching between modes using the same number of audio channels and the same sampling frequency. Every AMR- WB+ codec implementation is required to support all the respective audio coding modes defined by the codec and must be able to handle mode switching between any two modes. Switching between modes employing different number of audio channel or different sampling frequency is possible, but it requires the receiver to be equipped with necessary processing capabilities to take care of the changed characteristics of the incoming audio stream, and therefore it is not recommended because it is likely to cause severe audio quality problems if not taken care properly. 3.3. Voice Activity Detection and Discontinuous Transmission AMR-WB+ supports the same algorithms for voice activity detection (VAD) and generation of comfort noise (CN) parameters during silence periods as used by the AMR-WB codec. Hence, also the AMR-WB+ codec has the option to reduce the number of transmitted bits and packets during silence periods to a minimum. The operation of sending CN parameters at regular intervals during silence periods is usually called discontinuous transmission (DTX) or source controlled rate (SCR) operation. The AMR-WB+ frames containing CN parameters are called Silence Indicator (SID) frames. See more details about VAD and DTX functionality in [5] and [6]. 3.4. Support for Multi-Channel Session Some of the AMR-WB+ modes support encoding of stereophonic audio. Because of this native support for two-channel stereophonic signal it does not seem necessary to support multi-channel transport with separate codecs as done in AMR-WB RTP payload [9]. However for making the signalling of channels explicit, a sender of AMR-WB+ must use separate RTP payload types for mono and stereo modes. A reason for having the number of channels present at RTP level is that the codec external requirements are different, i.e. the playback facilities of a receiver need to handle stereo or mono signals. This will not make switching between mono and stereo any more different as payload type switching can be done without problems since the same RTP timestamp rate is used in both cases. Sjoberg, et. al. Standards Track [Page 6] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 3.5. Unequal Bit-error Detection and Protection The audio bits encoded in each AMR-WB+ frame have different perceptual sensitivity to bit errors. This property can be exploited e.g. in cellular systems to achieve better voice quality by using unequal error protection and detection (UEP and UED) mechanisms. The UEP/UED mechanisms focus the protection and detection of corrupted bits to the perceptually most sensitive bits in an AMR-WB+ frame. In particular, audio bits in an AMR-WB+ frame are divided into classes A and B, where bits in class A are most sensitive, while class B bits can tolerate some errors with only minor degradations in the speech quality. [NOTE: reference to appropriate 3GPP specification will be added as soon as it is available] A frame is only declared damaged if there are bit errors found in the most sensitive bits, i.e., the class A bits. On the other hand, it is acceptable to have some bit errors in the other bits, i.e. class B bits. Moreover, a damaged frame is still useful for error concealment at the decoder since some of the less sensitive bits can still be used. This approach can improve the audio quality compared to discarding the damaged frame. 3.5.1. Applying UEP and UED in an IP Network To take full advantage of the bit-error robustness of the AMR-WB+ codec, the RTP payload format is designed to facilitate UEP/UED in an IP network. It should be noted however that the utilization of UEP and UED discussed below is OPTIONAL. UEP/UED in an IP network can be achieved by detecting bit errors in class A bits and tolerating bit errors in class B bits of the AMR-WB+ frame(s) in each RTP payload. Today there exist some link layers that do not discard packets with bit errors, e.g., SLIP and some wireless links. With the Internet traffic pattern shifting towards a more multimedia-centric one, more link layers of such nature may emerge in the future. With transport layer support for partial checksums, for example those supported by UDP-Lite [10], bit error tolerant AMR-WB+ traffic could achieve better performance over these types of links. There are at least two basic approaches for carrying AMR-WB+ traffic over bit error tolerant IP networks: 1) Utilizing a partial checksum to cover headers and the most important audio bits of the payload. At least all class A bits should be covered by the checksum, since the bits of the extension modes are not sorted in sensitivity order but just classified in class A and B bits. Sjoberg, et. al. Standards Track [Page 7] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 2) Utilizing a partial checksum to only cover headers, but a frame CRC to cover the class A bits of each audio frame in the RTP payload. In either approach, at least part of the class B bits are left without error-check and thus bit error tolerance is achieved. The application interface to the UEP/UED transport protocol (e.g., UDP-Lite) may not provide any control over the link error rate. Therefore, it is incumbent upon the designer of a node with a link interface of this type to choose a residual bit error rate that is low enough to support applications such as AMR-WB+ encoding when transmitting packets of a UEP/UED transport protocol. Approach 1 is a bit efficient, flexible and simple way, but comes with two disadvantages, namely, a) bit errors in protected audio bits will cause the payload to be discarded, and b) when transporting multiple frames in a payload there is the possibility that a single bit error in protected bits will cause all the frames to be discarded. These disadvantages can be avoided, if needed, with some overhead in the form of a frame-wise CRC (Approach 2). In problem a), the CRC makes it possible to detect bit errors in class A bits and use the frame for error concealment, which gives a small improvement in audio quality. For b), when transporting multiple frames in a payload, the CRCs remove the possibility that a single bit error in a class A bit will cause all the frames to be discarded. Avoiding that gives an improvement in audio quality when transporting multiple frames over links subject to bit errors. The choice between the above two approaches must be made based on the available bandwidth, and desired tolerance to bit errors. Neither solution is appropriate to all cases. Section 7 defines parameters that may be used at session setup to select between these approaches. 3.6. Robustness against Packet Loss The payload format supports several means, including forward error correction (FEC) and frame interleaving, to increase robustness against packet loss. 3.6.1. Use of Forward Error Correction (FEC) The simple scheme of repetition of previously sent data is one way of achieving FEC. Another possible scheme which can be more bandwidth efficient is to use payload external FEC, e.g., RFC2733 [14], which generates extra packets containing repair data. The whole payload can also be sorted in sensitivity order to support external FEC schemes Sjoberg, et. al. Standards Track [Page 8] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 using UEP. There is also a work in progress on a generic version of such a scheme [12] that can be applied to AMR-WB+ payload transport. For the AMR-WB+ extension modes, it is only possible to use the codec to send redundant copies of the same mode. We describe such a scheme next. This involves the simple retransmission of previously transmitted frames together with the current frame(s). This is done by using a sliding window to group the audio frames to send in each payload. Figure 1 below shows us an example. --+--------+--------+--------+--------+--------+--------+--------+-- | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) | --+--------+--------+--------+--------+--------+--------+--------+-- <---- p(n-1) ----> <----- p(n) -----> <---- p(n+1) ----> <---- p(n+2) ----> <---- p(n+3) ----> <---- p(n+4) ----> Figure 1: An example of redundant transmission. In this example each frame is retransmitted one time in the following RTP payload packet. Here, f(n-2)..f(n+4) denotes a sequence of audio frames and p(n-1)..p(n+4) a sequence of payload packets. The use of this approach does not require signaling at the session setup. In other words, the audio sender can choose to use this scheme without consulting the receiver. This is because a packet containing redundant frames will not look different from a packet with only new frames. The receiver may receive multiple copies or versions (encoded with different modes) of a frame for a certain timestamp if no packet is lost. If multiple versions of the same audio frame are received, it is recommended that the mode with the highest rate be used by the audio decoder. This redundancy scheme provides the same functionality as the one described in RFC 2198 "RTP Payload for Redundant Audio Data" [15]. In most cases the mechanism in this payload format is more efficient and simpler than requiring both endpoints to support RFC 2198 in addition. There are two situations in which use of RFC 2198 is indicated: if the spread in time required between the primary and redundant encodings is larger than 5 frame times, the bandwidth overhead of RFC 2198 will be lower; or, if some other codec than AMR- WB+ is desired for the redundant encoding, the AMR-WB+ payload format won't be able to carry it. Sjoberg, et. al. Standards Track [Page 9] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 The sender is responsible for selecting an appropriate amount of redundancy based on feedback about the channel, e.g., in RTCP receiver reports. The sender is also responsible for avoiding congestion, which may be exacerbated by redundancy (see Section 5 for more details). 3.6.2. Use of Frame Interleaving To decrease protocol overhead, the payload design allows several audio frames be encapsulated into a single RTP packet. One of the drawbacks of such an approach is that in case of packet loss this means loss of several consecutive audio frames, which usually causes clearly audible distortion in the reconstructed audio. Interleaving of frames can improve the audio quality in such cases by distributing the consecutive losses into a series of single frame losses. However, interleaving and bundling several frames per payload will also increase end-to-end delay and is therefore not appropriate for all usage scenarios. Anyway, streaming applications will most likely be able to exploit interleaving to improve audio quality in lossy transmission conditions. This payload design supports the use of frame interleaving as an option. For the encoder (audio sender) to use frame interleaving in its outbound RTP packets for a given session, the decoder (audio receiver) needs to indicate its support via out-of-band means (see Section 7). 3.7. AMR-WB+ Audio over IP scenarios Since the primary target for the AMR-WB+ codec is packet switched streaming, the most relevant usage scenario for this payload format is IP end-to-end between between a server and a terminal, as shown in Figure 2. +----------+ +----------+ | | IP/UDP/RTP/AMR-WB+ | | | SERVER |<------------------------>| TERMINAL | | | | | +----------+ +----------+ Figure 2: Server to terminal IP scenario Sjoberg, et. al. Standards Track [Page 10] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 4. RTP Payload Format for AMR-WB+ The AMR-WB+ payload format has an identical structure with the AMR and AMR-WB payload formats [9]. The differences are that the number of modes is extended compared to the original AMR-WB format and that some features are removed. The motivation for the reduced functionality is that only IP transport expected for AMR-WB+, i.e. functionality used for gateway scenarios is removed. The payload format consists of the RTP header, payload header and payload data. Since the AMR-WB speech modes are included in the AMR-WB+ codec, an end-point supporting AMR-WB+ is in principle also able to support AMR-WB payload format and MIME subtype. To enable communication with an end-point supporting only AMR-WB coding an AMR-WB+ SHOULD also indicate its capability to communicate using AMR-WB MIME subtype and RTP payload format to facilitate interoperability. However, it should be noted that this is not possible in all scenarios: e.g. when AMR- WB+ RTP payload format is used for streaming audio that is stored at a server it is not possible to transform data stored using one of the AMR-WB+ extension modes into one of the AMR-WB modes without full transcoding. A similar scenario occurs with messaging services where the message containing AMR-WB+ audio is pre-stored at a messaging server. On the other hand, e.g. in live streaming scenario an AMR-WB+ end-point might have the possibility to limit its operation to AMR-WB modes only. 4.1. RTP Header Usage The format of the RTP header is specified in [4]. This payload format uses the fields of the header in a manner consistent with that specification. The RTP timestamp corresponds to the sampling instant of the first sample encoded for the first frame in the packet. The timestamp clock frequency SHALL be 96000 Hz, the lowest frequency that is an integer multiple of the sampling frequencies used by any of the AMR- WB+ modes. The duration of one AMR-WB+ audio transport frame is 20 ms. The sampling frequency is either 16 kHz, 24 kHz, or 32 kHz, corresponding to 320, 480, 640 encoded audio samples per frame from each channel, corresponding to a timestamp increase of 6x320, 4x480, or 3x640 all equal to 1920 timestamp units per frame. A packet MAY contain multiple frames of encoded audio or comfort noise parameters. If interleaving is employed, the frames encapsulated into a payload are picked according to the interleaving rules as defined in Section 4.3.1. Otherwise, each packet covers a period of one or more contiguous 20 ms frames. To allow for error resiliency through redundant transmission, the periods covered by multiple packets MAY overlap in time. A receiver Sjoberg, et. al. Standards Track [Page 11] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 MUST be prepared to receive any audio frame multiple times, all multiply sent frames MUST use the same mode. The payload is always made an integral number of octets long by padding with zero bits if necessary. If additional padding is required to bring the payload length to a larger multiple of octets or for some other purpose, then the P bit in the RTP in the header MAY be set and padding appended as specified in [4]. The RTP header marker bit (M) SHALL be set to 1 if the first frame carried in the packet contains an audio frame, which is the first in a talkspurt. For all other packets the marker bit SHALL be set to zero (M=0). The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile under which this payload format is being used will assign a payload type for this encoding or specify that the payload type is to be bound dynamically. An RTP payload type MUST only carry either mono or stereo encoded AMR frames. If both mono and stereo is to be sent by an application two different payload types must be used. Switching between mono and stereo modes MAY be done if the right extra processing is available (see section 3.2) in the receiver, through switching of the payload types. 4.2. Payload Structure The complete payload consists of a payload header, a payload table of contents, and audio data representing one or more audio frames. The following diagram shows the general payload format layout: +----------------+-------------------+---------------- | payload header | table of contents | audio data .. . +----------------+-------------------+---------------- Payloads containing more than one audio frame are called compound payloads. The following sections describe the variations taken by the payload format depending on whether the AMR-WB+ session is set up to use any of the OPTIONAL functions for robust sorting, interleaving, and frame CRCs. Sjoberg, et. al. Standards Track [Page 12] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 4.3. Payload definitions 4.3.1. The Payload Header The payload header consists of a 4 bit CMR, 4 reserved bits, and optionally, an 8 bit interleaving header, as shown below: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+- - - - - - - - | CMR |R|R|R|R| ILL | ILP | +-+-+-+-+-+-+-+-+- - - - - - - - CMR (4 bits): Is used by the AMR and AMR-WB formats to indicate a codec mode request sent to the audio encoder at the site of the receiver of this payload. The value of the CMR field is set to the frame type index of the corresponding audio mode being requested. AMR-WB+ is not intended for conversational use and no gateway scenarios are identified. Hence, this field is not needed for AMR- WB+. The CMR field is kept for conformity with AMR and AMR-WB formats, but MUST be set to the value 15, indicating that no mode request is present. R: is a reserved bit that MUST be set to zero. All R bits MUST be ignored by the receiver. ILL (4 bits, unsigned integer): This is an OPTIONAL field that is present only if interleaving is signaled out-of-band for the session. ILL=L indicates to the receiver that the interleaving length is L+1, in number of frames. ILP (4 bits, unsigned integer): This is an OPTIONAL field that is present only if interleaving is signaled. ILP MUST take a value between 0 and ILL, inclusive, indicating the interleaving index for frames in this payload in the interleave group. If the value of ILP is found greater than ILL, the payload SHOULD be discarded. ILL and ILP fields MUST be present in each packet in a session if interleaving is signaled for the session. Interleaving MUST be performed on a frame basis. The following example illustrates the arrangement of audio frames in an interleave group during an interleave session. Here we assume ILL=L for the interleave group that starts at audio frame n. We also assume that the first payload packet of the interleave group is s and the number of audio frames carried in each payload is N. Then we will have: Sjoberg, et. al. Standards Track [Page 13] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 Payload s (the first packet of this interleave group): ILL=L, ILP=0, Carry frames: n, n+(L+1), n+2*(L+1), ..., n+(N-1)*(L+1) Payload s+1 (the second packet of this interleave group): ILL=L, ILP=1, frames: n+1, n+1+(L+1), n+1+2*(L+1), ..., n+1+(N-1)*(L+1) ... Payload s+L (the last packet of this interleave group): ILL=L, ILP=L, frames: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+(N-1)*(L+1) The next interleave group will start at frame n+N*(L+1). There will be no interleaving effect unless the number of frames per packet (N) is at least 2. Moreover, the number of frames per payload (N) and the value of ILL MUST NOT be changed inside an interleave group. In other words, all payloads in an interleave group MUST have the same ILL and MUST contain the same number of audio frames. The sender of the payload MUST only apply interleaving if the receiver has signaled its use through out-of-band means. Since interleaving will increase buffering requirements at the receiver, the receiver uses MIME parameter "interleaving=I" to set the maximum number of frames allowed in an interleaving group to I. When performing interleaving the sender MUST use a proper number of frames per payload (N) and ILL so that the resulting size of an interleave group is less or equal to I, i.e., N*(L+1)<=I. 4.3.2. The Payload Table of Contents and Frame CRCs The table of contents (ToC) consists of a list of ToC entries where each entry corresponds to an audio frame carried in the payload and, optionally, a list of audio frame CRCs, i.e., +---------------------+ | list of ToC entries | +---------------------+ | list of frame CRCs | (optional) - - - - - - - - - - - Note, for ToC entries with FT=14 or 15, there will be no corresponding audio frame or frame CRC present in the payload. When multiple frames are present in a packet, the ToC entries will be placed in the packet in order of their creation time, with the following exception; when interleaving is used the frames in the ToC will almost never be placed consecutive in time. Sjoberg, et. al. Standards Track [Page 14] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 A ToC entry takes the following format: 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |F| FT |Q|P|P| +-+-+-+-+-+-+-+-+ F (1 bit): If set to 1, indicates that this frame is followed by another audio frame in this payload; if set to 0, indicates that this frame is the last frame in this payload. FT (4 bits): Frame type index, indicating the AMR-WB+ audio coding mode or comfort noise (SID) mode of the corresponding frame carried in this payload. The value of FT is defined in Table 1 Section 3.1, FT=14 (AUDIO_LOST), and FT=15 (NO_DATA) are used to indicate frames that are either lost or not being transmitted in this payload, respectively. NO_DATA (FT=15) frame could mean either that there is no data produced by the audio encoder for that frame or that no data for that frame is transmitted in the current payload (i.e., valid data for that frame could be sent in either an earlier or later packet). If receiving a ToC entry with a FT value not defined the whole packet SHOULD be discarded. This is to avoid the loss of data synchronization in the depacketization process, which can result in a huge degradation in audio quality. Note that packets containing only NO_DATA frames SHOULD NOT be transmitted. Also, frames containing only NO_DATA frames at the end of a packet SHOULD NOT be transmitted, except in the case of interleaving. The AMR-WB+ SCR/DTX is identical with AMR-WB SCR/DTX described in [6]. Q (1 bit): Frame quality indicator. If set to 0, indicates the corresponding frame is severely damaged and the receiver should set the RX_TYPE (see [6]) to either AUDIO_BAD or SID_BAD depending on the frame type (FT). The frame quality indicator enables damaged frames to be forwarded to the audio decoder for error concealment. This can improve the audio quality comparing to dropping the damaged frames. See Section 4.3.2.1 for more details. P bits: padding bits, MUST be set to zero. All padding bits MUST be ignored by the receiver. When multiple frames are present, their ToC entries will be placed in the ToC in order of their creation time. Sjoberg, et. al. Standards Track [Page 15] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 The following figure shows an example of a ToC of three entries. 0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| FT |Q|P|P|1| FT |Q|P|P|0| FT |Q|P|P| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The list of CRCs is OPTIONAL. It only exists if the use of CRC is signaled out-of-band for the session. When present, each CRC in the list is 8 bit long and corresponds to an audio frame carried in the payload. Calculation and use of the CRC is specified in Section 4.3.2.1. 4.3.2.1. Use of Frame CRC for UED over IP The general concept of UED/UEP over IP is discussed in Section 3.5. This section provides more details on how to use the frame CRC in the payload header together with a partial transport layer checksum to achieve UED. To achieve UED, one SHOULD use a transport layer checksum, for example, the one defined in UDP-Lite [10], to protect the RTP header, payload header, and table of contents bits in a payload. The frame CRC, when used, MUST be calculated only over all class A bits in the frame. Class B and possible C bits in the frame MUST NOT be included in the CRC calculation and SHOULD NOT be covered by the transport checksum. Note, the number of class A bits for various coding modes in AMR-WB+ codec is specified as normative in Table 1 in Section 3.1, and the SID frame (FT=9) has 40 class A bits. These definitions of class A bits MUST be used for this payload format. A packet SHOULD be discarded if the transport layer checksum detects errors. The receiver of the payload SHOULD examine the data integrity of the received class A bits by re-calculating the CRC over the received class A bits and comparing the result to the value found in the received payload header. If the two values mismatch, the receiver SHALL consider the class A bits in the receiver frame damaged and MUST clear the Q flag of the frame (i.e., set it to 0). This will subsequently cause the frame to be marked as AUDIO_BAD, if the FT of the frame is 0..8 or 10..13, or SID_BAD if the FT of the frame is 9 before it is passed to the audio decoder. See [6] more details. The following example shows an octet-aligned ToC with a CRC list for a payload containing 3 audio frames from a single channel session (assuming none of the FTs is equal to 14 or 15): Sjoberg, et. al. Standards Track [Page 16] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| FT#1 |Q|P|P|1| FT#2 |Q|P|P|0| FT#3 |Q|P|P| CRC#1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CRC#2 | CRC#3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Each of the CRC's takes 8 bits 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | c0| c1| c2| c3| c4| c5| c6| c7| +---+---+---+---+---+---+---+---+ (MSB) (LSB) and is calculated by the cyclic generator polynomial, C(x) = 1 + x^2 + x^3 + x^4 + x^8 where ^ is the exponentiation operator. In binary form the polynomial has the following form: 101110001 (MSB..LSB). The actual calculation of the CRC is made as follows: First, an 8- bit CRC register is reset to zero: 00000000. For each bit over which the CRC shall be calculated, an XOR operation is made between the rightmost (LSB) bit of the CRC register and the bit. The CRC register is then right shifted one step (each bits significance is reduced with one) inputting a "0" as the leftmost bit (MSB). If the result of the XOR operation mentioned above is a "1" then "10111000" is bit- wise XOR-ed into the CRC register. This operation is repeated for each bit that the CRC should cover. In this case, the first bit would be d(0) for the speech frame for which the CRC should cover. When the last bit (e.g., d(71) for AMR-WB 15.85 according to Table 1 in Section 3.1) have been used in this CRC calculation, the contents in CRC register should simply be copied to the corresponding field in the list of CRC's. Fast calculation of the CRC on a general-purpose CPU is possible using a table-driven algorithm. Sjoberg, et. al. Standards Track [Page 17] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 4.3.3. Audio Data Audio data of a payload contains one or more audio frames or comfort noise frames, as described in the ToC of the payload. Note, for ToC entries with FT=14 or 15, there will be no corresponding audio frame present in the audio data. Each audio frame represents 20 ms of audio encoded with the mode indicated in the FT field of the corresponding ToC entry. The length of the audio frame is implicitly defined by the mode indicated in the FT field. The order and numbering notation of the bits are as specified in [2]. As specified there, the bits of audio frames have been rearranged in order of decreasing sensitivity or for the extension modes in two sensitivity classes, while the bits of comfort noise frames are in the order produced by the encoder. The resulting bit sequence for a frame of length K bits is denoted d(0), d(1), ..., d(K-1). The last octet of each audio frame MUST be padded with zeroes at the end if not all bits in the octet are used. In other words, each audio frame MUST be octet-aligned. When multiple audio frames are present in the audio data (i.e., compound payload), the audio frames can be arranged either one whole frame after another as usual, or with the octets of all frames interleaved together at the octet level. Since the bits within each frame are ordered with the most error-sensitive bits first, interleaving the octets collects those sensitive bits from all frames to be nearer the beginning of the packet. This is called "robust sorting order" which allows the application of UED (such as UDP-Lite [10]) or UEP (such as ULP [12]) mechanisms to the payload data. The details of assembling the payload are given in the next section. The use of robust sorting order for a session MUST be agreed via out- of-band means. Section 7.1 specifies a MIME parameter for this purpose. 4.3.4. Methods for Forming the Payload Two different packetization methods, namely normal order and robust sorting order, exist for forming a payload. In both cases, the payload header and table of contents are packed into the payload the same way; the difference is in the packing of the audio frames. The payload begins with the payload header of one octet or two if frame interleaving is selected. The payload header is followed by the table of contents consisting of a list of one-octet ToC entries. If frame CRCs are to be included, they follow the table of contents with one 8-bit CRC filling each octet. Note that if a given frame has a ToC entry with FT=14 or 15, there will be no CRC present. Sjoberg, et. al. Standards Track [Page 18] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 The audio data follows the table of contents, or the CRCs if present. For packetization in the normal order, all of the octets comprising a audio frame are appended to the payload as a unit. The audio frames are packed in the same order as their corresponding ToC entries are arranged in the ToC list, with the exception that if a given frame has a ToC entry with FT=14 or 15, there will be no data octets present for that frame. For packetization in robust sorting order, the octets of all audio frames are interleaved together at the octet level. That is, the data portion of the payload begins with the first octet of the first frame, followed by the first octet of the second frame, then the first octet of the third frame, and so on. After the first octet of the last frame has been appended, the cycle repeats with the second octet of each frame. The process continues for as many octets as are present in the longest frame. If the frames are not all the same octet length, a shorter frame is skipped once all octets in it have been appended. The order of the frames in the cycle will be sequential if frame interleaving is not in use, or according to the interleave pattern specified in the payload header if frame interleaving is in use. Note that if a given frame has a ToC entry with FT=14 or 15, there will be no data octets present for that frame so that frame is skipped in the robust sorting cycle. The UED and/or UEP SHOULD cover at least the RTP header, payload header, table of contents, and all class A bits of a sorted payload. All class A bit SHOULD be covered since the extension modes do not have accurate sorting of the bits in sensitivity order. The bits are only sorted in different classes, with the most sensitive bits (class A bits) placed in the beginning. Exactly how many octets need to be covered depends on the network and application. If CRCs are used together with robust sorting, only the RTP header, the payload header, and the ToC SHOULD be covered by UED/UEP. The means to communicate to other layers performing UED/UEP the number of octets to be covered is beyond the scope of this specification. 4.3.5. Payload Examples 4.3.5.1. Example 1, Basic Payload Carrying Multiple Frames The following diagram shows a payload from a session that carries two AMR-WB+ frames of 14 kbps coding mode (FT=10). In the payload, the codec mode request is set to the default value (CMR=15), the mandated disabling of CMR. No frame CRC, interleaving, or robust-sorting is in use. Sjoberg, et. al. Standards Track [Page 19] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |CMR=15 |R|R|R|R|1|FT#1=10|Q|P|P|0|FT#2=10|Q|P|P| f1(0..7) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | f1(8..15) | f1(16..23) | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : ... : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... |f1(272..279) | f2(0..7) | f2(8..15) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | f2(16..23) | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : ... : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |f2(272..279) | +-+-+-+-+-+-+-+-+ 4.3.5.2. Example 2, Payload with CRC, Interleaving, and Robust-sorting This example shows a payload with two consecutive frames of 18 kbps stereo coding mode (FT=11), are carried in this payload. In the payload, the codec mode request is set to the mandated value (CMR=15) Moreover, frame CRC and interleaving are both enabled for the session. The interleaving length is 2 (ILL=1) and this payload is the first one in an interleave group (ILP=0). The first frame in the payload is frame #1, consisting of bits f1(0..359), and the next frame is frame#3, consisting of bits f3(0..359), due to interleaving. For each of the two audio frames a CRC is calculated as CRC1(0..7), CRC3(0..7), respectively. Finally, the payload is robust sorted. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |CMR=15 |R|R|R|R| ILL=1 | ILP=0 |1|FT#1=11|Q|P|P|0|FT#3=11|Q|P|P| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CRC1 | CRC3 | f1(0..7) | f3(0..7) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | f1(8..15) | f3(8..15) | f1(16..23) | f3(16..23) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : ... : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... | f1(336..343) | f3(336..343) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | f1(344..359) | f3(344..351) | f1(352..359) | f3(352..359) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Sjoberg, et. al. Standards Track [Page 20] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 4.4. Implementation Considerations An application implementing this payload format MUST understand all the payload parameters in the out-of-band signaling used. For example, if an application uses SDP, all the SDP and MIME parameters in this document MUST be understood. This requirement ensures that an implementation always can decide if it is capable or not of communicating. Only the basic operation mode of the payload format is mandatory to implement. The other modes of operation, i.e. interleaving, robust sorting, and frame-wise CRC are OPTIONAL to implement. The requirements of the application using the payload format should be used to determine what to implement. 5. Congestion Control The general congestion control considerations for transporting RTP data apply to AMR-WB+ audio over RTP as well. However, the multi- rate capability of AMR-WB+ audio coding may provide an advantage over other payload formats for controlling congestion since the bandwidth demand can be adjusted by selecting a different coding mode. Another parameter that may impact the bandwidth demand for AMR-WB+ is the number of frames that are encapsulated in each RTP payload. Packing more frames in each RTP payload can reduce the number of packets sent and hence the overhead from IP/UDP/RTP headers, at the expense of increased delay and reduced error robustness against packet losses. If forward error correction (FEC) is used to combat packet loss, the amount of redundancy added by FEC will need to be regulated so that the use of FEC itself does not cause a congestion problem. It is RECOMMENDED that AMR-WB+ applications using this payload format employ congestion control. The actual mechanism for congestion control is not specified but should be suitable for real-time flows, e.g., TCP Friendly Rate Control[11]. In the future the usage of congestion controlled transport protocols like Datagram Congestion Control Protocol (DCCP) [16] may simplify the usage of congestion control for application developers. 6. Security Considerations RTP packets using the payload format defined in this specification are subject to the general security considerations discussed in RFC3550 [4]. As this format transports encoded audio, the main security issues include confidentiality, integrity protection, and authentication of the audio itself. The payload format itself does Sjoberg, et. al. Standards Track [Page 21] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 not have any built-in security mechanisms. External mechanisms, such as SRTP [13], MAY be used. This payload format or the AMR-WB+ decoder does not exhibit any significant non-uniformity in the receiver side computational complexity for packet processing and thus is unlikely to pose a denial-of-service threat due to the receipt of pathological data. 6.1. Confidentiality To achieve confidentiality of the encoded AMR-WB+ audio, all audio data bits will need to be encrypted. There is less a need to encrypt the payload header or the table of contents due to 1) that they only carry information about the requested audio mode, frame type, and frame quality, and 2) that this information could be useful to some third party, e.g., quality monitoring. As long as the AMR-WB+ payload is only packed and unpacked at either end, encryption may be performed after packet encapsulation so that there is no conflict between the two operations. Interleaving may affect encryption. Depending on the encryption scheme used, there may be restrictions on, for example, the time when keys can be changed. Specifically, the key change may need to occur at the boundary between interleave groups. The type of encryption method used may impact the error robustness of the payload data. The error robustness may be severely reduced when the data is encrypted unless an encryption method without error- propagation is used, e.g. a stream cipher. Therefore, UED/UEP based on robust sorting may be difficult to apply when the payload data is encrypted. 6.2. Authentication To authenticate the sender of the audio and provide integrity protection, an external mechanism has to be used. It is RECOMMENDED that such a mechanism protect all the audio data bits and the RTP header. Note that the use of UED/UEP may be difficult to combine with authentication because any bit errors will cause authentication to fail. Data tampering by a man-in-the-middle attacker could result in erroneous depacketization/decoding that could lower the audio quality. To prevent a man-in-the-middle attacker from tampering with the payload packets, some additional information besides the audio bits SHOULD be protected. This may include the payload header, ToC, frame CRCs, RTP timestamp, RTP sequence number, and the RTP marker bit. Sjoberg, et. al. Standards Track [Page 22] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 6.3. Decoding Validation When processing a received payload packet, if the receiver finds that the calculated payload length, based on the information of the session and the values found in the payload header fields, does not match the size of the received packet, the receiver SHOULD discard the packet. This is because decoding a packet that has errors in its length field could severely degrade the audio quality. 7. Payload Format Parameters This section defines the parameters that may be used to select optional features of the AMR-WB+ payload format. The parameters are defined here as part of the MIME subtype registrations for the AMR- WB+ audio codec. A mapping of the parameters into the Session Description Protocol (SDP) [7] is also provided for those applications that use SDP. Equivalent parameters could be defined elsewhere for use with control protocols that do not use MIME or SDP. The data format and parameters are only specified for real-time transport in RTP. 7.1. MIME Registration The MIME subtype for the Adaptive Multi-Rate Wideband plus (AMR-WB+) codec is allocated from the IETF tree since AMR-WB+ is expected to be a widely used audio codec in general streaming applications. Note, any unspecified parameter MUST be ignored by the receiver. Media Type name: audio Media subtype name: AMR-WB+ Required parameters: none Optional parameters: These parameters apply to RTP transfer only. channels: The number of audio channels present in the audio frames. Permissible values are 1 (mono) or 2 (stereo). An RTP payload type SHALL only contain mono or stereo modes, not both. If switching is desired between mono or stereo two payload types will need to be declared. If no parameter is present, the number of channels is 1 (mono). maxptime: see Section 8 in RFC 3267 [9]. Sjoberg, et. al. Standards Track [Page 23] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 crc: Permissible values are 0 and 1. If 1, frame CRCs SHALL be included in the payload, otherwise not. If 0 or if not present, CRCs SHALL not be included. robust-sorting: Permissible values are 0 and 1. If 1, the payload SHALL employ robust payload sorting. If 0 or if not present, simple payload sorting SHALL be used. interleaving: Indicates that frame level interleaving SHALL be used for the session and its value defines the maximum number of frame allowed in an interleaving group (see Section 4.3.1). If this parameter is not present, interleaving SHALL not be used. ptime: see RFC2327 [7]. Encoding considerations: This type is only defined for transfer via RTP (RFC 3550) and as described in Section 4 of RFC XXXX. Security considerations: See Section 6 of RFC XXXX. Public specification: Please refer to Section 10 of RFC XXXX. Additional information: File storage of the AMR-WB+ format is recommended to be done in the 3GPP defined ISO based multimedia file format defined in 3GPP TS 26.244, see reference [18] of RFC XXXX. The file format has the MIME types "audio/3GPP" or "video/3GPP". To maintain interoperability with AMR-WB capable end- points, in cases where negotiation is possible, an AMR- WB+ end-point SHOULD declare itself also as AMR-WB capable. As the AMR-WB+ decoder is capable of performing stereo to mono conversions, all receivers of AMR-WB+ should be able to receive both stereo and mono, although the receiver only is capable of playout of mono signals. Person & email address to contact for further information: johan.sjoberg@ericsson.com ari.lakaniemi@nokia.com Intended usage: COMMON. It is expected that many IP based streaming applicationswill use this type. Sjoberg, et. al. Standards Track [Page 24] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 Author/Change controller: johan.sjoberg@ericsson.com ari.lakaniemi@nokia.com IETF Audio/Video transport working group 7.2. Mapping MIME Parameters into SDP The information carried in the MIME media type specification has a specific mapping to fields in the Session Description Protocol (SDP) [7], which is commonly used to describe RTP sessions. When SDP is used to specify sessions employing the AMR-WB+ codec, the mapping is as follows: - The MIME type ("audio") goes in SDP "m=" as the media name. - The MIME subtype (payload format name) goes in SDP "a=rtpmap" as the encoding name. The RTP clock rate in "a=rtpmap" SHALL be 96000 for AMR-WB+, and the encoding parameter number of channels MUST either be explicitly set to 1 or 2, or be omitted, implying the default value of 1. Only codec modes agreeing with the signalled number of channels may be used. - The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and "a=maxptime" attributes, respectively. - Any remaining parameters go in the SDP "a=fmtp" attribute by copying them directly from the MIME media type string as a semicolon separated list of parameter=value pairs. 7.2.1. Offer-Answer Model Considerations To achieve good interoperability for the AMR-WB+ RTP payload in an Offer-Answer negotiative usage in SDP the following considerations should be made: - Each combination of the RTP payload configuration parameters (crc, robust-sorting, and interleaving) is unique in its bit-pattern and not compatible with any other combination. Due to the application dependent nature of any configuration and they being optionally to implement, care must be taken. When creating an offer in an application desiring to use the more advance features (crc, robust-sorting, or interleaving), the offerer is RECOMMENDED to also offer an payload type containing only the octet-align configuration. If multiple configurations are of interest to the application they may all be offered, however care should be taken Sjoberg, et. al. Standards Track [Page 25] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 to not offer too many payload types. - As one can use both mono and stereo modes, and these require different payload types to be declared/negotiated, both stereo and mono payload types SHOULD be offered. - The parameters "maxptime" and "ptime" should in most cases not affect the interoperability, however the setting of the parameters can affect the performance of the application. - To maintain interoperability with AMR-WB in cases where negotiation is possible, an AMR-WB+ capable end-point SHOULD also declare itself capable of AMR-WB as it is a subset of AMR-WB+. 7.2.2. Examples One example SDP session description utilizing AMR-WB+ mono and stereo encoding follow. m=audio 49120 RTP/AVP 98 99 a=rtpmap:98 AMR-WB+/96000/1 a=rtpmap:99 AMR-WB+/96000/2 a=fmtp:98 interleaving=30 a=fmtp:99 interleaving=30 a=maxptime:100 Note that the payload format (encoding) names are commonly shown in upper case. MIME subtypes are commonly shown in lower case. These names are case-insensitive in both places. Similarly, parameter names are case-insensitive both in MIME types and in the default mapping to the SDP a=fmtp attribute. 8. IANA Considerations It is request that one new MIME subtypes is registered by IANA, see Section 7. 9. Acknowledgements The authors would like to thank Redwan Salami and Stefan Bruhn for their significant contributions made throughout the writing and reviewing of this document. We would also like to acknolwedge Qiaobing Xie coauthor of RFC 3267 on which this document is based on. Sjoberg, et. al. Standards Track [Page 26] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 10. References 10.1. Normative references [1] 3GPP TS 26.xxx "AMR Wideband plus audio codec; Transcoding functions", version 6.0.0 (2004-xx), 3rd Generation Partnership Project (3GPP). [2] 3GPP TS 26.xxx "AMR Wideband plus audio codec; Frame Structure", version 6.0.0 (2004-xx), 3rd Generation Partnership Project (3GPP). [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [4] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 3550 July 2003. [5] 3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise aspects", version 5.0.0 (2001-03), 3rd Generation Partnership Project (3GPP). [6] 3GPP TS 26.193 "AMR Wideband speech codec; Source Controled Rate operation", version 5.0.0 (2001-03), 3rd Generation Partnership Project (3GPP). [7] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [8] Schulzrinne, H., "RTP Profile for Audio and Video Conferences with Minimal Control", RFC 3551, July 2003. [9] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real- Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, June 2002. 10.2. Informative References [10] Larzon, L., Degermark, M. and S. Pink, "The UDP Lite Protocol", Work in Progress. [11] S. Floyd, M. Handley, J. Padhye, J. Widmer, "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC 3448, Internet Engineering Task Force, January 2003. [12] Li, A., et. al., "An RTP Payload Format for Generic FEC with Uneven Level Protection", Work in Progress. [13] Baugher, et. al., "The Secure Real Time Transport Protocol", Work in Progress. [14] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", RFC 2733, December 1999. [15] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M., Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, September 1997. [16] Kohler, E. et. al., "Datagram Congestion Control Protocol (DCCP)", Internet Draft, work in progress. [17] 3GPP TS 26.233 "Packet Switched Streaming service", version 5.0.0 (2001-03), 3rd Generation Partnership Project (3GPP). Sjoberg, et. al. Standards Track [Page 27] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 [18] 3GPP TS 26.244 " Transparent end-to-end packet switched streaming service (PSS); 3GPP file format (3GP)", version 1.0.0 (2003-11-28), 3rd Generation Partnership Project (3GPP). ETSI documents can be downloaded from the ETSI web server, "http://www.etsi.org/". Any 3GPP document can be downloaded from the 3GPP webserver, "http://www.3gpp.org/", see specifications. TIA documents can be obtained from "www.tiaonline.org". 11. Authors' Addresses Johan Sjoberg Ericsson Research Ericsson AB SE-164 80 Stockholm, SWEDEN Phone: +46 8 50878230 EMail: Johan.Sjoberg@ericsson.com Magnus Westerlund Ericsson Research Ericsson AB SE-164 80 Stockholm, SWEDEN Phone: +46 8 4048287 EMail: Magnus.Westerlund@ericsson.com Ari Lakaniemi Nokia Research Center P.O.Box 407 FIN-00045 Nokia Group, FINLAND Phone: +358-71-8008000 EMail: ari.lakaniemi@nokia.com Sjoberg, et. al. Standards Track [Page 28] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 12. IPR Notice The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Sjoberg, et. al. Standards Track [Page 29] INTERNET-DRAFT RTP payload format for AMR-WB+ February 13, 2004 13. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. This Internet-Draft expires in August 2004. Sjoberg, et. al. Standards Track [Page 30]