Internet Engineering Task Force Johan Sjoberg, Ericsson Audio Video Transport WG Magnus Westerlund, Ericsson INTERNET-DRAFT Ari Lakaniemi, Nokia February 27, 2001 Petri Koskelainen, Nokia Expires: August 27, 2001 Bernhard Wimmer, Siemens Tim Fingscheidt, Siemens Qiaobing Xie, Motorola Sanjay Gupta, Motorola RTP payload format and file storage format for AMR audio Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is an individual submission to the IETF. Comments should be directed to the authors. Abstract This document describes a proposed real-time transport protocol (RTP) payload format for AMR speech encoded signals. The AMR payload format is designed to be able to interoperate with existing AMR transport formats. This document also includes a MIME type registration for AMR. The MIME type is specified for both real-time transport and storage. Sjoberg et al. [Page 1] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 1. Introduction The adaptive multi-rate (AMR) speech codec [1] was developed by the European Telecommunications Standards institute (ETSI). The AMR codec is standardized for GSM, and is also chosen by 3GPP as the mandatory codec for third generation systems. It is currently under standardization for TDMA, i.e., the AMR codec will be widely used in cellular systems. The AMR codec is developed to preserve high speech quality under a wide range of transmission conditions. The AMR codec is a multi-mode codec with 8 narrow band speech modes with bit rates between 4.75 and 12.2 kbps. The sampling frequency is 8000 Hz and processing is done on 20 ms frames, i.e. 160 samples per frame. The AMR modes are closely related to each other and use the same coding framework. Three of the AMR modes are already adopted standards of their own, the 6.7 kbps mode as PDC-EFR [7], the 7.4 kbps mode as IS-641 codec in TDMA [6], and the 12.2 kbps mode as GSM- EFR [5]. The AMR codec is designed with a voice activity detector (VAD) and generation of comfort noise (CN) parameters during silence periods. Hence, the AMR codec can reduce the number of transmitted bits and packets during silence periods to a minimum. The operation to send CN parameters at regular intervals during silence periods is usually called discontinuous transmission (DTX) or source controlled rate (SCR) operation. The frames containing CN parameters are called Silence Indicator (SID) frames. AMR implementations must support all 8 speech coding modes, and mode switching can occur to any mode at any time. The mode information must therefore be transmitted together with the speech encoded bits, to indicate the mode. The AMR speech codec is designed with modes producing different bit rates to be able to adapt the source bit rate according to the radio link quality in mobile phone systems. The objective was to give highest possible speech quality under a variety of radio channel conditions. To realize rate adaptation the decoder needs to signal the mode it prefers to receive to the encoder. Due to the flexibility and robustness of AMR, it is suitable also for other purposes than circuit switched cellular systems. Other suitable applications are real-time services over packet switched networks. The payload format should be designed for robustness against both bit errors and packet loss. The speech encoded bits have different perceptual sensitivity to bit errors and cellular systems exploit this by using unequal error protection and detection (UEP and UED). The UED/UEP mechanism focus the correction and detection of corrupted bits to the perceptually most sensitive bits. A speech frame is only declared damaged if there are bit errors in the most sensitive bits, i.e. class A bits. It is acceptable to have some bit errors in the other bits, i.e. class B and C. Also a damaged frame is still useful Sjoberg et al. [Page 2] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 for error concealment in the decoding, which uses some of the less sensitive bits. This improves the speech quality compared to discarding the data. Today there exist some link layers that do not discard packets with bit errors, e.g. SLIP and some wireless links. With the Internet traffic pattern shifting towards a more media-centric one, more link layers of such nature may emerge in the future. With transport layer support for partial checksums, for example those supported by UDP- Lite [10] (work in progress), bit error tolerant AMR traffic could achieve better performance over these types of links. There are at least two basic approaches for carrying AMR traffic over bit error tolerant networks: 1) Utilizing a partial checksum to cover headers and the most important AMR speech bits of the payload. It is recommended that at least all class A bits are covered by the checksum. 2) Utilizing a partial checksum to only cover headers, but a frame CRC to cover the class A bits of each AMR frame in the payload. In either approach, at least part of the class B/C bits are left without error-check and thus bit error tolerance is achieved. It is still important that the network designer pay attention to the class B and C residual bit error rate. Though less sensitive to error than class A bits, class B bits are not insignificant and undetected errors in these bits cause degradation in speech quality. An example of residual error rates considered acceptable for AMR in UMTS can be found in [17]. Approach 1 is a bit efficient, flexible and simple way, but comes with two disadvantages, namely, a) bit errors in protected speech bits will cause the payload to be discarded, and b) when transporting multiple frames in a payload there is the possibility that a single bit error in protected bits gets all the frames discarded. These disadvantages can be avoided if needed, with some overhead in the form of a frame-wise CRC (Approach 2). In problem a), the CRC makes it possible to detect bit errors in class A bits and use the frame for error concealment, which gives a small improvement in speech quality. Secondly (b), when transporting multiple frames in a payload the CRC's remove the possibility that a single bit error in a class A bit gets all the frames discarded. Avoiding that gives an improvement in speech quality when transporting multiple frames and subject to bit errors. The choice between the two approaches must be made based on the available bandwidth, and desired tolerance to bit errors. Neither solution is appropriate to all cases. Sjoberg et al. [Page 3] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 The payload format supports several means to increase robustness against packet loss. The simple scheme of repetition of previously sent data is one possibility. Another possible scheme which is more bandwidth efficient is to use payload external FEC, e.g. RFC2733 [16], which generates extra packets containing repair data. The whole payload can also be sorted in sensitivity order to support external FEC schemes using UEP. There is work in progress on a generic version of such a scheme [15]. 2. Requirements The AMR payload format for RTP was designed to meet the following requirements: o Different levels of robustness must be supported, from no redundant data to extreme robustness capable of handling very high packet loss rates with no or small speech quality degradation. o Fast, bandwidth efficient, frame-wise AMR mode adaptation must be supported. This means that it must be possible to send Codec Mode Requests back from the receiving side to the transmitting side with information on the preferred mode. o Source controlled rate operation (SCR) (also called DTX) and transmission of SID frames defined in AMR must be supported. 3. Payload format The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [3]. The AMR payload format is designed to be flexible, ranging from very low overhead to an extended format with the possibility to increase bit error robustness and pack several speech frames in one packet. The payload format consists of one payload header, a table of contents, optionally one CRC per payload frame and zero or more payload frames. The payload format is bandwidth efficient. This is achieved by not using octet alignment for the payload header, table of contents or the payload frames, but the full payload is octet aligned. If the option to transmit a robust sorted payload is enabled and employed, the full payload SHALL finally be ordered in descending bit error sensitivity order to be prepared for unequal error protection or unequal error detection schemes. The AMR encoded bit streams are defined in sensitivity order in Annex B of [2], the Sjoberg et al. [Page 4] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 original order as delivered from the speech encoder is defined in [1]. The last octet of an AMR payload packet MUST be padded with zeroes at the end if not all bits are used. The AMR frame types, or modes, are defined in [2]. Frame type 15, no transmission, is needed to indicate not transmitted frames or lost frames. Not transmitted could mean both no data produced by the speech encoder for this frame or no data transmitted in this payload, i.e. valid data for this frame could be sent in an earlier or following packets. For example, when multiple frames are sent in each payload and comfort noise starts. A frame type sequence in a payload with 8 frames, speech frames with AMR mode 7 are interrupted by DTX operation in the fifth frame, could look like: {7,7,7,7,8,15,15,8}. The AMR SCR/DTX is described in [4]. The AMR payload format supports robust transmission, multiple frames in one payload packet, and the use of fast codec mode adaptation. Robustness against packet loss can be accomplished by using the possibility to retransmit previously transmitted frames together with the current frame or frames. The AMR performance over error tolerant links can be improved by delivering also speech frames with bit errors. Unequal error detection is needed since bit errors SHOULD only be allowed in the least error sensitive bits. This payload format provides two alternative methods to implement unequal error detection: A. CRC calculation over the class A speech bits If several consecutive speech frames are packed into each payload, the optional CRC MAY be used to protect the class A speech bits, see table 1. The number of class A bits is specified as informative in [2] and therefore copied into table 1 as normative for this payload format. Speech frames with errors in class A bits MUST be marked with SPEECH_BAD for corrupted speech frames (FT=0..7) or SID_BAD for corrupted SID frames (FT=8) and be sent to the speech decoder, see [4]. In this case the RTP header, payload header and table of contents should be covered by a transport layer checksum, e.g. UDP-lite [10]. Packets should be discarded if the transport layer checksum detects errors. B. Robust sorting of payload bits Robust behavior can also be accomplished by robust sorting of the payload. This enables the use of UED (e.g. UDP-lite) and UEP (e.g. ULP [15]). The UED and/or UEP is recommended to cover at Sjoberg et al. [Page 5] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 least the RTP header, payload header, table of contents and class A bits. Support for unequal error detection is OPTIONAL. If either scheme is to be used, it MUST be signaled out of band (see section 8). Class A total speech Index Mode bits bits ---------------------------------------- 0 AMR 4.75 42 95 1 AMR 5.15 49 103 2 AMR 5.9 55 118 3 AMR 6.7 58 134 4 AMR 7.4 61 148 5 AMR 7.95 75 159 6 AMR 10.2 65 204 7 AMR 12.2 81 244 8 AMR SID 39 39 Table 1. Specification of the number of class A bits. A frame quality indicator is included for interoperability with the ATM payload format described in ITU-T I.366.2, the UMTS Iu interface [13] and other transport formats. The speech quality is increased if damaged frames are forwarded to the speech decoder error concealment unit and not dropped. In many communication scenarios the AMR encoded bits will be transmitted from one IP/UDP/RTP terminal to a terminal in a system with another transport format and/or vice versa. The transport format transcoding will be done in a gateway. A second likely scenario is that IP/UDP/RTP is used as transport between other systems, i.e. IP is originated and terminated in gateways on both sides of the IP transport. AMR over I.366.{2,3} or +------+ +----------+ 3G Iu or | | IP/UDP/RTP/AMR | | -------------->| GW |----------------------->| TERMINAL | GSM Abis | | | | etc. +------+ +----------+ Figure 1: GW to VoIP terminal scenario AMR over AMR over I.366.{2,3} or +------+ +------+ I.366.{2,3} or 3G Iu or | | IP/UDP/RTP/AMR | | 3G Iu or -------------->| GW |-------------------->| GW |---------------> GSM Abis | | | | GSM Abis etc. +------+ +------+ etc. Figure 2. GW to GW scenario Sjoberg et al. [Page 6] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 3.1. The payload header The length of the payload header is 6 bits. The bits in the header are specified as follows: S (1bit): Indicates if set that the payload is robust sorted, otherwise simple payload sorting is employed. This bit MUST NOT be set unless the receiver has signaled support for the robust payload sorting (see section 8). C (1 bit): Indicates the existence of optional CRC fields in the payload table of contents. This bit MUST NOT be set unless the receiver has signaled support for the OPTIONAL CRC (see section 8). R (1 bit): Indicates, if set, that the Codec Mode Request (CMR) is valid. CMR (3 bits): this field is only valid if the R bit is set(R=1). Codec Mode Requested (CMR) for the other communication direction. It is only allowed to request the one of the speech modes, frame type index 0-7 see Table 1a in [2]. If R=0 the CMR bits SHALL be set to zero, other values are for future use. 0 0 1 2 3 4 5 +-+-+-+-+-+-+ |S|C|R| CMR | +-+-+-+-+-+-+ Figure 3: AMR payload header 3.2. The payload table of contents and CRCs The table of contents (ToC) consists of one entry for each speech frame in the payload. A table of contents entry includes several specified fields as follows: F (1 bit): Indicates if this frame is followed by further frames. F=1 further frames follow, F=0 last frame. Q (1 bit): The payload quality bit indicates, if not set, that the payload is severely damaged and the receiver should set the RX_TYPE, see [4], to SPEECH_BAD or SID_BAD depending on the frame type (FT). FT (4 bits): Frame type indicator, indicating the AMR speech coding mode or comfort noise (SID) mode. The mapping of existing AMR modes to FT is given in Table 1a in [2]. If FT=15 (No transmission) no CRC or payload frame is present. Sjoberg et al. [Page 7] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 0 0 1 2 3 4 5 +-+-+-+-+-+-+ |F|Q| FT | +-+-+-+-+-+-+ Figure 4: Table of contents entry field CRC (8 bits): OPTIONAL field, exists if the payload header bit C is set (C=1). The 8 bit CRC is used for error detection. These 8 parity bits are generated according to section 4.1.4 in [2]. 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | CRC | +-+-+-+-+-+-+-+-+ Figure 5: CRC field The ToC and CRCs are arranged with all table of contents entries fields first followed by all CRC fields. The ToC starts with the frame data belonging to the oldest speech frame. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F|Q| FT |F|Q| FT |F|Q| FT | CRC | CRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | CRC | +-+-+-+-+-+-+-+-+-+-+ Figure 6: The ToC and CRCs for a payload with three speech frames 3.3. AMR speech frame An AMR speech frame represents one speech frame encoded with the mode according to the ToC field FT. The length of this field is implicitly defined by the AMR mode in the FT field. The bits SHALL be sorted according to Appendix B of [2]. 3.4. Compound AMR payload The compound AMR payload consists of one AMR payload header, the table of contents and one or more AMR payload frames, see section 3.1, 3.2 and 3.3. These can be put together with robust or simple payload sorting. The payload header bit S indicates the method used. Sjoberg et al. [Page 8] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 Definitions for describing the compound AMR payload: b(m) - bit m of the compound AMR payload, octet aligned o(n,m) - bit m of octet n in the octet description of the compound AMR payload, bit 0 is MSB t(n,m) - bit m in the table of contents entry for speech frame n p(n,m) - bit m in the CRC for speech frame n f(n,m) - bit m in speech frame n F(n) - number of bits in speech frame n, defined by FT h(m) - bit m of payload header C - number of CRC bits , 0 or 8 bits N - number of payload frames in the payload S - number of unused bits Payload frames f(n,m) are ordered in consecutive order, where frame n=1 is preceding frame n=2. Within one payload all frames between the oldest and most recent MUST be present. If speech data is missing for one or more frames in the sequence of frames in the payload, due to e.g. DTX, send the NO_TRANSMISSION frame type for these frames. This does not mean that all frames must be sent, only that the sequence of frames in one payload MUST indicate missing frames. The compound AMR payload, b, is mapped into octets, o, where bit 0 is MSB. 3.4.1. Robust payload sorting A bit error in a more sensitive bit is subjectively more annoying than in a less sensitive bit. Therefore, to be able to protect only the most sensitive bits in a payload packet with a forward error detection code, e.g. a CRC outside RTP, the bits inside a frame are ordered into sensitivity order. The protection SHOULD cover an appropriate number of octets from the beginning of the payload, covering at least the AMR payload header, ToC and class A bits (see [2]). Exactly how many octets need protection depends on the network and application. To maintain sensitivity ordering inside the AMR payload, when more than one speech frame is transmitted in one payload, reordering of the data is needed. When robust sorting mode is used, the reordering to maintain the sensitivity ordered AMR payload SHALL be performed on bit level. The AMR payload header, ToC and CRCs SHALL still be placed unchanged in the beginning of the payload. Thereafter, the payload frames are sorted with one bit alternating from each payload frame. Sjoberg et al. [Page 9] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 The robust payload sorting algorithm is defined in C-style as: /* payload header */ k=0; for (i = 0; i < 6; i++){ b(k++) = h(i); } /* table of contents */ for (j = 0; j < N; j++){ for (i = 0; i < 6; i++){ b(k++) = t(j,i); } } /* CRCs */ for (j = 0; j < N; j++){ for (i = 0; i < C; i++){ b(k++) = p(j,i); } } /* payload frames */ max = max(F(0),..,F(N-1)); for (i = 0; i < max; i++){ for (j = 0; j < N; j++){ if (i < F(j)){ b(k++) = f(j,i); } } } /* padding */ S = 8 - k%8; if (S < 8){ for (i = 0; i < S; i++){ b(k++) = 0; } } /* map into octets */ for (i = 0; i < k; i++){ o(i/8,i%8)=b(i) } 3.4.2. Simple payload sorting If multiple new frames are encapsulated into the payload and robust payload sorting is not used, the payload is formed by concatenating the payload header, the ToC, optional CRC fields and the speech frames in the payload. However, the bits inside a frame are ordered into sensitivity order as defined in [2]. Sjoberg et al. [Page 10] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 The simple payload sorting algorithm is defined in C-style as: /* payload header */ k=0; for (i = 0; i < 6; i++){ b(k++) = h(i); } /* table of contents */ for (j = 0; j < N; j++){ for (i = 0; i < 6; i++){ b(k++) = t(j,i); } } /* CRCs */ for (j = 0; j < N; j++){ for (i = 0; i < C; i++){ b(k++) = p(j,i); } } /* payload frames */ for (j = 0; j < N; j++){ for (i = 0; i < F(j); i++){ b(k++) = f(j,i); } } } /* padding */ S = 8 - k%8; if (S < 8){ for (i = 0; i < S; i++){ b(k++) = 0; } } /* map into octets */ for (i = 0; i < k; i++){ o(i/8,i%8)=b(i) } 3.5. Decoding security consideration If the payload length calculation, using C, F and FT fields, does not indicate the same length as the size of the payload actually received, the payload should be dropped. Decoding a packet that has errors in length indicator bits could severely degrade the speech quality. Sjoberg et al. [Page 11] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 4. RTP header usage The RTP header marker bit (M) is used to mark (M=1) the packages containing the first speech frame after DTX operation. For all other packages the marker bit is set to 0 (M=0). The timestamp corresponds to the sampling instant of the first sample encoded for the first frame in the packet. A frame can be either encoded speech, comfort noise parameters, or NO_TRANSMISSION. The timestamp unit is in samples. The duration of one AMR speech frame is 20 ms and the sampling frequency is 8 kHz, corresponding to 160 encoded speech samples per frame. Thus, the timestamp is increased by 160 for each consecutive frame. All frames in a packet MUST be successive 20 ms frames. 5. Congestion Control The need of congestion control for data transported with RTP has to be considered. AMR speech data have some elastic properties due to the different bandwidth demand for each mode. Another parameter that can reduce the bandwidth demand for AMR is how many frames of speech data that are encapsulated in each payload. This will reduce the number of packets and the overhead from IP/UDP/RTP headers. If using forward error correction (FEC) there is also the need to regulate the amount, so the FEC itself does not worsen the problem. Therefore, it is RECOMMENDED that applications using this payload implement congestion control. The actual mechanism for congestion control is not specified but should be suitable for real-time flows, e.g. "Equation-Based Congestion Control for Unicast Applications" [14]. 6. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [8]. This implies that confidentiality of the media streams is achieved by encryption. Because the payload format is arranged end-to-end, encryption MAY be performed after encapsulation so there is no conflict between the two operations. This payload type does not exhibit any significant non-uniformity in the receiver side computational complexity for packet processing to cause a potential denial-of-service threat. As this format transports encoded speech, the main security issues are decoding security (see section 3.5), confidentiality and authentication of the speech itself. The payload format itself does not have any support for security. These issues have to be solved by a payload external mechanism. Sjoberg et al. [Page 12] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 6.1. Confidentiality To achieve confidentiality of the encoded speech all speech data bits must be encrypted. There is less need to encrypt the payload header or the frame header as they only carry information about the requested AMR mode, AMR frame type and frame quality. This information could be useful to some third party, e.g. quality monitoring. The type of encryption used can not only have impact on the confidentiality but also on error robustness. The error robustness against bit errors will be none, unless an encryption method without error-propagation is used, e.g. a stream cipher. This is only an issue when using UEP/D, when bit errors can be accepted in some part of the payload. 6.2. Authentication To authenticate the sender of the speech an external mechanism has to be added. It is RECOMMENDED that such a mechanism protects all the speech data bits. Note that the use of UED/UEP is difficult to combine with authentication. To prevent a man in the middle from tampering with the packetization of the speech data, some extra data SHOULD be protected. The data is: the payload header, ToC, CRCs, RTP timestamp, RTP sequence number, and the RTP marker bit. Tampering could result in erroneous depacketization/decoding that could lower speech quality. Tampering with the AMR mode request field can result in that the sender must receive speech in a different quality than desired. 7. Examples 7.1. Simple example In the simple example we just send one frame in each RTP packet, no valid Codec Mode Request CMR is sent (R=0), the payload was not damaged at IP origin (Q=1) and no CRC is used. The AMR mode is the 5.9 kbps mode (FT=2). The speech encoded bits are put into f(0) to f(117) in descending sensitivity order according to [2]. Simple payload sorting is used, S=0. | Bit no. | Oct| 0 1 2 3 4 5 6 7 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 0 | S=0 | C=0 | R=0 | 0 | 0 | 0 | F=0 | Q=1 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 1 | 0 | 0 | 1 | 0 | f(0) | f(1) | f(2) | ... | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 16 | f(116)| f(117)| 0 | 0 | 0 | 0 | 0 | 0 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ Figure 7: One frame per packet example. Sjoberg et al. [Page 13] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 7.2. Example with CRCs In this example the two frames with 6.7 kbps mode (FT=3) are sent in the payload. A mode request is sent(R=1), requesting the 10.2 kbps mode for the other link(CMR=6). CRC is used (C=1). Frame one (134 bits) is f1(0..133) and frame 2 f2(0..133). For each payload frame a CRC is calculated p1(0..7) for frame 1 and p2(0..7) for frame 2. Simple payload sorting is used, S=0. | Bit no. | Oct| 0 1 2 3 4 5 6 7 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 0 | S=0 | C=1 | R=1 | 1 | 1 | 0 | F=1 | Q=1 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 1 | 0 | 0 | 1 | 1 | F=0 | Q=1 | 0 | 0 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 2 | 1 | 1 | p1(0) | p1(1) | p1(2) | p1(3) | p1(4) | p1(5) | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 3 | p1(6) | p1(7) | p2(0) | p2(1) | p2(2) | p2(3) | p2(4) | p2(5) | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 4 | p2(6) | p2(7) | f1(0) | f1(1) | ... | ... | ... | ... | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 20 | ... | ... | ... | ... | ... | ... |f1(132)|f1(133)| ---+-------+-------+-------+-------+-------+-------+-------+-------+ 21 | f2(0) | f2(1) | ... | ... | ... | ... | ... | ... | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 37 | ... | ... | ... |f2(131)|f2(132)|f2(133)| 0 | 0 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ Figure 8: Example with CRCs. 7.3. Example with multiple frames per payload and robust sorting In this example two 5.9 kbps mode (FT=2) frames are sent in one payload. No CRC is used (C=0). A mode request is sent(R=1), requesting the 7.95 kbps mode for the other link(CMR=5). The first frame is represented by the 118 bits f(0) to f(117) and the subsequent frame by g(0) to g(117). Robust sorting is used. Sjoberg et al. [Page 14] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 | Bit no. | Oct| 0 1 2 3 4 5 6 7 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 0 | S=1 | C=0 | R=1 | 1 | 0 | 1 | F=1 | Q=1 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 1 | 0 | 0 | 1 | 0 | F=0 | Q=1 | 0 | 0 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 2 | 1 | 0 | f(0) | g(0) | f(1) | g(1) | ... | ... | ---+-------+-------+-------+-------+-------+-------+-------+-------+ 31 | ... | ... | f(116)| g(116)| f(117)| g(117)| 0 | 0 | ---+-------+-------+-------+-------+-------+-------+-------+-------+ Figure 9: Example two frames per payload and robust sorting. 8. The AMR MIME type registration This chapter defines the MIME type for the Adaptive Multi-Rate (AMR) speech codec [1]. The data format and parameters are specified for both real-time transport and for storage type applications (e.g. e- mail attachment, multimedia messaging). The former is referred to as RTP mode and the latter as storage mode. AMR implementations according to [1] MUST support all eight coding modes. The mode change can occur at any time during operation and therefore the mode information is transmitted in-band together with speech bits to allow mode change without any additional signaling. In addition to the speech codec, AMR specifications also include Discontinuous Transmission / comfort noise (DTX/CN) functionality [11]. The DTX/CN switches the transmission off during silent parts of the speech and only CN parameter updates, SID farmes, are sent at regular intervals. 8.1. RTP mode It is possible that the decoder may want to receive a certain AMR mode or a subset of AMR modes, due to link limitations in some cellular systems, e.g. the GSM radio link can only use a subset of at most four modes. A GSM subset can consist of any combination of the 8 AMR modes. Therefore, it is possible to request a specific set of AMR modes in capability description and the encoder MUST abide by this request. If the request for mode set is not given any mode may be used or requested. The AMR codec can in principle perform a mode change at any time between any two modes. To support interoperability with GSM through a gateway it is possible to set limitations for mode changes. The decoder has the possibility to define the minimum number of frames Sjoberg et al. [Page 15] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 between mode changes and to limit the mode change to transition into neighboring modes only. It is also possible to limit the number of AMR frames encapsulated into one RTP packet. This is an optional feature and if no parameter is given in the capability description, the transmitter MAY encapsulate any number of AMR speech frames into one RTP packet. The payload CRC UED MUST only be used if the receiver has signaled support for this functionality in the capability description. To support unequal error protection and/or detection the payload format supports robust payload sorting. The robust payload sorting is an OPTIONAL feature and MUST only be used if the receiver has signaled support for this functionality in the capability description. 8.2. Storage mode The AMR storage mode is used for storing AMR frames, e.g. as a file or e-mail attachment. Frames are stored in consecutive order in octet aligned manner. This implies that the first octet after the last octet of frame n must be the first octet of frame n+1. Each stored AMR frame consists of a Q bit and the 4-bit FT field (see definition in section 3.2), followed by the AMR encoded speech bits (see section 3.3). The last octet of each frame is padded with zeroes, if needed, to achieve octet alignment. An example is given in figure 10. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Q| FT | | +-+-+-+-+-+ + | | + AMR speech bits for frame n + | | + +-+-+-+-+-+ | | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Q| FT | | +-+-+-+-+-+ + | | + AMR speech bits for frame n+1 + | | + +-+-+-+-+-+ | | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 10: An example of storage format with two AMR 5.9 kbit/s frames (118 speech bits). Note that bits marked as 'padding' MUST be set to zero. Sjoberg et al. [Page 16] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 Frames lost in transmission and non-received frames between SID updates during non-speech period must be stored as NO_TRANSMISSION frames (frame type 15, see definition in [2]) to keep synchronization with the original media. The receiving entity (AMR decoder) MUST be able to decode all eight coding modes as well as the AMR DTX/CN [6]. Since no exchange of particular coding considerations can be signaled before downloading or receiving stored AMR data, the optional features (robust sorting, CRC) specified for RTP mode MUST NOT be used with storage mode. 8.3. MIME Registration MIME-name for the AMR codec is allocated from IETF tree since AMR is expected to be widely used speech codec in VoIP applications. Some parts of this chapter will distinguish between RTP and storage modes. Media Type name: audio Media subtype name: AMR Required parameters: none Optional parameters for RTP mode: mode-set: Requested AMR mode set. Restricts the active codec mode set to a subset of all modes. Possible values are comma separated list of modes: 0,...,7 (see Table 1a [2] an example is given in section 8.4). If not present, all speech modes are available. mode-change-period: Defines a number N which restricts the mode changes in such a way that mode changes are only allowed on multiples of N, initial state of the phase is arbitrary. If this parameter is not present, mode change can happen at any time. mode-change-neighbor: If present, mode changes SHALL only be made to neighboring modes in the active codec mode set. Neighboring modes are the ones closest in bit rate to the current mode, both higher and lower rate included. If not present, change between any two modes in the active codec mode set is allowed. maxframes: Maximum number of AMR speech frames in one RTP packet. The receiver may set this parameter in order to limit the buffering requirements or delay. crc: If present, transmission of CRCs in the payload is supported, otherwise not supported. robust-sorting: If present, robust payload sorting is supported, otherwise not supported and simple payload sorting SHALL be used. Sjoberg et al. [Page 17] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 Optional parameters for storage mode: none Encoding considerations for RTP mode: See section 3 in this document. Encoding considerations for storage mode: See section 8.2 in this document. Security considerations: see chapter 6 "Security". Public specification: please refer to chapter 9 "References". Additional information for storage mode: Magic number: none File extensions: amr, AMR Macintosh file type code: none Object identifier or OID: none Person & email address to contact for further information: johan.sjoberg@ericsson.com ari.lakaniemi@nokia.com Bernhard.Wimmer@mch.siemens.de Intended usage: COMMON. It is expected that many VoIP applications (as well as mobile applications) will use this type. Author/Change controller: johan.sjoberg@ericsson.com ari.lakaniemi@nokia.com 8.4 Mapping to SDP Parameters Please note that this chapter applies to the RTP mode only. Example of usage in SDP [12]: m=audio 49120 RTP/AVP 97 a=rtpmap:97 AMR/8000 a=fmtp:97 mode-set=0,2,5,7; maxframes=1 9. References [1] 3G TS 26.090, "Adaptive Multi-Rate (AMR) speech transcoding". [2] 3G TS 26.101, "AMR Speech Codec Frame Structure". [3] IETF RFC 2119, "Key words for use in RFCs to Indicate Requirement Levels". [4] 3G TS 26.093, "AMR Speech Codec; Source Controlled Rate operation". Sjoberg et al. [Page 18] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 [5] GSM 06.60, "Enhanced Full Rate (EFR) speech transcoding". [6] TIA/EIA -136-Rev.A, part 410 - "TDMA Cellular/PCS - Radio Interface, Enhanced Full Rate Voice Codec (ACELP). Formerly IS- 641. TIA published standard, 1998". [7] ARIB, RCR STD-27H, "Personal Digital Cellular Telecommunication System RCR Standard". [8] IETF RFC1889, "RTP: A Transport Protocol for Real-Time Applications". [9] IETF draft-westberg-realtime-cellular-01.txt, "Realtime Traffic over Cellular Access Networks". [10] IETF draft-larzon-udplite-03.txt, "The UDP Lite Protocol". [11] GSM 06.92, "Comfort noise aspects for Adaptive Multi-Rate (AMR) speech traffic channels". [12] M. Handley and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998 [13] 3G TS 25.415 "UTRAN Iu Interface User Plane Protocols" [14] S. Floyd, M. Handley, J. Padhye, J. Widmer, "Equation-Based Congestion Control for Unicast Applications", ACM SIGCOMM 2000, Stockholm, Sweden [15] IETF draft-ietf-avt-ulp-00.txt, "An RTP Payload Format for Generic FEC with Uneven Level Protection ". [16] IETF RFC2733, "An RTP Payload Format for Generic Forward Error Correction". [17] 3G TS 26.102, "AMR speech codec interface to Iu and Uu". 10. Authors' addresses Johan Sjoberg Tel: +46 8 50878230 Ericsson Research EMail: Johan.Sjoberg@ericsson.com Ericsson Radio Systems AB Torshamnsgatan 23 SE-164 80 Stockholm SWEDEN Sjoberg et al. [Page 19] INTERNET-DRAFT RTP Payload Format for AMR February 27, 2001 Magnus Westerlund Tel: +46 8 4048287 Ericsson Research EMail: Magnus.Westerlund@ericsson.com Ericsson Radio Systems AB Torshamnsgatan 23 SE-164 80 Stockholm SWEDEN Ari Lakaniemi Tel: +358 40 5276440 Nokia Research Center EMail: ari.lakaniemi@nokia.com P.O.Box 407 FIN-00045 Nokia Group Finland Petri Koskelainen Nokia Research Center Email: petri.koskelainen@nokia.com P.O.Box 100 FIN-33721 Tampere Finland Tim Fingscheidt Tel: +49 89 722 57658 Siemens AG, ICP CD Fax: +49 89 722 46489 Grillparzerstrasse 10-18 EMail: Tim.Fingscheidt@mch.siemens.de D - 81675 Munich Germany Bernhard Wimmer Tel: +49 89 722 23247 Siemens AG, ICP CD Fax: +49 89 722 46489 Grillparzerstrasse 10-18 EMail: Bernhard.Wimmer@mch.siemens.de D - 81675 Munich Germany Qiaobing Xie Tel: +1-847-632-3028 Motorola, Inc. EMail: qxie1@email.mot.com 1501 W. Shure Drive, #2309 Arlington Heights, IL 60004 USA Sanjay Gupta Tel: +1-847-435-0306 Motorola, Inc. EMail: QA4496@email.mot.com 1501 W. Shure Drive, #3205 Arlington Heights, IL 60004 USA This Internet-Draft expires August 27, 2001. Sjoberg et al. [Page 20]