idnits 2.17.1 draft-lakaniemi-avt-amrwb-00.txt: -(4): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == There are 2 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 20 longer pages, the longest (page 2) being 61 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 21 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '12' is mentioned on line 65, but not defined == Missing Reference: '14' is mentioned on line 111, but not defined == Missing Reference: '15' is mentioned on line 254, but not defined == Missing Reference: '16' is mentioned on line 649, but not defined == Missing Reference: '10' is mentioned on line 655, but not defined == Missing Reference: '11' is mentioned on line 902, but not defined == Unused Reference: '8' is defined on line 929, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' -- Possible downref: Non-RFC (?) normative reference: ref. '17' Summary: 6 errors (**), 0 flaws (~~), 11 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Ari Lakaniemi, Nokia 3 Audio Video Transport WG Pasi Ojala, Nokia 4 INTERNET-DRAFT Johan Sj�berg, Ericsson 5 February 23, 2001 Magnus Westerlund, Ericsson 6 Expires: August 23, 2001 8 RTP payload format for AMR-WB 9 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC 2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that other 18 groups may also distribute working documents as Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or cite them other than as "work in progress". 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/lid-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html 31 This document is an individual submission to the IETF. Comments 32 should be directed to the authors. 34 Abstract 36 This document specifies a real-time transport protocol (RTP) payload 37 format for Adaptive Multi-Rate Wideband (AMR-WB) speech encoded 38 signals. The AMR-WB payload format is designed to be able to 39 interoperate with existing AMR-WB transport formats. This document 40 also includes a MIME type registration for AMR-WB. 42 1. Introduction 44 The Adaptive Multi-Rate Wideband (AMR-WB) speech codec [1] was 45 originally developed by the Third Generation Partnership Project 46 (3GPP) to be used in GSM and 3G systems. I.e. the AMR-WB codec will 47 be widely used in cellular systems. The AMR-WB codec is developed to 48 preserve high speech quality under a wide range of transmission 49 conditions. 51 The AMR-WB codec is a multi-mode speech codec with 9 wideband speech 52 coding modes with bit-rates between 6.6 and 23.85 kbps. The sampling 53 frequency is 16000 Hz and processing is performed on 20 ms frames, 54 i.e. 320 speech samples per frame. The AMR-WB modes are closely 55 related to each other and employ the same coding framework. Mode 56 adaptation functionality is one valuable aspect of the AMR-WB 57 operation. In mobile radio systems (GSM) it allows the system to 58 adapt the balance between speech coding and error protection to 59 enable best possible speech quality in prevailing transmission 60 conditions. On the other hand, AMR-WB mode adaptation can be also 61 utilized to adapt to the varying available transmission bandwidth. 62 Basically the mode change can occur to any mode at any time. 64 The name and operational principles of the AMR-WB codec largely 65 resemble those of the Adaptive Multi-Rate (AMR-NB) codec [2,12]. 66 However, these are two separate speech codecs, the principal 67 difference being that AMR-NB is so-called narrow band speech coding, 68 using 8000 Hz sampling frequency, compared to 16000 Hz of the AMR-WB. 70 The AMR-WB codec is designed with a voice activity detector (VAD) [6] 71 and generation of comfort noise (CN) parameters during silence 72 periods [5]. Hence, the AMR-WB codec can reduce the number of 73 transmitted bits and packets during silence periods to a minimum. The 74 operation to send silence descriptor (SID) frames containing CN 75 parameters at regular intervals non-speech periods is usually called 76 discontinuous transmission (DTX) or source controlled rate (SCR) 77 operation [4]. 79 AMR-WB implementations must support all 9 speech coding modes. AMR-WB 80 mode switching can occur between any speech frames, and current mode 81 must be indicated by transmitting the mode information together with 82 the speech encoded bits. The objective of AMR-WB design has been to 83 enable highest possible speech quality under a variety of 84 transmission channel conditions. To realize the mode adaptation the 85 receiver needs to signal the AMR-WB mode it prefers to receive to the 86 transmitter. 88 Due to the flexibility and robustness of AMR-WB, it is suitable also 89 for other purposes than circuit switched cellular systems. Other 90 suitable applications are real-time services over packet switched 91 networks. The payload format should be designed for robustness 92 against both bit errors and packet loss. The speech encoded bits have 93 different perceptual sensitivity to bit errors and cellular systems 94 exploit this by using unequal error protection and detection (UEP and 95 UED). 97 The UED/UEP mechanism focus the correction and detection of corrupted 98 bits to the perceptually most sensitive bits. A speech frame is only 99 declared damaged if there are bit errors in the most sensitive bits, 100 i.e. class A bits. It is acceptable to have some bit errors in the 101 other bits, i.e. class B and C. Also a damaged frame is still useful 102 for error concealment in the decoding, which uses some of the less 103 sensitive bits of the damaged data. This improves the speech quality 104 compared to discarding the data. 106 Today there exist some link layers that do not discard packets with 107 bit errors, e.g. SLIP and some wireless links (with the Internet 108 traffic pattern shifting towards a more media-centric one, more link 109 layers of such nature may emerge in the future). With transport layer 110 support for partial checksums, for example those supported by UDP- 111 Lite [14], bit error tolerant AMR-WB traffic could achieve better 112 performance over these types of links. 114 There are at least two basic approaches for carrying AMR-WB traffic 115 over bit error tolerant networks: 117 1) Utilizing the a partial checksum to cover headers and the most 118 important AMR-WB speech bits of the payload. It is recommended 119 that at least all class A bits are covered by the checksum. 121 2) Utilizing the a partial checksum to only cover headers, but a 122 frame CRC to cover the class A bits of each AMR-WB frame in the 123 payload. 125 In either approach, at least part of the class B/C bits are left 126 without error-check and thus bit error tolerance is achieved. 128 It is still important that the network designer pays attention to the 129 class B and C residual bit error rate. Though less sensitive to error 130 than class A bits, class B and C bits are not insignificant and 131 undetected errors in these bits cause degradation in speech quality. 132 An example of residual error rates considered acceptable for AMR-WB 133 in UMTS can be found in [17]. 135 Approach 1 is bit efficient, flexible and simple way, but comes with 136 two disadvantages, namely, a) bit errors in protected speech bits 137 will cause the payload to be discarded, and b) when transporting 138 multiple frames in a payload there is the possibility that a single 139 bit error in protected bits gets all the frames discarded. 141 These disadvantages can be avoided if needed, with some overhead in 142 the form of a frame-wise CRC (Approach 2). In problem a), the CRC 143 makes it possible to detect bit errors in class A bits and use the 144 frame for error concealment, which gives a small improvement in 145 speech quality. Secondly b), when transporting multiple frames in a 146 payload the CRCs remove the possibility that a single bit error in a 147 class A bit gets all the frames discarded. Avoiding that gives an 148 improvement in speech quality when transporting multiple frames and 149 subject to bit errors. 151 The choice between the two approaches must be made based on the 152 available bandwidth, and desired tolerance to bit errors. Neither 153 solution is appropriate to all cases. 155 To achieve better robustness against packet loss the payload supports 156 Forward Error Correction (FEC). The simple scheme of repetition of 157 previously sent data is one possibility. Another possible scheme, 158 which is more bandwidth efficient, is to use payload external FEC, 159 e.g. RFC 2733, which generates extra packets containing repair data. 160 The whole payload can also be sorted in sensitivity order to support 161 external FEC schemes using UEP. There is work in progress on a 162 generic version of such a scheme [15]. 164 Yet another mechanism to enhance error robustness is the interleaving 165 of AMR-WB speech frames. Sometimes several frames can be encapsulated 166 into single RTP packet to decrease protocol overhead. One of the 167 drawbacks of such approach is that in case of packet loss this means 168 loss of several consecutive speech frames, which usually causes 169 clearly audible distortion in reconstructed speech. The interleaving 170 of frames can improve the speech quality in such cases by 171 distributing the consecutive losses into series of single frame 172 losses. However, interleaving and bundling several frames per payload 173 will also increase end-to-end delay and is therefore not applicable 174 to all usage scenarios. However, e.g. streaming applications are 175 likely to be able to exploit interleaving to improve speech quality 176 in lossy transmission conditions. 178 2. Requirements 180 The AMR-WB RTP payload format was designed to meet the following 181 requirements: 183 o Different levels of robustness must be supported, from no 184 redundant data to extreme robustness capable of handling very high 185 packet loss rates with no or small speech quality degradation. 187 o Fast, bandwidth efficient, frame-wise AMR-WB mode adaptation must 188 be supported. This means that it must be possible to send Codec 189 Mode Requests back from the receiving side to the transmitting 190 side with information on the preferred mode. 192 o Source controlled rate operation (SCR) (also called DTX) and 193 comfort noise parameter (CN) transmission defined in AMR-WB must 194 be supported. 196 3. Payload format 198 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 199 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 200 document are to be interpreted as described in RFC 2119 [9]. 202 The AMR-WB payload format supports transmission of multiple frames 203 per payload, the use of fast codec mode adaptation, and robustness 204 against packet losses and bit errors. 206 The AMR-WB payload format consists of one payload header, a table of 207 content, optionally one CRC per payload frame, and zero or more AMR- 208 WB payload frames. The payload format is made as bandwidth efficient 209 as possible by not using octet alignment for the payload header, 210 table of content or the payload frames. However, the full payload is 211 octet aligned. Therefore any unused bits in the last octet MUST be 212 padded with zeros. 214 If the option to transmit a robust sorted payload is enabled by the 215 receiver, the transmitted may choose to sort the bits in the payload 216 according to descending bit error sensitivity in order to enable 217 UEP/UED outside RTP (e.g. UDP-lite). The sensitivity order for AMR-WB 218 encoded speech bits for each mode is defined in Annex B of [3], the 219 original bit order being as delivered by the AMR-WB speech encoder 220 [1]. The AMR-WB frame types, or modes, are defined in [3]. 222 Robustness against packet loss can be accomplished by using the 223 possibility to retransmit previously transmitted frames together with 224 the current (new) frame or frames. Another approach is using 225 interleaving to reduced the speech quality effect of packet losses. 226 Note that the usage of these options can be restricted by the MIME 227 parameters during the session set-up. The AMR-WB performance over 228 error tolerant links can be improved by delivering also the speech 229 frames that have been corrupted with bit errors. However, UEP/UED 230 MUST be used in such a way that the bit errors are allowed only in 231 the least error sensitive bits. Bit errors in class A bits MUST NOT 232 be allowed in any circumstances. This payload format provides two 233 alternative methods to implement UED: 235 A. CRC calculation over the class A speech bits 237 If several consecutive speech frames are encapsulated into each 238 payload, the optional CRC may be used to protect the class A speech 239 bits of each frame, see table 1. The number of class A bits is 240 specified as informative in [3] and therefore copied into table 1 241 as normative for this payload format. Speech frames with errors in 242 class A bits MUST be marked with SPEECH_BAD for corrupted speech 243 frames (FT=0..8) or SID_BAD for corrupted SID frames (FT=9), and be 244 sent to the speech decoder to assist error concealment, see [7]. In 245 this case the RTP header, payload header, and table of content 246 should be covered by a transport layer CRC, e.g. UDP-lite. A packet 247 MUST be discarded if the transport layer CRC detects errors in 248 these bits. 250 B. Robust sorting of payload bits 252 Robust behavior can also be accomplished by robust sorting of the 253 payload. This enables the use of UED (e.g. UDP-lite) and UEP (e.g. 254 ULP [15]). Note that payloads containing a single frame are sorted 255 in the same robust way regardless of the use of simple or robust 256 sorting. The UED and/or UEP is recommended to cover at least the 257 RTP header, payload header, table of content and all class A bits 258 from all frames in the payload. 260 Support for unequal error detection is OPTIONAL. If either scheme is 261 to be used, it MUST be signaled out of band (see section 8). 263 Class A total speech 264 Index Mode bits bits 265 ---------------------------------------- 266 0 AMR-WB 6.6 54 78 267 1 AMR-WB 8.85 64 113 268 2 AMR-WB 12.65 72 181 269 3 AMR-WB 14.25 72 213 270 4 AMR-WB 15.85 72 245 271 5 AMR-WB 18.25 72 293 272 6 AMR-WB 19.85 72 325 273 7 AMR-WB 23.05 72 389 274 8 AMR-WB 23.85 72 405 275 9 AMR-WB SID 40 40 277 Table 1. Specification of the number of class A bits for AMR-WB. 279 The speech quality in channel error conditions can be improved by 280 delivering also the frames corrupted e.g. in transmission over a 281 radio link to the receiver. Despite the bit-errors, providing damaged 282 frames to the error concealment unit can improve the speech quality 283 compared to case where corrupted frames are dropped. However, to 284 accomplish this, a frame quality indicator is needed to mark the 285 corrupted frames for the decoder. In many communication scenarios the 286 AMR-WB frames will be transmitted from one IP/UDP/RTP terminal to a 287 terminal in a system with another transport format and/or vice versa. 288 The transport format transcoding will be done in a gateway. A second 289 likely scenario is that IP/UDP/RTP is used as transport between other 290 systems, i.e. IP is originated and terminated in gateways on both 291 sides of the IP transport. 293 AMR-WB over +------+ +----------+ 294 3G Iu or | | IP/UDP/RTP/AMR-WB | | 295 -------------->| GW |----------------------->| TERMINAL | 296 GSM Abis | | | | 297 etc. +------+ +----------+ 299 Figure 1: GW to VoIP terminal scenario. 301 AMR-WB over +------+ +------+ AMR-WB over 302 3G Iu or | | IP/UDP/RTP/AMR-WB | | 3G Iu or 303 -------------->| GW |-------------------->| GW |---------------> 304 GSM Abis | | | | GSM Abis 305 etc. +------+ +------+ etc. 307 Figure 2. GW to GW scenario. 309 The speech quality in case of packet losses when transmitting several 310 AMR-WB frames per packet can be improved by using OPTIONAL frame 311 interleaving. The interleaving improves perceived speech quality 312 since it introduces single frame errors instead of several 313 consecutive frame errors. Note that interleaving can be applied only 314 if the receiver has signaled support for it in capability 315 description. 317 3.1. The payload header 319 The length of the payload header is either 7 or 15 bits, depending on 320 whether the interleaving is used or not. Figures 3a and 3b illustrate 321 the header structure. Header bits are specified in following two 322 subclauses. 324 3.1.1. Required fields of the payload header 326 S (1 bit): Indicates, if set, that the bits in the payload is robust 327 sorted. If not set, simple payload sorting is employed. Note that 328 this bit can be set only if the receiver has signaled support for the 329 OPTIONAL robust payload sorting. 331 C (1 bit): Indicates the existence of OPTIONAL CRC fields in the 332 payload table of content. Note that this bit can be set only if the 333 receiver has signaled support for the OPTIONAL CRC. 335 I (1 bit): Indicates, if set, that frames in this payload are 336 interleaved, and that ILL and ILP fields are present in the payload 337 header. If not set, frames in this payload are successive frames and 338 ILL and ILP fields are not present in the payload header. Note that 339 this bit can be set only if the receiver has signaled support for 340 interleaving. 342 CMR (4 bits): Indicates Codec Mode Requested for the other 343 communication direction. It is only allowed to request one of the 344 AMR-WB speech modes (frame type index 0...8, see Table 1a in [3]). 345 CMR value 15 indicates that no mode request is present, other values 346 are for future use. 348 3.1.2. Optional fields of the payload header 350 ILL (4 bits): OPTIONAL field that is present only if I=1. The value 351 of this field specifies the interleaving length used for frames in 352 this payload. 354 ILP (4 bits): OPTIONAL field that is present only if I=1. The value 355 of this field indicates the interleaving index for frames in this 356 payload. The value of ILP MUST be smaller than or equal to the value 357 of ILL. Erroneous value of ILP SHOULD cause the payload to be 358 discarded. 360 The value of the ILL field defines the length of an interleave group: 361 ILL=L implies that frames in (L+1)-frame intervals are picked into 362 the same interleaved payload, and the interleave group consists of 363 L+1 payloads. The value of ILP=p in payloads belonging to the same 364 group runs from 0 to L. The interleaving is meaningful only when 365 number of frames per payload N is greater than or equal to 2. Thus, 366 when N frames are transmitted in each payload of a group, the 367 interleave group consists of payloads with sequence numbers s...s+L, 368 and frames encapsulated into these payloads are f...f+N*(L+1)-1. 370 To put this in a form of an equation, let's assume that the first 371 frame of an interleave group is n, the first payload of the group is 372 s, number of frames per payload is N, ILL=L and ILP=p (p in range 373 0...L), the frames contained by the payload s+p are n + p + k*(L+1), 374 where k runs from 0 to N-1. I.e. 376 The first packet of an interleave group: ILL=L, ILP=0 377 Payload: s 378 Frames: n, n+(L+1), n+2*(L+1), ..., n+(N-1)*(L+1) 380 The second packet of an interleave group: ILL=L, ILP=1 381 Payload: s+1 382 Frames: n+1, n+1+(L+1), n+1+2*(L+1), ..., n+(N-1)*(L+1) 384 ... 386 The last packet of an interleave group: ILL=L, ILP=L 387 Payload: s+L 388 Frames: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+(N-1)*(L+1) 390 Interleaved frames MUST be stored in the payload in timestamp- 391 increasing order. Furthermore, the interleaved payloads within an 392 interleave group MUST be sent according to increasing order of ILP 393 field, and each payload of an interleave group MUST contain equal 394 number of frames. It is RECOMMENDED that ILL remains constant 395 throughout the session. If ILL is to be changed, the change SHOULD be 396 done between interleaving groups, i.e. the ILP of the previous packet 397 was L. Furthermore, because of the inter-frame dependent nature of 398 AMR-WB coding, it is RECOMMENDED that ILL values greater than or 399 equal to 2 are used to enable better error recovery in the decoder in 400 case of lost interleaved payload. Note also that using value ILL=0 or 401 using interleaving for payload carrying only one frame is not 402 meaningful. 404 0 405 0 1 2 3 4 5 6 406 +-+-+-+-+-+-+-+ 407 |S|C|I| CMR | 408 +-+-+-+-+-+-+-+ 410 Figure 3a: AMR-WB payload header, I=0. 412 0 1 413 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 415 |S|C|I| CMR | ILL | ILP | 416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 418 Figure 3b: AMR-WB payload header, I=1. 420 3.2. The payload table of content and CRCs 422 The table of content (ToC) consists of one table of content entry for 423 each speech frame in the payload. A table of content entry includes 424 several specified fields as follows: 426 F (1 bit): Indicates if this frame is followed by further frames in 427 this payload. F=1 further frames follow, F=0 last frame. 429 FT (4 bits): Frame type indicator, indicating the AMR-WB speech 430 coding mode or comfort noise (CN) mode. The mapping of AMR-WB modes 431 to FT is given in Table 1a in [3]. If FT=14 (lost frame) or FT=15 (no 432 transmission/no reception), no CRC or payload frame is present. 434 Q (1 bit): The frame quality bit indicates, if not set, that the 435 payload is corrupted and the receiver should set the RX_TYPE (see 436 [4]) to SPEECH_BAD or SID_BAD depending on the frame type (FT). 438 0 439 0 1 2 3 4 5 440 +-+-+-+-+-+-+ 441 |F| FT |Q| 442 +-+-+-+-+-+-+ 444 Figure 4: Table of content (ToC) entry field. 446 CRC (8 bits): OPTIONAL field, exists if the payload header bit C is 447 set (C=1). The 8 bit CRC is used for error detection. These 8 parity 448 bits are generated according to section 4.1.4 in [3]. 450 0 451 0 1 2 3 4 5 6 7 452 +-+-+-+-+-+-+-+-+ 453 | CRC | 454 +-+-+-+-+-+-+-+-+ 456 Figure 5: CRC field. 458 The ToC and CRCs are arranged with all table of content entries 459 fields first followed by all CRC fields. The ToC starts with the 460 frame data belonging to the oldest speech frame in the payload. 462 0 1 2 3 463 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 464 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 465 |F| FT |Q|F| FT |Q|F| FT |Q| CRC | CRC | 466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 467 | | CRC | 468 +-+-+-+-+-+-+-+-+-+-+ 470 Figure 6: The ToC and CRCs for a payload with three speech frames. 472 3.3. AMR-WB speech frame 474 An AMR-WB speech frame represents one encoded speech frame encoded 475 using the mode according to the FT field in ToC entry corresponding 476 to this frame. The length of this field is implicitly defined by the 477 AMR-WB mode in the FT field. The AMR-WB speech bits SHALL be sorted 478 according to Appendix B of [3]. 480 3.4. Compound AMR-WB payload 482 The compound AMR-WB payload consists of one AMR-WB payload header, 483 the table of content, and one or more AMR-WB payload frames, see 484 section 3.1., 3.2 and 3.3. These can be combined either by using 485 robust or simple payload sorting. The S-bit in the AMR-WB payload 486 header indicates which method is used. 488 Definitions for describing the compound AMR-WB payload: 490 b(m) - bit m of the compound AMR-WB payload 491 t(n,m) - bit m in the table of content entry for speech frame n 492 p(n,m) - bit m in the CRC for speech frame n 493 f(n,m) - bit m in speech frame n 494 F(n) - number of bits in speech frame n, defined by FT 495 h(m) - bit m of payload header 496 H - number of bits in payload header, 7 or 15 bits 497 C - number of CRC bits , 0 or 8 bits 498 N - number of payload frames in the payload 499 S - number of unused bits in the last octet of the payload 501 Payload frames f(n,m) are ordered in the order they are delivered by 502 the AMR-WB speech encoder, i.e. frame n is preceding frame n+1. All 503 frames between the oldest one and the most recent one MUST be present 504 in the payload, the only exception is interleaving, when the frame 505 order are defined in section 3.1.2. If some of the frames are not 506 available because of a frame loss or they are not transmitted, e.g. 507 due to DTX, those MUST be replaced by lost speech or by no 508 transmission/no reception type frames, respectively. 510 3.4.1. Robust payload sorting 512 As described earlier, a bit error in a more sensitive bit is 513 subjectively more annoying than in a less sensitive bit. Therefore, 514 to enable protection of only the most sensitive bits of a payload 515 with a forward error detection code, e.g. a CRC outside RTP, the bits 516 inside a payload can be ordered into sensitivity order. The 517 protection SHOULD cover an appropriate number of octets from the 518 beginning of the payload, covering at least the AMR-WB payload 519 header, ToC, and class A bits (see Table 1). Exactly how many octets 520 that needs protection depends on the network and application. To 521 maintain sensitivity ordering inside the AMR-WB payload, when more 522 than one speech frame is transmitted in one payload, reordering of 523 the bits in the payload is needed. 525 The AMR-WB payload header, ToC and CRCs SHALL still be placed 526 unchanged in the beginning of the robust sorted payload. Thereafter, 527 the payload frames are sorted with one bit alternating from each AMR- 528 WB payload frame. 530 The robust payload sorting algorithm is defined in C-style as: 532 /* payload header */ 533 k=0; 534 for (i = 0; i < H; i++){ 535 b(k++) = h(i); 536 } 537 /* table of content */ 538 for (j = 0; j < N; j++){ 539 for (i = 0; i < 6; i++){ 540 b(k++) = t(j,i); 541 } 542 } 543 /* CRCs */ 544 for (j = 0; j < N; j++){ 545 for (i = 0; i < C; i++){ 546 b(k++) = p(j,i); 547 } 548 } 549 /* payload frames */ 550 max = max(F(0),..,F(N-1)); 551 for (i = 0; i < max; i++){ 552 for (j = 0; j < N; j++){ 553 if (i < F(j)){ 554 b(k++) = f(j,i); 555 } 556 } 557 } 558 /* padding */ 559 S = 8 - k%8; 560 if (S < 8){ 561 for (i = 0; i < S; i++){ 562 b(k++) = 0; 563 } 564 } 566 3.4.2. Simple payload sorting 568 If multiple frames are encapsulated into the payload and robust 569 payload sorting is not used, the payload is formed as concatenation 570 of the AMR-WB payload header, ToC, possibly optional CRC fields, and 571 the AMR-WB speech frames. However, the bits inside each AMR-WB 572 payload frame are ordered into sensitivity order as defined in Annex 573 B of [3]. 575 The simple payload sorting algorithm is defined in C-style as: 577 /* payload header */ 578 k=0; 579 for (i = 0; i < H; i++){ 580 b(k++) = h(i); 581 } 582 /* table of content */ 583 for (j = 0; j < N; j++){ 584 for (i = 0; i < 6; i++){ 585 b(k++) = t(j,i); 586 } 587 } 588 /* CRCs */ 589 for (j = 0; j < N; j++){ 590 for (i = 0; i < C; i++){ 591 b(k++) = p(j,i); 592 } 593 } 594 /* payload frames */ 595 for (j = 0; j < N; j++){ 596 for (i = 0; i < F(j); i++){ 597 b(k++) = f(j,i); 598 } 599 } 600 } 601 /* padding */ 602 S = 8 - k%8; 603 if (S < 8){ 604 for (i = 0; i < S; i++){ 605 b(k++) = 0; 606 } 607 } 609 3.5. Decoding security consideration 611 If the payload length calculation based on C, I, F and FT fields does 612 not indicate the same length as the actually received payload size, 613 the payload should be dropped as erroneous. Decoding AMR-WB frames 614 that are parsed based on erroneous header information could severely 615 degrade the speech quality. 617 4. RTP header usage 619 The RTP header marker bit (M) is used to mark (M=1) the payloads 620 containing the first speech frame after a CN period. For all other 621 payloads the marker bit is set to 0 (M=0). 623 The timestamp corresponds to the sampling time of the first sample of 624 the first encoded AMR-WB frame in the payload. A frame can either be 625 encoded speech, comfort noise parameters, LOST_FRAME, or 626 NO_TRANSMISSION. The unit used to compute timestamp is one sample. 627 The duration of one AMR-WB speech frame is 20 ms and the sampling 628 frequency is 16 kHz, corresponding to 320 speech samples per frame. 629 Thus, the timestamp is increased by 320 for each consecutive frame. 630 If the optional interleaving functionality is not used, all frames in 631 a packet MUST be successive frames, stored in the same order as 632 delivered by the AMR-WB speech encoder. If the interleaving is 633 employed, the frames encapsulated into a payload MUST be picked as 634 defined in section 3.1.2. 636 5. Congestion Control 638 The need of congestion control for data transported with RTP has to 639 be considered. AMR-WB speech data have some elastic properties due to 640 the different bandwidth demand for each mode. Another parameter that 641 can reduce the bandwidth demand for AMR-WB are how many frames of 642 speech data that are encapsulated in each payload. This will reduce 643 the number of packets and the overhead from IP/UDP/RTP headers. If 644 using forward error correction (FEC) there is also the need to 645 regulate the amount, so that the FEC itself does not worsen the 646 problem. Therefore, it is RECOMMENDED that applications using this 647 payload implements congestion control. The actual mechanism for 648 congestion control is not specified but should be suitable for real- 649 time flows, e.g. [16]. 651 6. Security Considerations 653 RTP packets using the payload format defined in this specification 654 are subject to the security considerations discussed in the RTP 655 specification [10]. This implies that confidentiality of the media 656 streams is achieved by encryption. Because the payload format is 657 arranged end-to-end, encryption MAY be performed after encapsulation 658 so there is no conflict between the two operations. 660 This payload type does not exhibit any significant non-uniformity in 661 the receiver side computational complexity for packet processing to 662 cause a potential denial-of-service threat. 664 As this format transports encoded speech data, the main security 665 issues are confidentiality and authentication of the speech itself. 666 Some other smaller issues also exist. The payload format itself does 667 not have any support for security. These issues have to be solved by 668 a payload external mechanism. 670 6.1. Confidentiality 672 To achieve confidentiality of the encoded speech all speech data bits 673 must be encrypted. There is less need to encrypt the payload header 674 or the frame header as they only carry information about the 675 requested AMR-WB mode, AMR-WB frame type, and frame quality. This 676 information could be useful to some third party, e.g. quality 677 monitoring. The type of encryption used can not only have impact on 678 the confidentiality but also on error robustness. The robustness 679 against bit errors will be non, unless an encryption method without 680 error-propagation is used, e.g. a stream cipher. This is only an 681 issue when using UEP/UED, when bit errors can be accepted in some 682 part of the payload. 684 6.2. Authentication 686 To authenticate the sender of the speech an external mechanism have 687 to be added. It is recommended that such a mechanism protects all the 688 speech data bits. Note that the use of UED/UEP is difficult to 689 combine with authentication. To prevent a man in the middle to tamper 690 with the packetization of the speech data, some extra data could be 691 protected. The data is: RTP timestamp, RTP sequence number, RTP 692 marker bit. Tampering could result in erroneous 693 decapsulation/decoding that could lower speech quality. Tampering 694 with the AMR-WB mode request field can result in that the sender 695 receives speech in a different quality than desired. 697 7. Examples 699 7.1. Simple example 701 In the simple example one AMR-WB frame is encapsulated into the 702 payload. Simple payload sorting is used (S=0), no CRC fields are 703 present (C=0), and interleaving is not used (I=0). A 23.05 kbps mode 704 is requested for the reverse link (CMR=7), and the payload was not 705 damaged at IP origin (Q=1). The AMR-WB mode is the 12.65 kbps mode 706 (FT=2). The speech encoded bits are put into f(0...252) in descending 707 sensitivity order according to [3]. 709 | Bit no. | 710 Oct| 0 1 2 3 4 5 6 7 | 711 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 712 0 | S=0 | C=0 | I=0 | 0 | 1 | 1 | 1 | F=0 | 713 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 714 1 | 0 | 0 | 1 | 0 | Q=1 | f(0) | f(1) | ... | 715 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 716 32 | ... | ... | ... | ... | ... | ... |f1(249)|f1(250)| 717 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 718 33 | f(251)| f(252)| 0 | 0 | 0 | 0 | 0 | 0 | 719 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 721 Figure 7: One AMR-WB frame per payload example. 723 7.2. Example with CRCs 725 In this example two frames are transmitted in one payload. Simple 726 payload sorting is used (S=0), CRC fields are present (C=1), and 727 interleaving is not used (I=0). No mode request is sent (CMR=15), and 728 neither of the frames is corrupted (Q=1). The payload contains one 729 frame at 14.25 kbps mode (FT=3) and one frame at 15.85 kbps mode 730 (FT=4). Bits p1(0...7) and p2(0...7) mark the CRC checksum for the 731 first and second frames, respectively. The bits of the first frame 732 are denoted by f1(0...284), and bits of the second frame are marked 733 by f2(0...316). 735 | Bit no. | 736 Oct| 0 1 2 3 4 5 6 7 | 737 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 738 0 | S=0 | C=1 | I=0 | 1 | 1 | 1 | 1 | F=1 | 739 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 740 1 | 0 | 0 | 1 | 1 | Q=1 | F=0 | 0 | 1 | 741 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 742 2 | 0 | 0 | Q=1 | p1(0) | p1(1) | p1(2) | p1(3) | p1(4) | 743 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 744 3 | p1(5) | p1(6) | p1(7) | p2(0) | p2(1) | p2(2) | p2(3) | p2(4) | 745 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 746 4 | p2(5) | p2(6) | p2(7) | f1(0) | f1(1) | ... | ... | ... | 747 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 748 40 | ... | ... | ... | ... | ... | ... |f1(283)|f1(284)| 749 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 750 41 | f2(0) | f2(1) | ... | ... | ... | ... | ... | ... | 751 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 752 80 | ... | ... | ... |f2(315)|f2(316)| 0 | 0 | 0 | 753 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 755 Figure 8: Example with two AMR-WB frames and CRCs. 757 7.3. Example with multiple frames per payload and robust sorting 759 In this example two frames are transmitted in one payload with robust 760 sorting (S=1). No CRC is used (C=0), interleaving is not used (I=0), 761 and 8.85 kbps mode frame is requested from the reverse link (CMR=1). 762 Both frames are undamaged (Q=1), and the two frames in the payload 763 are encoded at 14.25 kbps (FT=3) and 15.85 kbps (FT=4) modes. The 764 first frame is represented by f1(0...284) and the subsequent frame by 765 f2(0...316). 767 | Bit no. | 768 Oct| 0 1 2 3 4 5 6 7 | 769 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 770 0 | S=1 | C=0 | I=0 | 0 | 0 | 0 | 1 | F=1 | 771 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 772 1 | 0 | 0 | 1 | 1 | Q=1 | F=0 | 0 | 1 | 773 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 774 2 | 0 | 0 | Q=1 | f1(0) | f2(0) | f1(1) | f2(1) | ... | 775 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 776 74 | ... |f1(283)|f2(283)|f1(284)|f2(284)|f2(285)|f2(286)| ... | 777 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 778 78 | ... | ... | ... |f2(316)|f2(317)| 0 | 0 | 0 | 779 ---+-------+-------+-------+-------+-------+-------+-------+-------+ 781 Figure 9: Example with two AMR-WB frames per payload and robust 782 sorting. 784 8. The AMR-WB MIME type registration 786 This chapter defines the MIME type for the Adaptive Multi-Rate 787 Wideband (AMR-WB) speech codec. AMR-WB implementations according to 788 [1] MUST support all nine coding modes. The fast mode adaptation is 789 supported by transmitting the mode information in-band together with 790 encoded speech data to allow mode change without any additional 791 signaling. Furthermore, fast mode adaptation requires transmission of 792 codec mode request inside payload. 794 In addition to the speech codec, AMR-WB specifications also include 795 Discontinuous Transmission / comfort noise (DTX/CN) functionality 796 [4]. The DTX/CN switches the transmission off during silent periods 797 of the speech and only SID frames containing CN parameter updates are 798 sent at regular intervals. Also the AMR-WB DTX/CN MUST be supported. 800 It is possible that the receiver may only want to receive a certain 801 AMR-WB mode or a subset of AMR-WB modes, due to link limitations in 802 some cellular systems, e.g. the GSM/GERAN radio link can require that 803 only a subset of AMR-WB modes is used. Therefore, it is possible to 804 request a specific set of AMR-WB modes in capability description and 805 the encoder MUST abide this request. If the request for mode set is 806 not given, any mode may be used or requested. 808 The AMR-WB codec can in principle perform a mode change at any time 809 between any two modes. To support interoperability with GSM through a 810 gateway it is possible to set limitations for mode changes. The 811 decoder has possibility to define the minimum number of frames 812 between mode changes and to limit the mode change to happen into 813 neighboring modes only. 815 The receiver can limit the number of AMR-WB frames encapsulated into 816 one RTP packet, and if maximum number of frames per packet is given 817 in capability description, the transmitter MUST comply with this 818 limitation. This is an OPTIONAL feature and if no parameter is given 819 in capability description, the transmitter can encapsulate any number 820 of AMR-WB speech frames into one RTP packet. 822 The payload CRC UED MUST only be used if the receiver has signaled 823 support for this functionality in the capability description. 825 To enable unequal error protection and/or detection outside RTP, the 826 payload format supports robust payload sorting. The robust payload 827 sorting is an optional feature and MUST only be used if the receiver 828 has signaled support for this functionality in the capability 829 description. 831 The speech quality in case of packet losses when transmitting several 832 AMR-WB frames per packet can be improved by using OPTIONAL frame 833 interleaving. The interleaving improves perceived speech quality 834 since it introduces series of single frame errors instead of several 835 consecutive frame errors. Interleaving MUST only be applied if the 836 receiver has signaled support for it, and if used, the interleaving 837 length MUST NOT exceed the limitation given in capability 838 description. Note that the receiver can use the MIME parameters to 839 limit increased buffering requirements caused by the interleaving. 840 For example specifying maxframes=N and interleaving=L, the maximum 841 size of an interleave group would be N*(L+1) (see section 3.1.2 for 842 details on interleaving). 844 8.1. MIME Registration 846 MIME-name for the AMR-WB codec is allocated from IETF tree since AMR- 847 WB is expected to be widely used speech codec in VoIP applications. 849 Media Type name: audio 851 Media subtype name: AMR-WB 853 Required parameters: none 855 Optional parameters: 856 mode-set: Requested AMR-WB mode set. Restricts the active codec 857 mode set to a subset of all modes. Possible values are 858 comma separated list of modes: 0,...,8 (see Table 1a [3], 859 an example is given in section 8.4). If not present, all 860 speech modes are available. 861 mode-change-period: Defines a number N which restricts the mode 862 changes in such a way that mode changes are only allowed 863 on multiples of N, initial state of the phase is 864 arbitrary. If this parameter is not present, mode change 865 can happen at any time. 866 mode-change-neighbor: If present, mode changes SHALL only be made to 867 neighboring modes in the active codec mode set. If not 868 present, change between any two modes in the active codec 869 mode set is allowed. 870 maxframes:Maximum number of AMR-WB speech frames in one RTP packet. 871 The receiver may set this parameter in order to limit the 872 buffering requirements or delay. 873 crc: If present, transmission of CRCs in the payload is 874 supported, otherwise not supported. 875 robust-sorting: If present, robust payload sorting is supported, 876 otherwise not supported and simple payload sorting SHALL 877 be used. 878 interleaving: Indicates that the frame interleaving is supported and 879 defines a maximum value for interleaving length field ILL 880 (see section 3.1.2). If this parameter is not present, 881 the interleaving is not supported. 883 Encoding considerations: See section 3 in this document. 885 Security considerations: see chapter 6 "Security Consideration". 887 Public specification: please refer to chapter 9 "References". 889 Person & email address to contact for further information: 890 ari.lakaniemi@nokia.com 891 pasi.s.ojala@nokia.com 893 Intended usage: COMMON. It is expected that many VoIP applications 894 (as well as mobile applications) will use this type. 896 Author/Change controller: 897 ari.lakaniemi@nokia.com 898 pasi.s.ojala@nokia.com 900 8.2. Mapping to SDP Parameters 902 Parameters are mapped to SDP [11] as usual. 903 Example usage in SDP: 904 m=audio 49120 RTP/AVP 97 905 a=rtpmap:97 AMR-WB/16000 906 a=fmtp:97 mode-set=2,3,4,5,6; maxframes=1 908 9. References 910 [1] 3GPP TS 26.190 "AMR Wideband speech codec; Transcoding 911 functions". 913 [2] 3GPP TS 26.090 "AMR speech codec; Transcoding functions". 915 [3] 3GPP TS 26.201 "AMR Wideband speech codec; Frame Structure". 917 [4] 3GPP TS 26.193 "AMR Wideband Speech Codec; Source Controlled 918 Rate operation". 920 [5] 3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise 921 aspects". 923 [6] 3GPP TS 26.194 "AMR Wideband speech codec; Voice Activity 924 Detector (VAD)". 926 [7] 3GPP TS 26.191 "AMR Wideband speech codec; Error concealment of 927 lost frames". 929 [8] 3GPP TS 25.415 "UTRAN Iu Interface User Plane Protocols". 931 [9] IETF RFC 2119, "Key words for use in RFCs to Indicate 932 Requirement Levels". 934 [10]IETF RFC 1889, "RTP: A Transport Protocol for Real-Time 935 Applications". 937 [11]IETF RFC 2327 "SDP: Session Description Protocol", April 1998. 939 [12]IETF draft-ietf-avt-rtp-amr-03.txt, "RTP payload format for 940 AMR", work in progress. 942 [13]IETF draft-westberg-realtime-cellular-01.txt, "Realtime Traffic 943 over Cellular Access Networks", work in progress. 945 [14]IETF draft-larzon-udplite-03.txt, "The UDP Lite Protocol", work 946 in progress. 948 [15]IETF draft-ietf-avt-ulp-00.txt, " An RTP Payload Format for 949 Generic FEC with Uneven Level Protection", work in progress. 951 [16]S. Floyd, M. Handley, J. Padhye, J. Widmer, "Equation-Based 952 Congestion Control for Unicast Applications", ACM SIGCOMM 2000, 953 Stockholm, Sweden. 955 [17] 3GPP TS 26.202 "AMR Wideband speech codec; Interface to Iu and 956 Uu". 958 10. Authors' addresses 960 Ari Lakaniemi 961 Nokia Research Center 962 P.O.Box 407 963 FIN-00045 Nokia Group 964 Finland 965 E-mail: ari.lakaniemi@nokia.com 967 Pasi Ojala 968 Nokia Research Center 969 P.O.Box 100 970 FIN-33721 Tampere 971 Finland 972 E-mail: pasi.s.ojala@nokia.com 974 Johan Sj�berg 975 Ericsson Research 976 Ericsson Radio System AB 977 Torshamsgatan 23 978 SE-164 80 Stockholm 979 SWEDEN 980 E-mail: johan.sjoberg@ericsson.com 982 Magnus Westerlund 983 Ericsson Research 984 Ericsson Radio System AB 985 Torshamsgatan 23 986 SE-164 80 Stockholm 987 SWEDEN 988 E-mail: magnus.westerlund@ericsson.com 990 This Internet-Draft expires in August 23, 2001.