idnits 2.17.1 draft-sjoberg-avt-rtp-amrwbplus-01.txt: -(192): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(240): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts -- however, there's a paragraph with a matching beginning. Boilerplate error? == There are 3 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 217 has weird spacing: '... stereo bits ...' == Line 1382 has weird spacing: '...for the purpo...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHALL not' in this paragraph: crc: Permissible values are 0 and 1. If 1, frame CRCs SHALL be included in the payload, otherwise not. If 0 or if not present, CRCs SHALL not be included. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHALL not' in this paragraph: interleaving: Indicates that frame level interleaving SHALL be used for the session and its value defines the maximum number of frame allowed in an interleaving group (see Section 4.3.1). If this parameter is not present, interleaving SHALL not be used. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 13, 2004) is 7378 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '1' is defined on line 1265, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' ** Obsolete normative reference: RFC 2327 (ref. '7') (Obsoleted by RFC 4566) ** Obsolete normative reference: RFC 3267 (ref. '9') (Obsoleted by RFC 4867) -- Obsolete informational reference (is this intentional?): RFC 3448 (ref. '11') (Obsoleted by RFC 5348) -- Obsolete informational reference (is this intentional?): RFC 2733 (ref. '14') (Obsoleted by RFC 5109) Summary: 6 errors (**), 0 flaws (~~), 8 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Johan Sjoberg 3 INTERNET-DRAFT Magnus Westerlund 4 Category: Standards Track Ericsson 5 Expires: August 2004 Ari Lakaniemi 6 Nokia 7 February 13, 2004 9 Real-Time Transport Protocol (RTP) Payload Format for Adaptive Multi- 10 Rate Wideband plus (AMR-WB+) Audio Codec 11 13 Status of this memo 15 This document is an Internet-Draft and is in full conformance with 16 all provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that other 20 groups may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or cite them other than as "work in progress". 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/lid-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This document is an individual submission to the IETF. Comments 34 should be directed to the authors. 36 Copyright Notice 38 Copyright (C) The Internet Society (2004). All Rights Reserved. 40 Abstract 42 This document specifies a real-time transport protocol (RTP) payload 43 format to be used for Adaptive Multi-Rate Wideband plus (AMR-WB+) 44 encoded audio signals. The AMR-WB+ codec is an audio extension of the 45 AMR-WB codec providing additional modes designed to give higher 46 quality of music and speech than the original modes. The payload 47 format is designed according to the principles outlined in the 48 existing payload formats for AMR and AMR-WB, RFC3267. A MIME type 49 registration is included for AMR-WB+. 51 TABLE OF CONTENTS 53 1. Definitions.........................................................3 54 1.1. Glossary.......................................................3 55 1.2. Terminology....................................................3 56 2. Introduction........................................................3 57 3. Background on AMR-WB+ and Design Principles.........................4 58 3.1. The AMR-WB+ Audio Codec........................................5 59 3.2. Multi-rate Encoding and Mode Adaptation........................6 60 3.3. Voice Activity Detection and Discontinuous Transmission........6 61 3.4. Support for Multi-Channel Session..............................6 62 3.5. Unequal Bit-error Detection and Protection.....................7 63 3.5.1. Applying UEP and UED in an IP Network.....................7 64 3.6. Robustness against Packet Loss.................................8 65 3.6.1. Use of Forward Error Correction (FEC).....................8 66 3.6.2. Use of Frame Interleaving................................10 67 3.7. AMR-WB+ Audio over IP scenarios...............................10 68 4. RTP Payload Format for AMR-WB+.....................................11 69 4.1. RTP Header Usage..............................................11 70 4.2. Payload Structure.............................................12 71 4.3. Payload definitions...........................................13 72 4.3.1. The Payload Header.......................................13 73 4.3.2. The Payload Table of Contents and Frame CRCs.............14 74 4.3.3. Audio Data...............................................18 75 4.3.4. Methods for Forming the Payload..........................18 76 4.3.5. Payload Examples.........................................19 77 4.4. Implementation Considerations.................................21 78 5. Congestion Control.................................................21 79 6. Security Considerations............................................21 80 6.1. Confidentiality...............................................22 81 6.2. Authentication................................................22 82 6.3. Decoding Validation...........................................23 83 7. Payload Format Parameters..........................................23 84 7.1. MIME Registration.............................................23 85 7.2. Mapping MIME Parameters into SDP..............................25 86 7.2.1. Offer-Answer Model Considerations........................25 87 7.2.2. Examples.................................................26 88 8. IANA Considerations................................................26 89 9. Acknowledgements...................................................26 90 10. References........................................................27 91 10.1. Normative references.........................................27 92 10.2. Informative References.......................................27 93 11. Authors' Addresses................................................28 94 12. IPR Notice........................................................29 95 13. Copyright Notice..................................................30 96 1. Definitions 98 1.1. Glossary 100 3GPP - the Third Generation Partnership Project 101 AMR - Adaptive Multi-Rate Codec 102 AMR-WB - Adaptive Multi-Rate Wideband Codec 103 AMR-WB+ - Adaptive Multi-Rate Wideband plus Codec 104 CMR - Codec Mode Request 105 CN - Comfort Noise 106 DTX - Discontinuous Transmission 107 FEC - Forward Error Correction 108 SCR - Source Controlled Rate Operation 109 SID - Silence Indicator (the frames containing only CN 110 parameters) 111 VAD - Voice Activity Detection 112 UED - Unequal Error Detection 113 UEP - Unequal Error Protection 115 1.2. Terminology 117 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 118 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 119 document are to be interpreted as described in RFC 2119 [3]. 121 2. Introduction 123 This document specifies the payload format for packetization of AMR- 124 WB+ encoded audio signals into the Real-time Transport Protocol (RTP) 125 [4]. The payload format supports transmission of multiple channels 126 according to the mode definition (modes are mono or stereo modes), 127 multiple frames per payload, and robustness against packet loss and 128 bit errors. 130 Background on AMR-WB+ and design principles can be found in Section 131 3. The payload format itself is specified in Section 4 and follows 132 the principles used in [4], [8], and [9]. In Section 7, a MIME type 133 registration is provided. 135 The intention with this RTP payload format definition is to follow 136 closely to the payload format definitions of AMR and AMR-WB [9]. 137 However, AMR-WB+ has a couple of features not available in AMR or 138 AMR-WB. The new features are; all modes do not have the same 139 sampling rate, and modes are either mono or stereo modes. On the 140 other hand AMR-WB+ is intended to use IP transport and this removes 141 the need for interworking with other transport networks. 143 The bandwidth efficient mode defined in [9] is not specified for AMR- 144 WB+. AMR-WB+ will mainly be used in streaming scenarios and there 145 the benefit of using an octet-aligned format to decrease the 146 complexity of the server is large. The saved bandwidth using 147 bandwidth efficient mode would also be very small for all extension 148 modes. 150 The inbuilt codec support for stereo encoding makes the 151 implementation of multi-channel support difficult, but also less 152 needed. Therefore the multi-channel support is removed from this 153 payload format compared to AMR and AMR-WB payload format. 155 There is no file format for AMR-WB+ defined within this 156 specification. Instead the 3GPP defined ISO based 3GP file format 157 [18] will support AMR-WB+, and provides all functionality need from a 158 file format. This format does also support storage of AMR and AMR- 159 WB, plus other multi-media formats allowing for synchronized 160 playback. As the 3GP format provides much greater capability than 161 the previously defined formats for AMR and AMR-WB, this format is 162 expected to be used and be sufficient for all use cases. 164 3. Background on AMR-WB+ and Design Principles 166 The Adaptive Multi-Rate plus (AMR-WB+) audio codec is designed for 167 encoding and transport of speech and low bit-rate audio with good 168 quality. The codec is being specified by 3GPP, and primary target 169 applications within 3GPP are packet switched streaming (PSS) [17] and 170 multimedia messaging (MMS) services. However, due to its flexibility 171 and robustness, AMR-WB+ is very well suited for streaming services in 172 highly varying transport environments, e.g. the Internet. 174 Because of the flexibility of this codec, the behavior in a 175 particular application is controlled by several parameters that 176 select options or specify the acceptable values for a variable. These 177 options and variables are described in general terms at appropriate 178 points in the text of this specification as parameters to be 179 established through out-of-band means. In Section 7, all of the 180 parameters are specified in the form of MIME subtype registrations 181 for the AMR-WB+ encoding. The method used to signal these parameters 182 at session setup or to arrange prior agreement of the participants is 183 beyond the scope of this document; however, Section 7 provides a 184 mapping of the parameters into the Session Description Protocol (SDP) 185 [7] for those applications that use SDP. 187 Note that the AMR-WB+ design and specification work in 3GPP is still 188 work in progress. Target is to finalize the codec specifications 189 within 3GPP Release 6 timeline, the release will be frozen earliest 190 in June 2004. However, due to non-finished status of the codec work 191 some of the issues discussed in this internet-draft are still subject 192 to change, but the draft presents the situation according to authors� 193 best knowledge at the time of writing. 195 3.1. The AMR-WB+ Audio Codec 197 The AMR-WB+ audio codec was originally developed by 3GPP to be used 198 for streaming and messaging services in GSM and 3G cellular systems. 199 AMR-WB+ is designed as an audio extension to the AMR-WB speech codec. 200 Thus, it includes the nine coding modes specified for AMR-WB, 201 extended with four new modes with bit rates ranging from 14 to 24 202 kbit/s. Whereas the AMR-WB modes employ 16000 Hz sampling frequency 203 and operates on monophonic signal in all modes, the extension modes 204 operate at sampling rates 16000, 24000 or 32000 Hz, and the input 205 signal can be either monophonic or stereophonic audio, depending on 206 the mode. The audio processing is performed on equal sizeframes, the 207 transport frames correspond to 20 ms duration. This means that each 208 AMR-WB+ transport frame represents 320, 480 or 640 audio samples for 209 each channel, depending on the employed sampling frequency. 211 The AMR-WB+ codec includes four extension modes in addition to the 212 AMR-WB modes, as introduced in Table 1 below. However, since the 213 codec design work is still going on, the final specification may 214 include different set of modes. 216 Sampling Mono/ Number of Number of 217 Index Mode rate [kHz] stereo bits per frame class A bits 218 -------------------------------------------------------------------- 219 0 WB 6.60 kbps 16 mono 132 54 220 1 WB 8.80 kbps 16 mono 177 64 221 2 WB 12.65 kbps 16 mono 253 72 222 3 WB 14.25 kbps 16 mono 285 72 223 4 WB 15.85 kbps 16 mono 317 72 224 5 WB 18.25 kbps 16 mono 365 72 225 6 WB 19.85 kbps 16 mono 397 72 226 7 WB 23.05 kbps 16 mono 461 72 227 8 WB 23.85 kbps 16 mono 477 72 228 9 WB SID 16 mono 40 40 229 10 WB+ 14 kbps 16 mono 280 ?? 230 11 WB+ 18 kbps 16/24 stereo 360 ?? 231 12 WB+ 24 kbps 16/24 mono 480 ?? 232 13 WB+ 24 kbps 16/24 stereo 480 ?? 233 14 LOST_SPEECH - - 0 234 15 NO_DATA - - 0 236 Table 1: AMR-WB+ modes. NOTE! THIS TABLE WILL BE REPLACED BY A 237 REFERENCE TO THE APPROPRIATE 3GPP SPECIFICATION AS SOON AS IT IS 238 AVAIBLE. 240 Note that modes with index in the range 0 � 9 are the same as defined 241 for AMR-WB in [9], and modes with index in range 10 � 13 are the 242 extension modes. 244 3.2. Multi-rate Encoding and Mode Adaptation 246 The multi-rate encoding (i.e., multi-mode) capability of AMR-WB+ is 247 designed for preserving high audio quality under a wide range of 248 bandwidth requirements and transmission conditions. 250 AMR-WB+ enables seamless switching between modes using the same 251 number of audio channels and the same sampling frequency. Every AMR- 252 WB+ codec implementation is required to support all the respective 253 audio coding modes defined by the codec and must be able to handle 254 mode switching between any two modes. Switching between modes 255 employing different number of audio channel or different sampling 256 frequency is possible, but it requires the receiver to be equipped 257 with necessary processing capabilities to take care of the changed 258 characteristics of the incoming audio stream, and therefore it is not 259 recommended because it is likely to cause severe audio quality 260 problems if not taken care properly. 262 3.3. Voice Activity Detection and Discontinuous Transmission 264 AMR-WB+ supports the same algorithms for voice activity detection 265 (VAD) and generation of comfort noise (CN) parameters during silence 266 periods as used by the AMR-WB codec. Hence, also the AMR-WB+ codec 267 has the option to reduce the number of transmitted bits and packets 268 during silence periods to a minimum. The operation of sending CN 269 parameters at regular intervals during silence periods is usually 270 called discontinuous transmission (DTX) or source controlled rate 271 (SCR) operation. The AMR-WB+ frames containing CN parameters are 272 called Silence Indicator (SID) frames. See more details about VAD and 273 DTX functionality in [5] and [6]. 275 3.4. Support for Multi-Channel Session 277 Some of the AMR-WB+ modes support encoding of stereophonic audio. 278 Because of this native support for two-channel stereophonic signal it 279 does not seem necessary to support multi-channel transport with 280 separate codecs as done in AMR-WB RTP payload [9]. However for 281 making the signalling of channels explicit, a sender of AMR-WB+ must 282 use separate RTP payload types for mono and stereo modes. A reason 283 for having the number of channels present at RTP level is that the 284 codec external requirements are different, i.e. the playback 285 facilities of a receiver need to handle stereo or mono signals. 287 This will not make switching between mono and stereo any more 288 different as payload type switching can be done without problems 289 since the same RTP timestamp rate is used in both cases. 291 3.5. Unequal Bit-error Detection and Protection 293 The audio bits encoded in each AMR-WB+ frame have different 294 perceptual sensitivity to bit errors. This property can be exploited 295 e.g. in cellular systems to achieve better voice quality by using 296 unequal error protection and detection (UEP and UED) mechanisms. 298 The UEP/UED mechanisms focus the protection and detection of 299 corrupted bits to the perceptually most sensitive bits in an AMR-WB+ 300 frame. In particular, audio bits in an AMR-WB+ frame are divided into 301 classes A and B, where bits in class A are most sensitive, while 302 class B bits can tolerate some errors with only minor degradations in 303 the speech quality. [NOTE: reference to appropriate 3GPP 304 specification will be added as soon as it is available] A frame is 305 only declared damaged if there are bit errors found in the most 306 sensitive bits, i.e., the class A bits. On the other hand, it is 307 acceptable to have some bit errors in the other bits, i.e. class B 308 bits. 310 Moreover, a damaged frame is still useful for error concealment at 311 the decoder since some of the less sensitive bits can still be used. 312 This approach can improve the audio quality compared to discarding 313 the damaged frame. 315 3.5.1. Applying UEP and UED in an IP Network 317 To take full advantage of the bit-error robustness of the AMR-WB+ 318 codec, the RTP payload format is designed to facilitate UEP/UED in an 319 IP network. It should be noted however that the utilization of UEP 320 and UED discussed below is OPTIONAL. 322 UEP/UED in an IP network can be achieved by detecting bit errors in 323 class A bits and tolerating bit errors in class B bits of the AMR-WB+ 324 frame(s) in each RTP payload. 326 Today there exist some link layers that do not discard packets with 327 bit errors, e.g., SLIP and some wireless links. With the Internet 328 traffic pattern shifting towards a more multimedia-centric one, more 329 link layers of such nature may emerge in the future. With transport 330 layer support for partial checksums, for example those supported by 331 UDP-Lite [10], bit error tolerant AMR-WB+ traffic could achieve 332 better performance over these types of links. 334 There are at least two basic approaches for carrying AMR-WB+ traffic 335 over bit error tolerant IP networks: 337 1) Utilizing a partial checksum to cover headers and the most 338 important audio bits of the payload. At least all class A bits 339 should be covered by the checksum, since the bits of the extension 340 modes are not sorted in sensitivity order but just classified in 341 class A and B bits. 343 2) Utilizing a partial checksum to only cover headers, but a frame 344 CRC to cover the class A bits of each audio frame in the RTP 345 payload. 347 In either approach, at least part of the class B bits are left 348 without error-check and thus bit error tolerance is achieved. 350 The application interface to the UEP/UED transport protocol (e.g., 351 UDP-Lite) may not provide any control over the link error rate. 352 Therefore, it is incumbent upon the designer of a node with a link 353 interface of this type to choose a residual bit error rate that is 354 low enough to support applications such as AMR-WB+ encoding when 355 transmitting packets of a UEP/UED transport protocol. 357 Approach 1 is a bit efficient, flexible and simple way, but comes 358 with two disadvantages, namely, a) bit errors in protected audio bits 359 will cause the payload to be discarded, and b) when transporting 360 multiple frames in a payload there is the possibility that a single 361 bit error in protected bits will cause all the frames to be 362 discarded. 364 These disadvantages can be avoided, if needed, with some overhead in 365 the form of a frame-wise CRC (Approach 2). In problem a), the CRC 366 makes it possible to detect bit errors in class A bits and use the 367 frame for error concealment, which gives a small improvement in audio 368 quality. For b), when transporting multiple frames in a payload, the 369 CRCs remove the possibility that a single bit error in a class A bit 370 will cause all the frames to be discarded. Avoiding that gives an 371 improvement in audio quality when transporting multiple frames over 372 links subject to bit errors. 374 The choice between the above two approaches must be made based on the 375 available bandwidth, and desired tolerance to bit errors. Neither 376 solution is appropriate to all cases. Section 7 defines parameters 377 that may be used at session setup to select between these approaches. 379 3.6. Robustness against Packet Loss 381 The payload format supports several means, including forward error 382 correction (FEC) and frame interleaving, to increase robustness 383 against packet loss. 385 3.6.1. Use of Forward Error Correction (FEC) 387 The simple scheme of repetition of previously sent data is one way of 388 achieving FEC. Another possible scheme which can be more bandwidth 389 efficient is to use payload external FEC, e.g., RFC2733 [14], which 390 generates extra packets containing repair data. The whole payload can 391 also be sorted in sensitivity order to support external FEC schemes 392 using UEP. There is also a work in progress on a generic version of 393 such a scheme [12] that can be applied to AMR-WB+ payload transport. 395 For the AMR-WB+ extension modes, it is only possible to use the codec 396 to send redundant copies of the same mode. We describe such a scheme 397 next. 399 This involves the simple retransmission of previously transmitted 400 frames together with the current frame(s). This is done by using a 401 sliding window to group the audio frames to send in each payload. 402 Figure 1 below shows us an example. 404 --+--------+--------+--------+--------+--------+--------+--------+-- 405 | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) | 406 --+--------+--------+--------+--------+--------+--------+--------+-- 408 <---- p(n-1) ----> 409 <----- p(n) -----> 410 <---- p(n+1) ----> 411 <---- p(n+2) ----> 412 <---- p(n+3) ----> 413 <---- p(n+4) ----> 415 Figure 1: An example of redundant transmission. 417 In this example each frame is retransmitted one time in the following 418 RTP payload packet. Here, f(n-2)..f(n+4) denotes a sequence of audio 419 frames and p(n-1)..p(n+4) a sequence of payload packets. 421 The use of this approach does not require signaling at the session 422 setup. In other words, the audio sender can choose to use this scheme 423 without consulting the receiver. This is because a packet containing 424 redundant frames will not look different from a packet with only new 425 frames. The receiver may receive multiple copies or versions (encoded 426 with different modes) of a frame for a certain timestamp if no packet 427 is lost. If multiple versions of the same audio frame are received, 428 it is recommended that the mode with the highest rate be used by the 429 audio decoder. 431 This redundancy scheme provides the same functionality as the one 432 described in RFC 2198 "RTP Payload for Redundant Audio Data" [15]. In 433 most cases the mechanism in this payload format is more efficient and 434 simpler than requiring both endpoints to support RFC 2198 in 435 addition. There are two situations in which use of RFC 2198 is 436 indicated: if the spread in time required between the primary and 437 redundant encodings is larger than 5 frame times, the bandwidth 438 overhead of RFC 2198 will be lower; or, if some other codec than AMR- 439 WB+ is desired for the redundant encoding, the AMR-WB+ payload format 440 won't be able to carry it. 442 The sender is responsible for selecting an appropriate amount of 443 redundancy based on feedback about the channel, e.g., in RTCP 444 receiver reports. The sender is also responsible for avoiding 445 congestion, which may be exacerbated by redundancy (see Section 5 for 446 more details). 448 3.6.2. Use of Frame Interleaving 450 To decrease protocol overhead, the payload design allows several 451 audio frames be encapsulated into a single RTP packet. One of the 452 drawbacks of such an approach is that in case of packet loss this 453 means loss of several consecutive audio frames, which usually causes 454 clearly audible distortion in the reconstructed audio. Interleaving 455 of frames can improve the audio quality in such cases by distributing 456 the consecutive losses into a series of single frame losses. 457 However, interleaving and bundling several frames per payload will 458 also increase end-to-end delay and is therefore not appropriate for 459 all usage scenarios. Anyway, streaming applications will most likely 460 be able to exploit interleaving to improve audio quality in lossy 461 transmission conditions. 463 This payload design supports the use of frame interleaving as an 464 option. For the encoder (audio sender) to use frame interleaving in 465 its outbound RTP packets for a given session, the decoder (audio 466 receiver) needs to indicate its support via out-of-band means (see 467 Section 7). 469 3.7. AMR-WB+ Audio over IP scenarios 471 Since the primary target for the AMR-WB+ codec is packet switched 472 streaming, the most relevant usage scenario for this payload format 473 is IP end-to-end between between a server and a terminal, as shown in 474 Figure 2. 476 +----------+ +----------+ 477 | | IP/UDP/RTP/AMR-WB+ | | 478 | SERVER |<------------------------>| TERMINAL | 479 | | | | 480 +----------+ +----------+ 482 Figure 2: Server to terminal IP scenario 484 4. RTP Payload Format for AMR-WB+ 486 The AMR-WB+ payload format has an identical structure with the AMR 487 and AMR-WB payload formats [9]. The differences are that the number 488 of modes is extended compared to the original AMR-WB format and that 489 some features are removed. The motivation for the reduced 490 functionality is that only IP transport expected for AMR-WB+, i.e. 491 functionality used for gateway scenarios is removed. The payload 492 format consists of the RTP header, payload header and payload data. 494 Since the AMR-WB speech modes are included in the AMR-WB+ codec, an 495 end-point supporting AMR-WB+ is in principle also able to support 496 AMR-WB payload format and MIME subtype. To enable communication with 497 an end-point supporting only AMR-WB coding an AMR-WB+ SHOULD also 498 indicate its capability to communicate using AMR-WB MIME subtype and 499 RTP payload format to facilitate interoperability. However, it should 500 be noted that this is not possible in all scenarios: e.g. when AMR- 501 WB+ RTP payload format is used for streaming audio that is stored at 502 a server it is not possible to transform data stored using one of the 503 AMR-WB+ extension modes into one of the AMR-WB modes without full 504 transcoding. A similar scenario occurs with messaging services where 505 the message containing AMR-WB+ audio is pre-stored at a messaging 506 server. On the other hand, e.g. in live streaming scenario an AMR-WB+ 507 end-point might have the possibility to limit its operation to AMR-WB 508 modes only. 510 4.1. RTP Header Usage 512 The format of the RTP header is specified in [4]. This payload 513 format uses the fields of the header in a manner consistent with that 514 specification. 516 The RTP timestamp corresponds to the sampling instant of the first 517 sample encoded for the first frame in the packet. The timestamp 518 clock frequency SHALL be 96000 Hz, the lowest frequency that is an 519 integer multiple of the sampling frequencies used by any of the AMR- 520 WB+ modes. 522 The duration of one AMR-WB+ audio transport frame is 20 ms. The 523 sampling frequency is either 16 kHz, 24 kHz, or 32 kHz, corresponding 524 to 320, 480, 640 encoded audio samples per frame from each channel, 525 corresponding to a timestamp increase of 6x320, 4x480, or 3x640 all 526 equal to 1920 timestamp units per frame. A packet MAY contain 527 multiple frames of encoded audio or comfort noise parameters. If 528 interleaving is employed, the frames encapsulated into a payload are 529 picked according to the interleaving rules as defined in Section 530 4.3.1. Otherwise, each packet covers a period of one or more 531 contiguous 20 ms frames. 533 To allow for error resiliency through redundant transmission, the 534 periods covered by multiple packets MAY overlap in time. A receiver 535 MUST be prepared to receive any audio frame multiple times, all 536 multiply sent frames MUST use the same mode. 538 The payload is always made an integral number of octets long by 539 padding with zero bits if necessary. If additional padding is 540 required to bring the payload length to a larger multiple of octets 541 or for some other purpose, then the P bit in the RTP in the header 542 MAY be set and padding appended as specified in [4]. 544 The RTP header marker bit (M) SHALL be set to 1 if the first frame 545 carried in the packet contains an audio frame, which is the first in 546 a talkspurt. For all other packets the marker bit SHALL be set to 547 zero (M=0). 549 The assignment of an RTP payload type for this new packet format is 550 outside the scope of this document, and will not be specified here. 551 It is expected that the RTP profile under which this payload format 552 is being used will assign a payload type for this encoding or specify 553 that the payload type is to be bound dynamically. 555 An RTP payload type MUST only carry either mono or stereo encoded AMR 556 frames. If both mono and stereo is to be sent by an application two 557 different payload types must be used. Switching between mono and 558 stereo modes MAY be done if the right extra processing is available 559 (see section 3.2) in the receiver, through switching of the payload 560 types. 562 4.2. Payload Structure 564 The complete payload consists of a payload header, a payload table of 565 contents, and audio data representing one or more audio frames. The 566 following diagram shows the general payload format layout: 568 +----------------+-------------------+---------------- 569 | payload header | table of contents | audio data .. . 570 +----------------+-------------------+---------------- 572 Payloads containing more than one audio frame are called compound 573 payloads. 575 The following sections describe the variations taken by the payload 576 format depending on whether the AMR-WB+ session is set up to use any 577 of the OPTIONAL functions for robust sorting, interleaving, and frame 578 CRCs. 580 4.3. Payload definitions 582 4.3.1. The Payload Header 584 The payload header consists of a 4 bit CMR, 4 reserved bits, and 585 optionally, an 8 bit interleaving header, as shown below: 587 0 1 588 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 589 +-+-+-+-+-+-+-+-+- - - - - - - - 590 | CMR |R|R|R|R| ILL | ILP | 591 +-+-+-+-+-+-+-+-+- - - - - - - - 593 CMR (4 bits): Is used by the AMR and AMR-WB formats to indicate a 594 codec mode request sent to the audio encoder at the site of the 595 receiver of this payload. The value of the CMR field is set to the 596 frame type index of the corresponding audio mode being requested. 597 AMR-WB+ is not intended for conversational use and no gateway 598 scenarios are identified. Hence, this field is not needed for AMR- 599 WB+. The CMR field is kept for conformity with AMR and AMR-WB 600 formats, but MUST be set to the value 15, indicating that no mode 601 request is present. 603 R: is a reserved bit that MUST be set to zero. All R bits MUST be 604 ignored by the receiver. 606 ILL (4 bits, unsigned integer): This is an OPTIONAL field that is 607 present only if interleaving is signaled out-of-band for the 608 session. ILL=L indicates to the receiver that the interleaving 609 length is L+1, in number of frames. 611 ILP (4 bits, unsigned integer): This is an OPTIONAL field that is 612 present only if interleaving is signaled. ILP MUST take a value 613 between 0 and ILL, inclusive, indicating the interleaving index 614 for frames in this payload in the interleave group. If the 615 value of ILP is found greater than ILL, the payload SHOULD be 616 discarded. 618 ILL and ILP fields MUST be present in each packet in a session if 619 interleaving is signaled for the session. Interleaving MUST be 620 performed on a frame basis. 622 The following example illustrates the arrangement of audio frames in 623 an interleave group during an interleave session. Here we assume 624 ILL=L for the interleave group that starts at audio frame n. We also 625 assume that the first payload packet of the interleave group is s and 626 the number of audio frames carried in each payload is N. Then we will 627 have: 629 Payload s (the first packet of this interleave group): 630 ILL=L, ILP=0, 631 Carry frames: n, n+(L+1), n+2*(L+1), ..., n+(N-1)*(L+1) 633 Payload s+1 (the second packet of this interleave group): 634 ILL=L, ILP=1, 635 frames: n+1, n+1+(L+1), n+1+2*(L+1), ..., n+1+(N-1)*(L+1) 636 ... 638 Payload s+L (the last packet of this interleave group): 639 ILL=L, ILP=L, 640 frames: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+(N-1)*(L+1) 642 The next interleave group will start at frame n+N*(L+1). 644 There will be no interleaving effect unless the number of frames per 645 packet (N) is at least 2. Moreover, the number of frames per payload 646 (N) and the value of ILL MUST NOT be changed inside an interleave 647 group. In other words, all payloads in an interleave group MUST have 648 the same ILL and MUST contain the same number of audio frames. 650 The sender of the payload MUST only apply interleaving if the 651 receiver has signaled its use through out-of-band means. Since 652 interleaving will increase buffering requirements at the receiver, 653 the receiver uses MIME parameter "interleaving=I" to set the maximum 654 number of frames allowed in an interleaving group to I. 656 When performing interleaving the sender MUST use a proper number of 657 frames per payload (N) and ILL so that the resulting size of an 658 interleave group is less or equal to I, i.e., N*(L+1)<=I. 660 4.3.2. The Payload Table of Contents and Frame CRCs 662 The table of contents (ToC) consists of a list of ToC entries where 663 each entry corresponds to an audio frame carried in the payload and, 664 optionally, a list of audio frame CRCs, i.e., 666 +---------------------+ 667 | list of ToC entries | 668 +---------------------+ 669 | list of frame CRCs | (optional) 670 - - - - - - - - - - - 672 Note, for ToC entries with FT=14 or 15, there will be no 673 corresponding audio frame or frame CRC present in the payload. 675 When multiple frames are present in a packet, the ToC entries will be 676 placed in the packet in order of their creation time, with the 677 following exception; when interleaving is used the frames in the ToC 678 will almost never be placed consecutive in time. 680 A ToC entry takes the following format: 682 0 1 2 3 4 5 6 7 683 +-+-+-+-+-+-+-+-+ 684 |F| FT |Q|P|P| 685 +-+-+-+-+-+-+-+-+ 687 F (1 bit): If set to 1, indicates that this frame is followed by 688 another audio frame in this payload; if set to 0, indicates that 689 this frame is the last frame in this payload. 691 FT (4 bits): Frame type index, indicating the AMR-WB+ 692 audio coding mode or comfort noise (SID) mode of the 693 corresponding frame carried in this payload. 695 The value of FT is defined in Table 1 Section 3.1, FT=14 696 (AUDIO_LOST), and FT=15 (NO_DATA) are used to indicate frames that 697 are either lost or not being transmitted in this payload, 698 respectively. 700 NO_DATA (FT=15) frame could mean either that there is no data 701 produced by the audio encoder for that frame or that no data for that 702 frame is transmitted in the current payload (i.e., valid data for 703 that frame could be sent in either an earlier or later packet). 705 If receiving a ToC entry with a FT value not defined the whole packet 706 SHOULD be discarded. This is to avoid the loss of data 707 synchronization in the depacketization process, which can result in a 708 huge degradation in audio quality. 710 Note that packets containing only NO_DATA frames SHOULD NOT be 711 transmitted. Also, frames containing only NO_DATA frames at the end 712 of a packet SHOULD NOT be transmitted, except in the case of 713 interleaving. The AMR-WB+ SCR/DTX is identical with AMR-WB SCR/DTX 714 described in [6]. 716 Q (1 bit): Frame quality indicator. If set to 0, indicates the 717 corresponding frame is severely damaged and the receiver should 718 set the RX_TYPE (see [6]) to either AUDIO_BAD or SID_BAD 719 depending on the frame type (FT). 721 The frame quality indicator enables damaged frames to be forwarded to 722 the audio decoder for error concealment. This can improve the audio 723 quality comparing to dropping the damaged frames. See Section 724 4.3.2.1 for more details. 726 P bits: padding bits, MUST be set to zero. All padding bits MUST be 727 ignored by the receiver. 729 When multiple frames are present, their ToC entries will be placed in 730 the ToC in order of their creation time. 732 The following figure shows an example of a ToC of three entries. 734 0 1 2 735 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 736 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 737 |1| FT |Q|P|P|1| FT |Q|P|P|0| FT |Q|P|P| 738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 740 The list of CRCs is OPTIONAL. It only exists if the use of CRC is 741 signaled out-of-band for the session. When present, each CRC in the 742 list is 8 bit long and corresponds to an audio frame carried in the 743 payload. Calculation and use of the CRC is specified in Section 744 4.3.2.1. 746 4.3.2.1. Use of Frame CRC for UED over IP 748 The general concept of UED/UEP over IP is discussed in Section 3.5. 749 This section provides more details on how to use the frame CRC in the 750 payload header together with a partial transport layer checksum to 751 achieve UED. 753 To achieve UED, one SHOULD use a transport layer checksum, for 754 example, the one defined in UDP-Lite [10], to protect the RTP header, 755 payload header, and table of contents bits in a payload. The frame 756 CRC, when used, MUST be calculated only over all class A bits in the 757 frame. Class B and possible C bits in the frame MUST NOT be included 758 in the CRC calculation and SHOULD NOT be covered by the transport 759 checksum. 761 Note, the number of class A bits for various coding modes in 762 AMR-WB+ codec is specified as normative in Table 1 in Section 3.1, 763 and the SID frame (FT=9) has 40 class A bits. These definitions 764 of class A bits MUST be used for this payload format. 766 A packet SHOULD be discarded if the transport layer checksum detects 767 errors. 769 The receiver of the payload SHOULD examine the data integrity of the 770 received class A bits by re-calculating the CRC over the received 771 class A bits and comparing the result to the value found in the 772 received payload header. If the two values mismatch, the receiver 773 SHALL consider the class A bits in the receiver frame damaged and 774 MUST clear the Q flag of the frame (i.e., set it to 0). This will 775 subsequently cause the frame to be marked as AUDIO_BAD, if the FT of 776 the frame is 0..8 or 10..13, or SID_BAD if the FT of the frame is 9 777 before it is passed to the audio decoder. See [6] more details. 779 The following example shows an octet-aligned ToC with a CRC list for 780 a payload containing 3 audio frames from a single channel session 781 (assuming none of the FTs is equal to 14 or 15): 783 0 1 2 3 784 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 785 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 786 |1| FT#1 |Q|P|P|1| FT#2 |Q|P|P|0| FT#3 |Q|P|P| CRC#1 | 787 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 788 | CRC#2 | CRC#3 | 789 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 791 Each of the CRC's takes 8 bits 793 0 1 2 3 4 5 6 7 794 +---+---+---+---+---+---+---+---+ 795 | c0| c1| c2| c3| c4| c5| c6| c7| 796 +---+---+---+---+---+---+---+---+ 797 (MSB) (LSB) 799 and is calculated by the cyclic generator polynomial, 801 C(x) = 1 + x^2 + x^3 + x^4 + x^8 803 where ^ is the exponentiation operator. 805 In binary form the polynomial has the following form: 101110001 806 (MSB..LSB). 808 The actual calculation of the CRC is made as follows: First, an 8- 809 bit CRC register is reset to zero: 00000000. For each bit over which 810 the CRC shall be calculated, an XOR operation is made between the 811 rightmost (LSB) bit of the CRC register and the bit. The CRC register 812 is then right shifted one step (each bits significance is reduced 813 with one) inputting a "0" as the leftmost bit (MSB). If the result of 814 the XOR operation mentioned above is a "1" then "10111000" is bit- 815 wise XOR-ed into the CRC register. This operation is repeated for 816 each bit that the CRC should cover. In this case, the first bit 817 would be d(0) for the speech frame for which the CRC should cover. 818 When the last bit (e.g., d(71) for AMR-WB 15.85 according to Table 1 819 in Section 3.1) have been used in this CRC calculation, the contents 820 in CRC register should simply be copied to the corresponding field in 821 the list of CRC's. 823 Fast calculation of the CRC on a general-purpose CPU is possible 824 using a table-driven algorithm. 826 4.3.3. Audio Data 828 Audio data of a payload contains one or more audio frames or comfort 829 noise frames, as described in the ToC of the payload. 831 Note, for ToC entries with FT=14 or 15, there will be no 832 corresponding audio frame present in the audio data. 834 Each audio frame represents 20 ms of audio encoded with the mode 835 indicated in the FT field of the corresponding ToC entry. The length 836 of the audio frame is implicitly defined by the mode indicated in the 837 FT field. The order and numbering notation of the bits are as 838 specified in [2]. As specified there, the bits of audio frames have 839 been rearranged in order of decreasing sensitivity or for the 840 extension modes in two sensitivity classes, while the bits of comfort 841 noise frames are in the order produced by the encoder. The resulting 842 bit sequence for a frame of length K bits is denoted d(0), d(1), ..., 843 d(K-1). The last octet of each audio frame MUST be padded with zeroes 844 at the end if not all bits in the octet are used. In other words, 845 each audio frame MUST be octet-aligned. 847 When multiple audio frames are present in the audio data (i.e., 848 compound payload), the audio frames can be arranged either one whole 849 frame after another as usual, or with the octets of all frames 850 interleaved together at the octet level. Since the bits within each 851 frame are ordered with the most error-sensitive bits first, 852 interleaving the octets collects those sensitive bits from all frames 853 to be nearer the beginning of the packet. This is called "robust 854 sorting order" which allows the application of UED (such as UDP-Lite 855 [10]) or UEP (such as ULP [12]) mechanisms to the payload data. The 856 details of assembling the payload are given in the next section. 858 The use of robust sorting order for a session MUST be agreed via out- 859 of-band means. Section 7.1 specifies a MIME parameter for this 860 purpose. 862 4.3.4. Methods for Forming the Payload 864 Two different packetization methods, namely normal order and robust 865 sorting order, exist for forming a payload. In both cases, the 866 payload header and table of contents are packed into the payload the 867 same way; the difference is in the packing of the audio frames. 869 The payload begins with the payload header of one octet or two if 870 frame interleaving is selected. The payload header is followed by 871 the table of contents consisting of a list of one-octet ToC entries. 872 If frame CRCs are to be included, they follow the table of contents 873 with one 8-bit CRC filling each octet. Note that if a given frame 874 has a ToC entry with FT=14 or 15, there will be no CRC present. 876 The audio data follows the table of contents, or the CRCs if present. 877 For packetization in the normal order, all of the octets comprising a 878 audio frame are appended to the payload as a unit. The audio frames 879 are packed in the same order as their corresponding ToC entries are 880 arranged in the ToC list, with the exception that if a given frame 881 has a ToC entry with FT=14 or 15, there will be no data octets 882 present for that frame. 884 For packetization in robust sorting order, the octets of all audio 885 frames are interleaved together at the octet level. That is, the 886 data portion of the payload begins with the first octet of the first 887 frame, followed by the first octet of the second frame, then the 888 first octet of the third frame, and so on. After the first octet of 889 the last frame has been appended, the cycle repeats with the second 890 octet of each frame. The process continues for as many octets as are 891 present in the longest frame. If the frames are not all the same 892 octet length, a shorter frame is skipped once all octets in it have 893 been appended. The order of the frames in the cycle will be 894 sequential if frame interleaving is not in use, or according to the 895 interleave pattern specified in the payload header if frame 896 interleaving is in use. Note that if a given frame has a ToC entry 897 with FT=14 or 15, there will be no data octets present for that frame 898 so that frame is skipped in the robust sorting cycle. 900 The UED and/or UEP SHOULD cover at least the RTP header, payload 901 header, table of contents, and all class A bits of a sorted payload. 902 All class A bit SHOULD be covered since the extension modes do not 903 have accurate sorting of the bits in sensitivity order. The bits are 904 only sorted in different classes, with the most sensitive bits (class 905 A bits) placed in the beginning. Exactly how many octets need to be 906 covered depends on the network and application. If CRCs are used 907 together with robust sorting, only the RTP header, the payload 908 header, and the ToC SHOULD be covered by UED/UEP. The means to 909 communicate to other layers performing UED/UEP the number of octets 910 to be covered is beyond the scope of this specification. 912 4.3.5. Payload Examples 914 4.3.5.1. Example 1, Basic Payload Carrying Multiple Frames 916 The following diagram shows a payload from a session that carries two 917 AMR-WB+ frames of 14 kbps coding mode (FT=10). In the payload, the 918 codec mode request is set to the default value (CMR=15), the mandated 919 disabling of CMR. No frame CRC, interleaving, or robust-sorting is 920 in use. 922 0 1 2 3 923 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 925 |CMR=15 |R|R|R|R|1|FT#1=10|Q|P|P|0|FT#2=10|Q|P|P| f1(0..7) | 926 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 927 | f1(8..15) | f1(16..23) | .... | 928 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 929 : ... : 930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 931 | ... |f1(272..279) | f2(0..7) | f2(8..15) | 932 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 933 | f2(16..23) | .... | 934 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 935 : ... : 936 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 937 |f2(272..279) | 938 +-+-+-+-+-+-+-+-+ 940 4.3.5.2. Example 2, Payload with CRC, Interleaving, and Robust-sorting 942 This example shows a payload with two consecutive frames of 18 kbps 943 stereo coding mode (FT=11), are carried in this payload. In the 944 payload, the codec mode request is set to the mandated value (CMR=15) 946 Moreover, frame CRC and interleaving are both enabled for the 947 session. The interleaving length is 2 (ILL=1) and this payload is 948 the first one in an interleave group (ILP=0). 950 The first frame in the payload is frame #1, consisting of bits 951 f1(0..359), and the next frame is frame#3, consisting of bits 952 f3(0..359), due to interleaving. For each of the two audio frames a 953 CRC is calculated as CRC1(0..7), CRC3(0..7), respectively. Finally, 954 the payload is robust sorted. 956 0 1 2 3 957 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 958 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 959 |CMR=15 |R|R|R|R| ILL=1 | ILP=0 |1|FT#1=11|Q|P|P|0|FT#3=11|Q|P|P| 960 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 961 | CRC1 | CRC3 | f1(0..7) | f3(0..7) | 962 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 963 | f1(8..15) | f3(8..15) | f1(16..23) | f3(16..23) | 964 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 965 : ... : 966 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 967 | ... | f1(336..343) | f3(336..343) | 968 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 969 | f1(344..359) | f3(344..351) | f1(352..359) | f3(352..359) | 970 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 972 4.4. Implementation Considerations 974 An application implementing this payload format MUST understand all 975 the payload parameters in the out-of-band signaling used. For 976 example, if an application uses SDP, all the SDP and MIME parameters 977 in this document MUST be understood. This requirement ensures that 978 an implementation always can decide if it is capable or not of 979 communicating. 981 Only the basic operation mode of the payload format is mandatory to 982 implement. The other modes of operation, i.e. interleaving, robust 983 sorting, and frame-wise CRC are OPTIONAL to implement. The 984 requirements of the application using the payload format should be 985 used to determine what to implement. 987 5. Congestion Control 989 The general congestion control considerations for transporting RTP 990 data apply to AMR-WB+ audio over RTP as well. However, the multi- 991 rate capability of AMR-WB+ audio coding may provide an advantage over 992 other payload formats for controlling congestion since the bandwidth 993 demand can be adjusted by selecting a different coding mode. 995 Another parameter that may impact the bandwidth demand for AMR-WB+ is 996 the number of frames that are encapsulated in each RTP payload. 997 Packing more frames in each RTP payload can reduce the number of 998 packets sent and hence the overhead from IP/UDP/RTP headers, at the 999 expense of increased delay and reduced error robustness against 1000 packet losses. 1002 If forward error correction (FEC) is used to combat packet loss, the 1003 amount of redundancy added by FEC will need to be regulated so that 1004 the use of FEC itself does not cause a congestion problem. 1006 It is RECOMMENDED that AMR-WB+ applications using this payload format 1007 employ congestion control. The actual mechanism for congestion 1008 control is not specified but should be suitable for real-time flows, 1009 e.g., TCP Friendly Rate Control[11]. In the future the usage of 1010 congestion controlled transport protocols like Datagram Congestion 1011 Control Protocol (DCCP) [16] may simplify the usage of congestion 1012 control for application developers. 1014 6. Security Considerations 1016 RTP packets using the payload format defined in this specification 1017 are subject to the general security considerations discussed in 1018 RFC3550 [4]. As this format transports encoded audio, the main 1019 security issues include confidentiality, integrity protection, and 1020 authentication of the audio itself. The payload format itself does 1021 not have any built-in security mechanisms. External mechanisms, such 1022 as SRTP [13], MAY be used. 1024 This payload format or the AMR-WB+ decoder does not exhibit any 1025 significant non-uniformity in the receiver side computational 1026 complexity for packet processing and thus is unlikely to pose a 1027 denial-of-service threat due to the receipt of pathological data. 1029 6.1. Confidentiality 1031 To achieve confidentiality of the encoded AMR-WB+ audio, all audio 1032 data bits will need to be encrypted. There is less a need to encrypt 1033 the payload header or the table of contents due to 1) that they only 1034 carry information about the requested audio mode, frame type, and 1035 frame quality, and 2) that this information could be useful to some 1036 third party, e.g., quality monitoring. 1038 As long as the AMR-WB+ payload is only packed and unpacked at either 1039 end, encryption may be performed after packet encapsulation so that 1040 there is no conflict between the two operations. 1042 Interleaving may affect encryption. Depending on the encryption 1043 scheme used, there may be restrictions on, for example, the time when 1044 keys can be changed. Specifically, the key change may need to occur 1045 at the boundary between interleave groups. 1047 The type of encryption method used may impact the error robustness of 1048 the payload data. The error robustness may be severely reduced when 1049 the data is encrypted unless an encryption method without error- 1050 propagation is used, e.g. a stream cipher. Therefore, UED/UEP based 1051 on robust sorting may be difficult to apply when the payload data is 1052 encrypted. 1054 6.2. Authentication 1056 To authenticate the sender of the audio and provide integrity 1057 protection, an external mechanism has to be used. It is RECOMMENDED 1058 that such a mechanism protect all the audio data bits and the RTP 1059 header. Note that the use of UED/UEP may be difficult to combine 1060 with authentication because any bit errors will cause authentication 1061 to fail. 1063 Data tampering by a man-in-the-middle attacker could result in 1064 erroneous depacketization/decoding that could lower the audio 1065 quality. 1067 To prevent a man-in-the-middle attacker from tampering with the 1068 payload packets, some additional information besides the audio bits 1069 SHOULD be protected. This may include the payload header, ToC, frame 1070 CRCs, RTP timestamp, RTP sequence number, and the RTP marker bit. 1072 6.3. Decoding Validation 1074 When processing a received payload packet, if the receiver finds that 1075 the calculated payload length, based on the information of the 1076 session and the values found in the payload header fields, does not 1077 match the size of the received packet, the receiver SHOULD discard 1078 the packet. This is because decoding a packet that has errors in its 1079 length field could severely degrade the audio quality. 1081 7. Payload Format Parameters 1083 This section defines the parameters that may be used to select 1084 optional features of the AMR-WB+ payload format. The parameters are 1085 defined here as part of the MIME subtype registrations for the AMR- 1086 WB+ audio codec. A mapping of the parameters into the Session 1087 Description Protocol (SDP) [7] is also provided for those 1088 applications that use SDP. Equivalent parameters could be defined 1089 elsewhere for use with control protocols that do not use MIME or SDP. 1091 The data format and parameters are only specified for real-time 1092 transport in RTP. 1094 7.1. MIME Registration 1096 The MIME subtype for the Adaptive Multi-Rate Wideband plus (AMR-WB+) 1097 codec is allocated from the IETF tree since AMR-WB+ is expected to be 1098 a widely used audio codec in general streaming applications. 1100 Note, any unspecified parameter MUST be ignored by the receiver. 1102 Media Type name: audio 1104 Media subtype name: AMR-WB+ 1106 Required parameters: none 1108 Optional parameters: 1110 These parameters apply to RTP transfer only. 1112 channels: The number of audio channels present in the audio 1113 frames. Permissible values are 1 (mono) or 2 1114 (stereo). An RTP payload type SHALL only contain mono 1115 or stereo modes, not both. If switching is desired 1116 between mono or stereo two payload types will need to 1117 be declared. If no parameter is present, the number 1118 of channels is 1 (mono). 1120 maxptime: see Section 8 in RFC 3267 [9]. 1122 crc: Permissible values are 0 and 1. If 1, frame CRCs 1123 SHALL be included in the payload, otherwise not. If 0 1124 or if not present, CRCs SHALL not be included. 1126 robust-sorting: Permissible values are 0 and 1. If 1, the payload 1127 SHALL employ robust payload sorting. If 0 or if not 1128 present, simple payload sorting SHALL be used. 1130 interleaving: Indicates that frame level interleaving SHALL be 1131 used for the session and its value defines the 1132 maximum number of frame allowed in an interleaving 1133 group (see Section 4.3.1). If this parameter is not 1134 present, interleaving SHALL not be used. 1136 ptime: see RFC2327 [7]. 1138 Encoding considerations: 1139 This type is only defined for transfer via RTP (RFC 1140 3550) and as described in Section 4 of RFC XXXX. 1142 Security considerations: 1143 See Section 6 of RFC XXXX. 1145 Public specification: 1146 Please refer to Section 10 of RFC XXXX. 1148 Additional information: 1149 File storage of the AMR-WB+ format is recommended to be 1150 done in the 3GPP defined ISO based multimedia file 1151 format defined in 3GPP TS 26.244, see reference [18] of 1152 RFC XXXX. The file format has the MIME types 1153 "audio/3GPP" or "video/3GPP". 1155 To maintain interoperability with AMR-WB capable end- 1156 points, in cases where negotiation is possible, an AMR- 1157 WB+ end-point SHOULD declare itself also as AMR-WB 1158 capable. 1160 As the AMR-WB+ decoder is capable of performing stereo 1161 to mono conversions, all receivers of AMR-WB+ should be 1162 able to receive both stereo and mono, although the 1163 receiver only is capable of playout of mono signals. 1165 Person & email address to contact for further information: 1166 johan.sjoberg@ericsson.com 1167 ari.lakaniemi@nokia.com 1169 Intended usage: COMMON. 1170 It is expected that many IP based streaming 1171 applicationswill use this type. 1173 Author/Change controller: 1174 johan.sjoberg@ericsson.com 1175 ari.lakaniemi@nokia.com 1176 IETF Audio/Video transport working group 1178 7.2. Mapping MIME Parameters into SDP 1180 The information carried in the MIME media type specification has a 1181 specific mapping to fields in the Session Description Protocol (SDP) 1182 [7], which is commonly used to describe RTP sessions. When SDP is 1183 used to specify sessions employing the AMR-WB+ codec, the mapping is 1184 as follows: 1186 - The MIME type ("audio") goes in SDP "m=" as the media name. 1188 - The MIME subtype (payload format name) goes in SDP "a=rtpmap" as 1189 the encoding name. The RTP clock rate in "a=rtpmap" SHALL be 1190 96000 for AMR-WB+, and the encoding parameter number of channels 1191 MUST either be explicitly set to 1 or 2, or be omitted, implying 1192 the default value of 1. Only codec modes agreeing with the 1193 signalled number of channels may be used. 1195 - The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and 1196 "a=maxptime" attributes, respectively. 1198 - Any remaining parameters go in the SDP "a=fmtp" attribute by 1199 copying them directly from the MIME media type string as a 1200 semicolon separated list of parameter=value pairs. 1202 7.2.1. Offer-Answer Model Considerations 1204 To achieve good interoperability for the AMR-WB+ RTP payload in an 1205 Offer-Answer negotiative usage in SDP the following considerations 1206 should be made: 1208 - Each combination of the RTP payload configuration parameters (crc, 1209 robust-sorting, and interleaving) is unique in its bit-pattern and 1210 not compatible with any other combination. Due to the application 1211 dependent nature of any configuration and they being optionally to 1212 implement, care must be taken. When creating an offer in an 1213 application desiring to use the more advance features (crc, 1214 robust-sorting, or interleaving), the offerer is RECOMMENDED to 1215 also offer an payload type containing only the octet-align 1216 configuration. If multiple configurations are of interest to the 1217 application they may all be offered, however care should be taken 1218 to not offer too many payload types. 1220 - As one can use both mono and stereo modes, and these require 1221 different payload types to be declared/negotiated, both stereo and 1222 mono payload types SHOULD be offered. 1224 - The parameters "maxptime" and "ptime" should in most cases not 1225 affect the interoperability, however the setting of the parameters 1226 can affect the performance of the application. 1228 - To maintain interoperability with AMR-WB in cases where 1229 negotiation is possible, an AMR-WB+ capable end-point SHOULD also 1230 declare itself capable of AMR-WB as it is a subset of AMR-WB+. 1232 7.2.2. Examples 1234 One example SDP session description utilizing AMR-WB+ mono and stereo 1235 encoding follow. 1237 m=audio 49120 RTP/AVP 98 99 1238 a=rtpmap:98 AMR-WB+/96000/1 1239 a=rtpmap:99 AMR-WB+/96000/2 1240 a=fmtp:98 interleaving=30 1241 a=fmtp:99 interleaving=30 a=maxptime:100 1243 Note that the payload format (encoding) names are commonly shown in 1244 upper case. MIME subtypes are commonly shown in lower case. These 1245 names are case-insensitive in both places. Similarly, parameter 1246 names are case-insensitive both in MIME types and in the default 1247 mapping to the SDP a=fmtp attribute. 1249 8. IANA Considerations 1251 It is request that one new MIME subtypes is registered by IANA, see 1252 Section 7. 1254 9. Acknowledgements 1256 The authors would like to thank Redwan Salami and Stefan Bruhn for 1257 their significant contributions made throughout the writing and 1258 reviewing of this document. We would also like to acknolwedge 1259 Qiaobing Xie coauthor of RFC 3267 on which this document is based on. 1261 10. References 1263 10.1. Normative references 1265 [1] 3GPP TS 26.xxx "AMR Wideband plus audio codec; Transcoding 1266 functions", version 6.0.0 (2004-xx), 3rd Generation Partnership 1267 Project (3GPP). 1268 [2] 3GPP TS 26.xxx "AMR Wideband plus audio codec; Frame Structure", 1269 version 6.0.0 (2004-xx), 3rd Generation Partnership Project 1270 (3GPP). 1271 [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1272 Levels", BCP 14, RFC 2119, March 1997. 1273 [4] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A 1274 Transport Protocol for Real-Time Applications", RFC 3550 July 1275 2003. 1276 [5] 3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise 1277 aspects", version 5.0.0 (2001-03), 3rd Generation Partnership 1278 Project (3GPP). 1279 [6] 3GPP TS 26.193 "AMR Wideband speech codec; Source Controled Rate 1280 operation", version 5.0.0 (2001-03), 3rd Generation Partnership 1281 Project (3GPP). 1282 [7] Handley, M. and V. Jacobson, "SDP: Session Description 1283 Protocol", RFC 2327, April 1998. 1284 [8] Schulzrinne, H., "RTP Profile for Audio and Video Conferences 1285 with Minimal Control", RFC 3551, July 2003. 1286 [9] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real- 1287 Time Transport Protocol (RTP) Payload Format and File Storage 1288 Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate 1289 Wideband (AMR-WB) Audio Codecs", RFC 3267, June 2002. 1291 10.2. Informative References 1293 [10] Larzon, L., Degermark, M. and S. Pink, "The UDP Lite Protocol", 1294 Work in Progress. 1295 [11] S. Floyd, M. Handley, J. Padhye, J. Widmer, "TCP Friendly Rate 1296 Control (TFRC): Protocol Specification", RFC 3448, Internet 1297 Engineering Task Force, January 2003. 1298 [12] Li, A., et. al., "An RTP Payload Format for Generic FEC with 1299 Uneven Level Protection", Work in Progress. 1300 [13] Baugher, et. al., "The Secure Real Time Transport Protocol", 1301 Work in Progress. 1302 [14] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for 1303 Generic Forward Error Correction", RFC 2733, December 1999. 1304 [15] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M., 1305 Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload 1306 for Redundant Audio Data", RFC 2198, September 1997. 1307 [16] Kohler, E. et. al., "Datagram Congestion Control Protocol 1308 (DCCP)", Internet Draft, work in progress. 1309 [17] 3GPP TS 26.233 "Packet Switched Streaming service", version 1310 5.0.0 (2001-03), 3rd Generation Partnership Project (3GPP). 1312 [18] 3GPP TS 26.244 " Transparent end-to-end packet switched 1313 streaming service (PSS); 3GPP file format (3GP)", version 1.0.0 1314 (2003-11-28), 3rd Generation Partnership Project (3GPP). 1316 ETSI documents can be downloaded from the ETSI web server, 1317 "http://www.etsi.org/". Any 3GPP document can be downloaded from the 1318 3GPP webserver, "http://www.3gpp.org/", see specifications. TIA 1319 documents can be obtained from "www.tiaonline.org". 1321 11. Authors' Addresses 1323 Johan Sjoberg 1324 Ericsson Research 1325 Ericsson AB 1326 SE-164 80 Stockholm, SWEDEN 1328 Phone: +46 8 50878230 1329 EMail: Johan.Sjoberg@ericsson.com 1331 Magnus Westerlund 1332 Ericsson Research 1333 Ericsson AB 1334 SE-164 80 Stockholm, SWEDEN 1336 Phone: +46 8 4048287 1337 EMail: Magnus.Westerlund@ericsson.com 1339 Ari Lakaniemi 1340 Nokia Research Center 1341 P.O.Box 407 1342 FIN-00045 Nokia Group, FINLAND 1344 Phone: +358-71-8008000 1345 EMail: ari.lakaniemi@nokia.com 1347 12. IPR Notice 1349 The IETF takes no position regarding the validity or scope of any 1350 intellectual property or other rights that might be claimed to 1351 pertain to the implementation or use of the technology described in 1352 this document or the extent to which any license under such rights 1353 might or might not be available; neither does it represent that it 1354 has made any effort to identify any such rights. Information on the 1355 IETF's procedures with respect to rights in standards-track and 1356 standards-related documentation can be found in BCP-11. Copies of 1357 claims of rights made available for publication and any assurances of 1358 licenses to be made available, or the result of an attempt made to 1359 obtain a general license or permission for the use of such 1360 proprietary rights by implementors or users of this specification can 1361 be obtained from the IETF Secretariat. 1363 The IETF invites any interested party to bring to its attention any 1364 copyrights, patents or patent applications, or other proprietary 1365 rights which may cover technology that may be required to practice 1366 this standard. Please address the information to the IETF Executive 1367 Director. 1369 13. Copyright Notice 1371 Copyright (C) The Internet Society (2004). All Rights Reserved. 1373 This document and translations of it may be copied and 1374 furnished to others, and derivative works that comment on or 1375 otherwise explain it or assist in its implementation may be 1376 prepared, copied, published and distributed, in whole or in 1377 part, without restriction of any kind, provided that the above 1378 copyright notice and this paragraph are included on all such 1379 copies and derivative works. However, this document itself may 1380 not be modified in any way, such as by removing the copyright 1381 notice or references to the Internet Society or other Internet 1382 organizations, except as needed for the purpose of developing 1383 Internet standards in which case the procedures for copyrights 1384 defined in the Internet Standards process must be followed, or 1385 as required to translate it into languages other than English. 1387 The limited permissions granted above are perpetual and will 1388 not be revoked by the Internet Society or its successors or 1389 assigns. 1391 This document and the information contained herein is provided 1392 on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 1393 ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR 1394 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE 1395 OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY 1396 IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A 1397 PARTICULAR PURPOSE. 1399 This Internet-Draft expires in August 2004.