idnits 2.17.1 draft-ietf-avt-rtp-amr-bis-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 20. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2631. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2602. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2609. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2615. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 9 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. -- The abstract seems to indicate that this document obsoletes RFC3267, but the header doesn't have an 'Obsoletes:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1822 has weird spacing: '...fies if the c...' == Line 1995 has weird spacing: '...fies if the c...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 18, 2006) is 6428 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' ** Obsolete normative reference: RFC 4566 (ref. '11') (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 4288 (ref. '14') (Obsoleted by RFC 6838) == Outdated reference: A later version (-05) exists of draft-ietf-avt-rfc3555bis-04 -- Obsolete informational reference (is this intentional?): RFC 3448 (ref. '21') (Obsoleted by RFC 5348) -- Obsolete informational reference (is this intentional?): RFC 2733 (ref. '23') (Obsoleted by RFC 5109) -- Obsolete informational reference (is this intentional?): RFC 2326 (ref. '29') (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 4346 (ref. '34') (Obsoleted by RFC 5246) Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 20 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Johan Sjoberg 3 INTERNET-DRAFT Magnus Westerlund 4 Expires: March 2007 Ericsson 5 Obsoletes(if approved): RFC 3267 Ari Lakaniemi 6 Nokia 7 Q. Xie 8 Motorola 9 September 18, 2006 11 RTP Payload Format and File Storage Format for the Adaptive Multi- 12 Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs 13 15 Status of this memo 17 By submitting this Internet-Draft, each author represents that any 18 applicable patent or other IPR claims of which he or she is aware 19 have been or will be disclosed, and any of which he or she becomes 20 aware will be disclosed, in accordance with Section 6 of BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as 25 Internet-Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six 28 months and may be updated, replaced, or obsoleted by other documents 29 at any time. It is inappropriate to use Internet-Drafts as 30 reference material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/1id-abstracts.txt 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html 38 This document is a submission of the IETF AVT WG. Comments should 39 be directed to the AVT WG mailing list, avt@ietf.org. 41 Abstract 43 This document specifies a real-time transport protocol (RTP) payload 44 format to be used for Adaptive Multi-Rate (AMR) and Adaptive 45 Multi-Rate Wideband (AMR-WB) encoded speech signals. The payload 46 format is designed to be able to interoperate with existing AMR and 47 AMR-WB transport formats on non-IP networks. In addition, a file 48 format is specified for transport of AMR and AMR-WB speech data in 49 storage mode applications such as email. Two separate media type 50 registrations are included, one for AMR and one for AMR-WB, 51 specifying use of both the RTP payload format and the storage 52 format. This document obsoletes RFC 3267. 54 Table of Contents 56 1. Introduction.....................................................3 57 2. Conventions and Acronyms.........................................4 58 3. Background on AMR/AMR-WB and Design Principles...................4 59 3.1. The Adaptive Multi-Rate (AMR) Speech Codec..................5 60 3.2. The Adaptive Multi-Rate Wideband (AMR-WB) Speech Codec......5 61 3.3. Multi-rate Encoding and Mode Adaptation.....................5 62 3.4. Voice Activity Detection and Discontinuous Transmission.....6 63 3.5. Support for Multi-Channel Session...........................6 64 3.6. Unequal Bit-error Detection and Protection..................7 65 3.6.1. Applying UEP and UED in an IP Network..................8 66 3.7. Robustness against Packet Loss..............................9 67 3.7.1. Use of Forward Error Correction (FEC)..................9 68 3.7.2. Use of Frame Interleaving.............................11 69 3.8. Bandwidth Efficient or Octet-aligned Mode..................11 70 3.9. AMR or AMR-WB Speech over IP scenarios.....................12 71 4. AMR and AMR-WB RTP Payload Formats..............................14 72 4.1. RTP Header Usage...........................................14 73 4.2. Payload Structure..........................................16 74 4.3. Bandwidth-Efficient Mode...................................16 75 4.3.1. The Payload Header....................................16 76 4.3.2. The Payload Table of Contents.........................17 77 4.3.3. Speech Data...........................................19 78 4.3.4. Algorithm for Forming the Payload.....................21 79 4.3.5. Payload Examples......................................22 80 4.3.5.1. Single Channel Payload Carrying a Single Frame...22 81 4.3.5.2. Single Channel Payload Carrying Multiple Frames..23 82 4.3.5.3. Multi-Channel Payload Carrying Multiple Frames...24 83 4.4. Octet-aligned Mode.........................................25 84 4.4.1. The Payload Header....................................25 85 4.4.2. The Payload Table of Contents and Frame CRCs..........26 86 4.4.2.1. Use of Frame CRC for UED over IP.................28 87 4.4.3. Speech Data...........................................29 88 4.4.4. Methods for Forming the Payload.......................30 89 4.4.5. Payload Examples......................................32 90 4.4.5.1. Basic Single Channel Payload Carrying Multiple 91 Frames...........................................32 92 4.4.5.2. Two Channel Payload with CRC, Interleaving, and 93 Robust-sorting...................................32 94 4.5. Implementation Considerations..............................33 95 4.5.1. Decoding Validation...................................34 96 5. AMR and AMR-WB Storage Format...................................35 97 5.1. Single channel Header......................................36 98 5.2. Multi-channel Header.......................................36 99 5.3. Speech Frames..............................................37 100 6. Congestion Control..............................................38 101 7. Security Considerations.........................................39 102 7.1. Confidentiality............................................39 103 7.2. Authentication and Integrity...............................39 104 8. Payload Format Parameters.......................................40 105 8.1. AMR Media Type Registration................................40 106 8.2. AMR-WB Media Type Registration.............................44 107 8.3. Mapping Media Type Parameters into SDP.....................47 108 8.3.1. Offer-Answer Model Considerations.....................47 109 8.3.2. Usage of declarative SDP..............................50 110 8.3.3. Examples..............................................51 111 9. IANA Considerations.............................................52 112 10. Changes........................................................53 113 11. Acknowledgements...............................................54 114 12. References.....................................................55 115 12.1. Normative References......................................55 116 12.2. Informative References....................................56 117 13. Authors' Addresses.............................................57 118 14. IPR Notice.....................................................58 119 15. Copyright Notice...............................................58 121 1. Introduction 123 This document obsoletes RFC 3267 and extends that specification with 124 offer/answer rules. See Section 10 for the changes made to this 125 format in relation to RFC 3267. 127 This document specifies the payload format for packetization of AMR 128 and AMR-WB encoded speech signals into the Real-time Transport 129 Protocol (RTP)[8]. The payload format supports transmission of 130 multiple channels, multiple frames per payload, the use of fast 131 codec mode adaptation, robustness against packet loss and bit 132 errors, and interoperation with existing AMR and AMR-WB transport 133 formats on non-IP networks, as described in Section 3. 135 The payload format itself is specified in Section 4. A related file 136 format is specified in Section 5 for transport of AMR and AMR-WB 137 speech data in storage mode applications such as email. In Section 138 8, two separate media type registrations are provided, one for AMR 139 and one for AMR-WB. 141 Even though this RTP payload format definition supports the 142 transport of both AMR and AMR-WB speech, it is important to remember 143 that AMR and AMR-WB are two different codecs and they are always 144 handled as different payload types in RTP. 146 2. Conventions and Acronyms 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 150 document are to be interpreted as described in RFC2119 [5]. 152 The following acronyms are used in this document: 154 3GPP - the Third Generation Partnership Project 155 AMR - Adaptive Multi-Rate (Codec) 156 AMR-WB - Adaptive Multi-Rate Wideband (Codec) 157 CMR - Codec Mode Request 158 CN - Comfort Noise 159 DTX - Discontinuous Transmission 160 ETSI - European Telecommunications Standards Institute 161 FEC - Forward Error Correction 162 SCR - Source Controlled Rate Operation 163 SID - Silence Indicator (the frames containing only CN 164 parameters) 165 VAD - Voice Activity Detection 166 UED - Unequal Error Detection 167 UEP - Unequal Error Protection 169 The term "frame-block" is used in this document to describe the 170 time-synchronized set of speech frames in a multi-channel AMR or 171 AMR-WB session. In particular, in an N-channel session, a 172 frame- block will contain N speech frames, one from each of the 173 channels, and all N speech frames represents exactly the same time 174 period. 176 3. Background on AMR/AMR-WB and Design Principles 178 AMR and AMR-WB were originally designed for circuit-switched mobile 179 radio systems. Due to their flexibility and robustness, they are 180 also suitable for other real-time speech communication services over 181 packet-switched networks such as the Internet. 183 Because of the flexibility of these codecs, the behavior in a 184 particular application is controlled by several parameters that 185 select options or specify the acceptable values for a variable. 186 These options and variables are described in general terms at 187 appropriate points in the text of this specification as parameters 188 to be established through out-of-band means. In Section 8, all of 189 the parameters are specified in the form of media subtype 190 registrations for the AMR and AMR-WB encodings. The method used to 191 signal these parameters at session setup or to arrange prior 192 agreement of the participants is beyond the scope of this document; 193 however, Section 8.3 provides a mapping of the parameters into the 194 Session Description Protocol (SDP) [11] for those applications that 195 use SDP. 197 3.1. The Adaptive Multi-Rate (AMR) Speech Codec 199 The AMR codec was originally developed and standardized by the 200 European Telecommunications Standards Institute (ETSI) for GSM 201 cellular systems. It is now chosen by the Third Generation 202 Partnership Project (3GPP) as the mandatory codec for third 203 generation (3G) cellular systems [1]. 205 The AMR codec is a multi-mode codec that supports eight narrow band 206 speech encoding modes with bit rates between 4.75 and 12.2 kbps. 207 The sampling frequency used in AMR is 8000 Hz and the speech 208 encoding is performed on 20 ms speech frames. Therefore, each 209 encoded AMR speech frame represents 160 samples of the original 210 speech. 212 Among the eight AMR encoding modes, three are already separately 213 adopted as standards of their own. Particularly, the 6.7 kbps mode 214 is adopted as PDC-EFR [18], the 7.4 kbps mode as IS-641 codec in 215 TDMA [17], and the 12.2 kbps mode as GSM-EFR [16]. 217 3.2. The Adaptive Multi-Rate Wideband (AMR-WB) Speech Codec 219 The Adaptive Multi-Rate Wideband (AMR-WB) speech codec [3] was 220 originally developed by 3GPP to be used in GSM and 3G cellular 221 systems. 223 Similar to AMR, the AMR-WB codec is also a multi-mode speech codec. 224 AMR-WB supports nine wide band speech coding modes with respective 225 bit rates ranging from 6.6 to 23.85 kbps. The sampling frequency 226 used in AMR-WB is 16000 Hz and the speech processing is performed on 227 20 ms frames. This means that each AMR-WB encoded frame represents 228 320 speech samples. 230 3.3. Multi-rate Encoding and Mode Adaptation 232 The multi-rate encoding (i.e., multi-mode) capability of AMR and 233 AMR-WB is designed for preserving high speech quality under a wide 234 range of transmission conditions. 236 With AMR or AMR-WB, mobile radio systems are able to use available 237 bandwidth as effectively as possible. E.g., in GSM it is possible 238 to dynamically adjust the speech encoding rate during a session so 239 as to continuously adapt to the varying transmission conditions by 240 dividing the fixed overall bandwidth between speech data and error 241 protective coding to enable best possible trade-off between speech 242 compression rate and error tolerance. To perform mode adaptation, 243 the decoder (speech receiver) needs to signal the encoder (speech 244 sender) the new mode it prefers. This mode change signal is called 245 Codec Mode Request or CMR. 247 Since in most sessions speech is sent in both directions between the 248 two ends, the mode requests from the decoder at one end to the 249 encoder at the other end are piggy-backed over the speech frames in 250 the reverse direction. In other words, there is no out-of-band 251 signaling needed for sending CMRs. 253 Every AMR or AMR-WB codec implementation is required to support all 254 the respective speech coding modes defined by the codec and must be 255 able to handle mode switching to any of the modes at any time. 256 However, some transport systems may impose limitations in the number 257 of modes supported and how often the mode can change due to 258 bandwidth limitations or other constraints. For this reason, the 259 decoder is allowed to indicate its acceptance of a particular mode 260 or a subset of the defined modes for the session using out-of-band 261 means. 263 For example, the GSM radio link can only use a subset of at most 264 four different modes in a given session. This subset can be any 265 combination of the eight AMR modes for an AMR session or any 266 combination of the nine AMR-WB modes for an AMR-WB session. 268 Moreover, for better interoperability with GSM through a gateway, 269 the decoder is allowed to use out-of-band means to set the minimum 270 number of frames between two mode changes and to limit the mode 271 change among neighboring modes only. 273 Section 8 specifies a set of media type parameters that may be used 274 to signal these mode adaptation controls at session setup. 276 3.4. Voice Activity Detection and Discontinuous Transmission 278 Both codecs support voice activity detection (VAD) and generation of 279 comfort noise (CN) parameters during silence periods. Hence, the 280 codecs have the option to reduce the number of transmitted bits and 281 packets during silence periods to a minimum. The operation of 282 sending CN parameters at regular intervals during silence periods is 283 usually called discontinuous transmission (DTX) or source controlled 284 rate (SCR) operation. The AMR or AMR-WB frames containing CN 285 parameters are called Silence Indicator (SID) frames. See more 286 details about VAD and DTX functionality in [9] and [10]. 288 3.5. Support for Multi-Channel Session 290 Both the RTP payload format and the storage format defined in this 291 document support multi-channel audio content (e.g., a stereophonic 292 speech session). 294 Although AMR and AMR-WB codecs themselves do not support encoding of 295 multi-channel audio content into a single bit stream, they can be 296 used to separately encode and decode each of the individual 297 channels. 299 To transport (or store) the separately encoded multi-channel 300 content, the speech frames for all channels that are framed and 301 encoded for the same 20 ms periods are logically collected in a 302 frame-block. 304 At the session setup, out-of-band signaling must be used to indicate 305 the number of channels in the session and the order of the speech 306 frames from different channels in each frame-block. When using SDP 307 for signaling, the number of channels is specified in the rtpmap 308 attribute and the order of channels carried in each frame-block is 309 implied by the number of channels as specified in Section 4.1 in 310 [12]. 312 3.6. Unequal Bit-error Detection and Protection 314 The speech bits encoded in each AMR or AMR-WB frame have different 315 perceptual sensitivity to bit errors. This property has been 316 exploited in cellular systems to achieve better voice quality by 317 using unequal error protection and detection (UEP and UED) 318 mechanisms. 320 The UEP/UED mechanisms focus the protection and detection of 321 corrupted bits to the perceptually most sensitive bits in an AMR or 322 AMR-WB frame. In particular, speech bits in an AMR or AMR-WB frame 323 are divided into class A, B, and C, where bits in class A are most 324 sensitive and bits in class C least sensitive (see Table 1 below for 325 AMR and [4] for AMR-WB). An AMR or AMR-WB frame is only declared 326 damaged if there are bit errors found in the most sensitive bits, 327 i.e., the class A bits. On the other hand, it is acceptable to have 328 some bit errors in the other bits, i.e., class B and C bits. 330 Class A total speech 331 Index Mode bits bits 332 ---------------------------------------- 333 0 AMR 4.75 42 95 334 1 AMR 5.15 49 103 335 2 AMR 5.9 55 118 336 3 AMR 6.7 58 134 337 4 AMR 7.4 61 148 338 5 AMR 7.95 75 159 339 6 AMR 10.2 65 204 340 7 AMR 12.2 81 244 341 8 AMR SID 39 39 343 Table 1. The number of class A bits for the AMR codec. 345 Moreover, a damaged frame is still useful for error concealment at 346 the decoder since some of the less sensitive bits can still be used. 347 This approach can improve the speech quality compared to discarding 348 the damaged frame. 350 3.6.1. Applying UEP and UED in an IP Network 352 To take full advantage of the bit-error robustness of the AMR and 353 AMR-WB codec, the RTP payload format is designed to facilitate 354 UEP/UED in an IP network. It should be noted however that the 355 utilization of UEP and UED discussed below is OPTIONAL. 357 UEP/UED in an IP network can be achieved by detecting bit errors in 358 class A bits and tolerating bit errors in class B/C bits of the AMR 359 or AMR-WB frame(s) in each RTP payload. 361 Link layer protocols exist that do not discard packets containing 362 bit errors, e.g., SLIP and some wireless links. With the Internet 363 traffic pattern shifting towards a more multimedia-centric one, more 364 link layers of such nature may emerge in the future. With transport 365 layer support for partial checksums, for example those supported by 366 UDP-Lite [19], bit error tolerant AMR and AMR-WB traffic could 367 achieve better performance over these types of links. The 368 relationship between UDP-Lite's partial checksum at the Transport 369 Layer and the checksum coverage provided by the link-layer frame is 370 described in UDP-Lite specification [19]. 372 There are at least two basic approaches for carrying AMR and AMR-WB 373 traffic over bit error tolerant IP networks: 375 a) Utilizing a partial checksum to cover the IP, transport protocol 376 (e.g. UDP-Lite), RTP and payload headers, and the most important 377 speech bits of the payload. The IP, UDP and RTP headers need to 378 be protected, and it is recommended that at least all class A 379 bits are covered by the checksum. 381 b) Utilizing a partial checksum to only cover the IP, transport 382 protocol, RTP and payload headers, but an AMR or AMR-WB frame CRC 383 to cover the class A bits of each speech frame in the RTP 384 payload. 386 In either approach, at least part of the class B/C bits are left 387 without error-check and thus bit error tolerance is achieved. 389 Note, it is still important that the network designer pay 390 attention to the class B and C residual bit error rate. Though 391 less sensitive to errors than class A bits, class B and C bits 392 are not insignificant and undetected errors in these bits cause 393 degradation in speech quality. An example of residual error 394 rates considered acceptable for AMR in UMTS can be found in [24] 395 and for AMR-WB in [25]. 397 The application interface to the UEP/UED transport protocol (e.g., 398 UDP-Lite) may not provide any control over the link error rate, 399 especially in a gateway scenario. Therefore, it is incumbent upon 400 the designer of a node with a link interface of this type to choose 401 a residual bit error rate that is low enough to support applications 402 such as AMR encoding when transmitting packets of a UEP/UED 403 transport protocol. 405 Approach 1 is bit efficient, flexible and simple, but comes with two 406 disadvantages, namely, a) bit errors in protected speech bits will 407 cause the payload to be discarded, and b) when transporting multiple 408 AMR or AMR-WB frames in a RTP payload there is the possibility that 409 a single bit error in protected bits will cause all the frames to be 410 discarded. 412 These disadvantages can be avoided, if needed, with some overhead in 413 the form of a frame-wise CRC (Approach 2). In problem a), the CRC 414 makes it possible to detect bit errors in class A bits and use the 415 frame for error concealment, which gives a small improvement in 416 speech quality. For b), when transporting multiple frames in a 417 payload, the CRCs remove the possibility that a single bit error in 418 a class A bit will cause all the frames to be discarded. Avoiding 419 that gives an improvement in speech quality when transporting 420 multiple AMR or AMR-WB frames over links subject to bit errors. 422 The choice between the above two approaches must be made based on 423 the available bandwidth, and desired tolerance to bit errors. 424 Neither solution is appropriate to all cases. Section 8 defines 425 parameters that may be used at session setup to select between these 426 approaches. 428 3.7. Robustness against Packet Loss 430 The payload format supports several means, including forward error 431 correction (FEC) and frame interleaving, to increase robustness 432 against packet loss. 434 3.7.1. Use of Forward Error Correction (FEC) 436 The simple scheme of repetition of previously sent data is one way 437 of achieving FEC. Another possible scheme which is more bandwidth 438 efficient is to use payload external FEC, e.g., RFC2733 [23], which 439 generates extra packets containing repair data. The whole payload 440 can also be sorted in sensitivity order to support external FEC 441 schemes using UEP. There is also a work in progress on a generic 442 version of such a scheme [22] that can be applied to AMR or AMR-WB 443 payload transport. 445 With AMR or AMR-WB, it is possible to use the multi-rate capability 446 of the codec to send redundant copies of the same mode or of another 447 mode, e.g., one with lower-bandwidth. We describe such a scheme 448 next. 450 This involves the simple retransmission of previously transmitted 451 frame-blocks together with the current frame-block(s). This is done 452 by using a sliding window to group the speech frame-blocks to send 453 in each payload. Figure 1 below shows us an example. 455 --+--------+--------+--------+--------+--------+--------+--------+-- 456 | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) | 457 --+--------+--------+--------+--------+--------+--------+--------+-- 459 <---- p(n-1) ----> 460 <----- p(n) -----> 461 <---- p(n+1) ----> 462 <---- p(n+2) ----> 463 <---- p(n+3) ----> 464 <---- p(n+4) ----> 466 Figure 1: An example of redundant transmission. 468 In this example each frame-block is retransmitted one time in the 469 following RTP payload packet. Here, f(n-2)..f(n+4) denotes a 470 sequence of speech frame-blocks and p(n-1)..p(n+4) a sequence of 471 payload packets. 473 The use of this approach does not require signaling at the session 474 setup. However a parameter for providing a maximum delay in 475 transmitting any redundant frame is defined. In other words, the 476 speech sender can choose to use this scheme without consulting the 477 receiver. This is because a packet containing redundant frames will 478 not look different from a packet with only new frames. The receiver 479 may receive multiple copies or versions (encoded with different 480 modes) of a frame for a certain timestamp if no packet is lost. If 481 multiple versions of the same speech frame are received, it is 482 recommended that the mode with the highest rate be used by the 483 speech decoder. 485 This redundancy scheme provides the same functionality as the one 486 described in RFC 2198 "RTP Payload for Redundant Audio Data" [27]. 487 In most cases the mechanism in this payload format is more efficient 488 and simpler than requiring both endpoints to support RFC 2198 in 489 addition. There are two situations in which use of RFC 2198 is 490 indicated: if the spread in time required between the primary and 491 redundant encodings is larger than 5 frame times, the bandwidth 492 overhead of RFC 2198 will be lower; or, if a non-AMR codec is 493 desired for the redundant encoding, the AMR payload format won't be 494 able to carry it. 496 The sender is responsible for selecting an appropriate amount of 497 redundancy based on feedback about the channel, e.g., in RTCP 498 receiver reports. A sender should not base selection of FEC on the 499 CMR, as this parameter most probably was set based on none-IP 500 information, e.g., radio link performance measures. The sender is 501 also responsible for avoiding congestion, which may be exacerbated 502 by redundancy (see Section 6 for more details). 504 3.7.2. Use of Frame Interleaving 506 To decrease protocol overhead, the payload design allows several 507 speech frame-blocks be encapsulated into a single RTP packet. One 508 of the drawbacks of such an approach is that in case of packet loss 509 this means loss of several consecutive speech frame-blocks, which 510 usually causes clearly audible distortion in the reconstructed 511 speech. Interleaving of frame-blocks can improve the speech quality 512 in such cases by distributing the consecutive losses into a series 513 of single frame-block losses. However, interleaving and bundling 514 several frame-blocks per payload will also increase end-to-end delay 515 and is therefore not appropriate for all types of applications. 516 Streaming applications will most likely be able to exploit 517 interleaving to improve speech quality in lossy transmission 518 conditions. 520 This payload design supports the use of frame interleaving as an 521 option. For the encoder (speech sender) to use frame interleaving 522 in its outbound RTP packets for a given session, the decoder (speech 523 receiver) needs to indicate its support via out-of-band means (see 524 Section 8). 526 3.8. Bandwidth Efficient or Octet-aligned Mode 528 For a given session, the payload format can be either bandwidth 529 efficient or octet aligned, depending on the mode of operation that 530 is established for the session via out-of-band means. 532 In the octet-aligned format, all the fields in a payload, including 533 payload header, table of contents entries, and speech frames 534 themselves, are individually aligned to octet boundaries to make 535 implementations efficient. In the bandwidth efficient format only 536 the full payload is octet aligned, so fewer padding bits are added. 538 Note, octet alignment of a field or payload means that the last 539 octet is padded with zeroes in the least significant bits to fill 540 the octet. Also note that this padding is separate from padding 541 indicated by the P bit in the RTP header. 543 Between the two operation modes, only the octet-aligned mode has the 544 capability to use the robust sorting, interleaving, and frame CRC to 545 make the speech transport more robust to packet loss and bit errors. 547 3.9. AMR or AMR-WB Speech over IP scenarios 549 The primary scenario for this payload format is IP end-to-end 550 between two terminals, as shown in Figure 2. This payload format is 551 expected to be useful for both conversational and streaming 552 services. 554 +----------+ +----------+ 555 | | IP/UDP/RTP/AMR or | | 556 | TERMINAL |<----------------------->| TERMINAL | 557 | | IP/UDP/RTP/AMR-WB | | 558 +----------+ +----------+ 560 Figure 2: IP terminal to IP terminal scenario 562 A conversational service puts requirements on the payload format. 563 Low delay is one very important factor, i.e., few speech 564 frame-blocks per payload packet. Low overhead is also required when 565 the payload format traverses low bandwidth links, especially as the 566 frequency of packets will be high. For low bandwidth links it also 567 an advantage to support UED which allows a link provider to reduce 568 delay and packet loss or to reduce the utilization of link 569 resources. 571 A Streaming service has less strict real-time requirements and 572 therefore can use a larger number of frame-blocks per packet than a 573 conversational service. This reduces the overhead from IP, UDP, and 574 RTP headers. However, including several frame-blocks per packet 575 makes the transmission more vulnerable to packet loss, so 576 interleaving may be used to reduce the effect packet loss will have 577 on speech quality. A streaming server handling a large number of 578 clients also needs a payload format that requires as few resources 579 as possible when doing packetization. The octet-aligned and 580 interleaving modes require the least amount of resources, while CRC, 581 robust sorting, and bandwidth efficient modes have higher demands. 583 Another scenario occurs when AMR or AMR-WB encoded speech will be 584 transmitted from a non-IP system (e.g., a GSM or 3GPP UMTS network) 585 to an IP/UDP/RTP VoIP terminal, and/or vice versa, as depicted in 586 Figure 3. 588 AMR or AMR-WB 589 over 590 I.366.{2,3} or +------+ +----------+ 591 3G Iu or | | IP/UDP/RTP/AMR or | | 592 <------------->| GW |<---------------------->| TERMINAL | 593 GSM Abis | | IP/UDP/RTP/AMR-WB | | 594 etc. +------+ +----------+ 595 | 596 GSM/ | IP network 597 3GPP UMTS network | 599 Figure 3: GW to VoIP terminal scenario 601 In such a case, it is likely that the AMR or AMR-WB frame is 602 packetized in a different way in the non-IP network and will need to 603 be re-packetized into RTP at the gateway. Also, speech frames from 604 the non-IP network may come with some UEP/UED information (e.g., a 605 frame quality indicator) that will need to be preserved and 606 forwarded on to the decoder along with the speech bits. This is 607 specified in Section 4.3.2. 609 AMR's capability to do fast mode switching is exploited in some 610 non-IP networks to optimize speech quality. To preserve this 611 functionality in scenarios including a gateway to an IP network, a 612 codec mode request (CMR) field is needed. The gateway will be 613 responsible for forwarding the CMR between the non-IP and IP parts 614 in both directions. The IP terminal should follow the CMR forwarded 615 by the gateway to optimize speech quality going to the non-IP 616 decoder. The mode control algorithm in the gateway must accommodate 617 the delay imposed by the IP network on the response to CMR by the IP 618 terminal. 620 The IP terminal should not set the CMR (see Section 4.3.1), but the 621 gateway can set the CMR value on frames going toward the encoder in 622 the non-IP part to optimize speech quality from that encoder to the 623 gateway. The gateway can alternatively set a lower CMR value, if 624 desired, as one means to control congestion on the IP network. 626 A third likely scenario is that IP/UDP/RTP is used as transport 627 between two non-IP systems, i.e., IP is originated and terminated in 628 gateways on both sides of the IP transport, as illustrated in Figure 629 4 below. 631 AMR or AMR-WB AMR or AMR-WB 632 over over 633 I.366.{2,3} or +------+ +------+ I.366.{2,3} or 634 3G Iu or | | IP/UDP/RTP/AMR or | | 3G Iu or 635 <------------->| GW |<------------------->| GW |<-------------> 636 GSM Abis | | IP/UDP/RTP/AMR-WB | | GSM Abis 637 etc. +------+ +------+ etc. 638 | | 639 GSM/ | IP network | GSM/ 640 3GPP UMTS network | | 3GPP UMTS network 642 Figure 4: GW to GW scenario 644 This scenario requires the same mechanisms for preserving UED/UEP 645 and CMR information as in the single gateway scenario. In addition, 646 the CMR value may be set in packets received by the gateways on the 647 IP network side. The gateway should forward to the non-IP side a 648 CMR value that is the minimum of three values: 650 - the CMR value it receives on the IP side; 652 - the CMR value it calculates based on its reception quality on 653 the non-IP side; and 655 - a CMR value it may choose for congestion control of 656 transmission on the IP side. 658 The details of the control algorithm are left to the implementation. 660 4. AMR and AMR-WB RTP Payload Formats 662 The AMR and AMR-WB payload formats have identical structure, so they 663 are specified together. The only differences are in the types of 664 codec frames contained in the payload. The payload format consists 665 of the RTP header, payload header and payload data. 667 4.1. RTP Header Usage 669 The format of the RTP header is specified in [8]. This payload 670 format uses the fields of the header in a manner consistent with 671 that specification. 673 The RTP timestamp corresponds to the sampling instant of the first 674 sample encoded for the first frame-block in the packet. The 675 timestamp clock frequency is the same as the sampling frequency, so 676 the timestamp unit is in samples. 678 The duration of one speech frame-block is 20 ms for both AMR and 679 AMR-WB. For AMR, the sampling frequency is 8 kHz, corresponding to 680 160 encoded speech samples per frame from each channel. For AMR-WB, 681 the sampling frequency is 16 kHz, corresponding to 320 samples per 682 frame from each channel. Thus, the timestamp is increased by 160 683 for AMR and 320 for AMR-WB for each consecutive frame-block. 685 A packet may contain multiple frame-blocks of encoded speech or 686 comfort noise parameters. If interleaving is employed, the 687 frame-blocks encapsulated into a payload are picked according to the 688 interleaving rules as defined in Section 4.4.1. Otherwise, each 689 packet covers a period of one or more contiguous 20 ms frame-block 690 intervals. In case the data from all the channels for a particular 691 frame-block in the period is missing, for example at a gateway from 692 some other transport format, it is possible to indicate that no data 693 is present for that frame-block rather than breaking a 694 multi-frame-block packet into two, as explained in Section 4.3.2. 696 To allow for error resiliency through redundant transmission, the 697 periods covered by multiple packets MAY overlap in time. A receiver 698 MUST be prepared to receive any speech frame multiple times, either 699 in exact duplicates, or in different AMR rate modes, or with data 700 present in one packet and not present in another. If multiple 701 versions of the same speech frame are received, it is RECOMMENDED 702 that the mode with the highest rate be used by the speech decoder. 703 A given frame MUST NOT be encoded as speech in one packet and 704 comfort noise parameters in another. 706 The payload is always made an integral number of octets long by 707 padding with zero bits if necessary. If additional padding is 708 required to bring the payload length to a larger multiple of octets 709 or for some other purpose, then the P bit in the RTP in the header 710 may be set and padding appended as specified in [8]. 712 The RTP header marker bit (M) SHALL be set to 1 if the first 713 frame-block carried in the packet contains a speech frame which is 714 the first in a talkspurt. For all other packets the marker bit 715 SHALL be set to zero (M=0). 717 The assignment of an RTP payload type for this new packet format is 718 outside the scope of this document, and will not be specified here. 719 It is expected that the RTP profile under which this payload format 720 is being used will assign a payload type for this encoding or 721 specify that the payload type is to be bound dynamically. 723 4.2. Payload Structure 725 The complete payload consists of a payload header, a payload table 726 of contents, and speech data representing one or more speech 727 frame-blocks. The following diagram shows the general payload 728 format layout: 730 +----------------+-------------------+---------------- 731 | payload header | table of contents | speech data ... 732 +----------------+-------------------+---------------- 734 Payloads containing more than one speech frame-block are called 735 compound payloads. 737 The following sections describe the variations taken by the payload 738 format depending on whether the AMR session is set up to use the 739 bandwidth-efficient mode or octet-aligned mode and any of the 740 OPTIONAL functions for robust sorting, interleaving, and frame CRCs. 741 Implementations SHOULD support both bandwidth-efficient and 742 octet-aligned operation to increase interoperability. 744 4.3. Bandwidth-Efficient Mode 746 4.3.1. The Payload Header 748 In bandwidth-efficient mode, the payload header simply consists of a 749 4 bit codec mode request: 751 0 1 2 3 752 +-+-+-+-+ 753 | CMR | 754 +-+-+-+-+ 756 CMR (4 bits): Indicates a codec mode request sent to the speech 757 encoder at the site of the receiver of this payload. The value 758 of the CMR field is set to the frame type index of the 759 corresponding speech mode being requested. The frame type index 760 may be 0-7 for AMR, as defined in Table 1a in [2], or 0-8 for 761 AMR-WB, as defined in Table 1a in [4]. CMR value 15 indicates 762 that no mode request is present, and other values are for future 763 use. 765 The codec mode request received in the CMR field is valid until the 766 next codec mode request is received, i.e., a newly received CMR 767 value corresponding to a speech mode or NO_DATA overrides the 768 previously received CMR value corresponding to a speech mode or 769 NO_DATA. Therefore, if a terminal continuously wishes to receive 770 frames in the same mode X, it needs to set CMR=X for all its 771 outbound payloads, and if a terminal has no preference in which mode 772 to receive, it SHOULD set CMR=15 in all its outbound payloads. 774 If receiving a payload with a CMR value that is not a speech mode or 775 NO_DATA, the CMR MUST be ignored by the receiver. 777 In a multi-channel session, codec mode request SHOULD be interpreted 778 by the receiver of the payload as the desired encoding mode for all 779 the channels in the session. 781 An IP end-point SHOULD NOT set the codec mode request based on 782 packet losses or other congestion indications, for several reasons: 784 - The other end of the IP path may be a gateway to a non-IP 785 network (such as a radio link) that needs to set the CMR field 786 to optimize performance on that network. 788 - Congestion on the IP network is managed by the IP sender, in 789 this case at the other end of the IP path. Feedback about 790 congestion SHOULD be provided to that IP sender through RTCP 791 or other means, and then the sender can choose to avoid 792 congestion using the most appropriate mechanism. That may 793 include adjusting the codec mode, but also includes adjusting 794 the level of redundancy or number of frames per packet. 796 The encoder SHOULD follow a received codec mode request, but MAY 797 change to a lower-numbered mode if it so chooses, for example to 798 control congestion. 800 The CMR field MUST be set to 15 for packets sent to a multicast 801 group. The encoder in the speech sender SHOULD ignore codec mode 802 requests when sending speech to a multicast session but MAY use RTCP 803 feedback information as a hint that a codec mode change is needed. 805 The codec mode selection MAY be restricted by a session parameter to 806 a subset of the available modes. If so, the requested mode MUST be 807 among the signalled subset (see Section 8). 809 4.3.2. The Payload Table of Contents 811 The table of contents (ToC) consists of a list of ToC entries, each 812 representing a speech frame. 814 In bandwidth-efficient mode, a ToC entry takes the following format: 816 0 1 2 3 4 5 817 +-+-+-+-+-+-+ 818 |F| FT |Q| 819 +-+-+-+-+-+-+ 820 F (1 bit): If set to 1, indicates that this frame is followed by 821 another speech frame in this payload; if set to 0, indicates that 822 this frame is the last frame in this payload. 824 FT (4 bits): Frame type index, indicating either the AMR or AMR-WB 825 speech coding mode or comfort noise (SID) mode of the 826 corresponding frame carried in this payload. 828 The value of FT is defined in Table 1a in [2] for AMR and in Table 829 1a in [4] for AMR-WB. FT=14 (SPEECH_LOST, only available for 830 AMR-WB) and FT=15 (NO_DATA) are used to indicate frames that are 831 either lost or not being transmitted in this payload, respectively. 833 NO_DATA (FT=15) frame could mean either that there is no data 834 produced by the speech encoder for that frame or that no data for 835 that frame is transmitted in the current payload (i.e., valid data 836 for that frame could be sent in either an earlier or later packet). 838 If receiving a ToC entry with a FT value in the range 9-14 for AMR 839 or 10-13 for AMR-WB the whole packet SHOULD be discarded. This is 840 to avoid the loss of data synchronization in the depacketization 841 process, which can result in a huge degradation in speech quality. 843 Note that packets containing only NO_DATA frames SHOULD NOT be 844 transmitted, independently of payload format configuration with the 845 exception of interleaving. Also, frame-blocks containing only 846 NO_DATA frames at the end of a packet SHOULD NOT be transmitted in 847 any payload format configuration, except in the case of 848 interleaving. The AMR SCR/DTX is described in [6] and AMR-WB 849 SCR/DTX in [7]. 851 The extra comfort noise frame types specified in table 1a in [2] 852 (i.e., GSM-EFR CN, IS-641 CN, and PDC-EFR CN) MUST NOT be used in 853 this payload format because the standardized AMR codec is only 854 required to implement the general AMR SID frame type and not those 855 that are native to the incorporated encodings. 857 Q (1 bit): Frame quality indicator. If set to 0, indicates the 858 corresponding frame is severely damaged and the receiver should 859 set the RX_TYPE (see [6]) to either SPEECH_BAD or SID_BAD 860 depending on the frame type (FT). 862 The frame quality indicator is included for interoperability with 863 the ATM payload format described in ITU-T I.366.2, the UMTS Iu 864 interface [20], as well as other transport formats. The frame 865 quality indicator enables damaged frames to be forwarded to the 866 speech decoder for error concealment. This can improve the speech 867 quality comparing to dropping the damaged frames. See Section 868 4.4.2.1 for more details. 870 For multi-channel sessions, the ToC entries of all frames from a 871 frame-block are placed in the ToC in consecutive order as defined in 872 Section 4.1 in [12]. When multiple frame-blocks are present in a 873 packet in bandwidth-efficient mode, they will be placed in the 874 packet in order of their creation time. 876 Therefore, with N channels and K speech frame-blocks in a packet, 877 there MUST be N*K entries in the ToC, and the first N entries will 878 be from the first frame-block, the second N entries will be from the 879 second frame-block, and so on. 881 The following figure shows an example of a ToC of three entries in a 882 single channel session using bandwidth efficient mode. 884 0 1 885 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 886 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 887 |1| FT |Q|1| FT |Q|0| FT |Q| 888 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 890 Below is an example of how the ToC entries will appear in the ToC of 891 a packet carrying 3 consecutive frame-blocks in a session with two 892 channels (L and R). 894 +----+----+----+----+----+----+ 895 | 1L | 1R | 2L | 2R | 3L | 3R | 896 +----+----+----+----+----+----+ 897 |<------->|<------->|<------->| 898 Frame- Frame- Frame- 899 Block 1 Block 2 Block 3 901 4.3.3. Speech Data 903 Speech data of a payload contains zero or more speech frames or 904 comfort noise frames, as described in the ToC of the payload. 906 Note, for ToC entries with FT=14 or 15, there will be no 907 corresponding speech frame present in the speech data. 909 Each speech frame represents 20 ms of speech encoded with the mode 910 indicated in the FT field of the corresponding ToC entry. The 911 length of the speech frame is implicitly defined by the mode 912 indicated in the FT field. The order and numbering notation of the 913 bits are as specified for Interface Format 1 (IF1) in [2] for AMR 914 and [4] for AMR-WB. As specified there, the bits of speech frames 915 have been rearranged in order of decreasing sensitivity, while the 916 bits of comfort noise frames are in the order produced by the 917 encoder. The resulting bit sequence for a frame of length K bits is 918 denoted d(0), d(1), ..., d(K-1). 920 4.3.4. Algorithm for Forming the Payload 922 The complete RTP payload in bandwidth-efficient mode is formed by 923 packing bits from the payload header, table of contents, and speech 924 frames, in order as defined by their corresponding ToC entries in 925 the ToC list, contiguously into octets beginning with the most 926 significant bits of the fields and the octets. 928 To be precise, the four-bit payload header is packed into the first 929 octet of the payload with bit 0 of the payload header in the most 930 significant bit of the octet. The four most significant bits 931 (numbered 0-3) of the first ToC entry are packed into the least 932 significant bits of the octet, ending with bit 3 in the least 933 significant bit. Packing continues in the second octet with bit 4 934 of the first ToC entry in the most significant bit of the octet. If 935 more than one frame is contained in the payload, then packing 936 continues with the second and successive ToC entries. Bit 0 of the 937 first data frame follows immediately after the last ToC bit, 938 proceeding through all the bits of the frame in numerical order. 939 Bits from any successive frames follow contiguously in numerical 940 order for each frame and in consecutive order of the frames. 942 If speech data is missing for one or more speech frame within the 943 sequence, because of, for example, DTX, a ToC entry with FT set to 944 NO_DATA SHALL be included in the ToC for each of the missing frames, 945 but no data bits are included in the payload for the missing frame 946 (see Section 4.3.5.2 for an example). 948 4.3.5. Payload Examples 950 4.3.5.1. Single Channel Payload Carrying a Single Frame 952 The following diagram shows a bandwidth-efficient AMR payload from a 953 single channel session carrying a single speech frame-block. 955 In the payload, no specific mode is requested (CMR=15), the speech 956 frame is not damaged at the IP origin (Q=1), and the coding mode is 957 AMR 7.4 kbps (FT=4). The encoded speech bits, d(0) to d(147), are 958 arranged in descending sensitivity order according to [2]. Finally, 959 two zero bits are added to the end as padding to make the payload 960 octet aligned. 962 0 1 2 3 963 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 964 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 965 | CMR=15|0| FT=4 |1|d(0) | 966 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 967 | | 968 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 969 | | 970 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 971 | | 972 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 973 | d(147)|P|P| 974 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 976 4.3.5.2. Single Channel Payload Carrying Multiple Frames 978 The following diagram shows a single channel, bandwidth efficient 979 compound AMR-WB payload that contains four frames, of which one has 980 no speech data. The first frame is a speech frame at 6.6 kbps mode 981 (FT=0) that is composed of speech bits d(0) to d(131). The second 982 frame is an AMR-WB SID frame (FT=9), consisting of bits g(0) to 983 g(39). The third frame is NO_DATA frame and does not carry any 984 speech information, it is represented in the payload by its ToC 985 entry. The fourth frame in the payload is a speech frame at 8.85 986 kbps mode (FT=1), it consists of speech bits h(0) to h(176). 988 As shown below, the payload carries a mode request for the encoder 989 on the receiver's side to change its future coding mode to AMR-WB 990 8.85 kbps (CMR=1). None of the frames are damaged at IP origin 991 (Q=1). The encoded speech and SID bits, d(0) to d(131), g(0) to 992 g(39) and h(0) to h(176), are arranged in the payload in descending 993 sensitivity order according to [4]. (Note, no speech bits are 994 present for the third frame). Finally, seven zero bits are padded 995 to the end to make the payload octet aligned. 997 0 1 2 3 998 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 999 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1000 | CMR=1 |1| FT=0 |1|1| FT=9 |1|1| FT=15 |1|0| FT=1 |1|d(0) | 1001 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1002 | | 1003 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1004 | | 1005 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1006 | | 1007 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1008 | d(131)| 1009 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1010 |g(0) | 1011 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1012 | g(39)|h(0) | 1013 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1014 | | 1015 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1016 | | 1017 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1018 | | 1019 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1020 | h(176)|P|P|P|P|P|P|P| 1021 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1023 4.3.5.3. Multi-Channel Payload Carrying Multiple Frames 1025 The following diagram shows a two channel payload carrying 3 1026 frame-blocks, i.e., the payload will contain 6 speech frames. 1028 In the payload all speech frames contain the same mode 7.4 kbit/s 1029 (FT=4) and are not damaged at IP origin. The CMR is set to 15, 1030 i.e., no specific mode is requested. The two channels are defined 1031 as left (L) and right (R) in that order. The encoded speech bits is 1032 designated dXY(0).. dXY(K-1), where X = block number, Y = channel, 1033 and K is the number of speech bits for that mode. Exemplifying 1034 this, for frame-block 1 of the left channel the encoded bits are 1035 designated as d1L(0) to d1L(147). 1037 0 1 2 3 1038 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1039 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1040 | CMR=15|1|1L FT=4|1|1|1R FT=4|1|1|2L FT=4|1|1|2R FT=4|1|1|3L FT| 1041 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1042 |4|1|0|3R FT=4|1|d1L(0) | 1043 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1044 | | 1045 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1046 | | 1047 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1048 | | 1049 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1050 | d1L(147)|d1R(0) | 1051 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1052 : ... : 1053 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1054 | d1R(147)|d2L(0) | 1055 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1056 : ... : 1057 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1058 |d2L(147|d2R(0) | 1059 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1060 : ... : 1061 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1062 | d2R(147)|d3L(0) | 1063 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1064 : ... : 1065 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1066 | d3L(147)|d3R(0) | 1067 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1068 : ... : 1069 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1070 | d3R(147)| 1071 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1073 4.4. Octet-aligned Mode 1075 4.4.1. The Payload Header 1077 In octet-aligned mode, the payload header consists of a 4 bit CMR, 4 1078 reserved bits, and optionally, an 8 bit interleaving header, as 1079 shown below: 1081 0 1 1082 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1083 +-+-+-+-+-+-+-+-+- - - - - - - - 1084 | CMR |R|R|R|R| ILL | ILP | 1085 +-+-+-+-+-+-+-+-+- - - - - - - - 1087 CMR (4 bits): same as defined in section 4.3.1. 1089 R: is a reserved bit that MUST be set to zero. All R bits MUST be 1090 ignored by the receiver. 1092 ILL (4 bits, unsigned integer): This is an OPTIONAL field that is 1093 present only if interleaving is signalled out-of-band for the 1094 session. ILL=L indicates to the receiver that the interleaving 1095 length is L+1, in number of frame-blocks. 1097 ILP (4 bits, unsigned integer): This is an OPTIONAL field that is 1098 present only if interleaving is signalled. ILP MUST take a value 1099 between 0 and ILL, inclusive, indicating the interleaving index 1100 for frame-blocks in this payload in the interleave group. If the 1101 value of ILP is found greater than ILL, the payload SHOULD be 1102 discarded. 1104 ILL and ILP fields MUST be present in each packet in a session if 1105 interleaving is signalled for the session. Interleaving MUST be 1106 performed on a frame-block basis (i.e., NOT on a frame basis) in a 1107 multi-channel session. 1109 The following example illustrates the arrangement of speech 1110 frame-blocks in an interleave group during an interleave session. 1111 Here we assume ILL=L for the interleave group that starts at speech 1112 frame-block n. We also assume that the first payload packet of the 1113 interleave group is s and the number of speech frame-blocks carried 1114 in each payload is N. Then we will have: 1116 Payload s (the first packet of this interleave group): 1117 ILL=L, ILP=0, 1118 Carry frame-blocks: n, n+(L+1), n+2*(L+1), ..., n+(N-1)*(L+1) 1120 Payload s+1 (the second packet of this interleave group): 1121 ILL=L, ILP=1, 1122 frame-blocks: n+1, n+1+(L+1), n+1+2*(L+1), ..., n+1+(N-1)*(L+1) 1123 ... 1125 Payload s+L (the last packet of this interleave group): 1126 ILL=L, ILP=L, 1127 frame-blocks: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+(N-1)*(L+1) 1129 The next interleave group will start at frame-block n+N*(L+1). 1131 There will be no interleaving effect unless the number of 1132 frame-blocks per packet (N) is at least 2. Moreover, the number of 1133 frame-blocks per payload (N) and the value of ILL MUST NOT be 1134 changed inside an interleave group. In other words, all payloads in 1135 an interleave group MUST have the same ILL and MUST contain the same 1136 number of speech frame-blocks. 1138 The sender of the payload MUST only apply interleaving if the 1139 receiver has signalled its use through out-of-band means. Since 1140 interleaving will increase buffering requirements at the receiver, 1141 the receiver uses media type parameter "interleaving=I" to set the 1142 maximum number of frame-blocks allowed in an interleaving group to 1143 I. 1145 When performing interleaving the sender MUST use a proper number of 1146 frame-blocks per payload (N) and ILL so that the resulting size of 1147 an interleave group is less or equal to I, i.e., N*(L+1)<=I. 1149 4.4.2. The Payload Table of Contents and Frame CRCs 1151 The table of contents (ToC) in octet-aligned mode consists of a list 1152 of ToC entries where each entry corresponds to a speech frame 1153 carried in the payload and, optionally, a list of speech frame CRCs, 1154 i.e., 1156 +---------------------+ 1157 | list of ToC entries | 1158 +---------------------+ 1159 | list of frame CRCs | (optional) 1160 - - - - - - - - - - - 1162 Note, for ToC entries with FT=14 or 15, there will be no 1163 corresponding speech frame or frame CRC present in the payload. 1165 The list of ToC entries is organized in the same way as described 1166 for bandwidth-efficient mode in 4.3.2, with the following exception; 1167 when interleaving is used the frame-blocks in the ToC will almost 1168 never be placed consecutive in time. Instead, the presence and 1169 order of the frame-blocks in a packet will follow the pattern 1170 described in 4.4.1. 1172 The following example shows the ToC of three consecutive packets, 1173 each carrying 3 frame-blocks, in an interleaved two-channel session. 1174 Here, the two channels are left (L) and right (R) with L coming 1175 before R, and the interleaving length is 3 (i.e., ILL=2). This 1176 makes the interleave group 9 frame-blocks large. 1178 Packet #1 1179 --------- 1181 ILL=2, ILP=0: 1182 +----+----+----+----+----+----+ 1183 | 1L | 1R | 4L | 4R | 7L | 7R | 1184 +----+----+----+----+----+----+ 1185 |<------->|<------->|<------->| 1186 Frame- Frame- Frame- 1187 Block 1 Block 4 Block 7 1189 Packet #2 1190 --------- 1192 ILL=2, ILP=1: 1193 +----+----+----+----+----+----+ 1194 | 2L | 2R | 5L | 5R | 8L | 8R | 1195 +----+----+----+----+----+----+ 1196 |<------->|<------->|<------->| 1197 Frame- Frame- Frame- 1198 Block 2 Block 5 Block 8 1200 Packet #3 1201 --------- 1203 ILL=2, ILP=2: 1204 +----+----+----+----+----+----+ 1205 | 3L | 3R | 6L | 6R | 9L | 9R | 1206 +----+----+----+----+----+----+ 1207 |<------->|<------->|<------->| 1208 Frame- Frame- Frame- 1209 Block 3 Block 6 Block 9 1211 A ToC entry takes the following format in octet-aligned mode: 1213 0 1 2 3 4 5 6 7 1214 +-+-+-+-+-+-+-+-+ 1215 |F| FT |Q|P|P| 1216 +-+-+-+-+-+-+-+-+ 1218 F (1 bit): see definition in Section 4.3.2. 1220 FT (4 bits unsigned integer): see definition in Section 4.3.2. 1222 Q (1 bit): see definition in Section 4.3.2. 1224 P bits: padding bits, MUST be set to zero, and MUST be ignored on 1225 reception. 1227 The list of CRCs is OPTIONAL. It only exists if the use of CRC is 1228 signalled out-of-band for the session. When present, each CRC in 1229 the list is 8 bit long and corresponds to a speech frame (NOT a 1230 frame-block) carried in the payload. Calculation and use of the CRC 1231 is specified in the next section. 1233 4.4.2.1. Use of Frame CRC for UED over IP 1235 The general concept of UED/UEP over IP is discussed in Section 3.6. 1236 This section provides more details on how to use the frame CRC in 1237 the octet-aligned payload header together with a partial transport 1238 layer checksum to achieve UED. 1240 To achieve UED, one SHOULD use a transport layer checksum, for 1241 example, the one defined in UDP-Lite [19], to protect the IP, 1242 transport protocol (e.g. UDP-Lite), and RTP headers, and in the 1243 payload the payload header and the table of contents. The frame 1244 CRC, when used, MUST be calculated only over all class A bits in the 1245 AMR or AMR-WB frame. Class B and C bits in the AMR or AMR-WB frame 1246 MUST NOT be included in the CRC calculation and SHOULD NOT be 1247 covered by the transport checksum. 1249 Note, the number of class A bits for various coding modes in AMR 1250 codec is specified as informative in [2] and is therefore copied 1251 into Table 1 in Section 3.6 to make it normative for this payload 1252 format. The number of class A bits for various coding modes in 1253 AMR-WB codec is specified as normative in table 2 in [4], and the 1254 SID frame (FT=9) has 40 class A bits. These definitions of class 1255 A bits MUST be used for this payload format. 1257 If the transport layer checksum or link layer checksum detects any 1258 errors within the protected (sensitive) part it is assumed that the 1259 complete packet will be discarded as defined by UDP-Lite [19]. 1261 The receiver of the payload SHOULD examine the data integrity of the 1262 received class A bits by re-calculating the CRC over the received 1263 class A bits and comparing the result to the value found in the 1264 received payload header. If the two values mismatch, the receiver 1265 SHALL consider the class A bits in the receiver frame damaged and 1266 MUST clear the Q flag of the frame (i.e., set it to 0). This will 1267 subsequently cause the frame to be marked as SPEECH_BAD, if the FT 1268 of the frame is 0..7 for AMR or 0..8 for AMR-WB, or SID_BAD if the 1269 FT of the frame is 8 for AMR or 9 for AMR-WB, before it is passed to 1270 the speech decoder. See [6] and [7] more details. 1272 The following example shows an octet-aligned ToC with a CRC list for 1273 a payload containing 3 speech frames from a single channel session 1274 (assuming none of the FTs is equal to 14 or 15): 1276 0 1 2 3 1277 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1279 |1| FT#1 |Q|P|P|1| FT#2 |Q|P|P|0| FT#3 |Q|P|P| CRC#1 | 1280 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1281 | CRC#2 | CRC#3 | 1282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1284 Each of the CRC's takes 8 bits 1286 0 1 2 3 4 5 6 7 1287 +---+---+---+---+---+---+---+---+ 1288 | c0| c1| c2| c3| c4| c5| c6| c7| 1289 +---+---+---+---+---+---+---+---+ 1290 (MSB) (LSB) 1292 and is calculated by the cyclic generator polynomial, 1294 C(x) = 1 + x^2 + x^3 + x^4 + x^8 1296 where ^ is the exponentiation operator. 1298 In binary form the polynomial has the following form: 101110001 1299 (MSB..LSB). 1301 The actual calculation of the CRC is made as follows: First, an 1302 8-bit CRC register is reset to zero: 00000000. For each bit over 1303 which the CRC shall be calculated, an XOR operation is made between 1304 the rightmost (LSB) bit of the CRC register and the bit. The CRC 1305 register is then right shifted one step (each bit's significance is 1306 reduced by one), inputting a "0" as the leftmost bit (MSB). If the 1307 result of the XOR operation mentioned above is a "1" then "10111000" 1308 is bit-wise XOR-ed into the CRC register. This operation is 1309 repeated for each bit that the CRC should cover. In this case, the 1310 first bit would be d(0) for the speech frame for which the CRC 1311 should cover. When the last bit (e.g., d(54) for AMR 5.9 according 1312 to Table 1 in Section 3.6) have been used in this CRC calculation, 1313 the contents in CRC register should simply be copied to the 1314 corresponding field in the list of CRC's. 1316 Fast calculation of the CRC on a general-purpose CPU is possible 1317 using a table-driven algorithm. 1319 4.4.3. Speech Data 1321 In octet-aligned mode, speech data is carried in a similar way to 1322 that in the bandwidth-efficient mode as discussed in Section 4.3.3, 1323 with the following exceptions: 1325 - The last octet of each speech frame MUST be padded with zero 1326 bits at the end if not all bits in the octet are used. The 1327 padding bits MUST be ignored on reception. In other words, 1328 each speech frame MUST be octet-aligned. 1330 - When multiple speech frames are present in the speech data 1331 (i.e., compound payload), the speech frames can be arranged 1332 either one whole frame after another as usual, or with the 1333 octets of all frames interleaved together at the octet level. 1334 Since the bits within each frame are ordered with the most 1335 error-sensitive bits first, interleaving the octets collects 1336 those sensitive bits from all frames to be nearer the 1337 beginning of the packet. This is called "robust sorting 1338 order" which allows the application of UED (such as UDP-Lite 1339 [19]) or UEP (such as the ULP [22]) mechanisms to the payload 1340 data. The details of assembling the payload are given in the 1341 next section. 1343 The use of robust sorting order for a session MUST be agreed via 1344 out-of-band means. Section 8 specifies a media type parameter for 1345 this purpose. 1347 Note, robust sorting order MUST only be performed on the frame level 1348 and thus is independent of interleaving which is at the frame-block 1349 level, as described in Section 4.4.1. In other words, robust sorting 1350 can be applied to either non-interleaved or interleaved sessions. 1352 4.4.4. Methods for Forming the Payload 1354 Two different packetization methods, namely normal order and robust 1355 sorting order, exist for forming a payload in octet-aligned mode. 1356 In both cases, the payload header and table of contents are packed 1357 into the payload the same way; the difference is in the packing of 1358 the speech frames. 1360 The payload begins with the payload header of one octet or two if 1361 frame interleaving is selected. The payload header is followed by 1362 the table of contents consisting of a list of one-octet ToC entries. 1363 If frame CRCs are to be included, they follow the table of contents 1364 with one 8-bit CRC filling each octet. Note that if a given frame 1365 has a ToC entry with FT=14 or 15, there will be no CRC present. 1367 The speech data follows the table of contents, or the CRCs if 1368 present. For packetization in the normal order, all of the octets 1369 comprising a speech frame are appended to the payload as a unit. The 1370 speech frames are packed in the same order as their corresponding 1371 ToC entries are arranged in the ToC list, with the exception that if 1372 a given frame has a ToC entry with FT=14 or 15, there will be no 1373 data octets present for that frame. 1375 For packetization in robust sorting order, the octets of all speech 1376 frames are interleaved together at the octet level. That is, the 1377 data portion of the payload begins with the first octet of the first 1378 frame, followed by the first octet of the second frame, then the 1379 first octet of the third frame, and so on. After the first octet of 1380 the last frame has been appended, the cycle repeats with the second 1381 octet of each frame. The process continues for as many octets as 1382 are present in the longest frame. If the frames are not all the 1383 same octet length, a shorter frame is skipped once all octets in it 1384 have been appended. The order of the frames in the cycle will be 1385 sequential if frame interleaving is not in use, or according to the 1386 interleave pattern specified in the payload header if frame 1387 interleaving is in use. Note that if a given frame has a ToC entry 1388 with FT=14 or 15, there will be no data octets present for that 1389 frame so that frame is skipped in the robust sorting cycle. 1391 The UED and/or UEP is RECOMMENDED to cover at least the RTP header, 1392 payload header, table of contents, and class A bits of a sorted 1393 payload. Exactly how many octets need to be covered depends on the 1394 network and application. If CRCs are used together with robust 1395 sorting, only the RTP header, the payload header, and the ToC SHOULD 1396 be covered by UED/UEP. The means to communicate to other layers 1397 performing UED/UEP the number of octets to be covered is beyond the 1398 scope of this specification. 1400 4.4.5. Payload Examples 1402 4.4.5.1. Basic Single Channel Payload Carrying Multiple Frames 1404 The following diagram shows an octet aligned payload from a single 1405 channel session that carries two AMR frames of 7.95 kbps coding mode 1406 (FT=5). In the payload, a codec mode request is sent (CMR=6), 1407 requesting the encoder at the receiver's side to use AMR 10.2 kbps 1408 coding mode. No frame CRC, interleaving, or robust-sorting is in 1409 use. 1411 0 1 2 3 1412 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1413 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1414 | CMR=6 |R|R|R|R|1|FT#1=5 |Q|P|P|0|FT#2=5 |Q|P|P| f1(0..7) | 1415 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1416 | f1(8..15) | f1(16..23) | .... | 1417 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1418 : ... : 1419 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1420 | ... |f1(152..158) |P| f2(0..7) | 1421 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1422 | f2(8..15) | f2(16..23) | .... | 1423 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1424 : ... : 1425 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1426 | ... |f2(152..158) |P| 1427 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1429 Note, in above example the last octet in both speech frames is 1430 padded with one zero bit to make it octet-aligned. 1432 4.4.5.2. Two Channel Payload with CRC, Interleaving, and Robust-sorting 1434 This example shows an octet aligned payload from a two channel 1435 session. Two frame-blocks, each containing two speech frames of 1436 7.95 kbps coding mode (FT=5), are carried in this payload, 1438 The two channels are left (L) and right (R) with L coming before R. 1439 In the payload, a codec mode request is also sent (CMR=6), 1440 requesting the encoder at the receiver's side to use AMR 10.2 kbps 1441 coding mode. 1443 Moreover, frame CRC and frame-block interleaving are both enabled 1444 for the session. The interleaving length is 2 (ILL=1) and this 1445 payload is the first one in an interleave group (ILP=0). 1447 The first two frames in the payload are the L and R channel speech 1448 frames of frame-block #1, consisting of bits f1L(0..158) and 1449 f1R(0..158), respectively. The next two frames are the L and R 1450 channel frames of frame-block #3, consisting of bits f3L(0..158) and 1451 f3R(0..158), respectively, due to interleaving. For each of the 1452 four speech frames a CRC is calculated as CRC1L(0..7), CRC1R(0..7), 1453 CRC3L(0..7), and CRC3R(0..7), respectively. Finally, the payload is 1454 robust sorted. 1456 0 1 2 3 1457 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1458 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1459 | CMR=6 |R|R|R|R| ILL=1 | ILP=0 |1|FT#1L=5|Q|P|P|1|FT#1R=5|Q|P|P| 1460 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1461 |1|FT#3L=5|Q|P|P|0|FT#3R=5|Q|P|P| CRC1L | CRC1R | 1462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1463 | CRC3L | CRC3R | f1L(0..7) | f1R(0..7) | 1464 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1465 | f3L(0..7) | f3R(0..7) | f1L(8..15) | f1R(8..15) | 1466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1467 | f3L(8..15) | f3R(8..15) | f1L(16..23) | f1R(16..23) | 1468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1469 : ... : 1470 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1471 | f3L(144..151) | f3R(144..151) |f1L(152..158)|P|f1R(152..158)|P| 1472 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1473 |f3L(152..158)|P|f3R(152..158)|P| 1474 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1476 Note, in above example the last octet in all the four speech frames 1477 is padded with one zero bit to make it octet-aligned. 1479 4.5. Implementation Considerations 1481 An application implementing this payload format MUST understand all 1482 the payload parameters in the out-of-band signaling used. For 1483 example, if an application uses SDP, all the SDP and media type 1484 parameters in this document MUST be understood. This requirement 1485 ensures that an implementation always can decide if it is capable or 1486 not of communicating. 1488 No operating mode of the payload format is mandatory to implement. 1489 The requirements of the application using the payload format should 1490 be used to determine what to implement. To achieve basic 1491 interoperability an implementation SHOULD at least implement both 1492 bandwidth-efficient and octet-aligned modes for a single audio 1493 channel. The other operating modes: interleaving, robust sorting, 1494 and frame-wise CRC in both single and multi-channel, are OPTIONAL to 1495 implement. 1497 The mode-change period and mode-change-neighbor parameters are 1498 intended for signaling with GSM endpoints. When interoperability 1499 with GSM is desired, encoders SHOULD only perform codec mode changes 1500 to neighboring modes and in integer multiples of 40ms (two 1501 frame-blocks), but decoders SHOULD accept codec mode changes at any 1502 time, i.e. for every frame-block. The encoder may arbitrarily select 1503 the initial phase (odd or even frame-block), where codec mode 1504 changes are performed, but then SHOULD stick to that phase as far as 1505 possible. Handovers or other events (e.g. call forwarding) may, 1506 however, in rare cases change this phase and may also cause mode 1507 changes to non-neighboring modes. The decoder SHALL therefore be 1508 prepared to accept changes also in the other phase and to other 1509 modes. Section 8 specifies the usage of the parameters mode-change- 1510 period and mode-change-capability to indicate the desired behavior 1511 in applications. 1513 See 3GPP TS 26.103 [28] for preferred AMR and AMR-WB configurations 1514 for operation in GSM and 3GPP UMTS networks. In gateway scenarios 1515 encoders can be requested through the "mode-set" parameter to use a 1516 limited mode-set that is supported by the link beyond the gateway. 1517 Further to avoid congestion on that link, the encoder SHOULD limit 1518 the initial codec mode for a session to a lower mode, until at least 1519 one frame-block is received with rate control information. 1521 4.5.1. Decoding Validation 1523 When processing a received payload packet, if the receiver finds 1524 that the calculated payload length, based on the information of the 1525 session and the values found in the payload header fields, does not 1526 match the size of the received packet, the receiver SHOULD discard 1527 the packet. This is because decoding a packet that has errors in 1528 its length field could severely degrade the speech quality. 1530 5. AMR and AMR-WB Storage Format 1532 The storage format is used for storing AMR or AMR-WB speech frames 1533 in a file or as an e-mail attachment. Multiple channel content is 1534 supported. 1536 In general, an AMR or AMR-WB file has the following structure: 1538 +------------------+ 1539 | Header | 1540 +------------------+ 1541 | Speech frame 1 | 1542 +------------------+ 1543 : ... : 1544 +------------------+ 1545 | Speech frame n | 1546 +------------------+ 1548 Note, to preserve interoperability with already deployed 1549 implementations, single channel content uses a file header format 1550 different from that of multi-channel content. 1552 There also exists another storage format for AMR and AMR-WB that is 1553 suitable for applications with more advanced demands on the storage 1554 format, like random access or synchronization with video. This 1555 format is the 3GPP specified ISO-based multi-media file format 3GP 1556 [31]. Its media type is specified by RFC 3839 [32]. 1558 5.1. Single channel Header 1560 A single channel AMR or AMR-WB file header contains only a magic 1561 number and different magic numbers are defined to distinguish AMR 1562 from AMR-WB. 1564 The magic number for single channel AMR files MUST consist of ASCII 1565 character string: 1567 "#!AMR\n" 1568 (or 0x2321414d520a in hexadecimal). 1570 The magic number for single channel AMR-WB files MUST consist of 1571 ASCII character string: 1573 "#!AMR-WB\n" 1574 (or 0x2321414d522d57420a in hexadecimal). 1576 Note, the "\n" is an important part of the magic numbers and MUST be 1577 included in the comparison, since, otherwise, the single channel 1578 magic numbers above will become indistinguishable from those of the 1579 multi-channel files defined in the next section. 1581 5.2. Multi-channel Header 1583 The multi-channel header consists of a magic number followed by a 1584 32-bit channel description field, giving the multi-channel header 1585 the following structure: 1587 +------------------+ 1588 | magic number | 1589 +------------------+ 1590 | chan-desc field | 1591 +------------------+ 1593 The magic number for multi-channel AMR files MUST consist of the 1594 ASCII character string: 1596 "#!AMR_MC1.0\n" 1597 (or 0x2321414d525F4D43312E300a in hexadecimal). 1599 The magic number for multi-channel AMR-WB files MUST consist of the 1600 ASCII character string: 1602 "#!AMR-WB_MC1.0\n" 1603 (or 0x2321414d522d57425F4D43312E300a in hexadecimal). 1605 The version number in the magic numbers refers to the version of the 1606 file format. 1608 The 32 bit channel description field is defined as: 1610 0 1 2 3 1611 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1612 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1613 | Reserved bits | CHAN | 1614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1616 Reserved bits: MUST be set to 0 when written, and a reader MUST 1617 ignore them. 1619 CHAN (4 bit unsigned integer): Indicates the number of audio 1620 channels contained in this storage file. The valid values and the 1621 order of the channels within a frame block are specified in Section 1622 4.1 in [12]. 1624 5.3. Speech Frames 1626 After the file header, speech frame-blocks consecutive in time are 1627 stored in the file. Each frame-block contains a number of 1628 octet-aligned speech frames equal to the number of channels, and 1629 stored in increasing order, starting with channel 1. 1631 Each stored speech frame starts with a one octet frame header with 1632 the following format: 1634 0 1 2 3 4 5 6 7 1635 +-+-+-+-+-+-+-+-+ 1636 |P| FT |Q|P|P| 1637 +-+-+-+-+-+-+-+-+ 1639 The FT field and the Q bit are defined in the same way as in Section 1640 4.3.2 The P bits are padding and MUST be set to 0, and MUST be 1641 ignored. 1643 Following this one octet header come the speech bits as defined in 1644 4.4.3 The last octet of each frame is padded with zeroes, if 1645 needed, to achieve octet alignment. 1647 The following example shows an AMR frame in 5.9 kbit coding mode 1648 (with 118 speech bits) in the storage format. 1650 0 1 2 3 1651 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1653 |P| FT=2 |Q|P|P| | 1654 +-+-+-+-+-+-+-+-+ + 1655 | | 1656 + Speech bits for frame-block n, channel k + 1657 | | 1658 + +-+-+ 1659 | |P|P| 1660 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1662 Non-received speech frames or frame-blocks between SID updates 1663 during non-speech periods MUST be stored as NO_DATA frames (frame 1664 type 15, as defined in [2] and [4]). Frames or frame-blocks lost in 1665 transmission MUST be stored as NO_DATA frames or SPEECH_LOST (frame 1666 type 14, only available for AMR-WB) in complete frame-blocks to keep 1667 synchronization with the original media. 1669 Comfort noise frames of other types than AMR SID (FT=8), i.e. frame 1670 type 9,10 and 11 for AMR, SHALL NOT be used in the AMR file format. 1672 6. Congestion Control 1674 The general congestion control considerations for transporting RTP 1675 data apply to AMR or AMR-WB speech over RTP as well. However, the 1676 multi-rate capability of AMR and AMR-WB speech coding may provide an 1677 advantage over other payload formats for controlling congestion 1678 since the bandwidth demand can be adjusted by selecting a different 1679 coding mode. 1681 Another parameter that may impact the bandwidth demand for AMR and 1682 AMR-WB is the number of frame-blocks that are encapsulated in each 1683 RTP payload. Packing more frame-blocks in each RTP payload can 1684 reduce the number of packets sent and hence the overhead from 1685 IP/UDP/RTP headers, at the expense of increased delay. 1687 If forward error correction (FEC) is used to combat packet loss, the 1688 amount of redundancy added by FEC will need to be regulated so that 1689 the use of FEC itself does not cause a congestion problem. 1691 It is RECOMMENDED that AMR or AMR-WB applications using this payload 1692 format employ congestion control. The actual mechanism for 1693 congestion control is not specified but should be suitable for 1694 real-time flows, possibly "TCP Friendly Rate Control" [21]. 1696 7. Security Considerations 1698 RTP packets using the payload format defined in this specification 1699 are subject to the general security considerations discussed in [8] 1700 and in any used profile, like AVP [12] or SAVP [26]. 1702 As this format transports encoded speech, the main security issues 1703 include confidentiality, authentication and integrity of the speech 1704 itself. The payload format itself does not have any built-in 1705 security mechanisms. External mechanisms, such as SRTP [26], need to 1706 be used for this functionality. Note that the appropriate mechanism 1707 to provide security to RTP and the payloads following this memo may 1708 vary. It is dependent on the application, the transport, and the 1709 signalling protocol employed. Therefore a single mechanism is not 1710 sufficient, although if suitable the usage of SRTP [26] is 1711 RECOMMENDED. Other known mechanism that may be used are IPsec [33] 1712 and TLS [34] (RTP over TCP), but also other alternatives may exist. 1714 This payload format does not exhibit any significant non-uniformity 1715 in the receiver side computational complexity for packet processing 1716 and thus is unlikely to pose a denial-of-service threat due to the 1717 receipt of pathological data. 1719 7.1. Confidentiality 1721 To achieve confidentiality of the encoded AMR or AMR-WB speech, all 1722 speech data bits will need to be encrypted. There is less a need to 1723 encrypt the payload header or the table of contents due to a) that 1724 they only carry information about the requested speech mode, frame 1725 type, and frame quality, and b) that this information could be 1726 useful to some third party, e.g., quality monitoring. 1728 The packetization and unpacketization of the AMR and AMR-WB payload 1729 is done only at the end points. Therefore encryption should be 1730 performed after packet encapsulation, and decryption should be 1731 performed before packet decapsulation. 1733 Encryption may affect interleaving. Specifically, a change of keys 1734 should occur at the boundary between interleave groups. If it is 1735 not done at that boundary on both endpoints, the speech quality will 1736 be degraded during the complete interleave group for any receiver. 1738 The encryption mechanism may impact the robustness of the error 1739 correcting mechanism. This is discussed in Section 9.5 of SRTP 1740 [26]. From this, UED/UEP based on robust sorting may be difficult 1741 to apply when the payload data is encrypted. 1743 7.2. Authentication and Integrity 1744 To authenticate the sender and to protect the integrity of the 1745 RTP packets in transit, an external mechanism has to be used. As 1746 stated before, it is RECOMMENDED that SRTP [26] be used for common 1747 interoperability. Note that the use of UED/UEP may be difficult to 1748 combine with some integrity protection mechanisms because any bit 1749 errors will cause the integrity check to fail. 1751 Data tampering by a man-in-the-middle attacker could result in 1752 erroneous depacketization/decoding that could lower the speech 1753 quality, or produce unintelligible communications. Tampering with 1754 the CMR field may result in speech in a different quality than 1755 desired. 1757 8. Payload Format Parameters 1759 This section defines the parameters that may be used to select 1760 optional features of the AMR and AMR-WB payload formats. The 1761 parameters are defined here as part of the media type registrations 1762 for the AMR and AMR-WB speech codecs. The registrations are done 1763 following RFC 3555 [15] and the media registration rules [14]. 1765 A mapping of the parameters into the Session Description Protocol 1766 (SDP) [11] is also provided for those applications that use SDP. 1767 Equivalent parameters could be defined elsewhere for use with 1768 control protocols that do not use media types or SDP. 1770 Two separate media type registrations are made, one for AMR and one 1771 for AMR-WB, because they are distinct encodings that must be 1772 distinguished by their own media type. 1774 Data formats are specified for both real-time transport in RTP and 1775 for storage type applications such as e-mail attachments. 1777 8.1. AMR Media Type Registration 1779 The media type for the Adaptive Multi-Rate (AMR) codec is allocated 1780 from the IETF tree since AMR is a widely used speech codec in 1781 general VoIP and messaging applications. This media type 1782 registration covers both real-time transfer via RTP and non-real- 1783 time transfers via stored files. 1785 Note, any unspecified parameter MUST be ignored by the receiver. 1787 Media Type name: audio 1789 Media subtype name: AMR 1791 Required parameters: none 1792 Optional parameters: 1794 These parameters apply to RTP transfer only. 1796 octet-align: Permissible values are 0 and 1. If 1, octet-aligned 1797 operation SHALL be used. If 0 or if not present, 1798 bandwidth efficient operation is employed. 1800 mode-set: Restricts the active codec mode set to a subset of all 1801 modes, for example to be able to support transport 1802 channels such as GSM networks in gateway use cases. 1803 Possible values are a comma separated list of modes from 1804 the set: 0,...,7 (see Table 1a [2]). The SID frame type 1805 8 and No Data (frame type 15) are never included in the 1806 mode set, but can always be used. If mode-set is 1807 specified, it MUST be abided and frames encoded with 1808 modes outside of the subset MUST NOT be sent in any RTP 1809 payload or used in codec mode requests. If not present, 1810 all codec modes are allowed for the session. 1812 mode-change-period: Specifies a number of frame-blocks, N (1 or 1813 2), that is the frame-block period at which codec mode 1814 changes are allowed for the sender. The initial phase of 1815 the interval is arbitrary, but changes must be separated 1816 by a period of N frame-blocks, i.e. a value of two 1817 allows the sender to change mode every second frame- 1818 block. The value of N SHALL be either 1 or 2. If this 1819 parameter is not present, mode changes are allowed at 1820 any time during the session, i.e. N=1. 1822 mode-change-capability: Specifies if the client is capable of 1823 transmit with a restricted mode change period. The 1824 parameter may take value of 1 or 2. A value of 1 1825 indicates that the client is not capable of restricting 1826 the mode change period to 2, and that the codec mode may 1827 be changed at any point. A value of 2 indicates that 1828 client has the capability to restrict the mode change 1829 period to 2, thus that the client can correctly 1830 interoperate with a receiver requiring a mode-change- 1831 period=2. If this parameter is not present, the mode- 1832 change restriction capability is not supported, i.e. 1833 mode-change-capability=1. To be able to interoperate 1834 fully with gateways to circuit switched networks, for 1835 example GSM networks, transmissions with restricted mode 1836 changes (value = 2) are required. Thus, clients are 1837 RECOMMENDED to have the capability to support 1838 transmission according to mode-change-capability=2. 1840 mode-change-neighbor: Permissible values are 0 and 1. If 1, the 1841 sender SHOULD only perform mode changes to the 1842 neighboring modes in the active codec mode set. 1844 Neighboring modes are the ones closest in bit rate to 1845 the current mode, either the next higher or next lower 1846 rate. If 0 or if not present, change between any two 1847 modes in the active codec mode set is allowed. 1849 maxptime: The maximum amount of media which can be encapsulated 1850 in a payload packet, expressed as time in milliseconds. 1851 The time is calculated as the sum of the time the media 1852 present in the packet represents. The time SHOULD be an 1853 integer multiple of the frame size. If this parameter 1854 is not present, the sender MAY encapsulate any number of 1855 speech frames into one RTP packet. 1857 crc: Permissible values are 0 and 1. If 1, frame CRCs SHALL be 1858 included in the payload. If 0, or not present, CRCs 1859 SHALL NOT be used. If crc=1, this also implies 1860 automatically that octet-aligned operation SHALL be used 1861 for the session. 1863 robust-sorting: Permissible values are 0 and 1. If 1, the 1864 payload SHALL employ robust payload sorting. If 0 or if 1865 not present, simple payload sorting SHALL be used. If 1866 robust-sorting=1, this also implies automatically that 1867 octet-aligned operation SHALL be used for the session. 1869 interleaving: Indicates that frame-block level interleaving SHALL 1870 be used for the session and its value defines the 1871 maximum number of frame-blocks allowed in an 1872 interleaving group (see Section 4.4.1). If this 1873 parameter is not present, interleaving SHALL NOT be 1874 used. The presence of this parameter also implies 1875 automatically that octet-aligned operation SHALL be 1876 used. 1878 ptime: see RFC2327 [11]. 1880 channels: The number of audio channels. The possible values (1- 1881 6) and their respective channel order is specified in 1882 section 4.1 in [12]. If omitted it has the default 1883 value of 1. 1885 max-red: The maximum duration in milliseconds that elapse between 1886 the primary (first)transmission of a frame and any 1887 redundant transmission that the sender will use. This 1888 parameter allows a receiver to have a bounded delay when 1889 redundancy is used. Allowed values are between 0 (no 1890 redundancy will be used) and 65535. If the parameter is 1891 omitted no limitation on the use of redundancy is 1892 present. 1894 Encoding considerations: 1895 The Audio data is binary data, and must be encoded for 1896 non-binary transport; the Base64 encoding is suitable 1897 for Email. When used in RTP context the data is framed 1898 as defined in [14]. 1900 Security considerations: 1901 See Section 7 of RFC XXXX. 1903 Public specification: 1904 Please refer to Section 11 of RFC XXXX. 1906 Additional information: 1908 The following applies to stored-file transfer methods: 1910 Magic numbers: 1911 single channel: 1912 ASCII character string "#!AMR\n" 1913 (or 0x2321414d520a in hexadecimal) 1914 multi-channel: 1915 ASCII character string "#!AMR_MC1.0\n" 1916 (or 0x2321414d525F4D43312E300a in hexadecimal) 1918 AMR speech frames may also be stored in the file format 1919 "3GP" defined in 3GPP TS 26.244 [31], which is 1920 identified using the media types "audio/3GPP" or 1921 "video/3GPP" as registered by RFC 3839 [32]. 1923 File extensions: amr, AMR 1924 Macintosh file type code: "amr " (fourth character is space) 1926 Person & email address to contact for further information: 1927 magnus.westerlund@ericsson.com 1928 ari.lakaniemi@nokia.com 1930 Intended usage: COMMON. 1931 This media type is widely used in streaming, VoIP and 1932 messaging applications on many types of devices. 1934 Restrictions on usage: 1935 When this media type is used in the context of transfer 1936 over RTP SHALL the RTP payload format specified in 1937 Section 4 be used. In all other context SHALL the file 1938 format defined in Section 5 be used. 1940 Author: 1941 magnus.westerlund@ericsson.com 1942 ari.lakaniemi@nokia.com 1944 Change controller: 1945 IETF Audio/Video Transport working group delegated from 1946 the IESG. 1948 8.2. AMR-WB Media Type Registration 1950 The media type for the Adaptive Multi-Rate Wideband (AMR-WB) codec 1951 is allocated from the IETF tree since AMR-WB is a widely used speech 1952 codec in general VoIP and messaging applications. This media type 1953 registration covers both real-time transfer via RTP and non-real- 1954 time transfers via stored files. 1956 Note, any unspecified parameter MUST be ignored by the receiver. 1958 Media Type name: audio 1960 Media subtype name: AMR-WB 1962 Required parameters: none 1964 Optional parameters: 1966 These parameters apply to RTP transfer only. 1968 octet-align: Permissible values are 0 and 1. If 1, octet-aligned 1969 operation SHALL be used. If 0 or if not present, 1970 bandwidth efficient operation is employed. 1972 mode-set: Restricts the active codec mode set to a subset of all 1973 modes, for example to be able to support transport 1974 channels such as GSM networks in gateway use cases. 1975 Possible values are a comma separated list of modes from 1976 the set: 0,...,8 (see Table 1a [4]). The SID frame type 1977 9, SPEECH_LOST (frame type 14), and No Data (frame type 1978 15) are never included in the mode set, but can always 1979 be used. If mode-set is specified, it MUST be abided 1980 and frames encoded with modes outside of the subset MUST 1981 NOT be sent in any RTP payload or used in codec mode 1982 requests. If not present, all codec modes are allowed 1983 for the session. 1985 mode-change-period: Specifies a number of frame-blocks, N (1 or 1986 2), that is the frame-block period at which codec mode 1987 changes are allowed for the sender. The initial phase of 1988 the interval is arbitrary, but changes must be separated 1989 by multiples of N frame-blocks, i.e. a value of two 1990 allows the sender to change mode every second frame- 1991 block. The value of N SHALL be either 1 or 2. If this 1992 parameter is not present, mode changes are allowed at 1993 any time during the session, i.e. N=1. 1995 mode-change-capability: Specifies if the client is capable of 1996 transmit with a restricted mode change period. The 1997 parameter may take value of 1 or 2. A value of 1 1998 indicates that the client is not capable of restricting 1999 the mode change period to 2, and that the codec mode may 2000 be changed at any point. A value of 2 indicates that 2001 client has the capability to restrict the mode change 2002 period to 2, thus that the client can correctly 2003 interoperate with a receiver requiring a mode-change- 2004 period=2. If this parameter is not present, the mode- 2005 change restriction capability is not supported, i.e. 2006 mode-change-capability=1. To be able to interoperate 2007 fully with gateways to circuit switched networks, for 2008 example GSM networks, transmissions with restricted mode 2009 changes (value = 2) are required. Thus, clients are 2010 RECOMMENDED to have the capability to support 2011 transmission according to mode-change-capability=2. 2013 mode-change-neighbor: Permissible values are 0 and 1. If 1, the 2014 sender SHOULD only perform mode changes to the 2015 neighboring modes in the active codec mode set. 2016 Neighboring modes are the ones closest in bit rate to 2017 the current mode, either the next higher or next lower 2018 rate. If 0 or if not present, change between any two 2019 modes in the active codec mode set is allowed. 2021 maxptime: The maximum amount of media which can be encapsulated 2022 in a payload packet, expressed as time in milliseconds. 2023 The time is calculated as the sum of the time the media 2024 present in the packet represents. The time SHOULD be an 2025 integer multiple of the frame size. If this parameter 2026 is not present, the sender MAY encapsulate any number of 2027 speech frames into one RTP packet. 2029 crc: Permissible values are 0 and 1. If 1, frame CRCs SHALL be 2030 included in the payload. If 0, or not present, CRCs 2031 SHALL NOT be used. If crc=1, this also implies 2032 automatically that octet-aligned operation SHALL be used 2033 for the session. 2035 robust-sorting: Permissible values are 0 and 1. If 1, the 2036 payload SHALL employ robust payload sorting. If 0 or if 2037 not present, simple payload sorting SHALL be used. If 2038 robust-sorting=1, this also implies automatically that 2039 octet-aligned operation SHALL be used for the session. 2041 interleaving: Indicates that frame-block level interleaving SHALL 2042 be used for the session and its value defines the 2043 maximum number of frame-blocks allowed in an 2044 interleaving group (see Section 4.4.1). If this 2045 parameter is not present, interleaving SHALL NOT be 2046 used. The presence of this parameter also implies 2047 automatically that octet-aligned operation SHALL be 2048 used. 2050 ptime: see RFC2327 [11]. 2052 channels: The number of audio channels. The possible values (1- 2053 6) and their respective channel order is specified in 2054 section 4.1 in [12]. If omitted it has the default 2055 value of 1. 2057 max-red: The maximum duration in milliseconds that elapse between 2058 the primary (first)transmission of a frame and any 2059 redundant transmission that the sender will use. This 2060 parameter allows a receiver to have a bounded delay when 2061 redundancy is used. Allowed values are between 0 (no 2062 redundancy will be used) and 65535. If the parameter is 2063 omitted no limitation on the use of redundancy is 2064 present. 2066 Encoding considerations: 2067 The Audio data is binary data, and must be encoded for 2068 non-binary transport; the Base64 encoding is suitable 2069 for Email. When used in RTP context the data is framed 2070 as defined in [14]. 2072 Security considerations: 2073 See Section 7 of RFC XXXX. 2075 Public specification: 2076 Please refer to Section 11 of RFC XXXX. 2078 Additional information: 2079 The following applies to stored-file transfer methods: 2081 Magic numbers: 2082 single channel: 2083 ASCII character string "#!AMR-WB\n" 2084 (or 0x2321414d522d57420a in hexadecimal) 2085 multi-channel: 2086 ASCII character string "#!AMR-WB_MC1.0\n" 2087 (or 0x2321414d522d57425F4D43312E300a in hexadecimal) 2088 File extensions: awb, AWB 2089 Macintosh file type code: amrw 2090 Object identifier or OID: none 2092 AMR-WB speech frames may also be stored in the file 2093 format "3GP" defined in 3GPP TS 26.244 [31] and 2094 identified using the media type "audio/3GPP" or 2095 "video/3GPP" as registered by RFC 3839 [32]. 2097 Person & email address to contact for further information: 2098 magnus.westerlund@ericsson.com 2099 ari.lakaniemi@nokia.com 2101 Intended usage: COMMON. 2102 This media type is widely used in streaming, VoIP and 2103 messaging applications on many types of devices. 2105 Restrictions on usage: 2106 When this media type is used in the context of transfer 2107 over RTP SHALL the RTP payload format specified in 2108 Section 4 be used. In all other context SHALL the file 2109 format defined in Section 5 be used. 2111 Author: 2112 magnus.westerlund@ericsson.com 2113 ari.lakaniemi@nokia.com 2115 Change controller: 2116 IETF Audio/Video Transport working group delegated from 2117 the IESG. 2119 8.3. Mapping Media Type Parameters into SDP 2121 The information carried in the media type specification has a 2122 specific mapping to fields in the Session Description Protocol (SDP) 2123 [11], which is commonly used to describe RTP sessions. When SDP is 2124 used to specify sessions employing the AMR or AMR-WB codec, the 2125 mapping is as follows: 2127 - The media type ("audio") goes in SDP "m=" as the media name. 2129 - The media subtype (payload format name) goes in SDP "a=rtpmap" 2130 as the encoding name. The RTP clock rate in "a=rtpmap" MUST 2131 be 8000 for AMR and 16000 for AMR-WB, and the encoding 2132 parameters (number of channels) MUST either be explicitly set 2133 to N or omitted, implying a default value of 1. The values of 2134 N that are allowed are specified in Section 4.1 in [12]. 2136 - The parameters "ptime" and "maxptime" go in the SDP "a=ptime" 2137 and "a=maxptime" attributes, respectively. 2139 - Any remaining parameters go in the SDP "a=fmtp" attribute by 2140 copying them directly from the media type parameter string as 2141 a semicolon separated list of parameter=value pairs. 2143 8.3.1. Offer-Answer Model Considerations 2144 The following considerations apply when using SDP Offer-Answer 2145 procedures to negotiate the use of AMR or AMR-WB payload in RTP: 2147 - Each combination of the RTP payload transport format 2148 configuration parameters (octet-align, crc, robust-sorting, 2149 interleaving, and channels) is unique in its bit-pattern 2150 and not compatible with any other combination. When 2151 creating an offer in an application desiring to use the 2152 more advanced features (crc, robust-sorting, interleaving, 2153 or more than one channel), the offerer is RECOMMENDED to 2154 also offer a payload type containing only the octet-align 2155 or bandwidth efficient configuration with a single channel. 2156 If multiple configurations are of interest to the 2157 application they may all be offered, however care should be 2158 taken to not offer too many payload types. An SDP answerer 2159 MUST include in the SDP answer for a payload type the 2160 following parameters unmodified from the SDP offer, unless 2161 it removes the payload type: "octet-align"; "crc"; 2162 "robust-sorting"; "interleaving" and "channels". The SDP 2163 offerer and answerer MUST generate AMR or AMR-WB packets as 2164 described by these parameters. 2166 - The "mode-set" parameter can be used to restrict the set of 2167 active AMR/AMR-WB modes used in a session. This is 2168 primarily intended for gateways to networks such as GSM or 2169 3GPP UMTS, which transport only supports a subset. The 3GPP 2170 preferred codec configurations are defined in 3GPP TS 2171 26.103 [25], and it is RECOMMENDED that also other networks 2172 needing to restrict the mode set follow the preferred codec 2173 configurations defined in 3GPP for greatest 2174 interoperability. 2176 The parameter is bi-directional, i.e. the restricted set 2177 applies to media both to be received and sent by the 2178 declaring entity. If a mode set was supplied in the offer, 2179 the answerer SHALL return the mode-set unmodified or reject 2180 the payload type. However, only if no mode-set was supplied 2181 in the offer for a unicast two-peer session, is the 2182 answerer free to choose a mode-set in the answer. The mode- 2183 set in the answer is binding both for offerer and answerer. 2184 Thus, an offerer supporting all modes and subsets SHOULD 2185 NOT include the mode-set parameter. For any other offerer 2186 it is RECOMMENDED to include each mode-set it can support 2187 as a separate payload type within the offer. For multicast 2188 sessions, the answerer SHALL only participate in the 2189 session if it supports the offered mode-set. Thus it is 2190 RECOMMENDED that any offer for a multicast session include 2191 only the mode-set it will require the answerers to support, 2192 and that the mode-set be likely to be supported by all 2193 participants. 2195 - The parameters "mode-change-period" and "mode-change- 2196 capability" are intended to be used in sessions with 2197 gateways, for example when interoperating with GSM 2198 networks. Both parameters are declarative and are combined 2199 to allow a session participant to determine if the payload 2200 type can be supported. The mode-change-period will indicate 2201 what the offerer or answerer requires of data it receives, 2202 while the mode-change-capability indicates its transmission 2203 capabilities. 2205 A mode-change-period=2 in the offer indicates a requirement 2206 on the answerer to send with a mode-change period of 2, 2207 i.e., support mode-change-capability=2. If the answerer 2208 requires mode-change-period=2 it SHALL only include it in 2209 the answer if the offerer either has indicated support with 2210 mode-change-capability=2 or the offerer has indicated mode- 2211 change-period=2, otherwise the payload type SHALL be 2212 rejected. An offerer that supports mode-change-capability=2 2213 SHALL include the parameter in all offers to ensure the 2214 greatest possible interoperability, unless it includes 2215 mode-change-period=2 in the offer. The mode-change- 2216 capability SHOULD be included in answers. It is then 2217 indicating the answer's capabilty to transmit with that 2218 mode-change-period for the provided payload format 2219 configuration. The information is useful in future re- 2220 negotiation of the payload formats. 2222 - The parameter "mode-change-neighbor" is a recommendation to 2223 restrict the switching of codec modes to its neighbor and 2224 SHOULD be followed. It is intended to be used in gateway 2225 scenarios, for example to GSM networks, where the support 2226 of this parameter and the operations it implies improves 2227 interoperability. 2229 "mode-change-neighbor" is a declarative parameter. By 2230 including the parameter, the offerer or answerer indicates 2231 that it desires to receive streams with "mode-change- 2232 neighbor" restrictions. 2234 - The parameters "maxptime" and "ptime" will in most cases 2235 not affect interoperability, however the setting of the 2236 parameters can affect the performance of the application. 2237 The SDP offer-answer handling of the "ptime" parameter is 2238 described in RFC3264 [13]. The "maxptime" parameter MUST be 2239 handled in the same way. 2241 - The parameter "max-red" is a stream property parameter. For 2242 send-only or send-recv unicast media streams the parameter 2243 declares the limitation on redundancy that the stream 2244 sender will use. For recvonly streams it indicates the 2245 desired value for the stream sent to the receiver. The 2246 answerer MAY change the value but is RECOMMEDED to use the 2247 same limitation as the offer declares. In the case of 2248 multicast the offerer MAY declare a limitation, this SHALL 2249 be answered using the same value. A media sender is 2250 RECOMMEDED to always include the parameter and bound its 2251 usage of redundancy to simplify for the receiver. This is 2252 especially true if no redundancy will be used, in which 2253 case "max-red" is set to 0. As this parameter was not 2254 defined orignally some senders will not declare this 2255 parameter even if it will limit or not send redundancy at 2256 all. 2258 - Any unknown parameter in an offer SHALL be removed in the 2259 answer. 2261 8.3.2. Usage of declarative SDP 2263 In declarative usage, like SDP in RTSP [29] or SAP [30], the 2264 parameters SHALL be interpreted as follows: 2266 - The payload format configuration parameters (octet-align, crc, 2267 robust-sorting, interleaving, and channels) are all declarative 2268 and a participant MUST use the configuration(s) that is provided 2269 for the session. More than one configuration may be provided if 2270 necessary by declaring multiple RTP payload types, however the 2271 number of types should be kept small. 2273 - Any restriction of the AMR or AMR-WB encoder mode-switching and 2274 mode usage through the "mode-set", and "mode-change-period" MUST 2275 be followed by all participants of the session. The restriction 2276 indicated by "mode-change-neighbor" SHOULD be followed. Please 2277 note that such restrictions may be necessary if gateways to other 2278 transport systems like GSM participate in the session. Failure to 2279 consider such restrictions may result in failure for a peer 2280 behind such a gateway to correctly receive all or parts of the 2281 session. Also if different restrictions are needed by different 2282 peers in the same session, unless a common subset of the 2283 restrictions exists, some peer will not be able to participate. 2284 Note that the usage of mode-change-capability is meaningless when 2285 no negotiation exists, and can thus be excluded in any 2286 declarations. 2288 - Any "maxptime" and "ptime" values should be selected with care to 2289 ensure that the session's participants can achieve reasonable 2290 performance. 2292 - The usage of "max-red" puts a global upper limit on the usage of 2293 redundancy that needs to be followed by all that understand the 2294 parameter. However due to the late addition of this parameter, it 2295 may be ignored by some implementations. 2297 8.3.3. Examples 2299 Some example SDP session descriptions utilizing AMR and AMR-WB 2300 encodings follow. In these examples, long a=fmtp lines are folded 2301 to meet the column width constraints of this document; the backslash 2302 ("\") at the end of a line and the carriage return that follows it 2303 should be ignored. 2305 In an example of the usage of AMR in a possible GSM gateway to 2306 gateway scenario, the offerer is capable of supporting three 2307 different mode-sets and needs the mode-change-period to be 2 in 2308 combination with mode-change-neighbor restrictions. The other 2309 gateway can only support two of these mode-sets and removes the 2310 payload type 97 in the answer. If the offering GSM gateway only 2311 supports a single mode-set active at the same time, it should 2312 consider doing the 1 out of N selection procedures described in 2313 Section 10.2 of [13]: 2315 Offer: 2317 m=audio 49120 RTP/AVP 97 98 99 2318 a=rtpmap:97 AMR/8000/1 2319 a=fmtp:97 mode-set=0,2,5,7; mode-change-period=2; \ 2320 mode-change-capability=2; mode-change-neighbor=1 2321 a=rtpmap:98 AMR/8000/1 2322 a=fmtp:98 mode-set=0,2,3,6; mode-change-period=2; \ 2323 mode-change-capability=2; mode-change-neighbor=1 2324 a=rtpmap:99 AMR/8000/1 2325 a=fmtp:99 mode-set=0,2,3,4; mode-change-period=2; \ 2326 mode-change-capability=2; mode-change-neighbor=1 2327 a=maxptime:20 2329 Answer: 2331 m=audio 49120 RTP/AVP 98 99 2332 a=rtpmap:98 AMR/8000/1 2333 a=fmtp:98 mode-set=0,2,3,6; mode-change-period=2; \ 2334 mode-change-capability=2; mode-change-neighbor=1 2335 a=rtpmap:99 AMR/8000/1 2336 a=fmtp:99 mode-set=0,2,3,4; mode-change-period=2; \ 2337 mode-change-capability=2; mode-change-neighbor=1 2338 a=maxptime:20 2340 The following example shows the usage of AMR between a non-GSM 2341 endpoint and a GSM gateway. The non-GSM offerer requires no 2342 restrictions of the mode-change-period or mode-change-neighbor, but 2343 must signal its mode-change-capability in the offer and abide by 2344 those restrictions in the answer. 2346 Offer: 2348 m=audio 49120 RTP/AVP 97 2349 a=rtpmap:97 AMR/8000/1 2350 a=fmtp:97 mode-change-capability=2 2351 a=maxptime:20 2353 Answer: 2355 m=audio 49120 RTP/AVP 97 2356 a=rtpmap:97 AMR/8000/1 2357 a=fmtp:97 mode-set=0,2,4,7; mode-change-period=2; \ 2358 mode-change-capability=2; mode-change-neighbor=1 2359 a=maxptime:20 2361 Example of usage of AMR-WB in a possible VoIP scenario where UEP may 2362 be used (99) and a fallback declaration (98): 2364 m=audio 49120 RTP/AVP 99 98 2365 a=rtpmap:98 AMR-WB/16000 2366 a=fmtp:98 octet-align=1; mode-change-capability=2 2367 a=rtpmap:99 AMR-WB/16000 2368 a=fmtp:99 octet-align=1; crc=1; mode-change-capability=2 2370 Example of usage of AMR-WB in a possible streaming scenario (two 2371 channel stereo): 2373 m=audio 49120 RTP/AVP 99 2374 a=rtpmap:99 AMR-WB/16000/2 2375 a=fmtp:99 interleaving=30 2376 a=maxptime:100 2378 Note that the payload format (encoding) names are commonly shown in 2379 upper case. MIME subtypes are commonly shown in lower case. These 2380 names are case-insensitive in both places. Similarly, parameter 2381 names are case-insensitive both in MIME types and in the default 2382 mapping to the SDP a=fmtp attribute. 2384 9. IANA Considerations 2386 Two Media types (audio/amr and audio/amr-wb) are updated, see 2387 Section 8. 2389 10. Changes 2391 The differences between RFC 3267 and this document are as follows: 2393 - Added clarification what behavior in regards to mode change 2394 period and mode-change neighbor that is expected from an IP 2395 client, see Section 4.5. 2396 - Updated the maxptime for better clarification. The sentence that 2397 previously read: "The time SHOULD be a multiple of the frame 2398 size." now says "The time SHOULD be an integer multiple of the 2399 frame size. This should have no impact on interoperability. 2400 - Updated the definition of the mode-set parameter for 2401 clarification. 2402 - Restricted the values for mode-change-period to 1 or 2, which are 2403 the values used in circuit switched AMR systems. 2404 - Added a new media type parameter Mode-Change-Capability that 2405 defaults to 1, which is the assumed behavior of any non-updated 2406 implementation. This enables the offer-answer procedures to work. 2407 - Changed mode-change-neighbor to indicate a recommended behavior 2408 rather than a required one. 2409 - Added an Offer-Answer Section, see Section 8.3.1. This will have 2410 implications on the interoperability to implementations that have 2411 guessed how to perform offer/answer negoatiation of the payload 2412 parameters. 2413 - Clarified and aligned the unequal detection usage with the 2414 published UDP-Lite specification in section 3.6.1 and 4.4.2.1. 2415 This including removing a normative statement about packet 2416 handling with an informative paragraph with a reference to UDP- 2417 Lite. 2418 - Clarified the bit-order in the CRC calculation in Section 2419 4.4.2.1. 2420 - Corrected the reference in Section 5.3 for the Q and FT fields. 2421 - Changed the padding bit definition in Section 4.4.2 and 5.3 so 2422 that it is clear that they shall be ignored. 2423 - Added a clarification that Comfort Noise frames with frame type 2424 9, 10 and 11 SHALL NOT be used in the AMR file format. 2425 - Clarified in Section 4.3.2 that the rules about not sending 2426 NO_DATA frames do apply for all payload format configurations 2427 with the exception of the interleaved mode. 2428 - The reference list has been updated to now published RFCs: RFC 2429 3711, RFC 3828, RFC 3550, RFC 3448, and RFC 3551. A reference to 2430 3GPP TS 26.101 has also been added. 2431 - Added notes in storage format section and media type registration 2432 that AMR and AMR-WB frames can also be stored in the 3GP file 2433 format. 2434 - Added a media type parameter "max-red" that allows the sender to 2435 declare a bounded usage of redundancy. This parameter allows a 2436 receiver to optimize its function as it will know if redundancy 2437 will may be used or not. If it is used, the maximum extra delay 2438 introduced by the sender that is needed to be considered by the 2439 receiver to fully utilize the redundancy will be known. The 2440 addition of this parameter should have no negative effects on 2441 older implementations as they are mandated to ignore unknown 2442 parameters per RFC 3267. And in addition are required to operate 2443 as if the value of max-red is unknown and possibly infinite. 2444 - Updated the media type registration to comply with the new 2445 registration rules. 2446 - Moved section on decoding validation from Security consideration 2447 to Implementation consideration where it makes more sense. 2448 - Clarified the application of encryption, integrity protection and 2449 authentication mechanism to the payload. 2451 11. Acknowledgements 2453 The authors would like to thank Petri Koskelainen, Bernhard Wimmer, 2454 Tim Fingscheidt, Sanjay Gupta, Stephen Casner, and Colin Perkins for 2455 their significant contributions made throughout the writing and 2456 reviewing of RFC 3267 and this update. The authors would also like 2457 to thank Richard Ejzak, Thomas Belling, and Gorry Fairhurst for 2458 their input on this update of RFC 3267. 2460 12. References 2462 12.1. Normative References 2464 [1] 3GPP TS 26.090, "Adaptive Multi-Rate (AMR) speech transcoding", 2465 version 4.0.0 (2001-03), 3rd Generation Partnership Project 2466 (3GPP). 2467 [2] 3GPP TS 26.101, "AMR Speech Codec Frame Structure", version 2468 4.1.0 (2001-06), 3rd Generation Partnership Project (3GPP). 2469 [3] 3GPP TS 26.190 "AMR Wideband speech codec; Transcoding 2470 functions", version 5.0.0 (2001-03), 3rd Generation Partnership 2471 Project (3GPP). 2472 [4] 3GPP TS 26.201 "AMR Wideband speech codec; Frame Structure", 2473 version 5.0.0 (2001-03), 3rd Generation Partnership Project 2474 (3GPP). 2475 [5] Bradner, S., "Key words for use in RFCs to Indicate Requirement 2476 Levels", BCP 14, RFC 2119, March 1997. 2477 [6] 3GPP TS 26.093, "AMR Speech Codec; Source Controlled Rate 2478 operation", version 4.0.0 (2000-12), 3rd Generation Partnership 2479 Project (3GPP). 2480 [7] 3GPP TS 26.193 "AMR Wideband Speech Codec; Source Controlled 2481 Rate operation", version 5.0.0 (2001-03), 3rd Generation 2482 Partnership Project (3GPP). 2483 [8] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 2484 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 2485 RFC 3550, July 2003. 2486 [9] 3GPP TS 26.092, "AMR Speech Codec; Comfort noise aspects", 2487 version 4.0.0 (2001-03), 3rd Generation Partnership Project 2488 (3GPP). 2489 [10] 3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise 2490 aspects", version 5.0.0 (2001-03), 3rd Generation Partnership 2491 Project (3GPP). 2492 [11] Handley, M., V. Jacobson and C. Perkins, "SDP: Session 2493 Description Protocol", RFC 4566, July 2006. 2494 [12] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video 2495 Conferences with Minimal Control", STD 65, RFC 3551, July 2003. 2496 [13] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 2497 Session Description Protocol (SDP)", RFC 3264, June 2002. 2498 [14] Freed, N. and J. Klensin, "Media Type Specifications and 2499 Registration Procedures", BCP 13, RFC 4288, December 2005. 2500 [15] Casner, S, "Media Type Registration of RTP Payload Formats", 2501 draft-ietf-avt-rfc3555bis-04, April 17, 2006. 2503 12.2. Informative References 2505 [16] GSM 06.60, "Enhanced Full Rate (EFR) speech transcoding", 2506 version 8.0.1 (2000-11), European Telecommunications Standards 2507 Institute (ETSI). 2508 [17] ANSI/TIA/EIA-136-Rev.C, part 410 - "TDMA Cellular/PCS - Radio 2509 Interface, Enhanced Full Rate Voice Codec (ACELP)." Formerly 2510 IS-641. TIA published standard, June 1 2001. 2511 [18] ARIB, RCR STD-27H, "Personal Digital Cellular Telecommunication 2512 System RCR Standard", Association of Radio Industries and 2513 Businesses (ARIB). 2514 [19] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., and G. 2515 Fairhurst, "The Lightweight User Datagram Protocol (UDP-Lite)", 2516 RFC 3828, July 2004. 2517 [20] 3GPP TS 25.415 "UTRAN Iu Interface User Plane Protocols", 2518 version 4.2.0 (2001-09), 3rd Generation Partnership Project 2519 (3GPP). 2520 [21] Handley, M., Floyd, S., Padhye, J., and J. Widmer, "TCP 2521 Friendly Rate Control (TFRC): Protocol Specification", RFC 2522 3448, January 2003. 2523 [22] Li, A., et al., "An RTP Payload Format for Generic FEC with 2524 Uneven Level Protection", Work in Progress. 2525 [23] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for 2526 Generic Forward Error Correction", RFC 2733, December 1999. 2527 [24] 3GPP TS 26.102, "AMR speech codec interface to Iu and Uu", 2528 version 4.0.0 (2001-03), 3rd Generation Partnership Project 2529 (3GPP). 2530 [25] 3GPP TS 26.202, "AMR Wideband speech codec; Interface to Iu and 2531 Uu", version 5.0.0 (2001-03), 3rd Generation Partnership 2532 Project (3GPP). 2533 [26] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 2534 Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 2535 3711, March 2004. 2536 [27] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, 2537 M., Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP 2538 Payload for Redundant Audio Data", RFC 2198, September 1997. 2539 [28] 3GPP TS 26.103, "Speech codec list for GSM and UMTS", version 2540 5.5.0 (2004-09), 3rd Generation Partnership Project (3GPP). 2541 [29] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming 2542 Protocol (RTSP)", RFC 2326, April 1998. 2543 [30] Handley, M., Perkins, C., and E. Whelan, "Session Announcement 2544 Protocol", RFC 2974, October 2000. 2545 [31] 3GPP TS 26.244, "3GPP file format (3GP)", version 6.1.0 (2004- 2546 09), 3rd Generation Partnership Project (3GPP). 2547 [32] Castagno, R. and D. Singer, "MIME Type Registrations for 3rd 2548 Generation Partnership Project (3GPP) Multimedia files", RFC 2549 3839, July 2004. 2550 [33] Kent, S. and K. Seo, "Security Architecture for the Internet 2551 Protocol", RFC 4301, December 2005. 2552 [34] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) 2553 Protocol Version 1.1", RFC 4346, April 2006. 2555 ETSI documents can be downloaded from the ETSI web server, 2556 http://www.etsi.org/". Any 3GPP document can be downloaded from the 2557 3GPP webserver, "http://www.3gpp.org/", see specifications. TIA 2558 documents can be obtained from "www.tiaonline.org". 2560 13. Authors' Addresses 2562 Johan Sjoberg 2563 Ericsson AB 2564 SE-164 80 Stockholm, SWEDEN 2566 Phone: +46 8 7190000 2567 EMail: Johan.Sjoberg@ericsson.com 2569 Magnus Westerlund 2570 Ericsson Research 2571 Ericsson AB 2572 SE-164 80 Stockholm, SWEDEN 2574 Phone: +46 8 7190000 2575 EMail: Magnus.Westerlund@ericsson.com 2577 Ari Lakaniemi 2578 Nokia Research Center 2579 P.O.Box 407 2580 FIN-00045 Nokia Group, FINLAND 2582 Phone: +358-71-8008000 2583 EMail: ari.lakaniemi@nokia.com 2585 Qiaobing Xie 2586 Motorola, Inc. 2587 1501 W. Shure Drive, 2-B8 2588 Arlington Heights, IL 60004, USA 2590 Phone: +1-847-632-3028 2591 EMail: qxie1@email.mot.com 2593 14. IPR Notice 2595 The IETF takes no position regarding the validity or scope of any 2596 Intellectual Property Rights or other rights that might be claimed 2597 to pertain to the implementation or use of the technology described 2598 in this document or the extent to which any license under such 2599 rights might or might not be available; nor does it represent that 2600 it has made any independent effort to identify any such rights. 2601 Information on the procedures with respect to rights in RFC 2602 documents can be found in BCP 78 and BCP 79. 2604 Copies of IPR disclosures made to the IETF Secretariat and any 2605 assurances of licenses to be made available, or the result of an 2606 attempt made to obtain a general license or permission for the use 2607 of such proprietary rights by implementers or users of this 2608 specification can be obtained from the IETF on-line IPR repository 2609 at http://www.ietf.org/ipr. 2611 The IETF invites any interested party to bring to its attention any 2612 copyrights, patents or patent applications, or other proprietary 2613 rights that may cover technology that may be required to implement 2614 this standard. Please address the information to the IETF at 2615 ietf-ipr@ietf.org. 2617 15. Copyright Notice 2619 Copyright (C) The Internet Society (2006). 2621 This document is subject to the rights, licenses and restrictions 2622 contained in BCP 78, and except as set forth therein, the authors 2623 retain all their rights. 2625 This document and the information contained herein are provided on 2626 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 2627 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE 2628 INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 2629 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 2630 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2631 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2633 This Internet-Draft expires in January 2007 2635 RFC Editor Considerations 2637 - The RFC editor is requested to replace all occurances of XXXX 2638 with the RFC number this document receives.