idnits 2.17.1 draft-ietf-rtcweb-audio-codecs-for-interop-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 2, 2015) is 3068 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC7478' is defined on line 469, but no explicit reference was found in the text == Outdated reference: A later version (-11) exists of draft-ietf-rtcweb-audio-09 == Outdated reference: A later version (-19) exists of draft-ietf-rtcweb-overview-14 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Proust 3 Internet-Draft Orange 4 Intended status: Informational December 2, 2015 5 Expires: June 4, 2016 7 Additional WebRTC audio codecs for interoperability. 8 draft-ietf-rtcweb-audio-codecs-for-interop-03 10 Abstract 12 To ensure a baseline level of interoperability between WebRTC 13 clients, a minimum set of required codecs is specified. However, to 14 maximize the possibility to establish the session without the need 15 for audio transcoding, it is also recommended to include in the offer 16 other suitable audio codecs that are available to the browser. 18 This document provides some guidelines on the suitable codecs to be 19 considered for WebRTC clients to address the most relevant 20 interoperability use cases. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on June 4, 2016. 39 Copyright Notice 41 Copyright (c) 2015 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 2. Definition and abbreviations . . . . . . . . . . . . . . . . 3 58 3. Rationale for additional WebRTC codecs . . . . . . . . . . . 3 59 4. Additional suitable codecs for WebRTC . . . . . . . . . . . . 5 60 4.1. AMR-WB . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 4.1.1. AMR-WB General description . . . . . . . . . . . . . 5 62 4.1.2. WebRTC relevant use case for AMR-WB . . . . . . . . . 5 63 4.1.3. Guidelines for AMR-WB usage and implementation with 64 WebRTC . . . . . . . . . . . . . . . . . . . . . . . 5 65 4.2. AMR . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 66 4.2.1. AMR General description . . . . . . . . . . . . . . . 6 67 4.2.2. WebRTC relevant use case for AMR . . . . . . . . . . 6 68 4.2.3. Guidelines for AMR usage and implementation with 69 WebRTC . . . . . . . . . . . . . . . . . . . . . . . 7 70 4.3. G.722 . . . . . . . . . . . . . . . . . . . . . . . . . . 7 71 4.3.1. G.722 General description . . . . . . . . . . . . . . 7 72 4.3.2. WebRTC relevant use case for G.722 . . . . . . . . . 7 73 4.3.3. Guidelines for G.722 usage and implementation . . . . 8 74 4.4. Other codecs . . . . . . . . . . . . . . . . . . . . . . 8 75 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 76 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 77 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 78 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 79 8.1. Normative references . . . . . . . . . . . . . . . . . . 9 80 8.2. Informative references . . . . . . . . . . . . . . . . . 10 81 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 11 83 1. Introduction 85 As indicated in [I-D.ietf-rtcweb-overview], it has been anticipated 86 that WebRTC will not remain an isolated island and that some WebRTC 87 endpoints will need to communicate with devices used in other 88 existing networks with the help of a gateway. Therefore, in order to 89 maximize the possibility to establish the session without the need 90 for audio transcoding, it is recommended in [I-D.ietf-rtcweb-audio] 91 to include in the offer other suitable audio codecs that are 92 available to the browser. This document provides some guidelines on 93 the suitable codecs to be considered for WebRTC clients to address 94 the most relevant interoperability use cases. 96 The codecs considered in this document are recommended to be 97 supported and included in the Offer only for WebRTC clients for which 98 interoperability with other non-WebRTC endpoints and non-WebRTC based 99 services is relevant as described in Section 4.1.2, Section 4.2.2, 100 Section 4.3.2. Other use cases may justify offering other additional 101 codecs to avoid transcodings. 103 2. Definition and abbreviations 105 o Legacy networks: In this document, legacy networks encompass the 106 conversational networks that are already deployed like the PSTN, 107 the PLMN, the IP/IMS networks offering VoIP services, including 108 3GPP "4G" Evolved Packet System[TS23.002] supporting voice over 109 LTE radio access (VoLTE) [IR.92]. 111 o AMR: Adaptive Multi-Rate. 113 o AMR-WB: Adaptive Multi-Rate WideBand. 115 o CAT-iq: Cordless Advanced Technology-internet and quality. 117 o DECT: Digital Enhanced Cordless Telecommunications 119 o IMS: IP Multimedia Subsystem 121 o LTE: Long Term Evolution (3GPP "4G" wireless data transmission 122 standard) 124 o MOS: Mean Opinion Score 126 o PSTN:Public Switched Telephone Network 128 o PLMN: Public Land Mobile Network 130 o VoLTE: Voice Over LTE 132 3. Rationale for additional WebRTC codecs 134 The mandatory implementation of OPUS [RFC6716] in WebRTC clients can 135 guarantee codec interoperability (without transcoding) at state of 136 the art voice quality (better than narrow band "PSTN" quality) 137 between WebRTC clients. The WebRTC technology is also expected to be 138 used to communicate with other types of clients using other 139 technologies. It can be used for instance as an access technology to 140 VoLTE services (Voice over LTE as specified in [IR.92]) or to 141 interoperate with fixed or mobile Circuit Switched or VoIP services 142 like mobile Circuit Switched voice over 3GPP 2G/3G mobile networks 144 [TS23.002] or DECT based VoIP telephony [EN300175-1]. Consequently, 145 a significant number of calls are likely to occur between terminals 146 supporting WebRTC clients and other terminals like mobile handsets, 147 fixed VoIP terminals, DECT terminals that do not support WebRTC 148 clients nor implement OPUS. As a consequence, these calls are likely 149 to be either of low narrow band PSTN quality using G.711 [G.711] at 150 both ends or affected by transcoding operations. The drawbacks of 151 such transcoding operations are recalled below: 153 o Degraded user experience with respect to voice quality: voice 154 quality is significantly degraded by transcoding. For instance, 155 the degradation is around 0.2 to 0.3 MOS for most of transcoding 156 use cases with AMR-WB codec (Section 4.1) at 12.65 kbit/s and in 157 the same range for other wideband transcoding cases. It should be 158 stressed that if G.711 is used as a fall back codec for 159 interoperation, wideband voice quality will be lost. Such 160 bandwidth reduction effect down to narrow band clearly degrades 161 the user perceived quality of service leading to shorter and less 162 frequent calls. Such a switch to G.711 is less than desirable or 163 acceptable choice for customers. If transcoding is performed 164 between OPUS and any other wideband codec, wideband communication 165 could be maintained but with degraded quality (MOS scores of 166 transcoding between AMR-WB 12.65 kbit/s and OPUS at 16 kbit/s in 167 both directions are significantly lower than those of AMR-WB at 168 12.65 kbit/s or OPUS at 16 kbit/s). Furthermore, in degraded 169 conditions, the addition of defects, like audio artifacts due to 170 packet losses, and the audio effects resulting from the cascading 171 of different packet loss recovery algorithms may result in a 172 quality below the acceptable limit for the customers. 174 o Degraded user experience with respect to conversational 175 interactivity: the degradation of conversational interactivity is 176 due to the increase of end to end latency for both directions that 177 is introduced by the transcoding operations. Transcoding requires 178 full de-packetization for decoding of the media stream (including 179 mechanisms of de-jitter buffering and packet loss recovery) then 180 re-encoding, re-packetization and re-sending. The delays produced 181 by all these operations are additive and may increase the end to 182 end delay beyond acceptable limits like with more than 1s end to 183 end latency. 185 o Additional costs in networks: transcoding places important 186 additional costs on network gateways mainly related to codec 187 implementation, codecs license, deployments, testing and 188 validation costs. It must be noted that transcoding of wideband 189 to wideband would require more CPU processing and be more costly 190 than between narrowband codecs. 192 4. Additional suitable codecs for WebRTC 194 The following codecs are considered as relevant suitable codecs with 195 respect to the general purpose described in Section 3. This list 196 reflects the current status of WebRTC foreseen use cases. It is not 197 limitative and opened to further inclusion of other codecs for which 198 relevant use cases can be identified. These additional codecs are 199 recommended to be included in the offer in addition to OPUS and G.711 200 according to the foreseen interoperability cases to be addressed. 202 4.1. AMR-WB 204 4.1.1. AMR-WB General description 206 The Adaptive Multi-Rate WideBand (AMR-WB) is a 3GPP defined speech 207 codec that is mandatory to implement in any 3GPP terminal that 208 supports wideband speech communication. It is being used in circuit 209 switched mobile telephony services and new multimedia telephony 210 services over IP/IMS like for voice over LTE as specified by GSMA in 211 [IR.92]. More detailed information on AMR-WB can be found in 212 [IR.36]. References for AMR-WB related specifications including 213 detailed codec description and Source code are in [TS26.171], 214 [TS26.173], [TS26.190], [TS26.204]. 216 4.1.2. WebRTC relevant use case for AMR-WB 218 The market of personal voice communication is driven by mobile 219 terminals. AMR-WB is now implemented in several hundreds of devices 220 models and 145 HD mobile networks in 85 countries with a customer 221 base of more than 450 millions. A high number of calls are 222 consequently likely to occur between WebRTC clients and mobile 3GPP 223 terminals. The use of AMR-WB by WebRTC clients would consequently 224 allow transcoding free interoperation with all mobile 3GPP wideband 225 terminals. Besides, WebRTC clients running on mobile terminals 226 (smartphones) may reuse the AMR-WB codec already implemented on these 227 devices. 229 4.1.3. Guidelines for AMR-WB usage and implementation with WebRTC 231 The payload format to be used for AMR-WB is described in [RFC4867] 232 with bandwidth efficient format and one speech frame encapsulated in 233 each RTP packets. Further guidelines for implementing and using AMR- 234 WB and ensuring interoperability with 3GPP mobile services can be 235 found in [TS26.114]. In order to ensure interoperability with 4G/ 236 VoLTE as specified by GSMA, the more specific IMS profile for voice 237 derived from [TS26.114] should be considered in [IR.92]. In order to 238 maximize the possibility of successful call establishment for WebRTC 239 client offering AMR-WB it is important that the WebRTC client: 241 o Offer AMR in addition to AMR-WB with AMR-WB, being a wideband 242 codec, listed first as preferred payload type with respect to 243 other narrow band codecs (AMR, G.711...)and with Bandwidth 244 Efficient payload format preferred. 246 o Be capable of operating AMR-WB with any subset of the nine codec 247 modes and source controlled rate operation. Offer at least one 248 AMR-WB configuration with parameter settings as defined in 249 Table 6.1 of [TS26.114]. In order to maximize the 250 interoperability and quality this offer does not restrict the 251 codec modes offered. Restrictions in the use of codec modes may 252 be included in the answer. 254 4.2. AMR 256 4.2.1. AMR General description 258 Adaptive Multi-Rate (AMR) is a 3GPP defined speech codec that is 259 mandatory to implement in any 3GPP terminal that supports voice 260 communication, i.e. several hundred millions of terminals. This 261 include both mobile phone calls using GSM and 3G cellular systems as 262 well as multimedia telephony services over IP/IMS and 4G/VoLTE, such 263 as GSMA voice IMS profile for VoLTE in [IR.92]. In addition to 264 impacts listed above, support of AMR can avoid degrading the high 265 efficiency over mobile radio access.References for AMR related 266 specifications including detailed codec description and Source code 267 are in [TS26.071], [TS26.073], [TS26.090], [TS26.104]. 269 4.2.2. WebRTC relevant use case for AMR 271 A user of a WebRTC endpoint on a device integrating an AMR module 272 wants to communicate with another user that can only be reached on a 273 mobile device that only supports AMR. Although more and more 274 terminal devices are now "HD voice" and support AMR-WB; there is 275 still a high number of legacy terminals supporting only AMR 276 (terminals with no wideband / HD Voice capabilities) are still used. 277 The use of AMR by WebRTC client would consequently allow transcoding 278 free interoperation with all mobile 3GPP terminals. Besides, WebRTC 279 client running on mobile terminals (smartphones) may reuse the AMR 280 codec already implemented on these devices. 282 4.2.3. Guidelines for AMR usage and implementation with WebRTC 284 The payload format to be used for AMR is described in [RFC4867] with 285 bandwidth efficient format and one speech frame encapsulated in each 286 RTP packets. Further guidelines for implementing and using AMR with 287 purpose to ensure interoperability with 3GPP mobile services can be 288 found in [TS26.114]. In order to ensure interoperability with 4G/ 289 VoLTE as specified by GSMA, the more specific IMS profile for voice 290 derived from [TS26.114] should be considered in [IR.92]. In order to 291 maximize the possibility of successful call establishment for WebRTC 292 client offering AMR, it is important that the WebRTC client: 294 o Be capable of operating AMR with any subset of the eight codec 295 modes and source controlled rate operation. 297 o Offer at least one configuration with parameter settings as 298 defined in Table 6.1 and Table 6.2 of [TS26.114]. In order to 299 maximize the interoperability and quality this offer shall not 300 restrict AMR codec modes offered. Restrictions in the use of 301 codec modes may be included in the answer. 303 4.3. G.722 305 4.3.1. G.722 General description 307 G.722 [G.722] is an ITU-T defined wideband speech codec. G.722 was 308 approved by ITU-T in 1988. It is a royalty free codec that is common 309 in a wide range of terminals and endpoints supporting wideband speech 310 and requiring low complexity. The complexity of G.722 is estimated 311 to 10 MIPS [EN300175-8] which is 2.5 to 3 times lower than AMR-WB. 312 Especially, G.722 has been chosen by ETSI DECT as the mandatory 313 wideband codec for New Generation DECT with purpose to greatly 314 increase the voice quality by extending the bandwidth from narrow 315 band to wideband. G.722 is the wideband codec required for CAT-iq 316 DECT certified terminals and the V2.0 of CAT-iq specifications have 317 been approved by GSMA as minimum requirements for HD voice logo usage 318 on "fixed" devices; i.e., broadband connections using the G.722 319 codec. 321 4.3.2. WebRTC relevant use case for G.722 323 G.722 is the wideband codec required for DECT CAT-iq terminals. The 324 market for DECT cordless phones including DECT chipset is more than 325 150 Millions per year and CAT-IQ is a registered trade make in 47 326 countries worldwide. G.722 has also been specified by ETSI in 327 [TS181005] as mandatory wideband codec for IMS multimedia telephony 328 communication service and supplementary services using fixed 329 broadband access. The support of G.722 would consequently allow 330 transcoding free IP interoperation between WebRTC client and fixed 331 VoIP terminals including DECT / CAT-IQ terminals supporting G.722. 332 Besides, WebRTC client running on fixed terminals implementing G.722 333 may reuse the G.722 codec already implemented on these devices. 335 4.3.3. Guidelines for G.722 usage and implementation 337 The payload format to be used for G.722 is defined in [RFC3551] with 338 each octet of the stream of octets produced by the codec to be octet- 339 aligned in an RTP packet. The sampling frequency for G.722 is 16 kHz 340 but the rtp clock rate is set to 8000Hz in SDP to stay backward 341 compatible with an erroneous definition in the original version of 342 the RTP A/V profile. Further guidelines for implementing and using 343 G.722 with purpose to ensure interoperability with Multimedia 344 Telephony services overs IMS can be found in section 7 of [TS26.114]. 345 Additional information of G.722 implementation in DECT can be found 346 in [EN300175-8] and full codec description and C source code in 347 [G.722]. 349 4.4. Other codecs 351 Other interoperability use cases may justify the use of other codecs. 353 5. Security Considerations 355 Security considerations for WebRTC Audio Codec and Processing 356 Requirements can be found in [I-D.ietf-rtcweb-audio]. Implementors 357 making use of the additional codecs considered in this document are 358 advised to also report more specifically to the "Security 359 Considerations" sections of [RFC4867] (for AMR and AMR-WB) and 360 [RFC3551]. 362 6. IANA Considerations 364 None. 366 7. Acknowledgements 368 Special thanks to Espen Berger, Bernhard Feiten, Bo Burman, Kalyani 369 Bogineni, Miao Lei, Enrico Marocco, who co-authored the initial 370 document. Thanks, as well, to Magnus Westerlund and Barry Dingle who 371 carefully reviewed the document and helped to improve it. 373 8. References 374 8.1. Normative references 376 [G.722] ITU, "Recommendation ITU-T G.722 (2012): 7 kHz audio- 377 coding within 64 kbit/s", 2012-09. 379 [I-D.ietf-rtcweb-audio] 380 Valin, J. and C. Bran, "WebRTC Audio Codec and Processing 381 Requirements", draft-ietf-rtcweb-audio-09 (work in 382 progress), November 2015. 384 [IR.92] GSMA, "IMS Profile for Voice and SMS V9.0", April 2015. 386 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 387 Video Conferences with Minimal Control", STD 65, RFC 3551, 388 DOI 10.17487/RFC3551, July 2003, 389 . 391 [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, 392 "RTP Payload Format and File Storage Format for the 393 Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband 394 (AMR-WB) Audio Codecs", RFC 4867, DOI 10.17487/RFC4867, 395 April 2007, . 397 [TS26.071] 398 3GPP, "3GPP TS 26.071 v12.0.0: Recommendation ITU-T G.722 399 (2012): "Mandatory Speech Codec speech processing 400 functions; AMR Speech CODEC; General description".", 401 2014-09. 403 [TS26.073] 404 3GPP, "3GPP TS 26.073 v12.0.0: ANSI C code for the 405 Adaptive Multi Rate (AMR) speech codec", 2014-09. 407 [TS26.090] 408 3GPP, "3GPP TS 26.090 v12.0.0: Mandatory Speech Codec 409 speech processing functions; Adaptive Multi-Rate (AMR) 410 speech codec; Transcoding functions.", 2014-09. 412 [TS26.104] 413 3GPP, "3GPP TS 26.104 v12.0.0: ANSI C code for the 414 floating-point Adaptive Multi Rate (AMR) speech codec.", 415 2014-09. 417 [TS26.114] 418 3GPP, "IP Multimedia Subsystem (IMS); Multimedia 419 telephony; Media handling and interaction V13.0.0", June 420 2015. 422 [TS26.171] 423 3GPP, "3GPP TS 26.071 v12.0.0: Recommendation ITU-T G.722 424 (2012): "Speech codec speech processing functions; 425 Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; 426 General description".", 2014-09. 428 [TS26.173] 429 3GPP, "3GPP TS 26.073 v12.1.0: ANSI-C code for the 430 Adaptive Multi-Rate - Wideband (AMR-WB) speech codec.", 431 2015-03. 433 [TS26.190] 434 3GPP, "3GPP TS 26.090 v12.0.0: Speech codec speech 435 processing functions; Adaptive Multi-Rate - Wideband (AMR- 436 WB) speech codec; Transcoding functions.", 2014-09. 438 [TS26.204] 439 3GPP, "3GPP TS 26.104 v12.1.0: Speech codec speech 440 processing functions; Adaptive Multi-Rate - Wideband (AMR- 441 WB) speech codec; ANSI-C code.", 2015-03. 443 8.2. Informative references 445 [EN300175-1] 446 ETSI, "ETSI EN 300 175-1, Digital Enhanced Cordless 447 Telecommunications (DECT); Common Interface (CI); Part 1: 448 Overview v2.5.1", 2009. 450 [EN300175-8] 451 ETSI, "ETSI EN 300 175-8, v2.5.1: Digital Enhanced 452 Cordless Telecommunications (DECT); Common Interface (CI); 453 Part 8: Speech and audio coding and transmission.", 2009. 455 [G.711] ITU, "Recommendation ITU-T G.711 (2012): Pulse code 456 modulation (PCM) of voice frequencies", 1988-11. 458 [I-D.ietf-rtcweb-overview] 459 Alvestrand, H., "Overview: Real Time Protocols for 460 Browser-based Applications", draft-ietf-rtcweb-overview-14 461 (work in progress), June 2015. 463 [IR.36] GSMA, "Adaptive Multirate Wide Band V3.0", September 2014. 465 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 466 Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, 467 September 2012, . 469 [RFC7478] Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- 470 Time Communication Use Cases and Requirements", RFC 7478, 471 DOI 10.17487/RFC7478, March 2015, 472 . 474 [TS181005] 475 ETSI, "Telecommunications and Internet converged Services 476 and Protocols for Advanced Networking (TISPAN); Service 477 and Capability Requirements V3.3.1 (2009-12)", 2009. 479 [TS23.002] 480 3GPP, "3GPP TS 23.002 v13.3.0: Network architecture", 481 2015-09. 483 Author's Address 485 Stephane Proust 486 Orange 487 2, avenue Pierre Marzin 488 Lannion 22307 489 France 491 Email: stephane.proust@orange.com