idnits 2.17.1 draft-ietf-codec-results-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 17, 2013) is 3995 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 codec C. Hoene, Ed. 3 Internet-Draft Symonics GmbH 4 Intended status: Informational JM. Valin 5 Expires: November 18, 2013 Mozilla Corporation 6 K. Vos 7 Skype Technologies S.A. 8 J. Skoglund 9 Google 10 May 17, 2013 12 Summary of Opus listening test results 13 draft-ietf-codec-results-03 15 Abstract 17 This document describes and examines listening test results obtained 18 for the Opus codec and how they relate to the requirements. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on November 18, 2013. 37 Copyright Notice 39 Copyright (c) 2013 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Opus listening tests on final bit-stream . . . . . . . . . . 3 56 2.1. Google listening tests . . . . . . . . . . . . . . . . . 3 57 2.1.1. Google narrowband listening test . . . . . . . . . . 3 58 2.1.2. Google wideband and fullband listening test . . . . . 4 59 2.1.3. Google stereo music listening test . . . . . . . . . 5 60 2.1.4. Google transcoding test . . . . . . . . . . . . . . . 7 61 2.1.5. Google mandarin tests . . . . . . . . . . . . . . . . 10 62 2.2. HydrogenAudio stereo music listening test . . . . . . . . 13 63 2.3. Nokia Interspeech 2011 listening test . . . . . . . . . . 13 64 2.4. Universitaet Tuebingen stereo and binaural tests . . . . 13 65 3. Conclusion on the requirements . . . . . . . . . . . . . . . 15 66 3.1. Comparison to Speex (narrowband) . . . . . . . . . . . . 16 67 3.2. Comparison to iLBC . . . . . . . . . . . . . . . . . . . 16 68 3.3. Comparison to Speex (wideband) . . . . . . . . . . . . . 16 69 3.4. Comparison to G.722.1 . . . . . . . . . . . . . . . . . . 16 70 3.5. Comparison to G.722.1C . . . . . . . . . . . . . . . . . 17 71 3.6. Comparison to AMR-NB . . . . . . . . . . . . . . . . . . 17 72 3.7. Comparison to AMR-WB . . . . . . . . . . . . . . . . . . 17 73 4. Security Considerations . . . . . . . . . . . . . . . . . . . 17 74 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 75 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 76 7. Informative References . . . . . . . . . . . . . . . . . . . 18 77 Appendix A. Pre-Opus listening tests . . . . . . . . . . . . . . 18 78 A.1. SILK Dynastat listening test . . . . . . . . . . . . . . 19 79 A.2. SILK Deutsche Telekom test . . . . . . . . . . . . . . . 19 80 A.3. SILK Nokia test . . . . . . . . . . . . . . . . . . . . . 19 81 A.4. CELT 0.3.2 listening test . . . . . . . . . . . . . . . . 20 82 A.5. CELT 0.5.0 listening test . . . . . . . . . . . . . . . . 20 83 Appendix B. Opus listening tests on non-final bit-stream . . . . 20 84 B.1. First hybrid mode test . . . . . . . . . . . . . . . . . 20 85 B.2. Broadcom stereo music test . . . . . . . . . . . . . . . 21 86 Appendix C. In-the-field testing . . . . . . . . . . . . . . . . 22 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 89 1. Introduction 91 This document describes and examines listening test results obtained 92 for the Opus codec. Some of the test results presented are based on 93 older versions of the codec or on older versions of the SILK or CELT 94 components. While they do not necessarily represent the exact 95 quality of the current version, they are nonetheless useful for 96 validating the technology used and as an indication of a lower bound 97 on quality (based on the assumption that the codec has been improved 98 since they were performed). 100 Throughout this document, all statements about one codec being better 101 than or worse than another codec are based on 95% confidence. When 102 no statistically significant difference can be shown with 95% 103 confidence, then two codecs are said to be "tied". 105 In addition to the results summarized in this draft, Opus has been 106 subjected to many informal subjective listening tests, as well as 107 objective testing. 109 2. Opus listening tests on final bit-stream 111 The following tests were performed on the Opus codec _after_ the bit- 112 stream was finalized. 114 2.1. Google listening tests 116 The tests followed the MUSHRA test methodology. Two anchors were 117 used, one lowpass-filtered at 3.5 kHz and one lowpass-filtered at 7.0 118 kHz. Both trained and untrained listeners participated in the tests. 119 The reference signals were manually normalized to the same subjective 120 levels according to the experimenters' opinion. Experiments with 121 automatic normalization with respect to both level and loudness (in 122 Adobe Audition) did not result in signals having equal subjective 123 loudness. The sample magnitude levels were kept lower than 2^14 to 124 provide headroom for possible amplification through the codecs. 125 However, the normalization exercise was not repeated with the 126 processed sequences as neither the experimenters nor any of the 127 subjects (which included expert listeners) noticed any significant 128 level differences between the conditions in the tests. The only 129 post-processing performed was to remove noticeable delays in the MP3 130 files, as one could identify the MP3 samples when switching between 131 conditions when the MP3 had the longer delay. The testing tool Step 132 from ARL was used for tests and all listeners were instructed to to 133 carefully listen through the conditions before starting the grading. 134 The results of the tests are a available on the testing slides 135 presented at the Prague meeting [Prague-80]. 137 2.1.1. Google narrowband listening test 139 The test sequences in Test 1 were mono recordings (between 2 and 6 140 seconds long) of 4 different male and 4 different female speakers 141 sampled at 48 kHz in low background noise. 17 listeners were 142 presented with 6 stimuli according to Table 1 for each test sequence. 143 The corresponding bit rate for the reference is 48000 (sampling 144 frequency in Hz) x 16 (bits/sample) = 768 kbps. Since the anchors 145 are low-pass filtered they can also be downsampled for transmission 146 which corresponds to lower bit rates. Three narrowband codecs were 147 compared in this test: Opus NB, the royalty-free iLBC, and the 148 royalty-free Speex. The codecs all have an encoder frame length of 149 20 ms. Both Opus and Speex had variable rate whereas iLBC operated 150 at a fixed bit rate. 152 +-----------+----------------------+----------------+ 153 | Type | Signal bandwidth | Bitrate | 154 +-----------+----------------------+----------------+ 155 | Reference | 24 kHz (Fullband) | | 156 | | | | 157 | Anchor 1 | 3.5 kHz (Narrowband) | | 158 | | | | 159 | Anchor 2 | 7 kHz (Wideband) | | 160 | | | | 161 | iLBC | 4 kHz (Narrowband) | 15.2 kbps, CBR | 162 | | | | 163 | Opus NB | 4 kHz (Narrowband) | 11 kbps, VBR | 164 | | | | 165 | Speex NB | 3.5 kHz (Narrowband) | 11 kbps, VBR | 166 +-----------+----------------------+----------------+ 168 Table 1: Narrowband mono voice: test conditions 170 The overall results of the narrowband test, i.e., averaged over all 171 listeners for all sequences, are presented in the Prague meeting 172 slides [Prague-80]. The results suggest that Opus at 11 kbps is 173 superior to both iLBC at 15 kpbs and Speex at 11 kbps. T-tests 174 performed by Greg Maxwell confirm that there is indeed a 175 statistically significant difference. Note also that Opus has a 176 slightly higher average score than the 3.5 kHz anchor, likely due to 177 the higher bandwidth of Opus. 179 2.1.2. Google wideband and fullband listening test 181 The eight test sequences for the previous test were also used in this 182 Test. 16 listeners rated the stimuli listed in Table 2. In this 183 test comparisons were made between four wideband codecs: Opus WB, the 184 royalty-free Speex, the royalty-free ITU-T G.722.1, AMR-WB (ITU-T 185 G.722.2), and two fullband codecs: Opus FB and the royalty-free ITU-T 186 G.719. All six codecs utilize 20 ms encoding frames. Opus used 187 variable bitrate, while other codecs used constant bit rate. 189 +-----------+----------------------+-----------------+ 190 | Type | Signal bandwidth | Bitrate | 191 +-----------+----------------------+-----------------+ 192 | Reference | 24 kHz (Fullband) | | 193 | | | | 194 | Anchor 1 | 3.5 kHz (Narrowband) | | 195 | | | | 196 | Anchor 2 | 7 kHz (Wideband) | | 197 | | | | 198 | G.722.1 | 7 kHz (Wideband) | 24 kbps, CBR | 199 | | | | 200 | Speex WB | 7 kHz (Wideband) | 23.8 kbps, CBR | 201 | | | | 202 | AMR-WB | 7 kHz (Wideband) | 19.85 kbps, CBR | 203 | | | | 204 | Opus WB | 8 kHz (Wideband) | 19.85 kbps, VBR | 205 | | | | 206 | G.719 | ~20 kHz (Fullband) | 32 kbps, CBR | 207 | | | | 208 | Opus FB | ~20 kHz (Fullband) | 32 kbps, CBR | 209 +-----------+----------------------+-----------------+ 211 Table 2: Wideband and fullband mono voice: test conditions 213 The results from this test are depicted in the Prague meeting 214 slides[Prague-80]. Opus at 32 kbps is almost transparent, although 215 there is a small, but statistically significant, difference from the 216 fullband reference material. Opus at 20 kbps is significantly better 217 than all the other codecs, including AMR-WB and the fullband G.719, 218 and both low-pass anchors. 220 2.1.3. Google stereo music listening test 222 The sequences in this test were excerpts from 10 different stereo 223 music files: 225 o Rock/RnB (Boz Scaggs) 227 o Soft Rock (Steely Dan) 229 o Rock (Queen) 231 o Jazz (Harry James) 233 o Classical (Purcell) 235 o Electronica (Matmos) 237 o Piano (Moonlight Sonata) 239 o Vocals (Suzanne Vega) 240 o Glockenspiel 242 o Castanets 244 These sequences were originally recorded at a sampling frequency of 245 44.1 kHz and were upsampled to 48 kHz prior to processing. Test 3 246 included comparisons between six codecs (c.f., Table 3): Opus at 247 three rates, G.719, AAC-LC (Nero 1.5.1), and MP3 (Lame 3.98.4). 248 G.719 is a mono codec, so the two channels were each coded 249 independently at 32 kbps. 9 listeners participated in Test 3, and 250 the results are depicted in the Prague meeting slides[Prague-80]. 251 The codecs operated at constant (or comparable) bit rate. 253 +-------------+--------------------+-----------+--------------------+ 254 | Type | Signal bandwidth | Frame | Bitrate | 255 | | | size (ms) | | 256 +-------------+--------------------+-----------+--------------------+ 257 | Reference | 22 kHz (Fullband) | - | (1536 kbps) | 258 | | | | | 259 | Anchor 1 | 3.5 kHz | - | (256 kbps) | 260 | | (Narrowband) | | | 261 | | | | | 262 | Anchor 2 | 7 kHz (Wideband) | - | (512 kbps) | 263 | | | | | 264 | MP3 | 16 kHz (Super | >100 | 96 kbps, CBR | 265 | | wideband) | | | 266 | | | | | 267 | AAC-LC | ~20 kHz (Fullband) | 21 | 64 kbps, CBR (bit | 268 | | | | reservoir) | 269 | | | | | 270 | G.719 | ~20 kHz (Fullband) | 20 | 64 kbps (2x32), | 271 | | | | CBR | 272 | | | | | 273 | Opus FB | ~20 kHz (Fullband) | 20 | 64 kbps, | 274 | | | | constrained VBR | 275 | | | | | 276 | Opus FB | ~20 kHz (Fullband) | 10 | 80 kbps, | 277 | | | | constrained VBR | 278 | | | | | 279 | Opus FB | ~20 kHz (Fullband) | 5 | 128 kbps, | 280 | | | | constrained VBR | 281 +-------------+--------------------+-----------+--------------------+ 283 Table 3: Stereo music: Test conditions 285 The results indicate that all codecs had comparable performance, 286 except for G.719, which had a considerably lower score. T-tests by 287 Greg Maxwell verified that the low-delay Opus at 128 kbps had a 288 significantly higher performance and that G.719 had a significantly 289 lower performance than the other four. 291 2.1.4. Google transcoding test 293 If two telephone networks of different technology are coupled, 294 frequently speech has to be transcoded: It must be decoded and 295 encoded before it can be forward to the next network. Then, two 296 codecs are cooperating in a row, which is called tandem coding. 298 In the following tests, Jan Skoglund studied the impact of 299 transcoding if Opus call is forwarded to a cellular phone system. 300 [Skoglund2011]. Two tests were conducted for both narrowband and 301 wideband speech items. The test conditions of the narrow-band tests 302 are given in Table google-tandem-nb-conditions (Authors' Addresses) 303 and the respective results in xgoogle-tandem-nb-results (Authors' 304 Addresses). For the wide-band conditions and results refer to Table 305 google-tandem-wb-conditions (Authors' Addresses)and google-tandem-wb- 306 results (Authors' Addresses). 308 +-----------------+-------------------------------------------------+ 309 | Condition | Value | 310 +-----------------+-------------------------------------------------+ 311 | Laboratory | Google | 312 | | | 313 | Examiner | Jan Skoglund | 314 | | | 315 | Date | August and September 2011 | 316 | | | 317 | Methodology | ITU-R BS.1534-1 (MUSHRA) | 318 | | | 319 | Reference items | Two male and two female speakers from ITU-T | 320 | | P.501. Two male and two female speakers from | 321 | | McGill database. All recorded at 48kHz in a | 322 | | room with low background noise. | 323 | | | 324 | Listeners | 19 listeners no listeners rejected / trained | 325 | | and untrained English-speaking listeners | 326 | | | 327 | Anchor 1 | Reference file lowpass-filtered at 3.5 kHz | 328 | | | 329 | Anchor 2 | Reference file resampled at 8 kHz, with MNRU at | 330 | | 15 dB SNR | 331 | | | 332 | Test Condition | G.711 at 64 kbps -> Opus NB at 12.2 kbps, | 333 | 1 | variable bit rate | 334 | | | 335 | Test Condition | G.711 at 64 kbps -> AMR NB at 12.2 kbps, | 336 | 2 | constant bit rate | 337 | | | 338 | Test Condition | AMR NB at 12.2 kbps -> G.711 at 64 kbps -> Opus | 339 | 3 | NB at 12.2 kbps | 340 | | | 341 | Test Condition | Opus NB at 12.2 kbps > G.711 at 64 kbps > AMR | 342 | 4 | NB at 12.2 kbps | 343 | | | 344 | Test Condition | AMR NB at 12.2 kbps -> G.711 at 64 kbps -> AMR | 345 | 5 | NB at 12.2 kbps | 346 +-----------------+-------------------------------------------------+ 348 Table 4: Narrowband tandem coding: test conditions 350 +------------------+-------------------------+--------+ 351 | Test Item | Subjective MUSHRA score | 95% CI | 352 +------------------+-------------------------+--------+ 353 | Reference | 99.47 | 0.36 | 354 | | | | 355 | LP3.5 | 63.49 | 3.01 | 356 | | | | 357 | G.711->Opus | 54.51 | 2.85 | 358 | | | | 359 | G.711->AMR | 54.13 | 2.67 | 360 | | | | 361 | AMR->G.711->Opus | 51.11 | 2.74 | 362 | | | | 363 | Opus->G.711->AMR | 50.95 | 2.76 | 364 | | | | 365 | AMR->G.711->AMR | 47.81 | 2.95 | 366 | | | | 367 | MNRU | 14.94 | 2.21 | 368 +------------------+-------------------------+--------+ 370 Table 5: Tandem narrowband coding: test results 372 +-----------------+-------------------------------------------------+ 373 | Condition | Value | 374 +-----------------+-------------------------------------------------+ 375 | Laboratory | Google | 376 | | | 377 | Examiner | Jan Skoglund | 378 | | | 379 | Date | August and September 2011 | 380 | | | 381 | Methodology | MUSHRA | 382 | | | 383 | Reference items | Two male and two female speakers from ITU-T | 384 | | P.501. Two male and two female speakers | 385 | | recorded at Google at 48kHz in a room with low | 386 | | background noise | 387 | | | 388 | Listeners | 18 listeners after post-screening / no listener | 389 | | rejects / untrained and trained English | 390 | | speaking listeners | 391 | | | 392 | Anchor 1 | Reference file lowpass-filtered at 3.5 kHz (LP | 393 | | 3.5) | 394 | | | 395 | Anchor 2 | Reference file lowpass-filtered at 7 kHz (LP 7) | 396 | | | 397 | Test Condition | Opus WB at 19.85 kbps, variable bit rate (Opus) | 398 | 1 | | 399 | | | 400 | Test Condition | AMR WB at 19.85 kbps, constant bit rate (AMR | 401 | 2 | WB) | 402 | | | 403 | Test Condition | AMR WB at 19.85 kbps > Opus WB at 19.85 kbps | 404 | 3 | | 405 | | | 406 | Test Condition | Opus WB at 19.85 kbps -> AMR WB at 19.85 kbps | 407 | 4 | | 408 +-----------------+-------------------------------------------------+ 410 Table 6: Tandem wideband coding: test conditions 412 +--------------+--------------------------+--------+ 413 | Test Item | Subjective BS.1587 Score | 95% CI | 414 +--------------+--------------------------+--------+ 415 | Reference | 99.44 | 0.38 | 416 | | | | 417 | Opus | 78.38 | 2.16 | 418 | | | | 419 | LP7 | 74.24 | 2.24 | 420 | | | | 421 | AMR WB | 65.26 | 2.85 | 422 | | | | 423 | AMR WB->Opus | 63.97 | 2.95 | 424 | | | | 425 | Opus->AMR WB | 62.83 | 2.94 | 426 | | | | 427 | LP3.5 | 37.01 | 2.95 | 428 +--------------+--------------------------+--------+ 430 Table 7: Tandem wideband coding: test results 432 Under the given statistical confidence, narrowband tandem coding 433 condition using AMR and/or Opus are of similar quality. However, the 434 results have indications that Opus outperforms AMR NB slightly. In 435 any case, narrow band transcoding is worse than a low pass filtering 436 at 3.5kbps. 438 Opus at 20kbps outperforms AMR WB at a similar coding rate and 439 matches the quality of a 7kHz lowpass filtered signal. Tandem coding 440 with Opus does not reduce the quality of AMR WB encoded speech in the 441 studied conditions. 443 2.1.5. Google mandarin tests 445 Modern Standard Chinese - also called Mandarin - is a tonal language 446 that is spoken by about 845 million persons. In past, codecs have 447 been developed without consideration of the unique properties of 448 tonal languages. For the testing of Opus, Jan Skoglund has conducted 449 subjective listening-only tests to verify whether Opus can cope well 450 for Mandarin [Skoglund2011]. Two tests were conducted for both 451 narrow- and wide-band speech items. The test conditions of the 452 narrow-band tests are given in Table google-mandarin-nb-conditions 453 (Authors' Addresses) and the respective results in google-mandarin- 454 nb-results (Authors' Addresses). For the wide-band conditions and 455 results refer to Table google-mandarin-wb-conditions (Authors' 456 Addresses)and Table google-mandarin-wb-results (Authors' Addresses) 458 +-----------------+-------------------------------------------------+ 459 | Condition | Value | 460 +-----------------+-------------------------------------------------+ 461 | Laboratory | Google | 462 | | | 463 | Examiner | Jan Skoglund | 464 | | | 465 | Date | August and September 2011 | 466 | | | 467 | Methodology | ITU-R BS.1534-1 (MUSHRA) | 468 | | | 469 | Reference items | Two male and two female speakers from ITU-T | 470 | | P.501. Two male and two female speakers | 471 | | recorded at Google at 48kHz in a room with low | 472 | | background noise. | 473 | | | 474 | Listeners | 21 listeners after post-screening / no | 475 | | listeners rejected / untrained Mandarin- | 476 | | speaking listeners | 477 | | | 478 | Anchor 1 | Reference file lowpass-filtered at 3.5 kHz (LP | 479 | | 3.5) | 480 | | | 481 | Anchor 2 | Reference file resampled at 8 kHz, with MNRU at | 482 | | 15 dB SNR (MNRU) | 483 | | | 484 | Test Condition | Opus NB at 11 kbps, variable bit rate (Opus 11) | 485 | 1 | | 486 | | | 487 | Test Condition | Speex NB at 11 kbps, variable bit rate (Speex | 488 | 2 | 11) | 489 | | | 490 | Test Condition | iLBC at 15.2 kbps, constant bit rate (iBLC 15) | 491 | 3 | | 492 +-----------------+-------------------------------------------------+ 494 Table 8: Narrowband mandarin: test conditions 496 +-----------+----------------------------+--------+ 497 | Test Item | Subjective BS.1534-1 Score | 95% CI | 498 +-----------+----------------------------+--------+ 499 | Reference | 99.79 | 0.19 | 500 | | | | 501 | Opus 11 | 77.90 | 2.15 | 502 | | | | 503 | iLBC 15 | 76.76 | 2.08 | 504 | | | | 505 | LP 3.5 | 76.25 | 2.34 | 506 | | | | 507 | Speex 11 | 63.60 | 3.30 | 508 | | | | 509 | MNRU | 22.83 | 2.50 | 510 +-----------+----------------------------+--------+ 512 Table 9: Mandarin narrowband speech: test results 514 +-----------------+-------------------------------------------------+ 515 | Condition | Value | 516 +-----------------+-------------------------------------------------+ 517 | Laboratory | Google | 518 | | | 519 | Examiner | Jan Skoglund | 520 | | | 521 | Date | August and September 2011 | 522 | | | 523 | Methodology | MUSHRA | 524 | | | 525 | Reference items | Two male and two female speakers from ITU-T | 526 | | P.501. Two male and two female speakers | 527 | | recorded at Google at 48kHz in a room with low | 528 | | background noise | 529 | | | 530 | Listeners | 19 listeners after post-screening / Rejected 3 | 531 | | listeners having score correlation with the | 532 | | total average lower than R=0.8. | 533 | | | 534 | Anchor 1 | Reference file lowpass-filtered at 3.5 kHz (LP | 535 | | 3.5) | 536 | | | 537 | Anchor 2 | Reference file lowpass-filtered at 7 kHz (LP 7) | 538 | | | 539 | Test Condition | Opus WB at 19.85 kbps, variable bit rate (Opus | 540 | 1 | 20) | 541 | | | 542 | Test Condition | Speex WB at 23.8 kbps, constant bit rate (Speex | 543 | 2 | 24) | 544 | | | 545 | Test Condition | G.722.1 at 24 kbps, constant bit rate (G.722.1 | 546 | 3 | 24) | 547 | | | 548 | Test Condition | Opus FB at 32 kbps, constant bit rate (Opus 32) | 549 | 4 | | 550 | | | 551 | Test Condition | G.719 at 32 kbps, constant bit rate (G.719 32) | 552 | 5 | | 553 +-----------------+-------------------------------------------------+ 555 Table 10: Mandarin wideband speech: test conditions 557 +------------+--------------------------+--------+ 558 | Test Item | Subjective BS.1587 Score | 95% CI | 559 +------------+--------------------------+--------+ 560 | Reference | 98.95 | 0.59 | 561 | | | | 562 | Opus 32 | 98.13 | 0.72 | 563 | | | | 564 | G.719 32 | 93.43 | 1.51 | 565 | | | | 566 | Opus 20 | 81.59 | 2.48 | 567 | | | | 568 | LP 7 | 79.51 | 2.53 | 569 | | | | 570 | G.722.1 24 | 72.55 | 3.06 | 571 | | | | 572 | LP 3.5 | 54.57 | 3.44 | 573 | | | | 574 | Speex 24 | 53.63 | 4.23 | 575 +------------+--------------------------+--------+ 576 Table 11: Mandarin wideband speech: test results 578 Under the given confidence intervals, the quality of Opus at 11 kbps 579 equals the quality of iLBC at 15 kbps and the quality aferlowpass 580 filtering at 3.5 kHz. Speex at 11 kbps does not perform as well. 581 According to the listening-only tests, Opus at 32 kbps is better than 582 G.719 at 32 kbps. Opus at 20 kbps outperforms G.722.1 and Speex at 583 24 kbps. If one compares the Mandarin results with those for English 584 (Section 2.1.1 and Section 2.1.2), one can see that are pretty 585 consistent. The only difference is that using English stimuli Opus 586 at 20 kbps outperforms G.719 at 32 kbps. Probabily, this is due to 587 the fact that Mandarin speech does not contain as many high 588 frequency-rich consonants such as [s] as English. 590 2.2. HydrogenAudio stereo music listening test 592 In March 2011, the HydrogenAudio community conducted a listening test 593 comparing codec performance on stereo audio at 64 kb/s [ha-test]. 594 The Opus codec was compared to the Apple and Nero implementations of 595 HE-AAC, as well as to the Vorbis codec. The test included 30 audio 596 samples, including known "hard to code" samples from previous 597 HydrogenAudio listening tests. 599 A total of 33 listeners participated in the test, 10 of which 600 provided results for all the audio samples. The results of test 601 showed that Opus out-performed both HE-AAC implementations as well as 602 Vorbis. 604 2.3. Nokia Interspeech 2011 listening test 606 In 2011, Anssi Ramo from Nokia submitted [Ramo2011] the results of a 607 second listening test, focusing specifically on the Opus codec, to 608 Interspeech 2011. As in the previous test, the methodology used was 609 a 9-scale ACR MOS test with clean and noisy speech samples. 611 The results show Opus clearly out-performing both G.722.1C and G.719 612 on clean speech at 24 kb/s and above, while on noisy speech all 613 codecs and bit-rates above 24 kb/s are very close. It is also found 614 that the Opus hybrid mode at 28 kb/s has quality that is very close 615 to the recent G.718B standard at the same rate. At 20 kb/s, the Opus 616 wideband mode also out-performs AMR-WB, while the situation is 617 reversed for 12 kb/s and below. The only narrowband rate tested is 6 618 kb/s, which is below what Opus targets and unsurprisingly shows 619 poorer quality than AMR-NB at 5.9 kb/s.M 621 2.4. Universitaet Tuebingen stereo and binaural tests 622 Modern teleconferencing system use stereo or spatialy rendered speech 623 to enhance the conversation quality. Then, talkers can be identified 624 according to their acoustic locations. Opus allows to encode speech 625 in a stereo mode. In the tests conducted by Christian 626 Hoene[Hoene2011], the performance of Opus coding stereo and binaural 627 speech was studied. 629 +-------------+-----------------------------------------------------+ 630 | Condition | Value | 631 +-------------+-----------------------------------------------------+ 632 | Laboratory | Univesitaet Tuebingen | 633 | | | 634 | Examiner | Christian Hoene and Mansoor Hyder | 635 | | | 636 | Date | August 2011 | 637 | | | 638 | Methodology | ITU-R BS.1534-1 (MUSHRA) using a modified "rateit | 639 | | v0.1" software with German translations. | 640 | | | 641 | Reference | One German female voice recorded in stereo (8s). | 642 | items | Two female voices (stereo recording) mixed together | 643 | | (9 s). One moving talker binaural rendered with | 644 | | HTRF and an artificial room impulse response (13 | 645 | | s). Two voices binaural render at two different | 646 | | stationary positions. Acappella Song "Mein Fahrrad" | 647 | | by "Die Prinzen" (10.5s, mono) | 648 | | | 649 | Listeners | 20 German native speakers. Age between 20 and 59 | 650 | | (avg. 30.55). 9 male and 11 female. All have | 651 | | academic background. Three listeners were rejected | 652 | | because their rating showed a low correlation | 653 | | (R<0.8) to the average ratings. | 654 | | | 655 | Anchor | Reference file lowpass-filtered at 3.5 kHz calling | 656 | | "sox in.wav -r48000 -c1 out.wav lowpass 3500" | 657 | | | 658 | Test | Opus in the SILK mode, 12kbps, stereo, 60ms calling | 659 | Condition 1 | "draft-ietf-codec-opus-07/test_opus 0 48000 2 12000 | 660 | | -cbr -framesize 60 -bandwidth NB" | 661 | | | 662 | Test | Opus in the SILK mode, 16kbps, stereo, 20ms calling | 663 | Condition 2 | "draft-ietf-codec-opus-07/test_opus 0 48000 2 16000 | 664 | | -cbr -framesize 20 -bandwidth WB" | 665 | | | 666 | Test | Opus in the HYBRID mode, 32kbps, stereo, 20ms | 667 | Condition 3 | calling "draft-ietf-codec-opus-07/test_opus 0 48000 | 668 | | 2 32000 -cbr -framesize 20 -bandwidth FB" | 669 | | | 670 | Test | Opus in the CELT mode, 64kbps, stereo, 20ms calling | 671 | Condition 4 | "draft-ietf-codec-opus-07/test_opus 1 48000 2 64000 | 672 | | -cbr -framesize 20 -bandwidth FB" | 673 | | | 674 | Test | AMR-WB+ at 12kbps, 80ms using 26304_ANSI- | 675 | Condition 5 | C_source_code_v6_6_0: Arguments: -rate 12 | 676 | | | 677 | Test | AMR-WB+ at 15.2kbps, 80ms using 26304_ANSI- | 678 | Condition 6 | C_source_code_v6_6_0: Arguments: -rate 16 | 679 | | | 680 | Test | AMR-WB+ at 32kbps, 60ms using 26304_ANSI- | 681 | Condition 7 | C_source_code_v6_6_0: Arguments: -rate 32 | 682 +-------------+-----------------------------------------------------+ 684 Table 12: Stereo and binaural speech coding: test conditions 686 +------------+----------------------------+--------+ 687 | Test Item | Subjective BS.1534-1 Score | 95% CI | 688 +------------+----------------------------+--------+ 689 | Reference | 97.36 | 1.31 | 690 | | | | 691 | Opus 64 | 95.58 | 1.76 | 692 | | | | 693 | AMR-WB+ 32 | 80.11 | 4.79 | 694 | | | | 695 | Opus 32 | 55.42 | 5.96 | 696 | | | | 697 | AMR-WB+ 16 | 49.69 | 6.05 | 698 | | | | 699 | LP 3.5 | 48.35 | 4.50 | 700 | | | | 701 | Opus 16 | 39.31 | 4.80 | 702 | | | | 703 | AMR-WP+ 12 | 35.40 | 5.79 | 704 | | | | 705 | Opus 12 | 16.99 | 3.49 | 706 +------------+----------------------------+--------+ 708 Table 13: Binaural Speech: Test Results 710 According to the test results, Opus transmits binaural content well 711 at 64kbps. The other Opus results are not valid anymore as the codec 712 implementation have been updated. 714 3. Conclusion on the requirements 715 The requirements call for the Opus codec to be better than Speex and 716 iLBC in narrowband mode, better than Speex and G.722.1 in wideband 717 mode, and better than G.722.1C in super-wideband/fullband mode. 719 3.1. Comparison to Speex (narrowband) 721 The Opus codec was compared to Speex in narrowband mode in the Google 722 narrowband test (Section 2.1.1). This test showed that Opus at 11 kb 723 /s was significantly better than Speex at the same rate. In fact, 724 Opus at 11 kb/s was tied with the 3.5 low-pass of the original. 725 Considering the results, we conclude that the Opus codec is better 726 than the Speex codec. 728 3.2. Comparison to iLBC 730 The Opus codec was compared to iLBC in the Google narrowband test 731 (Section 2.1.1). This test showed that Opus at 11 kb/s was 732 significantly better than iLBC running at 15 kb/s. Considering the 733 results, we conclude that the Opus codec is better than the iLBC 734 codec. 736 3.3. Comparison to Speex (wideband) 738 The Opus codec was compared to Speex in wideband mode in the Google 739 wideband and fullband test (Section 2.1.2). This test showed that 740 Opus at 20 kb/s was significantly better than Speex at at 24 kb/s. 741 In fact, Opus at 20 kb/s was better than the 7 kHz low-pass of the 742 original. These results are consistent with an earlier Dynastat test 743 (Appendix A.1) that also concluded that SILK had significantly higher 744 quality than Speex in wideband mode at the same bit-rate. 745 Considering the results, we conclude that the Opus codec is better 746 than the Speex codec for wideband. 748 3.4. Comparison to G.722.1 750 In the Google wideband and fullband test (Section 2.1.2), Opus at 20 751 kb/s was shown to significantly out-perform G.722.1 operating at 24 752 kb/s. An indirect comparison point also comes from the Nokia 753 Interspeech 2011 listening test (Section 2.3) that shows Opus out- 754 performing AMR-WB at 20 kb/s, while AMR-WB is known to out-perform 755 G.722.1. Considering these results, we conclude that the Opus codec 756 is better than the G.722.1 codec for wideband. 758 3.5. Comparison to G.722.1C 760 Opus has been compared to G.722.1C in multiple listening tests. As 761 early as 2008, an old version of the CELT codec (Appendix A.4) using 762 very short frames was found to have higher quality than G.722.1C at 763 48 kb/s. More recently, the Nokia Interspeech 2011 listening test 764 (Section 2.3) showed that Opus out-performed G.722.1C at 24 kb/s, 32 765 kb/s, and 48 kb/s. We thus conclude that the Opus codec is better 766 than the G.722.1C codec for superwideband/fullband audio. 768 3.6. Comparison to AMR-NB 770 In the Google narrowband test (Section 2.1.1), Opus was shown to out- 771 perform AMR-NB at 12 kb/s. On the other hand, in the Nokia 772 Interspeech 2011 listening test (Section 2.3), AMB-NB was found to 773 have better quality than Opus at 6 kb/s. This indicates that Opus is 774 better than AMR-NB at higher rates and worse at lower rates, which is 775 to be expected given Opus' emphasis on higher quality and higher 776 rates. 778 3.7. Comparison to AMR-WB 780 In the Google wideband and fullband test (Section 2.1.2), Opus at 20 781 kb/s was shown to out-perform AMR-WB at the same rate. This was also 782 confirmed by the Nokia Interspeech 2011 listening test (Section 2.3), 783 with also found AMR-WB to out-perform Opus at 12 kb/s and below. As 784 with AMR-NB, we conclude that Opus is better than AMR-WB at higher 785 rates and worse at lower rates. 787 4. Security Considerations 789 No security considerations. 791 5. IANA Considerations 793 This document has no actions for IANA. 795 6. Acknowledgments 797 The authors would like to thank Anssi Ramo and the HydrogenAudio 798 community, who conducted some of the Opus listening test cited in 799 this draft. 801 7. Informative References 803 [valin2010] 804 Valin, J.M., Terriberry, T., Montgomery, C., and G. 805 Maxwell, "A High-Quality Speech and Audio Codec With Less 806 Than 10 ms delay", 2010. 808 [valin2009] 809 Valin, J.M., Terriberry, T., and G. Maxwell, "A High- 810 Quality Speech and Audio Codec With Less Than 10 ms 811 delay", 2009. 813 [Wustenhagen2010] 814 Wuestenhagen, U., Feiten, B., Kroll, J., Raake, A., and M. 815 Waeltermann, "Evaluation of Super-Wideband Speech and 816 Audio Codecs", 2010. 818 [Ramo2010] 819 Ramo, A. and H. Toukomaa, "Voice Quality Evaluation of 820 Recent Open Source Codecs", 2010. 822 [Ramo2011] 823 Ramo, A. and H. Toukomaa, "Voice Quality Characterization 824 of IETF Opus Codec", 2011. 826 [Maastricht-78] 827 Valin, J.M. and K. Vos, "Codec Prototype", 2010. 829 [Prague-80] 830 Chen, R., Terriberry, T., Maxwell, G., Skoglund, J., and 831 H. Nguyet, "Testing results", 2011. 833 [SILK-Dynastat] 834 Skype, , "SILK Datasheet", 2009. 836 [ha-test] Dyakonov, , "Results of the public multiformat listening 837 test @ 64 kbps", 2011. 839 [Skoglund2011] 840 Skoglund, , "Listening tests of Opus at Google", September 841 2011. 843 [Hoene2011] 844 Hoene, and Hyder, "MUSHRA Listening Tests - Focusing on 845 Stereo Voice Coding", August 2011. 847 Appendix A. Pre-Opus listening tests 848 Several listening tests have been performed on the SILK and CELT 849 codecs prior to them being merged as part of the Opus codec. 851 A.1. SILK Dynastat listening test 853 The original (pre-Opus) SILK codec was characterized in a Dynastat 854 listening test [SILK-Dynastat]. The test included 32 conditions with 855 4 male and 4 female talkers. The test signals were wideband speech 856 with and without office background noise at 15 dB SNR. Packet loss 857 was tested at 2, 5, and 10% loss rates. The bitrates ranged from 858 8.85 kb/s to 64 kb/s. The codecs included in the test were SILK-WB, 859 AMR-WB, Speex-WB and G.722 (which ran at 64 kb/s). 861 The results showed that for clean speech (1) SILK out-performs AMR-WB 862 at all bit-rates except 8.85 kb/s (which was a tie); (2) SILK out- 863 performs Speex at all bit-rates; and (3) SILK running at 18.25 kb/s 864 and above out-performs G.722 at 64 kbps. For noisy speech, tested at 865 18.25 kb/s, SILK is tied with AMR-WB, and out-performs Speex. For 2, 866 5 and 10% packet loss, tested at 18.25 kb/s, SILK out-performs both 867 AMR-WB and Speex in all conditions. 869 A.2. SILK Deutsche Telekom test 871 In 2010 Deutsche Telekom published results [Wustenhagen2010] of their 872 evaulation of super-wideband speech and audio codecs. The test 873 included the version of SILK submitted to the IETF. The results 874 showed that for clean speech (item "speechsample") SILK was tied with 875 AMR-WB and G.718, and out-performed Speex. For noisy speech (item 876 "arbeit") SILK out-performed AMR-WB and G.718 at 12 and 24 kb/s, and 877 Speex at all bitrates. At bitrates above 24 kb/s SILK and G.718 were 878 tied. 880 A.3. SILK Nokia test 882 In 2010, Anssi Ramo from Nokia presented [Ramo2010] the results of a 883 listening test focusing on open-source codecs at Interspeech 2010. 884 The methodology used was a 9-scale ACR MOS test with clean and noisy 885 speech samples. 887 It was noted in the test that: 889 "Especially at around 16 kbit/s or above Silk is better than AMR-WB 890 at comparable bitrates. This is due to the fact that Silk wideband 891 is critically sampled up to 8 kHz instead of ITU- T or 3GPP defined 7 892 kHz. This added bandwidth (from 7 to 8 kHz) shows up in the results 893 favourable to Silk. It seems that Silk provides quite artifact free 894 voice quality for the whole 16- 24 kbit/s range with WB signals. At 895 32 and 40 kbit/s Silk is SWB and competes quite equally against 896 G.718B or G.722.1C although having a slightly narrower bandwidth than 897 the ITU-T standardized codecs." 899 A.4. CELT 0.3.2 listening test 901 The first listening tests conducted on CELT version 0.3.2 in 2009 and 902 published in 2010 [valin2010] included AAC-LD (Apple), G.722.1C and 903 MP3 (Lame). Two MUSHRA tests were conducted: a 48 kb/s test and a 64 904 kb/s test, both at a 44.1 kHz sampling rate. CELT was used with 905 256-sample frames (5.8 ms). All codecs used constant bit-rate (CBR). 906 The algorithmic delay was 8.7 ms for CELT, 34.8 ms for AAC-LD, 40 ms 907 for G.722.1C and more than 100 ms for MP3. 909 The 48 kb/s test included two clean speech samples (one male, one 910 female) from the EBU SQAM database, four clean speech files (two 911 male, two female) from the NTT multi-lingual speech database for 912 telephonometry, and two music samples. In this test, CELT out- 913 performed AAC-LD, G.722.1C and MP3. 915 The 64 kb/s test included two clean speech samples (one male, one 916 female) from the EBU SQAM database, and six music files. In this 917 test, AAC-LD out-performed CELT, but CELT out-performed both MP3 and 918 G.722.1C (running at its highest rate of 48 kb/s). 920 A.5. CELT 0.5.0 listening test 922 Another CELT listening test was conducted in 2009 on version 0.5.0 923 and presented at EUSIPCO 2009 [valin2009]. In that test, CELT was 924 compared to G.722.1C and to the Fraunhofer Ultra Low-Delay (ULD) 925 codec on 9 audio samples: 2 clean speech samples and 7 music samples. 926 At 64 kb/s with 5.3 ms frames, CELT clearly out-performed G.722.1C 927 running at 48 kb/s with 20 ms frames. Also, at 96 kb/s and equal 928 frame size (2.7 ms), CELT clearly out-performed the ULD codec. 930 Appendix B. Opus listening tests on non-final bit-stream 932 The following listening tests were conducted on the Opus codec on 933 versions prior to the bit-stream freeze. While Opus has evolved 934 since these tests were conducted, the results should be considered as 935 a _lower bound_ on the quality of the final codec. 937 B.1. First hybrid mode test 939 In July 2010, the Opus codec authors conducted a preliminary MUSHRA 940 listening test to evaluate the quality of the recently created 941 "hybrid" mode combining the SILK and CELT codecs. That test was 942 conducted at 32 kb/s and compared the following codecs: 944 o Opus hybrid mode (fullband) 946 o G.719 (fullband) 948 o CELT (fullband) 950 o SILK (wideband) 952 o BroadVoice32 (wideband) 954 The test material consisted of two English speech samples from the 955 EBU SQAM (one male, one female) database and six speech samples 956 (three male, three female) from the NTT multi-lingual speech database 957 for telephonometry. Although only eight listeners participated to 958 the test, the difference between the Opus hybrid mode and all other 959 codecs was large enough to obtain 95% confidence that the Opus hybrid 960 mode provided better quality than all other codecs tested. This test 961 is of interest because it shows that the hybrid clearly out-performs 962 the codecs that it combines (SILK and CELT). It also out-performs 963 G.719, which is the only fullband interactive codec standardized by 964 the ITU-T. These results were presented [Maastricht-78] at the 78th 965 IETF meeting Maastricht. 967 B.2. Broadcom stereo music test 969 In December 2010, Broadcom conducted an ITU-R BS.1116-style 970 subjective listening test comparing different configurations of the 971 CELT-only mode of the IETF Opus codec along with MP3 and AAC-LC. The 972 test included stereo 10 audio samples sampled at 44.1 kHz and 973 distributed as follows: 975 o 2 pure speech 977 o 2 vocal 979 o 2 solo instruments 981 o 1 rock-and-roll 983 o 1 pop 985 o 1 classical orchestra 987 o 1 jazz 989 A total of 17 listeners participated to the test. The results of the 990 test are a available on the testing slides presented at the Prague 991 meeting [Prague-80]. Although at the time, Opus was not properly 992 optimised for 44.1 kHz audio, the quality of the Opus codec at 96 kb/ 993 s with 22 ms frame was significantly better than MP3 and only 994 slightly worse than AAC-LC. Even in ultra low-delay mode (5.4 ms), 995 Opus still outperformed MP3. The test also confirmed the usefulness 996 of the prefilter/postfilter contribution by Raymond Chen, showing 997 that this contribution significantly improves quality for small 998 frames (long frames were not tested with the prefilter/postfilter 999 disabled). 1001 Appendix C. In-the-field testing 1003 Various versions of Opus (or SILK/CELT components) are currently in 1004 use in production in the following applications: 1006 o Skype: VoIP client used by hundreds of millions of people 1008 o Steam: Gaming distribution and communications platform with over 1009 30 million users 1011 o Mumble: Gaming VoIP client with more than 200 thousand users 1013 o Soundjack: Client for live network music performances 1015 o Freeswitch: Open-source telephony platform 1017 o Ekiga: Open-source VoIP client 1019 o CHNC: Radio station using CELT for its studio-transmitter link 1021 Authors' Addresses 1023 Christian Hoene (editor) 1024 Symonics GmbH 1025 Sand 13 1026 Tuebingen 72076 1027 Germany 1029 Email: christian.hoene@symonics.com 1031 Jean-Marc Valin 1032 Mozilla Corporation 1033 650 Castro Street 1034 Mountain View, CA 94041 1035 USA 1037 Phone: +1 650 903-0800 1038 Email: jmvalin@jmvalin.ca 1039 Koen Vos 1040 Skype Technologies S.A. 1041 Stadsgarden 6 1042 Stockholm 11645 1043 Sweden 1045 Email: koen.vos@skype.net 1047 Jan Skoglund 1048 Google 1050 Email: jks@google.com