idnits 2.17.1 draft-ietf-codec-results-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 5, 2012) is 4161 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 codec C. Hoene, Ed. 3 Internet-Draft Symonics GmbH 4 Intended status: Informational JM. Valin 5 Expires: May 9, 2013 Mozilla Corporation 6 K. Vos 7 Skype Technologies S.A. 8 J. Skoglund 9 Google 10 November 5, 2012 12 Summary of Opus listening test results 13 draft-ietf-codec-results-02 15 Abstract 17 This document describes and examines listening test results obtained 18 for the Opus codec and how they relate to the requirements. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on May 9, 2013. 37 Copyright Notice 39 Copyright (c) 2012 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Opus listening tests on final bit-stream . . . . . . . . . . . 4 56 2.1. Google listening tests . . . . . . . . . . . . . . . . . . 4 57 2.1.1. Google narrowband listening test . . . . . . . . . . . 4 58 2.1.2. Google wideband and fullband listening test . . . . . 5 59 2.1.3. Google stereo music listening test . . . . . . . . . . 6 60 2.1.4. Google transcoding test . . . . . . . . . . . . . . . 8 61 2.1.5. Google mandarin tests . . . . . . . . . . . . . . . . 12 62 2.2. HydrogenAudio stereo music listening test . . . . . . . . 16 63 2.3. Nokia Interspeech 2011 listening test . . . . . . . . . . 17 64 2.4. Universitaet Tuebingen stereo and binaural tests . . . . . 17 65 3. Conclusion on the requirements . . . . . . . . . . . . . . . . 20 66 3.1. Comparison to Speex (narrowband) . . . . . . . . . . . . . 20 67 3.2. Comparison to iLBC . . . . . . . . . . . . . . . . . . . . 20 68 3.3. Comparison to Speex (wideband) . . . . . . . . . . . . . . 20 69 3.4. Comparison to G.722.1 . . . . . . . . . . . . . . . . . . 20 70 3.5. Comparison to G.722.1C . . . . . . . . . . . . . . . . . . 21 71 3.6. Comparison to AMR-NB . . . . . . . . . . . . . . . . . . . 21 72 3.7. Comparison to AMR-WB . . . . . . . . . . . . . . . . . . . 21 73 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 74 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 75 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 76 Appendix A. Pre-Opus listening tests . . . . . . . . . . . . . . 25 77 A.1. SILK Dynastat listening test . . . . . . . . . . . . . . . 25 78 A.2. SILK Deutsche Telekom test . . . . . . . . . . . . . . . . 25 79 A.3. SILK Nokia test . . . . . . . . . . . . . . . . . . . . . 25 80 A.4. CELT 0.3.2 listening test . . . . . . . . . . . . . . . . 26 81 A.5. CELT 0.5.0 listening test . . . . . . . . . . . . . . . . 26 82 Appendix B. Opus listening tests on non-final bit-stream . . . . 27 83 B.1. First hybrid mode test . . . . . . . . . . . . . . . . . . 27 84 B.2. Broadcom stereo music test . . . . . . . . . . . . . . . . 27 85 Appendix C. In-the-field testing . . . . . . . . . . . . . . . . 29 86 7. Informative References . . . . . . . . . . . . . . . . . . . . 30 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 31 89 1. Introduction 91 This document describes and examines listening test results obtained 92 for the Opus codec. Some of the test results presented are based on 93 older versions of the codec or on older versions of the SILK or CELT 94 components. While they do not necessarily represent the exact 95 quality of the current version, they are nonetheless useful for 96 validating the technology used and as an indication of a lower bound 97 on quality (based on the assumption that the codec has been improved 98 since they were performed). 100 Throughout this document, all statements about one codec being better 101 than or worse than another codec are based on 95% confidence. When 102 no statistically significant difference can be shown with 95% 103 confidence, then two codecs are said to be "tied". 105 In addition to the results summarized in this draft, Opus has been 106 subjected to many informal subjective listening tests, as well as 107 objective testing. 109 2. Opus listening tests on final bit-stream 111 The following tests were performed on the Opus codec _after_ the bit- 112 stream was finalized. 114 2.1. Google listening tests 116 The tests followed the MUSHRA test methodology. Two anchors were 117 used, one lowpass-filtered at 3.5 kHz and one lowpass-filtered at 7.0 118 kHz. Both trained and untrained listeners participated in the tests. 119 The reference signals were manually normalized to the same subjective 120 levels according to the experimenters' opinion. Experiments with 121 automatic normalization with respect to both level and loudness (in 122 Adobe Audition) did not result in signals having equal subjective 123 loudness. The sample magnitude levels were kept lower than 2^14 to 124 provide headroom for possible amplification through the codecs. 125 However, the normalization exercise was not repeated with the 126 processed sequences as neither the experimenters nor any of the 127 subjects (which included expert listeners) noticed any significant 128 level differences between the conditions in the tests. The only 129 post-processing performed was to remove noticeable delays in the MP3 130 files, as one could identify the MP3 samples when switching between 131 conditions when the MP3 had the longer delay. The testing tool Step 132 from ARL was used for tests and all listeners were instructed to to 133 carefully listen through the conditions before starting the grading. 134 The results of the tests are a available on the testing slides 135 presented at the Prague meeting [Prague-80]. 137 2.1.1. Google narrowband listening test 139 The test sequences in Test 1 were mono recordings (between 2 and 6 140 seconds long) of 4 different male and 4 different female speakers 141 sampled at 48 kHz in low background noise. 17 listeners were 142 presented with 6 stimuli according to Table 1 for each test sequence. 143 The corresponding bit rate for the reference is 48000 (sampling 144 frequency in Hz) x 16 (bits/sample) = 768 kbps. Since the anchors 145 are low-pass filtered they can also be downsampled for transmission 146 which corresponds to lower bit rates. Three narrowband codecs were 147 compared in this test: Opus NB, the royalty-free iLBC, and the 148 royalty-free Speex. The codecs all have an encoder frame length of 149 20 ms. Both Opus and Speex had variable rate whereas iLBC operated 150 at a fixed bit rate. 152 +-----------+----------------------+----------------+ 153 | Type | Signal bandwidth | Bitrate | 154 +-----------+----------------------+----------------+ 155 | Reference | 24 kHz (Fullband) | | 156 | | | | 157 | Anchor 1 | 3.5 kHz (Narrowband) | | 158 | | | | 159 | Anchor 2 | 7 kHz (Wideband) | | 160 | | | | 161 | iLBC | 4 kHz (Narrowband) | 15.2 kbps, CBR | 162 | | | | 163 | Opus NB | 4 kHz (Narrowband) | 11 kbps, VBR | 164 | | | | 165 | Speex NB | 3.5 kHz (Narrowband) | 11 kbps, VBR | 166 +-----------+----------------------+----------------+ 168 Table 1: Narrowband mono voice: test conditions 170 The overall results of the narrowband test, i.e., averaged over all 171 listeners for all sequences, are presented in the Prague meeting 172 slides [Prague-80]. The results suggest that Opus at 11 kbps is 173 superior to both iLBC at 15 kpbs and Speex at 11 kbps. T-tests 174 performed by Greg Maxwell confirm that there is indeed a 175 statistically significant difference. Note also that Opus has a 176 slightly higher average score than the 3.5 kHz anchor, likely due to 177 the higher bandwidth of Opus. 179 2.1.2. Google wideband and fullband listening test 181 The eight test sequences for the previous test were also used in this 182 Test. 16 listeners rated the stimuli listed in Table 2. In this test 183 comparisons were made between four wideband codecs: Opus WB, the 184 royalty-free Speex, the royalty-free ITU-T G.722.1, AMR-WB (ITU-T 185 G.722.2), and two fullband codecs: Opus FB and the royalty-free ITU-T 186 G.719. All six codecs utilize 20 ms encoding frames. Opus used 187 variable bitrate, while other codecs used constant bit rate. 189 +-----------+----------------------+-----------------+ 190 | Type | Signal bandwidth | Bitrate | 191 +-----------+----------------------+-----------------+ 192 | Reference | 24 kHz (Fullband) | | 193 | | | | 194 | Anchor 1 | 3.5 kHz (Narrowband) | | 195 | | | | 196 | Anchor 2 | 7 kHz (Wideband) | | 197 | | | | 198 | G.722.1 | 7 kHz (Wideband) | 24 kbps, CBR | 199 | | | | 200 | Speex WB | 7 kHz (Wideband) | 23.8 kbps, CBR | 201 | | | | 202 | AMR-WB | 7 kHz (Wideband) | 19.85 kbps, CBR | 203 | | | | 204 | Opus WB | 8 kHz (Wideband) | 19.85 kbps, VBR | 205 | | | | 206 | G.719 | ~20 kHz (Fullband) | 32 kbps, CBR | 207 | | | | 208 | Opus FB | ~20 kHz (Fullband) | 32 kbps, CBR | 209 +-----------+----------------------+-----------------+ 211 Table 2: Wideband and fullband mono voice: test conditions 213 The results from this test are depicted in the Prague meeting 214 slides[Prague-80]. Opus at 32 kbps is almost transparent, although 215 there is a small, but statistically significant, difference from the 216 fullband reference material. Opus at 20 kbps is significantly better 217 than all the other codecs, including AMR-WB and the fullband G.719, 218 and both low-pass anchors. 220 2.1.3. Google stereo music listening test 222 The sequences in this test were excerpts from 10 different stereo 223 music files: 225 o Rock/RnB (Boz Scaggs) 227 o Soft Rock (Steely Dan) 229 o Rock (Queen) 231 o Jazz (Harry James) 233 o Classical (Purcell) 235 o Electronica (Matmos) 236 o Piano (Moonlight Sonata) 238 o Vocals (Suzanne Vega) 240 o Glockenspiel 242 o Castanets 244 These sequences were originally recorded at a sampling frequency of 245 44.1 kHz and were upsampled to 48 kHz prior to processing. Test 3 246 included comparisons between six codecs (c.f., Table 3): Opus at 247 three rates, G.719, AAC-LC (Nero 1.5.1), and MP3 (Lame 3.98.4). 248 G.719 is a mono codec, so the two channels were each coded 249 independently at 32 kbps. 9 listeners participated in Test 3, and the 250 results are depicted in the Prague meeting slides[Prague-80]. The 251 codecs operated at constant (or comparable) bit rate. 253 +-----------+-------------------+-------------+---------------------+ 254 | Type | Signal bandwidth | Frame size | Bitrate | 255 | | | (ms) | | 256 +-----------+-------------------+-------------+---------------------+ 257 | Reference | 22 kHz (Fullband) | - | (1536 kbps) | 258 | | | | | 259 | Anchor 1 | 3.5 kHz | - | (256 kbps) | 260 | | (Narrowband) | | | 261 | | | | | 262 | Anchor 2 | 7 kHz (Wideband) | - | (512 kbps) | 263 | | | | | 264 | MP3 | 16 kHz (Super | >100 | 96 kbps, CBR | 265 | | wideband) | | | 266 | | | | | 267 | AAC-LC | ~20 kHz | 21 | 64 kbps, CBR (bit | 268 | | (Fullband) | | reservoir) | 269 | | | | | 270 | G.719 | ~20 kHz | 20 | 64 kbps (2x32), CBR | 271 | | (Fullband) | | | 272 | | | | | 273 | Opus FB | ~20 kHz | 20 | 64 kbps, | 274 | | (Fullband) | | constrained VBR | 275 | | | | | 276 | Opus FB | ~20 kHz | 10 | 80 kbps, | 277 | | (Fullband) | | constrained VBR | 278 | | | | | 279 | Opus FB | ~20 kHz | 5 | 128 kbps, | 280 | | (Fullband) | | constrained VBR | 281 +-----------+-------------------+-------------+---------------------+ 283 Table 3: Stereo music: Test conditions 285 The results indicate that all codecs had comparable performance, 286 except for G.719, which had a considerably lower score. T-tests by 287 Greg Maxwell verified that the low-delay Opus at 128 kbps had a 288 significantly higher performance and that G.719 had a significantly 289 lower performance than the other four. 291 2.1.4. Google transcoding test 293 If two telephone networks of different technology are coupled, 294 frequently speech has to be transcoded: It must be decoded and 295 encoded before it can be forward to the next network. Then, two 296 codecs are cooperating in a row, which is called tandem coding. 298 In the following tests, Jan Skoglund studied the impact of 299 transcoding if Opus call is forwarded to a cellular phone system. 300 [Skoglund2011]. Two tests were conducted for both narrowband and 301 wideband speech items. The test conditions of the narrow-band tests 302 are given in Table and the respective results in . For the wide-band 303 conditions and results refer to Table and . 305 +-------------+-----------------------------------------------------+ 306 | Condition | Value | 307 +-------------+-----------------------------------------------------+ 308 | Laboratory | Google | 309 | | | 310 | Examiner | Jan Skoglund | 311 | | | 312 | Date | August and September 2011 | 313 | | | 314 | Methodology | ITU-R BS.1534-1 (MUSHRA) | 315 | | | 316 | Reference | Two male and two female speakers from ITU-T P.501. | 317 | items | Two male and two female speakers from McGill | 318 | | database. All recorded at 48kHz in a room with low | 319 | | background noise. | 320 | | | 321 | Listeners | 19 listeners no listeners rejected / trained and | 322 | | untrained English-speaking listeners | 323 | | | 324 | Anchor 1 | Reference file lowpass-filtered at 3.5 kHz | 325 | | | 326 | Anchor 2 | Reference file resampled at 8 kHz, with MNRU at 15 | 327 | | dB SNR | 328 | | | 329 | Test | G.711 at 64 kbps -> Opus NB at 12.2 kbps, variable | 330 | Condition 1 | bit rate | 331 | | | 332 | Test | G.711 at 64 kbps -> AMR NB at 12.2 kbps, constant | 333 | Condition 2 | bit rate | 334 | | | 335 | Test | AMR NB at 12.2 kbps -> G.711 at 64 kbps -> Opus NB | 336 | Condition 3 | at 12.2 kbps | 337 | | | 338 | Test | Opus NB at 12.2 kbps > G.711 at 64 kbps > AMR NB at | 339 | Condition 4 | 12.2 kbps | 340 | | | 341 | Test | AMR NB at 12.2 kbps -> G.711 at 64 kbps -> AMR NB | 342 | Condition 5 | at 12.2 kbps | 343 +-------------+-----------------------------------------------------+ 345 Table 4: Narrowband tandem coding: test conditions 347 +------------------+-------------------------+--------+ 348 | Test Item | Subjective MUSHRA score | 95% CI | 349 +------------------+-------------------------+--------+ 350 | Reference | 99.47 | 0.36 | 351 | | | | 352 | LP3.5 | 63.49 | 3.01 | 353 | | | | 354 | G.711->Opus | 54.51 | 2.85 | 355 | | | | 356 | G.711->AMR | 54.13 | 2.67 | 357 | | | | 358 | AMR->G.711->Opus | 51.11 | 2.74 | 359 | | | | 360 | Opus->G.711->AMR | 50.95 | 2.76 | 361 | | | | 362 | AMR->G.711->AMR | 47.81 | 2.95 | 363 | | | | 364 | MNRU | 14.94 | 2.21 | 365 +------------------+-------------------------+--------+ 367 Table 5: Tandem narrowband coding: test results 369 +-------------+-----------------------------------------------------+ 370 | Condition | Value | 371 +-------------+-----------------------------------------------------+ 372 | Laboratory | Google | 373 | | | 374 | Examiner | Jan Skoglund | 375 | | | 376 | Date | August and September 2011 | 377 | | | 378 | Methodology | MUSHRA | 379 | | | 380 | Reference | Two male and two female speakers from ITU-T P.501. | 381 | items | Two male and two female speakers recorded at Google | 382 | | at 48kHz in a room with low background noise | 383 | | | 384 | Listeners | 18 listeners after post-screening / no listener | 385 | | rejects / untrained and trained English speaking | 386 | | listeners | 387 | | | 388 | Anchor 1 | Reference file lowpass-filtered at 3.5 kHz (LP 3.5) | 389 | | | 390 | Anchor 2 | Reference file lowpass-filtered at 7 kHz (LP 7) | 391 | | | 392 | Test | Opus WB at 19.85 kbps, variable bit rate (Opus) | 393 | Condition 1 | | 394 | | | 395 | Test | AMR WB at 19.85 kbps, constant bit rate (AMR WB) | 396 | Condition 2 | | 397 | | | 398 | Test | AMR WB at 19.85 kbps > Opus WB at 19.85 kbps | 399 | Condition 3 | | 400 | | | 401 | Test | Opus WB at 19.85 kbps -> AMR WB at 19.85 kbps | 402 | Condition 4 | | 403 +-------------+-----------------------------------------------------+ 405 Table 6: Tandem wideband coding: test conditions 407 +--------------+--------------------------+--------+ 408 | Test Item | Subjective BS.1587 Score | 95% CI | 409 +--------------+--------------------------+--------+ 410 | Reference | 99.44 | 0.38 | 411 | | | | 412 | Opus | 78.38 | 2.16 | 413 | | | | 414 | LP7 | 74.24 | 2.24 | 415 | | | | 416 | AMR WB | 65.26 | 2.85 | 417 | | | | 418 | AMR WB->Opus | 63.97 | 2.95 | 419 | | | | 420 | Opus->AMR WB | 62.83 | 2.94 | 421 | | | | 422 | LP3.5 | 37.01 | 2.95 | 423 +--------------+--------------------------+--------+ 425 Table 7: Tandem wideband coding: test results 427 Under the given statistical confidence, narrowband tandem coding 428 condition using AMR and/or Opus are of similar quality. However, the 429 results have indications that Opus outperforms AMR NB slightly. In 430 any case, narrow band transcoding is worse than a low pass filtering 431 at 3.5kbps. 433 Opus at 20kbps outperforms AMR WB at a similar coding rate and 434 matches the quality of a 7kHz lowpass filtered signal. Tandem coding 435 with Opus does not reduce the quality of AMR WB encoded speech in the 436 studied conditions. 438 2.1.5. Google mandarin tests 440 Modern Standard Chinese - also called Mandarin - is a tonal language 441 that is spoken by about 845 million persons. In past, codecs have 442 been developed without consideration of the unique properties of 443 tonal languages. For the testing of Opus, Jan Skoglund has conducted 444 subjective listening-only tests to verify whether Opus can cope well 445 for Mandarin [Skoglund2011]. Two tests were conducted for both 446 narrow- and wide-band speech items. The test conditions of the 447 narrow-band tests are given in Table and the respective results in . 448 For the wide-band conditions and results refer to Table and Table 449 +-------------+-----------------------------------------------------+ 450 | Condition | Value | 451 +-------------+-----------------------------------------------------+ 452 | Laboratory | Google | 453 | | | 454 | Examiner | Jan Skoglund | 455 | | | 456 | Date | August and September 2011 | 457 | | | 458 | Methodology | ITU-R BS.1534-1 (MUSHRA) | 459 | | | 460 | Reference | Two male and two female speakers from ITU-T P.501. | 461 | items | Two male and two female speakers recorded at Google | 462 | | at 48kHz in a room with low background noise. | 463 | | | 464 | Listeners | 21 listeners after post-screening / no listeners | 465 | | rejected / untrained Mandarin-speaking listeners | 466 | | | 467 | Anchor 1 | Reference file lowpass-filtered at 3.5 kHz (LP 3.5) | 468 | | | 469 | Anchor 2 | Reference file resampled at 8 kHz, with MNRU at 15 | 470 | | dB SNR (MNRU) | 471 | | | 472 | Test | Opus NB at 11 kbps, variable bit rate (Opus 11) | 473 | Condition 1 | | 474 | | | 475 | Test | Speex NB at 11 kbps, variable bit rate (Speex 11) | 476 | Condition 2 | | 477 | | | 478 | Test | iLBC at 15.2 kbps, constant bit rate (iBLC 15) | 479 | Condition 3 | | 480 +-------------+-----------------------------------------------------+ 482 Table 8: Narrowband mandarin: test conditions 484 +-----------+----------------------------+--------+ 485 | Test Item | Subjective BS.1534-1 Score | 95% CI | 486 +-----------+----------------------------+--------+ 487 | Reference | 99.79 | 0.19 | 488 | | | | 489 | Opus 11 | 77.90 | 2.15 | 490 | | | | 491 | iLBC 15 | 76.76 | 2.08 | 492 | | | | 493 | LP 3.5 | 76.25 | 2.34 | 494 | | | | 495 | Speex 11 | 63.60 | 3.30 | 496 | | | | 497 | MNRU | 22.83 | 2.50 | 498 +-----------+----------------------------+--------+ 500 Table 9: Mandarin narrowband speech: test results 502 +-------------+-----------------------------------------------------+ 503 | Condition | Value | 504 +-------------+-----------------------------------------------------+ 505 | Laboratory | Google | 506 | | | 507 | Examiner | Jan Skoglund | 508 | | | 509 | Date | August and September 2011 | 510 | | | 511 | Methodology | MUSHRA | 512 | | | 513 | Reference | Two male and two female speakers from ITU-T P.501. | 514 | items | Two male and two female speakers recorded at Google | 515 | | at 48kHz in a room with low background noise | 516 | | | 517 | Listeners | 19 listeners after post-screening / Rejected 3 | 518 | | listeners having score correlation with the total | 519 | | average lower than R=0.8. | 520 | | | 521 | Anchor 1 | Reference file lowpass-filtered at 3.5 kHz (LP 3.5) | 522 | | | 523 | Anchor 2 | Reference file lowpass-filtered at 7 kHz (LP 7) | 524 | | | 525 | Test | Opus WB at 19.85 kbps, variable bit rate (Opus 20) | 526 | Condition 1 | | 527 | | | 528 | Test | Speex WB at 23.8 kbps, constant bit rate (Speex 24) | 529 | Condition 2 | | 530 | | | 531 | Test | G.722.1 at 24 kbps, constant bit rate (G.722.1 24) | 532 | Condition 3 | | 533 | | | 534 | Test | Opus FB at 32 kbps, constant bit rate (Opus 32) | 535 | Condition 4 | | 536 | | | 537 | Test | G.719 at 32 kbps, constant bit rate (G.719 32) | 538 | Condition 5 | | 539 +-------------+-----------------------------------------------------+ 541 Table 10: Mandarin wideband speech: test conditions 542 +------------+--------------------------+--------+ 543 | Test Item | Subjective BS.1587 Score | 95% CI | 544 +------------+--------------------------+--------+ 545 | Reference | 98.95 | 0.59 | 546 | | | | 547 | Opus 32 | 98.13 | 0.72 | 548 | | | | 549 | G.719 32 | 93.43 | 1.51 | 550 | | | | 551 | Opus 20 | 81.59 | 2.48 | 552 | | | | 553 | LP 7 | 79.51 | 2.53 | 554 | | | | 555 | G.722.1 24 | 72.55 | 3.06 | 556 | | | | 557 | LP 3.5 | 54.57 | 3.44 | 558 | | | | 559 | Speex 24 | 53.63 | 4.23 | 560 +------------+--------------------------+--------+ 562 Table 11: Mandarin wideband speech: test results 564 Under the given confidence intervals, the quality of Opus at 11 kbps 565 equals the quality of iLBC at 15 kbps and the quality aferlowpass 566 filtering at 3.5 kHz. Speex at 11 kbps does not perform as well. 567 According to the listening-only tests, Opus at 32 kbps is better than 568 G.719 at 32 kbps. Opus at 20 kbps outperforms G.722.1 and Speex at 569 24 kbps. If one compares the Mandarin results with those for English 570 (Section 2.1.1 and Section 2.1.2), one can see that are pretty 571 consistent. The only difference is that using English stimuli Opus 572 at 20 kbps outperforms G.719 at 32 kbps. Probabily, this is due to 573 the fact that Mandarin speech does not contain as many high 574 frequency-rich consonants such as [s] as English. 576 2.2. HydrogenAudio stereo music listening test 578 In March 2011, the HydrogenAudio community conducted a listening test 579 comparing codec performance on stereo audio at 64 kb/s [ha-test]. 580 The Opus codec was compared to the Apple and Nero implementations of 581 HE-AAC, as well as to the Vorbis codec. The test included 30 audio 582 samples, including known "hard to code" samples from previous 583 HydrogenAudio listening tests. 585 A total of 33 listeners participated in the test, 10 of which 586 provided results for all the audio samples. The results of test 587 showed that Opus out-performed both HE-AAC implementations as well as 588 Vorbis. 590 2.3. Nokia Interspeech 2011 listening test 592 In 2011, Anssi Ramo from Nokia submitted [Ramo2011] the results of a 593 second listening test, focusing specifically on the Opus codec, to 594 Interspeech 2011. As in the previous test, the methodology used was 595 a 9-scale ACR MOS test with clean and noisy speech samples. 597 The results show Opus clearly out-performing both G.722.1C and G.719 598 on clean speech at 24 kb/s and above, while on noisy speech all 599 codecs and bit-rates above 24 kb/s are very close. It is also found 600 that the Opus hybrid mode at 28 kb/s has quality that is very close 601 to the recent G.718B standard at the same rate. At 20 kb/s, the Opus 602 wideband mode also out-performs AMR-WB, while the situation is 603 reversed for 12 kb/s and below. The only narrowband rate tested is 6 604 kb/s, which is below what Opus targets and unsurprisingly shows 605 poorer quality than AMR-NB at 5.9 kb/s.M 607 2.4. Universitaet Tuebingen stereo and binaural tests 609 Modern teleconferencing system use stereo or spatialy rendered speech 610 to enhance the conversation quality. Then, talkers can be identified 611 according to their acoustic locations. Opus allows to encode speech 612 in a stereo mode. In the tests conducted by Christian 613 Hoene[Hoene2011], the performance of Opus coding stereo and binaural 614 speech was studied. 616 +-------------+-----------------------------------------------------+ 617 | Condition | Value | 618 +-------------+-----------------------------------------------------+ 619 | Laboratory | Univesitaet Tuebingen | 620 | | | 621 | Examiner | Christian Hoene and Mansoor Hyder | 622 | | | 623 | Date | August 2011 | 624 | | | 625 | Methodology | ITU-R BS.1534-1 (MUSHRA) using a modified "rateit | 626 | | v0.1" software with German translations. | 627 | | | 628 | Reference | One German female voice recorded in stereo (8s). | 629 | items | Two female voices (stereo recording) mixed together | 630 | | (9 s). One moving talker binaural rendered with | 631 | | HTRF and an artificial room impulse response (13 | 632 | | s). Two voices binaural render at two different | 633 | | stationary positions. Acappella Song "Mein | 634 | | Fahrrad" by "Die Prinzen" (10.5s, mono) | 635 | | | 636 | Listeners | 20 German native speakers. Age between 20 and 59 | 637 | | (avg. 30.55). 9 male and 11 female. All have | 638 | | academic background. Three listeners were rejected | 639 | | because their rating showed a low correlation | 640 | | (R<0.8) to the average ratings. | 641 | | | 642 | Anchor | Reference file lowpass-filtered at 3.5 kHz calling | 643 | | "sox in.wav -r48000 -c1 out.wav lowpass 3500" | 644 | | | 645 | Test | Opus in the SILK mode, 12kbps, stereo, 60ms calling | 646 | Condition 1 | "draft-ietf-codec-opus-07/test_opus 0 48000 2 12000 | 647 | | -cbr -framesize 60 -bandwidth NB" | 648 | | | 649 | Test | Opus in the SILK mode, 16kbps, stereo, 20ms calling | 650 | Condition 2 | "draft-ietf-codec-opus-07/test_opus 0 48000 2 16000 | 651 | | -cbr -framesize 20 -bandwidth WB" | 652 | | | 653 | Test | Opus in the HYBRID mode, 32kbps, stereo, 20ms | 654 | Condition 3 | calling "draft-ietf-codec-opus-07/test_opus 0 48000 | 655 | | 2 32000 -cbr -framesize 20 -bandwidth FB" | 656 | | | 657 | Test | Opus in the CELT mode, 64kbps, stereo, 20ms calling | 658 | Condition 4 | "draft-ietf-codec-opus-07/test_opus 1 48000 2 64000 | 659 | | -cbr -framesize 20 -bandwidth FB" | 660 | | | 661 | Test | AMR-WB+ at 12kbps, 80ms using | 662 | Condition 5 | 26304_ANSI-C_source_code_v6_6_0: Arguments: -rate | 663 | | 12 | 664 | Test | AMR-WB+ at 15.2kbps, 80ms using | 665 | Condition 6 | 26304_ANSI-C_source_code_v6_6_0: Arguments: -rate | 666 | | 16 | 667 | | | 668 | Test | AMR-WB+ at 32kbps, 60ms using | 669 | Condition 7 | 26304_ANSI-C_source_code_v6_6_0: Arguments: -rate | 670 | | 32 | 671 +-------------+-----------------------------------------------------+ 673 Table 12: Stereo and binaural speech coding: test conditions 675 +------------+----------------------------+--------+ 676 | Test Item | Subjective BS.1534-1 Score | 95% CI | 677 +------------+----------------------------+--------+ 678 | Reference | 97.36 | 1.31 | 679 | | | | 680 | Opus 64 | 95.58 | 1.76 | 681 | | | | 682 | AMR-WB+ 32 | 80.11 | 4.79 | 683 | | | | 684 | Opus 32 | 55.42 | 5.96 | 685 | | | | 686 | AMR-WB+ 16 | 49.69 | 6.05 | 687 | | | | 688 | LP 3.5 | 48.35 | 4.50 | 689 | | | | 690 | Opus 16 | 39.31 | 4.80 | 691 | | | | 692 | AMR-WP+ 12 | 35.40 | 5.79 | 693 | | | | 694 | Opus 12 | 16.99 | 3.49 | 695 +------------+----------------------------+--------+ 697 Table 13: Binaural Speech: Test Results 699 According to the test results, Opus transmits binaural content well 700 at 64kbps. The other Opus results are not valid anymore as the codec 701 implementation have been updated. 703 3. Conclusion on the requirements 705 The requirements call for the Opus codec to be better than Speex and 706 iLBC in narrowband mode, better than Speex and G.722.1 in wideband 707 mode, and better than G.722.1C in super-wideband/fullband mode. 709 3.1. Comparison to Speex (narrowband) 711 The Opus codec was compared to Speex in narrowband mode in the Google 712 narrowband test (Section 2.1.1). This test showed that Opus at 11 713 kb/s was significantly better than Speex at the same rate. In fact, 714 Opus at 11 kb/s was tied with the 3.5 low-pass of the original. 715 Considering the results, we conclude that the Opus codec is better 716 than the Speex codec. 718 3.2. Comparison to iLBC 720 The Opus codec was compared to iLBC in the Google narrowband test 721 (Section 2.1.1). This test showed that Opus at 11 kb/s was 722 significantly better than iLBC running at 15 kb/s. Considering the 723 results, we conclude that the Opus codec is better than the iLBC 724 codec. 726 3.3. Comparison to Speex (wideband) 728 The Opus codec was compared to Speex in wideband mode in the Google 729 wideband and fullband test (Section 2.1.2). This test showed that 730 Opus at 20 kb/s was significantly better than Speex at at 24 kb/s. 731 In fact, Opus at 20 kb/s was better than the 7 kHz low-pass of the 732 original. These results are consistent with an earlier Dynastat test 733 (Appendix A.1) that also concluded that SILK had significantly higher 734 quality than Speex in wideband mode at the same bit-rate. 735 Considering the results, we conclude that the Opus codec is better 736 than the Speex codec for wideband. 738 3.4. Comparison to G.722.1 740 In the Google wideband and fullband test (Section 2.1.2), Opus at 20 741 kb/s was shown to significantly out-perform G.722.1 operating at 24 742 kb/s. An indirect comparison point also comes from the Nokia 743 Interspeech 2011 listening test (Section 2.3) that shows Opus out- 744 performing AMR-WB at 20 kb/s, while AMR-WB is known to out-perform 745 G.722.1. Considering these results, we conclude that the Opus codec 746 is better than the G.722.1 codec for wideband. 748 3.5. Comparison to G.722.1C 750 Opus has been compared to G.722.1C in multiple listening tests. As 751 early as 2008, an old version of the CELT codec (Appendix A.4) using 752 very short frames was found to have higher quality than G.722.1C at 753 48 kb/s. More recently, the Nokia Interspeech 2011 listening test 754 (Section 2.3) showed that Opus out-performed G.722.1C at 24 kb/s, 32 755 kb/s, and 48 kb/s. We thus conclude that the Opus codec is better 756 than the G.722.1C codec for superwideband/fullband audio. 758 3.6. Comparison to AMR-NB 760 In the Google narrowband test (Section 2.1.1), Opus was shown to out- 761 perform AMR-NB at 12 kb/s. On the other hand, in the Nokia 762 Interspeech 2011 listening test (Section 2.3), AMB-NB was found to 763 have better quality than Opus at 6 kb/s. This indicates that Opus is 764 better than AMR-NB at higher rates and worse at lower rates, which is 765 to be expected given Opus' emphasis on higher quality and higher 766 rates. 768 3.7. Comparison to AMR-WB 770 In the Google wideband and fullband test (Section 2.1.2), Opus at 20 771 kb/s was shown to out-perform AMR-WB at the same rate. This was also 772 confirmed by the Nokia Interspeech 2011 listening test (Section 2.3), 773 with also found AMR-WB to out-perform Opus at 12 kb/s and below. As 774 with AMR-NB, we conclude that Opus is better than AMR-WB at higher 775 rates and worse at lower rates. 777 4. Security Considerations 779 No security considerations. 781 5. IANA Considerations 783 This document has no actions for IANA. 785 6. Acknowledgments 787 The authors would like to thank Anssi Ramo and the HydrogenAudio 788 community, who conducted some of the Opus listening test cited in 789 this draft. 791 Appendix A. Pre-Opus listening tests 793 Several listening tests have been performed on the SILK and CELT 794 codecs prior to them being merged as part of the Opus codec. 796 A.1. SILK Dynastat listening test 798 The original (pre-Opus) SILK codec was characterized in a Dynastat 799 listening test [SILK-Dynastat]. The test included 32 conditions with 800 4 male and 4 female talkers. The test signals were wideband speech 801 with and without office background noise at 15 dB SNR. Packet loss 802 was tested at 2, 5, and 10% loss rates. The bitrates ranged from 803 8.85 kb/s to 64 kb/s. The codecs included in the test were SILK-WB, 804 AMR-WB, Speex-WB and G.722 (which ran at 64 kb/s). 806 The results showed that for clean speech (1) SILK out-performs AMR-WB 807 at all bit-rates except 8.85 kb/s (which was a tie); (2) SILK out- 808 performs Speex at all bit-rates; and (3) SILK running at 18.25 kb/s 809 and above out-performs G.722 at 64 kbps. For noisy speech, tested at 810 18.25 kb/s, SILK is tied with AMR-WB, and out-performs Speex. For 2, 811 5 and 10% packet loss, tested at 18.25 kb/s, SILK out-performs both 812 AMR-WB and Speex in all conditions. 814 A.2. SILK Deutsche Telekom test 816 In 2010 Deutsche Telekom published results [Wustenhagen2010] of their 817 evaulation of super-wideband speech and audio codecs. The test 818 included the version of SILK submitted to the IETF. The results 819 showed that for clean speech (item "speechsample") SILK was tied with 820 AMR-WB and G.718, and out-performed Speex. For noisy speech (item 821 "arbeit") SILK out-performed AMR-WB and G.718 at 12 and 24 kb/s, and 822 Speex at all bitrates. At bitrates above 24 kb/s SILK and G.718 were 823 tied. 825 A.3. SILK Nokia test 827 In 2010, Anssi Ramo from Nokia presented [Ramo2010] the results of a 828 listening test focusing on open-source codecs at Interspeech 2010. 829 The methodology used was a 9-scale ACR MOS test with clean and noisy 830 speech samples. 832 It was noted in the test that: 834 "Especially at around 16 kbit/s or above Silk is better than AMR-WB 835 at comparable bitrates. This is due to the fact that Silk wideband 836 is critically sampled up to 8 kHz instead of ITU- T or 3GPP defined 7 837 kHz. This added bandwidth (from 7 to 8 kHz) shows up in the results 838 favourable to Silk. It seems that Silk provides quite artifact free 839 voice quality for the whole 16- 24 kbit/s range with WB signals. At 840 32 and 40 kbit/s Silk is SWB and competes quite equally against 841 G.718B or G.722.1C although having a slightly narrower bandwidth than 842 the ITU-T standardized codecs." 844 A.4. CELT 0.3.2 listening test 846 The first listening tests conducted on CELT version 0.3.2 in 2009 and 847 published in 2010 [valin2010] included AAC-LD (Apple), G.722.1C and 848 MP3 (Lame). Two MUSHRA tests were conducted: a 48 kb/s test and a 64 849 kb/s test, both at a 44.1 kHz sampling rate. CELT was used with 256- 850 sample frames (5.8 ms). All codecs used constant bit-rate (CBR). 851 The algorithmic delay was 8.7 ms for CELT, 34.8 ms for AAC-LD, 40 ms 852 for G.722.1C and more than 100 ms for MP3. 854 The 48 kb/s test included two clean speech samples (one male, one 855 female) from the EBU SQAM database, four clean speech files (two 856 male, two female) from the NTT multi-lingual speech database for 857 telephonometry, and two music samples. In this test, CELT out- 858 performed AAC-LD, G.722.1C and MP3. 860 The 64 kb/s test included two clean speech samples (one male, one 861 female) from the EBU SQAM database, and six music files. In this 862 test, AAC-LD out-performed CELT, but CELT out-performed both MP3 and 863 G.722.1C (running at its highest rate of 48 kb/s). 865 A.5. CELT 0.5.0 listening test 867 Another CELT listening test was conducted in 2009 on version 0.5.0 868 and presented at EUSIPCO 2009 [valin2009]. In that test, CELT was 869 compared to G.722.1C and to the Fraunhofer Ultra Low-Delay (ULD) 870 codec on 9 audio samples: 2 clean speech samples and 7 music samples. 871 At 64 kb/s with 5.3 ms frames, CELT clearly out-performed G.722.1C 872 running at 48 kb/s with 20 ms frames. Also, at 96 kb/s and equal 873 frame size (2.7 ms), CELT clearly out-performed the ULD codec. 875 Appendix B. Opus listening tests on non-final bit-stream 877 The following listening tests were conducted on the Opus codec on 878 versions prior to the bit-stream freeze. While Opus has evolved 879 since these tests were conducted, the results should be considered as 880 a _lower bound_ on the quality of the final codec. 882 B.1. First hybrid mode test 884 In July 2010, the Opus codec authors conducted a preliminary MUSHRA 885 listening test to evaluate the quality of the recently created 886 "hybrid" mode combining the SILK and CELT codecs. That test was 887 conducted at 32 kb/s and compared the following codecs: 889 o Opus hybrid mode (fullband) 891 o G.719 (fullband) 893 o CELT (fullband) 895 o SILK (wideband) 897 o BroadVoice32 (wideband) 899 The test material consisted of two English speech samples from the 900 EBU SQAM (one male, one female) database and six speech samples 901 (three male, three female) from the NTT multi-lingual speech database 902 for telephonometry. Although only eight listeners participated to 903 the test, the difference between the Opus hybrid mode and all other 904 codecs was large enough to obtain 95% confidence that the Opus hybrid 905 mode provided better quality than all other codecs tested. This test 906 is of interest because it shows that the hybrid clearly out-performs 907 the codecs that it combines (SILK and CELT). It also out-performs 908 G.719, which is the only fullband interactive codec standardized by 909 the ITU-T. These results were presented [Maastricht-78] at the 78th 910 IETF meeting Maastricht. 912 B.2. Broadcom stereo music test 914 In December 2010, Broadcom conducted an ITU-R BS.1116-style 915 subjective listening test comparing different configurations of the 916 CELT-only mode of the IETF Opus codec along with MP3 and AAC-LC. The 917 test included stereo 10 audio samples sampled at 44.1 kHz and 918 distributed as follows: 920 o 2 pure speech 921 o 2 vocal 923 o 2 solo instruments 925 o 1 rock-and-roll 927 o 1 pop 929 o 1 classical orchestra 931 o 1 jazz 933 A total of 17 listeners participated to the test. The results of the 934 test are a available on the testing slides presented at the Prague 935 meeting [Prague-80]. Although at the time, Opus was not properly 936 optimised for 44.1 kHz audio, the quality of the Opus codec at 96 937 kb/s with 22 ms frame was significantly better than MP3 and only 938 slightly worse than AAC-LC. Even in ultra low-delay mode (5.4 ms), 939 Opus still outperformed MP3. The test also confirmed the usefulness 940 of the prefilter/postfilter contribution by Raymond Chen, showing 941 that this contribution significantly improves quality for small 942 frames (long frames were not tested with the prefilter/postfilter 943 disabled). 945 Appendix C. In-the-field testing 947 Various versions of Opus (or SILK/CELT components) are currently in 948 use in production in the following applications: 950 o Skype: VoIP client used by hundreds of millions of people 952 o Steam: Gaming distribution and communications platform with over 953 30 million users 955 o Mumble: Gaming VoIP client with more than 200 thousand users 957 o Soundjack: Client for live network music performances 959 o Freeswitch: Open-source telephony platform 961 o Ekiga: Open-source VoIP client 963 o CHNC: Radio station using CELT for its studio-transmitter link 965 7. Informative References 967 [valin2010] 968 Valin, J., Terriberry, T., Montgomery, C., and G. Maxwell, 969 "A High-Quality Speech and Audio Codec With Less Than 10 970 ms delay", 2010. 972 [valin2009] 973 Valin, J., Terriberry, T., and G. Maxwell, "A High-Quality 974 Speech and Audio Codec With Less Than 10 ms delay", 2010. 976 [Wustenhagen2010] 977 Wuestenhagen, U., Feiten, B., Kroll, J., Raake, A., and M. 978 Waeltermann, "Evaluation of Super-Wideband Speech and 979 Audio Codecs", 2010. 981 [Ramo2010] 982 Ramo, A. and H. Toukomaa, "Voice Quality Evaluation of 983 Recent Open Source Codecs", 2010. 985 [Ramo2011] 986 Ramo, A. and H. Toukomaa, "Voice Quality Characterization 987 of IETF Opus Codec", 2011. 989 [Maastricht-78] 990 Valin, J. and K. Vos, "Codec Prototype", 2010. 992 [Prague-80] 993 Chen, R., Terriberry, T., Maxwell, G., Skoglund, J., and 994 H. Nguyet, "Testing results", 2011. 996 [SILK-Dynastat] 997 Skype, "SILK Datasheet", 2009. 999 [ha-test] Dyakonov, "Results of the public multiformat listening 1000 test @ 64 kbps", 2011. 1002 [Skoglund2011] 1003 Skoglund, "Listening tests of Opus at Google", 1004 September 2011. 1006 [Hoene2011] 1007 Hoene and Hyder, "MUSHRA Listening Tests - Focusing on 1008 Stereo Voice Coding", August 2011. 1010 Authors' Addresses 1012 Christian Hoene (editor) 1013 Symonics GmbH 1014 Sand 13 1015 Tuebingen, 72076 1016 Germany 1018 Email: christian.hoene@symonics.com 1020 Jean-Marc Valin 1021 Mozilla Corporation 1022 650 Castro Street 1023 Mountain View, CA 94041 1024 USA 1026 Phone: +1 650 903-0800 1027 Email: jmvalin@jmvalin.ca 1029 Koen Vos 1030 Skype Technologies S.A. 1031 Stadsgarden 6 1032 Stockholm, 11645 1033 Sweden 1035 Email: koen.vos@skype.net 1037 Jan Skoglund 1038 Google 1040 Email: jks@google.com