idnits 2.17.1 draft-valin-codec-requirements-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 26, 2009) is 5293 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group JM. Valin 3 Internet-Draft Octasic Inc. 4 Intended status: Standards Track S. Borilin 5 Expires: April 29, 2010 SPIRIT DSP 6 K. Vos 7 Skype 8 C. Montgomery 9 Xiph.Org Foundation 10 R. Chen 11 Broadcom Corporation 12 October 26, 2009 14 Codec Requirements 15 draft-valin-codec-requirements-02 17 Status of this Memo 19 This Internet-Draft is submitted to IETF in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on April 29, 2010. 40 Copyright Notice 42 Copyright (c) 2009 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents in effect on the date of 47 publication of this document (http://trustee.ietf.org/license-info). 48 Please review these documents carefully, as they describe your rights 49 and restrictions with respect to this document. 51 Abstract 53 This document provides specific requirements for Internet audio 54 codecs. These requirements address quality, sampling rate, bit-rate, 55 and packet loss robustness, as well as other desirable properties. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 2.1. Point to point calls . . . . . . . . . . . . . . . . . . . 5 62 2.2. Conferencing . . . . . . . . . . . . . . . . . . . . . . . 5 63 2.3. Telepresence . . . . . . . . . . . . . . . . . . . . . . . 6 64 2.4. Teleoperation . . . . . . . . . . . . . . . . . . . . . . 6 65 2.5. In-game voice chat . . . . . . . . . . . . . . . . . . . . 7 66 2.6. Live distributed music performances / Internet music 67 lessons . . . . . . . . . . . . . . . . . . . . . . . . . 7 68 2.7. Other applications . . . . . . . . . . . . . . . . . . . . 8 69 3. Constraints Imposed by the Internet on the Codec . . . . . . . 9 70 3.1. Security . . . . . . . . . . . . . . . . . . . . . . . . . 10 71 4. Detailed Basic Requirements . . . . . . . . . . . . . . . . . 11 72 4.1. Operating space . . . . . . . . . . . . . . . . . . . . . 11 73 4.2. Quality and bit-rate . . . . . . . . . . . . . . . . . . . 11 74 4.3. Packet loss robustness . . . . . . . . . . . . . . . . . . 12 75 4.4. Computational resources . . . . . . . . . . . . . . . . . 12 76 5. Additional considerations . . . . . . . . . . . . . . . . . . 15 77 5.1. Low-complexity audio mixing . . . . . . . . . . . . . . . 15 78 5.2. Encoder side potential for improvement . . . . . . . . . . 15 79 5.3. Layered bit-stream . . . . . . . . . . . . . . . . . . . . 15 80 5.4. Partial redundancy . . . . . . . . . . . . . . . . . . . . 16 81 5.5. Bit error robustness . . . . . . . . . . . . . . . . . . . 16 82 5.6. Partial redundancy . . . . . . . . . . . . . . . . . . . . 16 83 5.7. Time stretching and shortening . . . . . . . . . . . . . . 16 84 5.8. Legacy compatibility . . . . . . . . . . . . . . . . . . . 17 85 6. Security Considerations . . . . . . . . . . . . . . . . . . . 18 86 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 87 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 88 9. Informative References . . . . . . . . . . . . . . . . . . . . 21 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 91 1. Introduction 93 This documents provides requirements for audio codecs designed 94 specifically for use over the Internet. The requirements attempt to 95 address the needs of the most common Internet interactive audio 96 transmission applications and to ensure good quality when operating 97 in conditions that are typical for the Internet. These requirements 98 address the quality, sampling rate, delay, bit-rate, and packet loss 99 robustness. Other desirable codec properties are considered as well. 101 Throughout this document, we will use the following conventions when 102 referring to the sampling rate of a signal: 104 Narrowband: 8 kHz sampling rate 106 Wideband: 16 kHz sampling rate 108 Super-wideband: 32 kHz sampling rate 110 Full-band: 44.1/48 kHz and above 112 Codec bit-rates in bits per second (b/s) will be considered without 113 counting any overhead (IP/UDP/RTP headers, padding, ...). The codec 114 delay is the total algorithmic delay when one adds the codec frame 115 size to the "look-ahead". It is thus the minimum theoretically 116 achievable end-to-end delay of a transmission system that uses the 117 codec. 119 2. Applications 121 The following applications should be considered for Internet audio 122 codecs, along with their requirements: 124 o Point to point calls 126 o Conferencing 128 o Telepresence 130 o Teleoperation 132 o In-game voice chat 134 o Live distributed music performances / Internet music lessons 136 o Other applications 138 2.1. Point to point calls 140 Point to point calls are voice over IP (VoIP) calls from two 141 "standard" (fixed or mobile) phones, and implemented in hardware or 142 software. For these applications, a wideband codec is required, 143 along with narrowband support for compatibility with legacy telephony 144 equipment (PSTN). It is expected for the range of useful bit-rates 145 to be 12 - 32 kb/s for wideband speech and 8 - 16 kb/s for narrowband 146 speech. The codec delay must be less than 40 ms, but no more than 25 147 ms is desirable. Support for encoding music is not required, but it 148 is desirable for the codecs not to make background (on-hold) music 149 excessively unpleasant to hear. Also, the codec should be robust to 150 noise (produce intelligible speech and no annoying artifacts) even at 151 lower bit-rates. 153 2.2. Conferencing 155 Conferencing applications (which support multi-party calls) have 156 additional requirements on top of the requirements for point-to-point 157 calls. Conferencing systems often have higher-fidelity audio 158 equipment and have greater network bandwidth available -- especially 159 when video transmission is involved. For that reason, support for 160 super-wideband audio becomes important, with useful bit-rates in the 161 32 - 64 kb/s range. The ability to vary the bit-rate according to 162 the "difficulty" of the audio signal (VBR) is a desirable feature for 163 codecs. This not only saves bandwidth "on average", but it can also 164 help conference servers make more efficient use of the available 165 bandwidth by using more bandwidth for important audio streams and 166 less bandwidth for less important ones (e.g. background noise). 168 Conferencing end-points often operate in hands-free conditions, which 169 creates acoustic echo problems. For this reason lower delay is 170 important, as it reduces the quality degradation due to any residual 171 echo after acoustic echo cancellation (AEC). For this reason, the 172 codec delay must be less than 30 ms for this application. An 173 optional low-delay mode with less than 10 ms delay is desirable, but 174 not required. 176 Most conferencing systems operate with a bridge that mixes some (or 177 all) of the audio streams and sends them back to all the 178 participants. In that case, it is important that the codec not 179 produce annoying artefacts when two voices are present at the same 180 time. Also, this mixing operation should be as easy as possible to 181 perform. To make it easier to determine which streams have to be 182 mixed (and which are noise/silence), it must be possible to measure 183 (or estimate) the voice activity in a packet without having to fully 184 decode the packet (saving most of the complexity when the packet need 185 not be decoded). Also, the ability to save on the computational 186 complexity when mixing is also desirable, but not required. For 187 example, a transform codec may make it possible to mix the streams in 188 the transform domain, without having to go back to time-domain. Low- 189 complexity up-sampling and down-sampling within the codec is also a 190 desirable feature when mixing streams with different sampling rates. 192 2.3. Telepresence 194 Most telepresence applications can be considered to be essentially 195 very high-quality video-conferencing environments, so all of the 196 conferencing requirements also apply to telepresence. In addition, 197 telepresence applications require super-wideband and full-band audio 198 capability with useful bit-rates in the 32 - 80 kb/s range. While 199 voice is still the most important signal to be encoded, it must be 200 possible to obtain good quality (even if not transparent) music. 202 Most telepresence applications require more than one audio channel, 203 so support for stereo and multi-channel is important. While this can 204 always be accomplished by encoding multiple single-channel streams, 205 it is preferable to take advantage of the redundancy that exists 206 between channels. 208 2.4. Teleoperation 210 Teleoperation applications are similar to telepresence, with the 211 exception that they involve remote physical interactions. For 212 example, the user may be controlling a robot while receiving real- 213 time audio feedback from that robot. For these applications, the 214 delay has to be less than 10 ms. The other requirements of 215 telepresence (quality, bit-rate, multi-channel) apply to 216 teleoperation as well. The only exception is that mixing is not an 217 important issue for teleoperation. 219 2.5. In-game voice chat 221 An increasing number of computer/console games make use of VoIP to 222 allow players to communicate in real-time. The requirements for 223 gaming are similar to those of conferencing, with the main difference 224 being that narrowband compatibility is not necessary. While for most 225 applications a codec delay up to 30 ms is acceptable, a low-delay (< 226 10 ms) option is highly desirable, especially for games with rapid 227 interactions. The ability to use VBR (with a maximum allowed 228 bitrate) is also highly desirable because it can significantly reduce 229 the bandwidth requirement for a game server. 231 2.6. Live distributed music performances / Internet music lessons 233 Live music over the Internet requires extremely low end-to-end delay 234 and is one of the most demanding application for interactive audio 235 transmission. It has been observed that for most scenarios, total 236 end-to-end delays up to 25 ms could be tolerated by musicians, with 237 the absolute limit (where none of the scenarios are possible) being 238 around 50 ms [carot09]. In order to achieve this low delay on the 239 Internet -- either in the same city or a nearby city -- the network 240 propagation time must be taken into account. When also subtracting 241 the delay of the audio buffer, jitter buffer, and acoustic path, that 242 leaves around 2 ms to 10 ms for the total delay of the codec. 243 Considering the speed of light in fiber, every 1 ms reduction in the 244 codec delay increases the range over which synchronization is 245 possible by approximately 200 km. 247 Acoustic echo is expected to be an even more important issue for 248 network music than it is in conferencing, especially considering that 249 the music quality requirements essentially forbid the use of a 250 "nonlinear processor" (NLP) with the AEC. This is another reason why 251 very low delay is essential. 253 Considering that the application is music, the full audio bandwidth 254 (44.1 or 48 kHz sampling rate) must be transmitted with a bit-rate 255 that is sufficient to provide near-transparent to transparent 256 quality. With the current audio coding technology, this corresponds 257 to approximately 64 kb/s to 128 kb/s per channel. As for 258 telepresence, support for two or more channels is often desired, so 259 it would be useful for a codec to be able to take advantage of the 260 redundancy that is often present between audio channels. 262 2.7. Other applications 264 The above list is by no means a complete list of all applications 265 involving interactive audio transmission on the Internet. However, 266 it is believed that meeting the needs of all these different 267 applications should be sufficient to ensure that most applications 268 not listed will also be met. 270 3. Constraints Imposed by the Internet on the Codec 272 Packet losses are inevitable on the Internet and dealing with those 273 is one of the most fundamental requirements for an Internet audio 274 codec. While any audio codec can be combined with a good packet loss 275 concealment (PLC) algorithm, the important aspect is what happens on 276 the first packets received _after_ the loss. More specifically, this 277 means that: 279 o it should be possible to interpret the contents of any received 280 packet, irrespective of previous losses as specified in BCP 36 281 [PAYLOADS]; and 283 o the decoder should re-synchronize as quickly as possible (i.e. the 284 output should quickly converge to the output that would have been 285 obtained if no-loss had occurred). 287 The constraint of being able to decode any packet implies the 288 following considerations for an audio codec: 290 o The size of a compressed frame must be kept smaller than the MTU 291 to avoid fragmentation; 293 o The interpretation of any parameter encoded in the bit-stream must 294 not depend on information contained in other packets. For 295 example, it is not acceptable for a codec to allow signaling a 296 mode change in one packet and assume that subsequent frames will 297 be decoded according to that mode. 299 Although the interpretation of parameters cannot depend on other 300 packets, it is still reasonable to use some amount of prediction 301 across frames, provided that the predictors can resynchronize quickly 302 in case of a lost packet. In this case, it is important to use the 303 best compromise between the gain in coding efficiency and the loss in 304 packet loss robustness due to the use of inter-frame prediction. It 305 is a desirable property for the codecs to allow some real-time 306 control of that trade-off so that it can take advantage of more 307 prediction when the loss rate is small, while being more robust to 308 losses when the loss rate is high. 310 To improve the robustness to packet loss, it would be desirable for 311 the codec to allow an adaptive (data- and network-dependent) amount 312 of side information to help improve audio quality when losses occur. 313 For example, this side information may include the retransmission of 314 certain parameters encoded in the previous frame(s). 316 Another important property of the Internet is that it is mostly a 317 best-effort network, with no guaranteed bandwidth. This means that 318 the codecs have to be able to vary their output bit-rate dynamically 319 (in real-time), without requiring an out-of-band signaling mechanism, 320 and without causing audible artifacts at the bit-rate change 321 boundaries. Additional desirable features are: 323 o Having the possibility to use smooth bit-rate changes with one 324 byte/frame resolution; 326 o Making it possible for a codec to adapt its bit-rate based on the 327 source signal being encoded (source-controlled VBR) to maximize 328 the quality for a certain _average_ bit-rate. 330 Because the Internet transmits data in bytes, codecs should produce 331 compressed data in integer numbers of bytes. In general, the codec 332 design should take into consideration explicit congestion 333 notification (ECN) and may include features that would improve the 334 quality of an ECN implementation. 336 The IETF has defined a set of application-layer protocols to be used 337 for transmitting real-time transport of multimedia data, including 338 voice. It is thus important for the resulting codecs to be easy to 339 use with these protocols. For example, it must be possible to create 340 an [RTP] payload format that conforms to BCP 36 [PAYLOADS]. If any 341 codec parameters need to be negotiated between end-points, the 342 negotiation should be as easy as possible to carry over SIP/SDP or 343 alternatively over XMPP/Jingle. 345 3.1. Security 347 Just like for any protocol to be used over the Internet, security is 348 a very important aspect to consider. This goes beyond the obvious 349 considerations of preventing buffer overflows and similar attacks 350 that can lead to denial-of-service or remote code execution. One 351 very important security aspect is to make sure that the decoders have 352 a bounded and reasonable worst-case complexity. This prevents an 353 attacker from causing a DoS by sending packets that are specially 354 crafted to take a very long (or infinite) time to decode. 356 A more subtle aspect is the information leak that can occur when the 357 codec is used over an encrypted channel (e.g. [SRTP]). For example, 358 it was suggested [wright08] that use of source-controlled VBR may 359 reveal some information about a conversation through the size of the 360 compressed packets. This would have to be investigated when 361 standardizing a codec. 363 4. Detailed Basic Requirements 365 This section summarizes all the constraints imposed by the target 366 applications and by the Internet into a set of actual requirements 367 for codec development. 369 4.1. Operating space 371 The operating space for the target applications can be divided in 372 terms of delay: most applications require a "medium delay" (20-30 373 ms), while a few require a "very low delay" (< 10 ms). It makes 374 sense to divide the space based on delay because lowering the delay 375 has a cost in terms of quality vs bit-rate. 377 For medium delay, the resulting codecs must be able to efficiently 378 operate within the following range of bit-rates (per channel): 380 o Narrowband: 8 kb/s to 16 kb/s 382 o Wideband: 12 to 32 kb/s 384 o Super-wideband: 24 to 64 kb/s 386 o Full-band: 32 to 80 kb/s 388 Obviously, a lower-delay codec that can operate in the above range is 389 also acceptable. 391 For very low delay, the resulting codecs will need to operate within 392 the following range of bit-rates (per channel): 394 o Super-wideband: 32 to 80 kb/s 396 o Full-band: 48 to 128 kb/s 398 o (Narrowband and wideband not required) 400 4.2. Quality and bit-rate 402 The quality of a codec is directly linked to the bit-rate, so these 403 two must be considered jointly. When comparing the bit-rate codecs, 404 the overhead of IP/UDP/RTP headers should not be considered, but any 405 additional bits required in the RTP payload format after the header 406 (e.g. required signalling) should be considered. In terms of quality 407 vs bit-rate, the codecs to be developed must be better than the 408 currently available codecs that satisfy the IPR requirements in the 409 guidelines document, which are: 411 o For narrowband: Speex (NB), GSM-FR, and iLBC(*) 413 o For wideband: Speex (WB), G.722, G.722.1(*) 415 o For super-wideband: Speex (UWB), G.722.1C(*) 417 The codecs marked with (*) do not meet all the licensing guidelines, 418 but the codecs to be developed should still not perform significantly 419 worse. Quality should be measured for multiple languages, including 420 tonal languages. The case of multiple simultaneous voices (as 421 sometimes happens in conferencing) should be evaluated as well. 423 The comparison with the above codecs assumes that the codecs being 424 compared have similar delay characteristics. The bit-rate required 425 for a certain level of quality may be higher than the referenced 426 codecs in cases where a much lower delay is required. In that case, 427 the increase in bit-rate must be less than the ratio between the 428 delays. 430 It is desirable for the codecs to support source-controlled variable 431 bit-rate (VBR) to take advantage from the fact that different inputs 432 require a different bitrate to achieve the same quality. However, it 433 should still be possible to use the codecs at truely constant bit- 434 rate to ensure that no information leak is possible when using an 435 encrypted channel. 437 4.3. Packet loss robustness 439 Robustness to packet loss is a very important aspect of any codec to 440 be used on the Internet. Codecs must maintain acceptable quality at 441 loss rates up to 5% and maintain good intelligibility up to 15% loss 442 rate. At any sampling rate, bit-rate, and packet loss rate, the 443 quality must be no less than the quality obtained with the Speex 444 codec or the GSM-FR codec in the same conditions. The actual packet 445 loss "patterns" to be used in testing must be obtained from real 446 packet loss traces collected on the Internet, rather than from loss 447 models. These traces should be representative of the typical 448 environments in which the applications of Section 2 operate. For 449 example, traces related to VoIP calls should consider the loss 450 patterns observed for typical home broadband and corporate 451 connections. 453 4.4. Computational resources 455 The resulting codecs should be implementable on a wide range of 456 devices, so there should be a fixed-point implementation or at least 457 assurance that a reasonable fixed-point is possible. The 458 computational resources figures listed below are meant to be upper 459 bounds. Even below these bounds, resources should still be 460 minimized. Any proposed increase in computational resources 461 consumption (e.g. to increase quality) should be carefully evaluated 462 even if the resulting resource consumption is below the upper bound. 463 Having variable complexity would be useful (but not required) in 464 achieving that goal as it would allow trading quality/bit-rate for 465 lower complexity. 467 The computational requirements for real-time encoding and decoding 468 are: 470 o Narrowband should require little CPU resources and be 471 implementable on most DSPs with a 16x16 multiplier (e.g. < 40 472 MIPS). 474 o Wideband can have a bit more complexity than narrowband, but 475 should still be implementable on a cheap DSP (e.g. < 80 MIPS) 477 o Super-wideband/full-band may require higher complexity, but should 478 be implementable on higher-end DSP (e.g. < 200 MIPS), and if 479 possible also on cheaper DSPs as well. 481 The MIPS values are approximate clock frequencies required for real- 482 time encoding+decoding on a DSP capable of single-cycle MAC 483 operations (16x16 multiplication accumulated into 32 bits). Similar 484 computational requirements apply to floating-point processors. For 485 example Narrowband encoding and decoding should be possible using 40 486 MHz on a modern x86 CPU (2% of a 2 GHz CPU). For applications that 487 require mixing (e.g. conferencing), it must be possible to estimate 488 the energy of the decoded signal with less than 10% of the complexity 489 figures listed above. 491 In terms of memory use, the codec context/state size required should 492 be no more than 2*R*C bytes in floating-point, where R is the 493 sampling rate and C is the number of channels. For fixed-point, that 494 size should be less than R*C. The scratch space required should also 495 be less than 2*R*C bytes for floating point or less than R*C bytes 496 for fixed-point. The combined codec size and data ROM should be 497 small enough not to cause significant implementation problems. Code 498 size is more difficult to evaluate since it is highly dependent on 499 the architecture, but when implemented on an x86 CPU, the codec 500 should require no more than 100 kB for instructions and constant 501 data. 503 It is the intent to maximize the range of devices on which a codec 504 can be implemented. For this reasons, the reference implementation 505 must not depend on "special hardware features" to be present in order 506 to meet the complexity requirement. However, it might be desirable 507 to take advantage of such hardware, (e.g., hardware accelerators for 508 operations like FFTs and convolutions). A codec should also minimize 509 the use of saturating arithmetic so as to be implementable on 510 architectures that do not provide hardware saturation (e.g. ARMv4). 512 5. Additional considerations 514 There are additional features or characteristics that may be 515 desirable under some circumstances, but should not be part of the 516 strict requirements. The benefit of meeting these considerations 517 should be weighted against the associated cost. 519 5.1. Low-complexity audio mixing 521 In many applications that require a mixing server (e.g. conferencing, 522 games), it is important to minimize the computational cost of the 523 mixing. As much as possible, it should be possible to perform the 524 mixing with fewer computations than it would take to decode all the 525 streams, mix them, and re-encode the result. Properties that reduce 526 the complexity of the mixing process include: 528 o the ability to derive sufficient parameters, such as loudness 529 and/or spectral envelope, for estimating voice activity of a 530 compressed frame without fully decoding that frame; 532 o the ability to mix the streams in an intermediate representation 533 (e.g. transform domain), rather than having to fully decode the 534 signals before the mixing; 536 o the use of bit-stream layers (Section 5.3) by aggregating a small 537 number of active streams at lower quality. 539 For conferencing applications, the total complexity of the decoding, 540 VAD and mixing should be considered when evaluating proposals. 542 5.2. Encoder side potential for improvement 544 In many codecs, it is possible to improve the quality by improving 545 the encoder without breaking compatibility (i.e. without changing the 546 decoder). Potential for improvement varies from one codec to 547 another. It is generally low for PCM or ADPCM codecs and higher for 548 perceptual transform codecs. All things being equal, being able to 549 improve a codec after the bit-stream is a desirable property. 550 However, this should not be done at the expense of quality in the 551 reference encoder. 553 5.3. Layered bit-stream 555 A layered codec makes it possible to transmit only a certain subset 556 of the bits and still obtain a valid bit-stream with a quality that 557 is equivalent to the quality that would be obtained from encoding at 558 the corresponding rate. While this is not a necessary feature for 559 most applications, it can be desirable for cases where a "mixing 560 server" needs to handle a large number of streams with limited 561 computational resources. 563 5.4. Partial redundancy 565 One possible way of increasing robustness to packet loss is to 566 include partial redundancy within packets. This can be achieved 567 either by including the base layer of the previous frame (for a 568 layered codec) or by transmitting other parameters from the previous 569 frame(s) to assist the PLC algorithm in case of loss. The ability to 570 include partial redundancy for high-loss scenarios is desirable, 571 provided that the feature can be dynamically turned on or off (so 572 that no bandwidth is wasted in case of loss-free transmission). 574 5.5. Bit error robustness 576 The vast majority of Internet-based applications do not need to be 577 robust to bit errors because packets either arrive unaltered, or do 578 not arrive at all. Considering that, the emphasis should be on 579 packet loss robustness and packet loss concealment. That being said, 580 it is often the case that extra robustness to bit errors can be 581 achieved at no cost at all (i.e. no increase in size, complexity or 582 bit-rate, no decrease in quality or packet loss robustness, ...). In 583 those cases then it is useful to make a change that increases the 584 robustness to bit errors. This can be useful for applications that 585 use UDP Lite transmission (e.g. over a wireless LAN). Robustness to 586 packet loss should *never* be sacrificed to achieve higher bit error 587 robustness. 589 5.6. Partial redundancy 591 One possible way of increasing robustness to packet loss is to 592 include partial redundancy within packets. This can be achieved 593 either by including the base layer of the previous frame (for a 594 layered codec) or by transmitting other parameters from the previous 595 frame(s) to assist the PLC algorithm in case of loss. The ability to 596 include partial redundancy for high-loss scenarios is desirable, 597 provided that the feature can be dynamically turned on or off (so 598 that no bandwidth is wasted in case of loss-free transmission). 600 5.7. Time stretching and shortening 602 When adaptive jitter buffers are used it is often necessary to 603 stretch or shorten the audio signal to allow changes in buffering. 604 While this operation can be performed directly on the decoder's 605 output, it is often more computationally efficient to stretch or 606 shorten the signal directly within the decoder. It is desirable for 607 the reference implementation to provide a time stretching/shortening 608 implementation, although it should not be normative. 610 5.8. Legacy compatibility 612 In order to create the best possible codec for the Internet, there is 613 no requirement for compatibility with legacy Internet codecs. 615 6. Security Considerations 617 The codec requirements themselves do not have security 618 considerations. However, codec security issues are discussed in 619 Section 3.1. 621 7. IANA Considerations 623 This document has no actions for IANA. 625 8. Acknowledgments 627 We would like to thank all the other people who contributed directly 628 or indirectly to this document, including Jason Fischl, Gregory 629 Maxwell, Alan Duric, Jonathan Christensen, Julian Spittka, and Henry 630 Sinnreich. We also like to thank Cullen Jennings and Gregory 631 Lebovitz for their advice. 633 9. Informative References 635 [carot09] Carot, A., Werner, C., and T. Fischinger, "Towards a 636 Comprehensive Cognitive Analysis of Delay-Influenced 637 Rhythmical Interaction", 2009. 639 [PAYLOADS] 640 Handley, M. and C. Perkins, "Guidelines for Writers of RTP 641 Payload Format Specifications", RFC 2736, BCP 36. 643 [RTP] Schulzrinne, H., Casner, S., Frederick, R., and V. 644 Jacobson, "RTP: A Transport Protocol for real-time 645 applications", RFC 3550. 647 [SRTP] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 648 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 649 RFC 3711, March 2004. 651 [wright08] 652 Wright, C., Ballard, L., Coull, S., Monrose, F., and G. 653 Masson, "Spot me if you can: Uncovering spoken phrases in 654 encrypted VoIP conversations", 2008. 656 Authors' Addresses 658 Jean-Marc Valin 659 Octasic Inc. 660 4101, Molson Street 661 Montreal, Quebec 662 Canada 664 Email: jean-marc.valin@octasic.com 666 Slava Borilin 667 SPIRIT DSP 669 Email: borilin@spiritdsp.net 671 Koen Vos 672 Skype 674 Email: koen.vos@skype.net 676 Christopher Montgomery 677 Xiph.Org Foundation 679 Email: xiphmont@xiph.org 681 Raymond (Juin-Hwey) Chen 682 Broadcom Corporation 684 Email: rchen@broadcom.com