idnits 2.17.1 draft-spittka-payload-rtp-opus-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 4, 2011) is 4679 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'Opus' is mentioned on line 837, but not defined == Missing Reference: 'SILK' is mentioned on line 835, but not defined == Missing Reference: 'CELT' is mentioned on line 836, but not defined ** Obsolete normative reference: RFC 2326 (Obsoleted by RFC 7826) ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Spittka 3 Internet-Draft K. Vos 4 Intended status: Informational Skype Technologies S.A. 5 Expires: January 5, 2012 JM. Valin 6 Octasic Inc. 7 July 4, 2011 9 RTP Payload Format and File Storage Format for Opus Speech and Audio 10 Codec 11 draft-spittka-payload-rtp-opus-00 13 Abstract 15 This document defines the Real-time Transport Protocol (RTP) payload 16 format and file storage format for packetization of Opus encoded 17 speech and audio data that is essential to integrate the codec in the 18 most compatible way. Further, media type registrations are described 19 for the RTP payload format and the file storage format. 21 Status of this Memo 23 This Internet-Draft is submitted to IETF in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt. 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 This Internet-Draft will expire on January 5, 2012. 44 Copyright Notice 46 Copyright (c) 2011 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 2. Conventions, Definitions and Acronyms used in this document . 4 63 3. Opus Codec . . . . . . . . . . . . . . . . . . . . . . . . . . 5 64 3.1. Modes . . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 3.1.1. Audio Mode . . . . . . . . . . . . . . . . . . . . . . 5 66 3.1.2. Audio Mode . . . . . . . . . . . . . . . . . . . . . . 6 67 3.2. Network Bandwidth . . . . . . . . . . . . . . . . . . . . 6 68 3.2.1. Variable versus Constant Bit Rate . . . . . . . . . . 6 69 3.2.2. Discontinuous Transmission (DTX) . . . . . . . . . . . 7 70 3.3. Complexity . . . . . . . . . . . . . . . . . . . . . . . . 7 71 3.4. Forward Error Correction (FEC) . . . . . . . . . . . . . . 7 72 3.5. Stereo Operation . . . . . . . . . . . . . . . . . . . . . 8 73 4. Opus RTP Payload Format . . . . . . . . . . . . . . . . . . . 9 74 4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . . 9 75 4.2. Payload Structure . . . . . . . . . . . . . . . . . . . . 10 76 5. Opus Storage Format . . . . . . . . . . . . . . . . . . . . . 12 77 5.1. Storage Header Structure . . . . . . . . . . . . . . . . . 12 78 5.2. Storage Block Structure . . . . . . . . . . . . . . . . . 12 79 6. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 14 80 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 81 7.1. Opus Media Type Registration . . . . . . . . . . . . . . . 15 82 7.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 18 83 7.2.1. Offer-Answer Model Considerations for Opus . . . . . . 19 84 7.2.2. Declarative SDP Considerations for Opus . . . . . . . 20 85 8. Security Considerations . . . . . . . . . . . . . . . . . . . 22 86 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 87 10. Normative References . . . . . . . . . . . . . . . . . . . . . 24 88 A. Informational References . . . . . . . . . . . . . . . . . . . 25 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26 91 1. Introduction 93 The Opus codec is a speech and audio codec developed within the IETF 94 Internet Wideband Audio Codec working group [codec]. The codec has a 95 very low algorithmic delay and is is highly scalable in terms of 96 audio bandwidth, network bit rate, and complexity. Further, it 97 provides different modes to efficiently encode speech signals as well 98 as music signals, thus, making it the codec of choice for various 99 applications using the Internet or similar networks. 101 This document defines the Real-time Transport Protocol (RTP) 102 [RFC3550] payload format and file storage format for packetization of 103 Opus encoded speech and audio data that is essential to integrate the 104 Opus codec in the most compatible way. Further, media type 105 registrations are described for the RTP payload format and the file 106 storage format. More information on the Opus codec can be obtained 107 from the following IETF draft [Opus]. 109 2. Conventions, Definitions and Acronyms used in this document 111 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 112 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 113 document are to be interpreted as described in [RFC2119]. 115 CPU: Central Processing Unit 116 IP: Internet Protocol 117 PSTN: Public Switched Telephone Network 118 samples: Speech or audio samples 119 SDP: Session Description Protocol 121 3. Opus Codec 123 The Opus speech and audio codec has been developed to encode speech 124 signals as well as audio signals. Two different modes, a voice mode 125 or an audio mode, may be chosen to allow the most efficient coding 126 dependent on the type of input signal, the sampling frequency of the 127 input signal, and the specific application. 129 The voice mode allows to efficiently encode voice signals at lower 130 bit rates while the audio mode is optimized for audio signals at 131 medium and higher bit rates. 133 The Opus speech and audio codec is highly scalable in terms of audio 134 bandwidth, network bit rate, and complexity. Further, Opus allows to 135 transmit stereo signals. 137 The Opus speech and audio codec is based on the SILK codec [SILK] and 138 the CELT codec [CELT]. For more detailed information on how Opus 139 operates, also refer to [Opus]. 141 3.1. Modes 143 Opus supports five different audio bandwidths, 8000, 12000, 16000, 144 24000, and 48000 Hz sampling frequency, for the voice mode and four 145 different audio bandwidths, 8000, 16000, 24000, and 48000 Hz sampling 146 frequency, for the audio mode. 148 3.1.1. Audio Mode 150 For low bit rate applications transmitting mostly speech signals the 151 voice mode of Opus SHOULD be used. The voice mode allows to encode 152 voice signals at 8000, 12000, 16000, 24000, and 48000 Hz sampling 153 frequency. 155 A sampling rate of 8000 Hz SHOULD only be used to interface to PSTN 156 networks or on low end devices that do not support greater than 8000 157 Hz sampling frequency. A sampling rate of 12000 Hz SHOULD be used 158 for lower end devices that do not support greater than 12000 Hz 159 sampling frequency or are under severe network bandwidth constrains 160 (e.g. wireless devices). A sampling rate of 16000 Hz SHOULD be used 161 for all-IP platforms that do not support greater than 16000 Hz 162 sampling frequency. Higher sampling rates are recommended for all 163 devices that support those high sampling rates and desire full- 164 bandwidth speech at medium bit rates. 166 3.1.2. Audio Mode 168 For applications desiring very low delay speech transmission as well 169 as music transmission in trade off to a higher bit rate, the audio 170 mode SHOULD be used. This mode supports audio sampling rates of 171 8000, 16000, 24000, and 48000 Hz. 173 3.2. Network Bandwidth 175 The network bit rate is adaptive within the range specified in 176 Table 1 for corresponding modes and audio sampling rates. The 177 average target network bit rate can be defined and modified in real- 178 time while the actual bit rate will be dependent on the settings of 179 Opus and the input signal and may change over time. 181 +-------+---------+-----------+ 182 | Mode | fs (Hz) | BR (kbps) | 183 +-------+---------+-----------+ 184 | voice | 8000 | 6 - 20 | 185 | | | | 186 | voice | 12000 | 7 - 25 | 187 | | | | 188 | voice | 16000 | 8 - 30 | 189 | | | | 190 | voice | 24000 | 18 - 28 | 191 | | | | 192 | voice | 48000 | 24 - 32 | 193 | | | | 194 | audio | 8000 | 20 - 28 | 195 | | | | 196 | audio | 16000 | 24 - 32 | 197 | | | | 198 | audio | 24000 | 28 - 40 | 199 | | | | 200 | audio | 48000 | 32 - 128 | 201 +-------+---------+-----------+ 203 Mode specifies the Opus mode of operation; fs specifies the audio 204 sampling frequency in Hertz (Hz); BR specifies the network bit rate 205 range in kilobits per second (kbps). 207 Table 1 209 3.2.1. Variable versus Constant Bit Rate 211 The voice mode will always use a variable bit rate at audio sampling 212 rates of 8000, 12000, and 16000 Hz. The average target bit rate can 213 be adjusted at any point in time. To avoid congestion of the 214 connection the average target bit rate SHOULD be adjusted to the 215 available network bandwidth. If no target bit rate is specified the 216 average bit rate may go up to the highest bit rate specified in 217 Table 1. 219 In voice mode at audio sampling rates higher than 16000 Hz, i.e. 220 24000, and 48000 Hz, and audio mode Opus can be operated in both 221 variable and constant bit rate. The target bit rate can be adjusted 222 at any point in time. 224 3.2.2. Discontinuous Transmission (DTX) 226 The Opus codec may, as described in Section 3.2.1, be operated with 227 an adaptive bit rate. In that case, the bit rate will automatically 228 be reduced for certain input signals like periods of silence. During 229 continuous transmission the bit rate will be reduced, when the input 230 signal allows to do so, but the transmission to the receiver itself 231 will never be interrupted. Therefore, the received signal will 232 maintain the same high level of quality over the full duration of a 233 transmission while minimizing the average bit rate over time. 235 In cases where the bit rate of Opus needs to be reduced even further 236 or in cases where only constant bit rate is available, the Opus 237 encoder may be set to use discontinuous transmission (DTX), where 238 parts of the encoded signal that correspond to periods of silence in 239 the input speech or audio signal are not transmitted to the receiver. 241 On the receiving side, the non-transmitted parts will be handled by a 242 frame loss concealment unit in the Opus decoder which generates a 243 comfort noise signal to replace the non transmitted parts of the 244 speech or audio signal. 246 The DTX mode of Opus will have a slightly lower speech or audio 247 quality than the continuous mode. Therefore, it is RECOMMENDED to 248 use Opus in the continuous mode unless restraints on network 249 bandwidth are severe. The DTX mode can be engaged for operation in 250 both adaptive or constant bit rate. 252 3.3. Complexity 254 Complexity can be scaled to optimize for CPU resources in real-time, 255 mostly in trade-off to network bit rate. Also, different modes of 256 Opus have different complexity. 258 3.4. Forward Error Correction (FEC) 260 The voice mode of Opus allows for "in-band" forward error correction 261 (FEC) data to be embedded into the bit stream of Opus. This FEC 262 scheme adds redundant information about the previous packet (n-1) to 263 the current output packet n. For each frame, the encoder decides 264 whether to use FEC based on (1) an externally-provided estimate of 265 the channel's packet loss rate; (2) an externally-provided estimate 266 of the channel's capacity; (3) the sensitivity of the audio or speech 267 signal to packet loss; (4) whether the receiving decoder has 268 indicated it can take advantage of "in-band" FEC information. The 269 decision to send "in-band" FEC information is entirely controlled by 270 the encoder and therefore no special precautions for the payload or 271 storage format have to be taken. 273 On the receiving side, the decoder can take advantage of this 274 additional information when, in case of a packet loss, the next 275 packet is available. In order to use the FEC data, the jitter buffer 276 needs to provide access to payloads with the FEC data. The decoder 277 API function has a flag to indicate that a FEC frame rather than a 278 regular frame should be decoded. If no FEC data is available for the 279 current frame, the decoder will consider the frame lost and invokes 280 the frame loss concealment. 282 If the FEC scheme is not implemented on the receiving side, FEC 283 SHOULD NOT be used, as it leads to an inefficient usage of network 284 bandwidth. Decoder support for FEC SHOULD be indicated at the time a 285 session is set up. 287 3.5. Stereo Operation 289 Opus allows for transmission of stereo audio signals. This operation 290 will be signaled in the Opus payload and no special arrangements have 291 to be made in the payload format. Any implementation of the Opus 292 decoder MUST be capable to receive stereo signals. 294 If a decoder can not take advantage of the benefits of a stereo 295 signal this SHOULD be indicated at the time a session is set up. In 296 that case the sending side SHOULD NOT send stereo signals as it leads 297 to an inefficient usage of network bandwidth. 299 4. Opus RTP Payload Format 301 The payload format for Opus consists of the RTP header and Opus 302 payload data. 304 4.1. RTP Header Usage 306 The format of the RTP header is specified in [RFC3550]. The Opus 307 payload format uses the fields of the RTP header consistent with this 308 specification. 310 The payload length of Opus is a multiple number of octets and 311 therefore no padding is required. The payload MAY be padded by an 312 integer number of octets according to [RFC3550]. 314 The marker bit (M) of the RTP header has no function in combination 315 with Opus and MAY be ignored. 317 The RTP payload type for Opus has not been assigned statically and is 318 expected to be assigned dynamically. 320 The receiving side MUST be prepared to receive duplicates of RTP 321 packets. Only one of those payloads MUST be provided to the Opus 322 decoder for decoding and others MUST be discarded. 324 Opus supports 5 different sampling rates which may be adjusted during 325 the duration of a call. The RTP timestamp clock frequency is defined 326 as the highest supported sampling frequency of Opus, i.e. 48000 Hz, 327 for all modes and sampling rates of Opus. The unit for the timestamp 328 is samples. The RTP timestamp corresponds to the sample time of the 329 first encoded sample in the encoded frame. For sampling rates lower 330 than 48000 Hz the number of samples has to be multiplied with a 331 multiplier according to Table 2 to determine the RTP timestamp. 333 +---------+------------+ 334 | fs (Hz) | Multiplier | 335 +---------+------------+ 336 | 8000 | 6 | 337 | | | 338 | 12000 | 4 | 339 | | | 340 | 16000 | 3 | 341 | | | 342 | 24000 | 2 | 343 | | | 344 | 48000 | 1 | 345 +---------+------------+ 347 fs specifies the audio sampling frequency in Hertz (Hz); Multiplier 348 is the value that the number of samples have to be multiplied with to 349 calculate the RTP timestamp. 351 Table 2 353 4.2. Payload Structure 355 The Opus encoder can be set to output encoded frames representing 356 2.5, 5, 10, 20, 40, or 60 ms of speech or audio data. Further, an 357 arbitrary number of frames can be combined into a packet. The 358 maximum packet length is limited to the amount of encoded data 359 representing 120 ms of speech or audio data. The packetization of 360 encoded data is purely done by the Opus encoder and therefore only 361 one packet output from the Opus encoder MUST be used as a payload. 363 Figure 1 shows the structure combined with the RTP header. 365 +----------+--------------+ 366 |RTP Header| Opus Payload | 367 +----------+--------------+ 369 Figure 1: Payload Structure with RTP header 371 Table 3 shows supported frame sizes for different modes and sampling 372 rates of Opus and how the timestamp needs to be incremented for 373 packetization. 375 +------+------------------------+----+----+-----+-----+------+------+ 376 | Mode | fs | 2. | 5 | 10 | 20 | 40 | 60 | 377 | | | 5 | | | | | | 378 +------+------------------------+----+----+-----+-----+------+------+ 379 | ts | all | 12 | 24 | 480 | 960 | 1920 | 2880 | 380 | incr | | 0 | 0 | | | | | 381 | | | | | | | | | 382 | voic | 8000/12000/16000/24000 | | | x | x | x | x | 383 | e | /48000 | | | | | | | 384 | | | | | | | | | 385 | audi | 8000/16000/24000/48000 | x | x | x | x | | | 386 | o | | | | | | | | 387 +------+------------------------+----+----+-----+-----+------+------+ 389 Mode specifies the Opus mode of operation; fs specifies the audio 390 sampling frequency in Hertz (Hz); 2.5, 5, 10, 20, 40, and 60 391 represent the duration of encoded speech or audio data in a packet; 392 ts incr specifies the value the timestamp needs to be incremented for 393 the representing packet size. For multiple frames in a packet these 394 values have to be multiplied with the respective number of frames. 396 Table 3 398 5. Opus Storage Format 400 The Opus storage format allows to store Opus encoded data into e.g. a 401 file or an email attachment. The storage format consists of a header 402 and a series of blocks containing encoded speech or audio frames. 403 The storage format closely mimics the real-time payload format and 404 allows to easily convert packets, e.g. received by a voicemail 405 system, into a storage format and vice versa and therefore allowing 406 maximum flexibility and low overhead. Please note that this storage 407 format is not meant to be a robust storage format, nor the most 408 efficient storage format. For a robust storage format that allows 409 advanced functionality like e.g. seeking, a more advanced container 410 format should be used. 412 Figure 2 shows an example of an Opus encoded file. Note that due to 413 the potentially adaptive bit rate the packet length may be variable 414 and no fixed block size can be defined for blocks containing encoded 415 data. 417 +------------------+ 418 | Header | 419 +-----------+------+ 420 | block 1 | 421 +-----------+--+ 422 | block 2 | 423 +--------------+--+ 424 : ... : 425 +--------------+--+ 426 | block n | 427 +-----------------+ 429 Figure 2: Example of Opus file storage format showing different block 430 lengths due to potentially adaptive bit rate of Opus 432 5.1. Storage Header Structure 434 An Opus storage header contains the following ASCII character string 435 as a magic number: 437 "#!opus\n" (hexadecimal: 0x23 0x21 0x6f 0x70 0x75 0x73 0x0A) 439 5.2. Storage Block Structure 441 Following the storage header, blocks of encoded data are stored in 442 consecutive order in time according to Figure 2. Each block contains 443 a block header followed by a payload according to Figure 3. 445 The block header contains information that, for an RTP-based session, 446 can be derived from the IP and RTP headers: The number of octets 447 contained in the subsequent payload and the RTP timestamp. 449 The number of octets in the payload is represented by 16 bits and the 450 timestamp is specified by 32 bits. For the first block, the 451 timestamp MAY be a random number. For the following blocks, the 452 timestamp MUST be incremented according to the way timestamps are 453 incremented when Opus payloads are transmitted over RTP. 455 0 16 48 456 +-------------------+----------------------------+----------------- 457 | # of octets | Timestamp | Payload 458 +-------------------+----------------------------+----------------- 460 Figure 3: Storage block header structure 462 The payload of each block in Figure 2 represents one packet of Opus 463 encoded data the way as originally encoded by the Opus encoder. 464 Information about frame size representing the duration of encoded 465 speech or audio data, number of encoded frames, stereo information, 466 and DTX is embedded into the payload of Opus and not subject to the 467 storage format. It can be extracted from the payload during decoding 468 of the encoded data. 470 During the usage of DTX no blocks are stored when the channel is 471 inactive. Timestamps MUST be used to reassemble the decoded signal 472 in a time-aligned way. 474 6. Congestion Control 476 The adaptive nature of the Opus codec allows for an efficient 477 congestion control. 479 The voice mode of Opus at audio sampling rates of 8000, 12000, and 480 16000 always runs with a variable bit rate. The average bit rate in 481 that mode is dependent on the input signal and will especially 482 decrease during silent periods. The voice mode at audio sampling 483 rates of 24000 and 48000 Hz and the audio mode may run at a variable 484 or constant bit rate. In either way, the target bit rate of Opus can 485 be adjusted at any point in time and thus allowing for an efficient 486 congestion control. 488 Furthermore, the amount of encoded speech or audio data encoded in a 489 single packet can be used for congestion control since the 490 transmission rate is inversely proportional to these frame sizes. A 491 lower packet transmission rate reduces the amount of header overhead 492 but at the same time increases latency and error sensitivity and 493 should be done with care. 495 It is RECOMMENDED that congestion control is applied during the 496 transmission of Opus encoded data. 498 7. IANA Considerations 500 One media subtype (audio/opus) has been defined and registered as 501 described in the following section. 503 7.1. Opus Media Type Registration 505 Media type registration is done according to [RFC4288] and [RFC4855]. 507 Type name: audio 509 Subtype name: opus 511 Required parameters: 513 rate: RTP timestamp clock rate is incremented with 48000 Hz clock 514 rate for all modes of Opus and all sampling frequencies. For 515 audio sampling rates other than 48000 Hz the rate has to be 516 adjusted to 48000 Hz according to Table 2. 518 Optional parameters: 520 maxcodedaudiobandwidth: the decoder's maximum sampling frequency 521 specified in Hertz (Hz) that the application can take advantage 522 of. The decoder MUST be capable to receive any allowed sampling 523 frequency but due to hardware limitations only signals up to the 524 specified sampling frequency can be processed. Sending signals 525 with higher sampling frequency may result in higher than necessary 526 network bandwidth and encoding complexity. Possible values are 527 8000, 12000, 16000, 24000, 48000. 529 maxptime: the decoder's maximum length of time in milliseconds 530 rounded up to the next full integer value represented by the media 531 in a packet that can be encapsulated in a received packet 532 according to Section 6 of [RFC4566]. Possible values are 3, 5, 533 10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes 534 rounded up to the next full integer value up to a maximum value of 535 120 as defined in Section 4 and Section 5 of this document. If no 536 value is specified, 120 is assumed as default. This value is a 537 recommendation by the decoding side to ensure the best performance 538 for the decoder. The decoder MUST be capable to accept any 539 allowed packet sizes to ensure maximum compatibility. 541 ptime: the decoder's recommended length of time in milliseconds 542 rounded up to the next full integer value represented by the media 543 in a packet according to Section 6 of [RFC4566]. Possible values 544 are 3, 5, 10, 20, 40, or 60 or an arbitrary multiple of Opus frame 545 sizes rounded up to the next full integer value up to a maximum 546 value of 120 as defined in Section 4 and Section 5 of this 547 document. If no value is specified, 20 is assumed as default. If 548 ptime is greater than maxptime, ptime MUST be ignored. This 549 parameter MAY be changed during a session. This value is a 550 recommendation by the decoding side to ensure the best performance 551 for the decoder. The decoder MUST be capable to accept any 552 allowed packet sizes to ensure maximum compatibility. 554 minptime: the decoder's minimum length of time in milliseconds 555 rounded up to the next full integer value represented by the media 556 in a packet that SHOULD be encapsulated in a received packet 557 according to Section 6 of [RFC4566]. Possible values are 3, 5, 558 10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes 559 rounded up to the next full integer value up to a maximum value of 560 120 as defined in Section 4 and Section 5 of this document. If no 561 value is specified, 3 is assumed as default. This value is a 562 recommendation by the decoding side to ensure the best performance 563 for the decoder. The decoder MUST be capable to accept any 564 allowed packet sizes to ensure maximum compatibility. 566 maxaveragebitrate: specifies the maximum average receive bit rate of 567 a session in bits per second (bps). The actual value of the bit 568 rate may vary as it is dependent on the characteristics of the 569 media in a packet. Note that the maximum average bit rate MAY be 570 modified dynamically during a session. Any positive integer is 571 allowed but values outside the range between 6000 and 510000 572 SHOULD be ignored. If no value is specified, the maximum value 573 specified in Table 1 for the corresponding mode of Opus and 574 corresponding clock rate will be the default. 576 stereo: specifies if the decoder prefers to receive stereo signals 577 versus mono signals. Possible values are 1 and 0 where 1 578 specifies that stereo signals are preferred and 0 specifies that 579 only mono signals are preferred. Independent of the stereo 580 parameter every receiver MUST be able to receive and decode stereo 581 signals but sending stereo signals to a receiver that signaled a 582 preference for mono signals may result in higher than necessary 583 network bandwidth and encoding complexity. If no value is 584 specified, stereo is assumed to be 0. 586 cbr: specifies if the decoder prefers the use of a constant bit rate 587 versus variable bit rate. Possible values are 1 and 0 where 1 588 specifies constant bit rate and 0 specifies variable bit rate. If 589 no value is specified, cbr is assumed to be 0. Note that the 590 maximum average bit rate may still be changed, e.g. to adapt to 591 changing network conditions. 593 useinbandfec: specifies that Opus in-band FEC is supported by the 594 decoder and MAY be used during a session. Possible values are 1 595 and 0. It is RECOMMENDED to provide 0 in case FEC is not 596 implemented on the receiving side. If no value is specified, 597 useinbandfec is assumed to be 1. 599 usedtx: specifies if the decoder prefers the use of DTX. Possible 600 values are 1 and 0. If no value is specified, usedtx is assumed 601 to be 0. 603 Encoding considerations: 605 Opus media type is framed and consists of binary data according to 606 Section 4.8 in [RFC4288]. 608 Security considerations: 610 See Section 8 of this document. 612 Interoperability considerations: none 614 Published specification: none 616 Applications that use this media type: 618 Any application that requires the transport or storage of speech 619 or audio data may use this media type. Some examples are, but not 620 limited to, audio and video conferencing, Voice over IP, voice 621 recording, media streaming, voice messaging. 623 Additional information: 625 For storage transfer methods the following applies: 627 Magic number:"#!opus\n" (hexadecimal: 0x23 0x21 0x6f 0x70 0x75 628 0x73 0x0A) 630 File extension(s): ops, OPS 632 Macintosh file type code(s): "opus" 634 Person & email address to contact for further information: 636 SILK Support silksupport@skype.net 637 Jean-Marc Valin jean-marc.valin@octasic.com 639 Intended usage: COMMON 641 Restrictions on usage: 643 For transfer over RTP, the RTP payload format (Section 4 of this 644 document) SHALL be used. For storage usage, the storage format 645 (Section 5 of this document) SHALL be used. 647 Author: 649 Julian Spittka julian.spittka@skype.net 651 Koen Vos koen.vos@skype.net 653 Jean-Marc Valin jean-marc.valin@octasic.com 655 Change controller: TBD 657 7.2. Mapping to SDP Parameters 659 The information described in the media type specification has a 660 specific mapping to fields in the Session Description Protocol (SDP) 661 [RFC4566], which is commonly used to describe RTP sessions. When SDP 662 is used to specify sessions employing the Opus codec, the mapping is 663 as follows: 665 o The media type ("audio") goes in SDP "m=" as the media name. 666 o The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding 667 name. The RTP clock rate in "a=rtpmap" MUST be mapped to the 668 required media type parameter "rate". 669 o The optional media type parameters "ptime" and "maxptime" are 670 mapped to "a=ptime" and "a=maxptime" attributes, respectively, in 671 the SDP. 673 o All remaining media type parameters are mapped to the "a=fmtp" 674 attribute in the SDP by copying them directly from the media type 675 parameter string as a semicolon-separated list of parameter=value 676 pairs (e.g. maxaveragebitrate=20000). 678 Below are some examples of SDP session descriptions for Opus: 680 Example 1: Standard session with 48000 Hz clock rate 682 m=audio 54312 RTP/AVP 101 683 a=rtpmap:101 opus/48000 685 Example 2: 16000 Hz clock rate, maximum packet size of 40 ms, 686 recommended packet size of 40 ms, maximum average bit rate of 20000 687 bps, stereo signals are preferred, FEC is allowed, DTX is not allowed 689 m=audio 54312 RTP/AVP 101 690 a=rtpmap:101 opus/48000 691 a=fmtp:101 maxcodedaudiobandwidth=16000; maxaveragebitrate=20000; 692 stereo=1; useinbandfec=1; usedtx=0 693 a=ptime:40 694 a=maxptime:40 696 7.2.1. Offer-Answer Model Considerations for Opus 698 When using the offer-answer procedure described in [RFC3264] to 699 negotiate the use of Opus, the following considerations apply: 701 o Opus supports several clock rates. For signaling purposes only 702 the highest, i.e. 48000, is used. The actual clock rate of the 703 corresponding media is signaled inside the payload and is not 704 subject to this payload format description. The decoder MUST be 705 capable to decode every received clock rate. An example is shown 706 below: 708 m=audio 54312 RTP/AVP 100 709 a=rtpmap:100 opus/48000 711 o The parameters "ptime" and "maxptime" are unidirectional receive- 712 only parameters and typically will not compromise 713 interoperability; however, dependent on the set values of the 714 parameters the performance of the application may suffer. 716 [RFC3264] defines the SDP offer-answer handling of the "ptime" 717 parameter. The "maxptime" parameter MUST be handled in the same 718 way. 719 o The parameter "minptime" is a unidirectional receive-only 720 parameters and typically will not compromise interoperability; 721 however, dependent on the set values of the parameter the 722 performance of the application may suffer and should be set with 723 care. 724 o The parameter "maxcodedaudiobandwidth" is a unidirectional 725 receive-only parameter that reflects limitations of the local 726 receiver. The sender of the other side SHOULD NOT send with a 727 sampling rate higher than "maxcodedaudiobandwidth" as it 728 represents an inefficient use of network bandwidth resources and 729 CPU cycles on the encoding side. The parameter 730 "maxcodedaudiobandwidth" typically will not compromise 731 interoperability; however, dependent on the set value of the 732 parameter the performance of the application may suffer and should 733 be set with care. 734 o The parameter "maxaveragebitrate" is a unidirectional receive-only 735 parameter that reflects limitations of the local receiver. The 736 sender of the other side MUST NOT send with an average bit rate 737 higher than "maxaveragebitrate" as it might overload the network 738 and/or receiver. The parameter "maxaveragebitrate" typically will 739 not compromise interoperability; however, dependent on the set 740 value of the parameter the performance of the application may 741 suffer and should be set with care. 742 o If the parameter "maxaveragebitrate" is below the range specified 743 in Table 1 the session MUST be rejected. 744 o The parameter "stereo" is a unidirectional receive-only parameter. 745 o The parameter "cbr" is a unidirectional receive-only parameter. 746 o The parameter "useinbandfec" is a unidirectional receive-only 747 parameter. 748 o The parameter "usedtx" is a unidirectional receive-only parameter. 749 o Any unknown parameter in an offer MUST be ignored by the receiver 750 and MUST be removed from the answer. 752 7.2.2. Declarative SDP Considerations for Opus 754 For declarative use of SDP such as in Session Announcement Protocol 755 (SAP), [RFC2974], and RTSP, [RFC2326], for Opus, the following needs 756 to be considered: 758 o The values for "maxptime", "ptime", "minptime", 759 "maxcodedaudiobandwidth", and "maxaveragebitrate" should be 760 selected carefully to ensure that a reasonable performance can be 761 achieved for the participants of a session. 763 o The values for "maxptime", "ptime", and "minptime" of the payload 764 format configuration are recommendations by the decoding side to 765 ensure the best performance for the decoder. The decoder MUST be 766 capable to accept any allowed packet sizes to ensure maximum 767 compatibility. 768 o All other parameters of the payload format configuration are 769 declarative and a participant MUST use the configurations that are 770 provided for the session. More than one configuration may be 771 provided if necessary by declaring multiple RTP payload types; 772 however, the number of types should be kept small. 774 8. Security Considerations 776 All RTP packets using the payload format defined in this 777 specification are subject to the general security considerations 778 discussed in the RTP specification [RFC3550] and any profile from 779 e.g. [RFC3711] or [RFC3551]. 781 This payload format transports Opus encoded speech or audio data, 782 hence, security issues include confidentiality, integrity protection, 783 and authentication of the speech or audio itself. The Opus payload 784 format does not have any built-in security mechanisms. Any suitable 785 external mechanisms, such as SRTP [RFC3711], MAY be used. 787 This payload format and the Opus encoding do not exhibit any 788 significant non-uniformity in the receiver-end computational load and 789 thus are unlikely to pose a denial-of-service threat due to the 790 receipt of pathological datagrams. 792 9. Acknowledgements 794 TBD 796 10. Normative References 798 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 799 Requirement Levels", BCP 14, RFC 2119, March 1997. 801 [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time 802 Streaming Protocol (RTSP)", RFC 2326, April 1998. 804 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 805 Announcement Protocol", RFC 2974, October 2000. 807 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 808 with Session Description Protocol (SDP)", RFC 3264, 809 June 2002. 811 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 812 Jacobson, "RTP: A Transport Protocol for Real-Time 813 Applications", STD 64, RFC 3550, July 2003. 815 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 816 Video Conferences with Minimal Control", STD 65, RFC 3551, 817 July 2003. 819 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 820 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 821 RFC 3711, March 2004. 823 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 824 Registration Procedures", BCP 13, RFC 4288, December 2005. 826 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 827 Description Protocol", RFC 4566, July 2006. 829 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 830 Formats", RFC 4855, February 2007. 832 Appendix A. Informational References 834 [codec] http://datatracker.ietf.org/wg/codec/ 835 [SILK] https://developer.skype.com/silk 836 [CELT] http://www.celt-codec.org/ 837 [Opus] http://datatracker.ietf.org/doc/draft-ietf-codec-opus/ 839 Authors' Addresses 841 Julian Spittka 842 Skype Technologies S.A. 843 3210 Porter Drive 844 Palo Alto, CA 94304 845 USA 847 Email: julian.spittka@skype.net 849 Koen Vos 850 Skype Technologies S.A. 851 3210 Porter Drive 852 Palo Alto, CA 94304 853 USA 855 Email: koen.vos@skype.net 857 Jean-Marc Valin 858 Octasic Inc. 859 4101 Molson Street 860 Montreal, Quebec 861 Canada 863 Email: jean-marc.valin@octasic.com