idnits 2.17.1 draft-spittka-payload-rtp-opus-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 346 has weird spacing: '...s frame sizes...' -- The document date (November 30, 2012) is 4136 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 2326 (Obsoleted by RFC 7826) ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Spittka 3 Internet-Draft 4 Intended status: Informational K. Vos 5 Expires: June 3, 2013 Skype Technologies S.A. 6 JM. Valin 7 Mozilla 8 November 30, 2012 10 RTP Payload Format for Opus Speech and Audio Codec 11 draft-spittka-payload-rtp-opus-03 13 Abstract 15 This document defines the Real-time Transport Protocol (RTP) payload 16 format for packetization of Opus encoded speech and audio data that 17 is essential to integrate the codec in the most compatible way. 18 Further, media type registrations are described for the RTP payload 19 format. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on June 3, 2013. 38 Copyright Notice 40 Copyright (c) 2012 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Conventions, Definitions and Acronyms used in this document . 4 57 2.1. Audio Bandwidth . . . . . . . . . . . . . . . . . . . . . 4 58 3. Opus Codec . . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.1. Network Bandwidth . . . . . . . . . . . . . . . . . . . . 5 60 3.1.1. Recommended Bitrate . . . . . . . . . . . . . . . . . 5 61 3.1.2. Variable versus Constant Bit Rate . . . . . . . . . . 5 62 3.1.3. Discontinuous Transmission (DTX) . . . . . . . . . . . 6 63 3.2. Complexity . . . . . . . . . . . . . . . . . . . . . . . . 6 64 3.3. Forward Error Correction (FEC) . . . . . . . . . . . . . . 6 65 3.4. Stereo Operation . . . . . . . . . . . . . . . . . . . . . 7 66 4. Opus RTP Payload Format . . . . . . . . . . . . . . . . . . . 8 67 4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . . 8 68 4.2. Payload Structure . . . . . . . . . . . . . . . . . . . . 9 69 5. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 11 70 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 71 6.1. Opus Media Type Registration . . . . . . . . . . . . . . . 12 72 6.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 15 73 6.2.1. Offer-Answer Model Considerations for Opus . . . . . . 17 74 6.2.2. Declarative SDP Considerations for Opus . . . . . . . 18 75 7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 76 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 77 9. Normative References . . . . . . . . . . . . . . . . . . . . . 21 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 80 1. Introduction 82 The Opus codec is a speech and audio codec developed within the IETF 83 Internet Wideband Audio Codec working group (codec). The codec has a 84 very low algorithmic delay and it is highly scalable in terms of 85 audio bandwidth, bitrate, and complexity. Further, it provides 86 different modes to efficiently encode speech signals as well as music 87 signals, thus, making it the codec of choice for various applications 88 using the Internet or similar networks. 90 This document defines the Real-time Transport Protocol (RTP) 91 [RFC3550] payload format for packetization of Opus encoded speech and 92 audio data that is essential to integrate the Opus codec in the most 93 compatible way. Further, media type registrations are described for 94 the RTP payload format. More information on the Opus codec can be 95 obtained from [RFC6716]. 97 2. Conventions, Definitions and Acronyms used in this document 99 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 100 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 101 document are to be interpreted as described in [RFC2119]. 103 CBR: Constant bitrate 104 CPU: Central Processing Unit 105 DTX: Discontinuous transmission 106 FEC: Forward error correction 107 IP: Internet Protocol 108 samples: Speech or audio samples (usually per channel) 109 SDP: Session Description Protocol 110 VBR: Variable bitrate 112 2.1. Audio Bandwidth 114 Throughout this document, we refer to the following definitions: 116 +--------------+----------------+-----------+----------+ 117 | Abbreviation | Name | Bandwidth | Sampling | 118 +--------------+----------------+-----------+----------+ 119 | nb | Narrowband | 0 - 4000 | 8000 | 120 | | | | | 121 | mb | Mediumband | 0 - 6000 | 12000 | 122 | | | | | 123 | wb | Wideband | 0 - 8000 | 16000 | 124 | | | | | 125 | swb | Super-wideband | 0 - 12000 | 24000 | 126 | | | | | 127 | fb | Fullband | 0 - 20000 | 48000 | 128 +--------------+----------------+-----------+----------+ 130 Audio bandwidth naming 132 Table 1 134 3. Opus Codec 136 The Opus [RFC6716] speech and audio codec has been developed to 137 encode speech signals as well as audio signals. Two different modes, 138 a voice mode or an audio mode, may be chosen to allow the most 139 efficient coding dependent on the type of input signal, the sampling 140 frequency of the input signal, and the specific application. 142 The voice mode allows efficient encoding of voice signals at lower 143 bit rates while the audio mode is optimized for audio signals at 144 medium and higher bitrates. 146 The Opus speech and audio codec is highly scalable in terms of audio 147 bandwidth, bitrate, and complexity. Further, Opus allows 148 transmitting stereo signals. 150 3.1. Network Bandwidth 152 Opus supports all bitrates from 6 kb/s to 510 kb/s. The bitrate can 153 be changed dynamically within that range. All other parameters being 154 equal, higher bitrate results in higher quality. 156 3.1.1. Recommended Bitrate 158 For a frame size of 20 ms, these are the bitrate "sweet spots" for 159 Opus in various configurations: 160 o 8-12 kb/s for NB speech, 161 o 16-20 kb/s for WB speech, 162 o 28-40 kb/s for FB speech, 163 o 48-64 kb/s for FB mono music, and 164 o 64-128 kb/s for FB stereo music. 166 3.1.2. Variable versus Constant Bit Rate 168 For the same average bitrate, variable bitrate (VBR) can achieve 169 higher quality than constant bitrate (CBR). For the majority of 170 voice transmission application, VBR is the best choice. One 171 potential reason for choosing CBR is the potential information leak 172 that _may_ occur when encrypting the compressed stream. See 173 [RFC6562] for guidelines on when VBR is appropriate for encrypted 174 audio communications. In the case where an existing VBR stream needs 175 to be converted to CBR for security reasons, then the Opus padding 176 mechanism described in [RFC6716] is the RECOMMENDED way to achieve 177 padding because the RTP padding bit is unencrypted. 179 The bitrate can be adjusted at any point in time. To avoid 180 congestion, the average bitrate SHOULD be adjusted to the available 181 network capacity. If no target bitrate is specified, the bitrates 182 specified in Section 3.1.1 are RECOMMENDED. 184 3.1.3. Discontinuous Transmission (DTX) 186 The Opus codec may, as described in Section 3.1.2, be operated with 187 an adaptive bitrate. In that case, the bitrate will automatically be 188 reduced for certain input signals like periods of silence. During 189 continuous transmission the bitrate will be reduced, when the input 190 signal allows to do so, but the transmission to the receiver itself 191 will never be interrupted. Therefore, the received signal will 192 maintain the same high level of quality over the full duration of a 193 transmission while minimizing the average bit rate over time. 195 In cases where the bitrate of Opus needs to be reduced even further 196 or in cases where only constant bitrate is available, the Opus 197 encoder may be set to use discontinuous transmission (DTX), where 198 parts of the encoded signal that correspond to periods of silence in 199 the input speech or audio signal are not transmitted to the receiver. 201 On the receiving side, the non-transmitted parts will be handled by a 202 frame loss concealment unit in the Opus decoder which generates a 203 comfort noise signal to replace the non transmitted parts of the 204 speech or audio signal. 206 The DTX mode of Opus will have a slightly lower speech or audio 207 quality than the continuous mode. Therefore, it is RECOMMENDED to 208 use Opus in the continuous mode unless restraints on network capacity 209 are severe. The DTX mode can be engaged for operation in both 210 adaptive or constant bitrate. 212 3.2. Complexity 214 Complexity can be scaled to optimize for CPU resources in real-time, 215 mostly as a trade-off between audio quality and bitrate. Also, 216 different modes of Opus have different complexity. 218 3.3. Forward Error Correction (FEC) 220 The voice mode of Opus allows for "in-band" forward error correction 221 (FEC) data to be embedded into the bit stream of Opus. This FEC 222 scheme adds redundant information about the previous packet (n-1) to 223 the current output packet n. For each frame, the encoder decides 224 whether to use FEC based on (1) an externally-provided estimate of 225 the channel's packet loss rate; (2) an externally-provided estimate 226 of the channel's capacity; (3) the sensitivity of the audio or speech 227 signal to packet loss; (4) whether the receiving decoder has 228 indicated it can take advantage of "in-band" FEC information. The 229 decision to send "in-band" FEC information is entirely controlled by 230 the encoder and therefore no special precautions for the payload have 231 to be taken. 233 On the receiving side, the decoder can take advantage of this 234 additional information when, in case of a packet loss, the next 235 packet is available. In order to use the FEC data, the jitter buffer 236 needs to provide access to payloads with the FEC data. The decoder 237 API function has a flag to indicate that a FEC frame rather than a 238 regular frame should be decoded. If no FEC data is available for the 239 current frame, the decoder will consider the frame lost and invokes 240 the frame loss concealment. 242 If the FEC scheme is not implemented on the receiving side, FEC 243 SHOULD NOT be used, as it leads to an inefficient usage of network 244 resources. Decoder support for FEC SHOULD be indicated at the time a 245 session is set up. 247 3.4. Stereo Operation 249 Opus allows for transmission of stereo audio signals. This operation 250 is signaled in-band in the Opus payload and no special arrangement is 251 required in the payload format. Any implementation of the Opus 252 decoder MUST be capable of receiving stereo signals, although it MAY 253 decode those signals as mono. 255 If a decoder can not take advantage of the benefits of a stereo 256 signal this SHOULD be indicated at the time a session is set up. In 257 that case the sending side SHOULD NOT send stereo signals as it leads 258 to an inefficient usage of the network. 260 4. Opus RTP Payload Format 262 The payload format for Opus consists of the RTP header and Opus 263 payload data. 265 4.1. RTP Header Usage 267 The format of the RTP header is specified in [RFC3550]. The Opus 268 payload format uses the fields of the RTP header consistent with this 269 specification. 271 The payload length of Opus is a multiple number of octets and 272 therefore no padding is required. The payload MAY be padded by an 273 integer number of octets according to [RFC3550]. 275 The marker bit (M) of the RTP header is used in accordance with 276 Section 4.1 of [RFC3551]. 278 The RTP payload type for Opus has not been assigned statically and is 279 expected to be assigned dynamically. 281 The receiving side MUST be prepared to receive duplicates of RTP 282 packets. Only one of those payloads MUST be provided to the Opus 283 decoder for decoding and others MUST be discarded. 285 Opus supports 5 different audio bandwidths which may be adjusted 286 during the duration of a call. The RTP timestamp clock frequency is 287 defined as the highest supported sampling frequency of Opus, i.e. 288 48000 Hz, for all modes and sampling rates of Opus. The unit for the 289 timestamp is samples per single (mono) channel. The RTP timestamp 290 corresponds to the sample time of the first encoded sample in the 291 encoded frame. For sampling rates lower than 48000 Hz the number of 292 samples has to be multiplied with a multiplier according to Table 2 293 to determine the RTP timestamp. 295 +---------+------------+ 296 | fs (Hz) | Multiplier | 297 +---------+------------+ 298 | 8000 | 6 | 299 | | | 300 | 12000 | 4 | 301 | | | 302 | 16000 | 3 | 303 | | | 304 | 24000 | 2 | 305 | | | 306 | 48000 | 1 | 307 +---------+------------+ 309 Table 2: Timestamp multiplier 311 4.2. Payload Structure 313 The Opus encoder can be set to output encoded frames representing 314 2.5, 5, 10, 20, 40, or 60 ms of speech or audio data. Further, an 315 arbitrary number of frames can be combined into a packet. The 316 maximum packet length is limited to the amount of encoded data 317 representing 120 ms of speech or audio data. The packetization of 318 encoded data is purely done by the Opus encoder and therefore only 319 one packet output from the Opus encoder MUST be used as a payload. 321 Figure 1 shows the structure combined with the RTP header. 323 +----------+--------------+ 324 |RTP Header| Opus Payload | 325 +----------+--------------+ 327 Figure 1: Payload Structure with RTP header 329 Table 3 shows supported frame sizes in milliseconds of encoded speech 330 or audio data for speech and audio mode (Mode) and sampling rates 331 (fs) of Opus and how the timestamp needs to be incremented for 332 packetization (ts incr). If the Opus encoder outputs multiple 333 encoded frames into a single packet the timestamps have to be added 334 up according to the combined frames. 336 +---------+-----------------+-----+-----+-----+-----+------+------+ 337 | Mode | fs | 2.5 | 5 | 10 | 20 | 40 | 60 | 338 +---------+-----------------+-----+-----+-----+-----+------+------+ 339 | ts incr | all | 120 | 240 | 480 | 960 | 1920 | 2880 | 340 | | | | | | | | | 341 | voice | nb/mb/wb/swb/fb | | | x | x | x | x | 342 | | | | | | | | | 343 | audio | nb/wb/swb/fb | x | x | x | x | | | 344 +---------+-----------------+-----+-----+-----+-----+------+------+ 346 Table 3: Supported Opus frame sizes and timestamp increments 348 5. Congestion Control 350 The adaptive nature of the Opus codec allows for an efficient 351 congestion control. 353 The target bitrate of Opus can be adjusted at any point in time and 354 thus allowing for an efficient congestion control. Furthermore, the 355 amount of encoded speech or audio data encoded in a single packet can 356 be used for congestion control since the transmission rate is 357 inversely proportional to these frame sizes. A lower packet 358 transmission rate reduces the amount of header overhead but at the 359 same time increases latency and error sensitivity and should be done 360 with care. 362 It is RECOMMENDED that congestion control is applied during the 363 transmission of Opus encoded data. 365 6. IANA Considerations 367 One media subtype (audio/opus) has been defined and registered as 368 described in the following section. 370 6.1. Opus Media Type Registration 372 Media type registration is done according to [RFC4288] and [RFC4855]. 374 Type name: audio 376 Subtype name: opus 378 Required parameters: 380 rate: RTP timestamp clock rate is incremented with 48000 Hz clock 381 rate for all modes of Opus and all sampling frequencies. For 382 audio sampling rates other than 48000 Hz the rate has to be 383 adjusted to 48000 Hz according to Table 2. 385 Optional parameters: 387 maxplaybackrate: a hint about the maximum output sampling rate that 388 the receiver is capable of rendering in Hz. The decoder MUST be 389 capable of decoding any audio bandwidth but due to hardware 390 limitations only signals up to the specified sampling rate can be 391 played back. Sending signals with higher audio bandwidth results 392 in higher than necessary network usage and encoding complexity, so 393 an encoder SHOULD NOT encode frequencies above the audio bandwidth 394 specified by maxplaybackrate. This parameter can take any value 395 between 8000 and 48000, although commonly the value will match one 396 of the Opus bandwidths (Table 1). By default, the receiver is 397 assumed to have no limitations, i.e. 48000. 399 sprop-maxcapturerate: a hint about the maximum input sampling rate 400 that the sender is likely to produce. This is not a guarantee 401 that the sender will never send any higher bandwidth (e.g. it 402 could send a pre-recorded prompt that uses a higher bandwidth), 403 but it indicates to the receiver that frequencies above this 404 maximum can safely be discarded. This parameter is useful to 405 avoid wasting receiver resources by operating the audio processing 406 pipeline (e.g. echo cancellation) at a higher rate than necessary. 407 This parameter can take any value between 8000 and 48000, although 408 commonly the value will match one of the Opus bandwidths 409 (Table 1). By default, the sender is assumed to have no 410 limitations, i.e. 48000. 412 maxptime: the decoder's maximum length of time in milliseconds 413 rounded up to the next full integer value represented by the media 414 in a packet that can be encapsulated in a received packet 415 according to Section 6 of [RFC4566]. Possible values are 3, 5, 416 10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes 417 rounded up to the next full integer value up to a maximum value of 418 120 as defined in Section 4. If no value is specified, 120 is 419 assumed as default. This value is a recommendation by the 420 decoding side to ensure the best performance for the decoder. The 421 decoder MUST be capable of accepting any allowed packet sizes to 422 ensure maximum compatibility. 424 ptime: the decoder's recommended length of time in milliseconds 425 rounded up to the next full integer value represented by the media 426 in a packet according to Section 6 of [RFC4566]. Possible values 427 are 3, 5, 10, 20, 40, or 60 or an arbitrary multiple of Opus frame 428 sizes rounded up to the next full integer value up to a maximum 429 value of 120 as defined in Section 4. If no value is specified, 430 20 is assumed as default. If ptime is greater than maxptime, 431 ptime MUST be ignored. This parameter MAY be changed during a 432 session. This value is a recommendation by the decoding side to 433 ensure the best performance for the decoder. The decoder MUST be 434 capable of accepting any allowed packet sizes to ensure maximum 435 compatibility. 437 minptime: the decoder's minimum length of time in milliseconds 438 rounded up to the next full integer value represented by the media 439 in a packet that SHOULD be encapsulated in a received packet 440 according to Section 6 of [RFC4566]. Possible values are 3, 5, 441 10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes 442 rounded up to the next full integer value up to a maximum value of 443 120 as defined in Section 4. If no value is specified, 3 is 444 assumed as default. This value is a recommendation by the 445 decoding side to ensure the best performance for the decoder. The 446 decoder MUST be capable to accept any allowed packet sizes to 447 ensure maximum compatibility. 449 maxaveragebitrate: specifies the maximum average receive bitrate of 450 a session in bits per second (b/s). The actual value of the 451 bitrate may vary as it is dependent on the characteristics of the 452 media in a packet. Note that the maximum average bitrate MAY be 453 modified dynamically during a session. Any positive integer is 454 allowed but values outside the range between 6000 and 510000 455 SHOULD be ignored. If no value is specified, the maximum value 456 specified in Section 3.1.1 for the corresponding mode of Opus and 457 corresponding maxplaybackrate: will be the default. 459 stereo: specifies whether the decoder prefers receiving stereo or 460 mono signals. Possible values are 1 and 0 where 1 specifies that 461 stereo signals are preferred and 0 specifies that only mono 462 signals are preferred. Independent of the stereo parameter every 463 receiver MUST be able to receive and decode stereo signals but 464 sending stereo signals to a receiver that signaled a preference 465 for mono signals may result in higher than necessary network 466 utilisation and encoding complexity. If no value is specified, 467 mono is assumed (stereo=0). 469 sprop-stereo: specifies whether the sender is likely to produce 470 stereo audio. Possible values are 1 and 0 where 1 specifies that 471 stereo signals are likely to be sent, and 0 speficies that the 472 sender will likely only send mono. This is not a guarantee that 473 the sender will never send stereo audio (e.g. it could send a pre- 474 recorded prompt that uses stereo), but it indicates to the 475 receiver that the received signal can be safely downmixed to mono. 476 This parameter is useful to avoid wasting receiver resources by 477 operating the audio processing pipeline (e.g. echo cancellation) 478 in stereo when not necessary. If no value is specified, mono is 479 assumed (sprop-stereo=0). 481 cbr: specifies if the decoder prefers the use of a constant bitrate 482 versus variable bitrate. Possible values are 1 and 0 where 1 483 specifies constant bitrate and 0 specifies variable bitrate. If 484 no value is specified, cbr is assumed to be 0. Note that the 485 maximum average bitrate may still be changed, e.g. to adapt to 486 changing network conditions. 488 useinbandfec: specifies that the decoder has the capability to take 489 advantage of the Opus in-band FEC. Possible values are 1 and 0. 490 It is RECOMMENDED to provide 0 in case FEC cannot be utilized on 491 the receiving side. If no value is specified, useinbandfec is 492 assumed to be 0. This parameter is only a preference and the 493 receiver MUST be able to process packets that include FEC 494 information, even if it means the FEC part is discarded. 496 usedtx: specifies if the decoder prefers the use of DTX. Possible 497 values are 1 and 0. If no value is specified, usedtx is assumed 498 to be 0. 500 Encoding considerations: 502 Opus media type is framed and consists of binary data according to 503 Section 4.8 in [RFC4288]. 505 Security considerations: 507 See Section 7 of this document. 509 Interoperability considerations: none 511 Published specification: none 513 Applications that use this media type: 515 Any application that requires the transport of speech or audio 516 data may use this media type. Some examples are, but not limited 517 to, audio and video conferencing, Voice over IP, media streaming. 519 Person & email address to contact for further information: 521 SILK Support silksupport@skype.net 522 Jean-Marc Valin jmvalin@jmvalin.ca 524 Intended usage: COMMON 526 Restrictions on usage: 528 For transfer over RTP, the RTP payload format (Section 4 of this 529 document) SHALL be used. 531 Author: 533 Julian Spittka jspittka@gmail.com 535 Koen Vos koenvos74@gmail.com 537 Jean-Marc Valin jmvalin@jmvalin.ca 539 Change controller: TBD 541 6.2. Mapping to SDP Parameters 543 The information described in the media type specification has a 544 specific mapping to fields in the Session Description Protocol (SDP) 546 [RFC4566], which is commonly used to describe RTP sessions. When SDP 547 is used to specify sessions employing the Opus codec, the mapping is 548 as follows: 550 o The media type ("audio") goes in SDP "m=" as the media name. 551 o The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding 552 name. The RTP clock rate in "a=rtpmap" MUST be 48000 and the 553 number of channels MUST be 2. 554 o The OPTIONAL media type parameters "ptime" and "maxptime" are 555 mapped to "a=ptime" and "a=maxptime" attributes, respectively, in 556 the SDP. 557 o The OPTIONAL media type parameters "maxaveragebitrate", 558 "maxplaybackrate", "minptime", "stereo", "cbr", "useinbandfec", 559 and "usedtx", when present, MUST be included in the "a=fmtp" 560 attribute in the SDP, expressed as a media type string in the form 561 of a semicolon-separated list of parameter=value pairs (e.g., 562 maxaveragebitrate=20000). They MUST NOT be specified in an SSRC- 563 specific "fmtp" source-level attribute (as defined in Section 6.3 564 of [RFC5576]). 565 o The OPTIONAL media type parameters "sprop-maxcapturerate", and 566 "sprop-stereo" MAY be mapped to the "a=fmtp" SDP attribute by 567 copying them directly from the media type parameter string as part 568 of the semicolon-separated list of parameter=value pairs (e.g., 569 sprop-stereo=1). These same OPTIONAL media type parameters MAY 570 also be specified using an SSRC-specific "fmtp" source-level 571 attribute as described in Section 6.3 of [RFC5576]. They MAY be 572 specified in both places, in which case the parameter in the 573 source-level attribute overrides the one found on the "a=fmtp" 574 line. The value of any parameter which is not specified in a 575 source-level source attribute MUST be taken from the "a=fmtp" 576 line, if it is present there. 578 Below are some examples of SDP session descriptions for Opus: 580 Example 1: Standard mono session with 48000 Hz clock rate 582 m=audio 54312 RTP/AVP 101 583 a=rtpmap:101 opus/48000/2 585 Example 2: 16000 Hz clock rate, maximum packet size of 40 ms, 586 recommended packet size of 40 ms, maximum average bitrate of 20000 587 bps, prefers to receive stereo but only plans to send mono, FEC is 588 allowed, DTX is not allowed 589 m=audio 54312 RTP/AVP 101 590 a=rtpmap:101 opus/48000/2 591 a=fmtp:101 maxplaybackrate=16000; sprop-maxcapturerate=16000; 592 maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0 593 a=ptime:40 594 a=maxptime:40 596 Example 3: Two-way full-band stereo preferred 598 m=audio 54312 RTP/AVP 101 599 a=rtpmap:101 opus/48000/2 600 a=fmtp:101 stereo=1; sprop-stereo=1 602 6.2.1. Offer-Answer Model Considerations for Opus 604 When using the offer-answer procedure described in [RFC3264] to 605 negotiate the use of Opus, the following considerations apply: 607 o Opus supports several clock rates. For signaling purposes only 608 the highest, i.e. 48000, is used. The actual clock rate of the 609 corresponding media is signaled inside the payload and is not 610 subject to this payload format description. The decoder MUST be 611 capable to decode every received clock rate. An example is shown 612 below: 614 m=audio 54312 RTP/AVP 100 615 a=rtpmap:100 opus/48000/2 617 o The "ptime" and "maxptime" parameters are unidirectional receive- 618 only parameters and typically will not compromise 619 interoperability; however, dependent on the set values of the 620 parameters the performance of the application may suffer. 621 [RFC3264] defines the SDP offer-answer handling of the "ptime" 622 parameter. The "maxptime" parameter MUST be handled in the same 623 way. 624 o The "minptime" parameter is a unidirectional receive-only 625 parameters and typically will not compromise interoperability; 626 however, dependent on the set values of the parameter the 627 performance of the application may suffer and should be set with 628 care. 629 o The "maxplaybackrate" parameter is a unidirectional receive-only 630 parameter that reflects limitations of the local receiver. The 631 sender of the other side SHOULD NOT send with an audio bandwidth 632 higher than "maxplaybackrate" as this would lead to inefficient 633 use of network resources. The "maxplaybackrate" parameter does 634 not affect interoperability. Also, this parameter SHOULD NOT be 635 used to adjust the audio bandwidth as a function of the bitrates, 636 as this is the responsibility of the Opus encoder implementation. 637 o The "maxaveragebitrate" parameter is a unidirectional receive-only 638 parameter that reflects limitations of the local receiver. The 639 sender of the other side MUST NOT send with an average bitrate 640 higher than "maxaveragebitrate" as it might overload the network 641 and/or receiver. The "maxaveragebitrate" parameter typically will 642 not compromise interoperability; however, dependent on the set 643 value of the parameter the performance of the application may 644 suffer and should be set with care. 645 o The "sprop-maxcapturerate" and "sprop-stereo" parameters are 646 unidirectional sender-only parameters that reflect limitations of 647 the sender side. They allow the receiver to set up a reduced- 648 complexity audio processing pipeline if the sender is not planning 649 to use the full range of Opus's capabilities. Neither "sprop- 650 maxcapturerate" nor "sprop-stereo" affect interoperability and the 651 receiver MUST be capable of receiving any signal. 652 o The "stereo" parameter is a unidirectional receive-only parameter. 653 o The "cbr" parameter is a unidirectional receive-only parameter. 654 o The "useinbandfec" parameter is a unidirectional receive-only 655 parameter. 656 o The "usedtx" parameter is a unidirectional receive-only parameter. 657 o Any unknown parameter in an offer MUST be ignored by the receiver 658 and MUST be removed from the answer. 660 6.2.2. Declarative SDP Considerations for Opus 662 For declarative use of SDP such as in Session Announcement Protocol 663 (SAP), [RFC2974], and RTSP, [RFC2326], for Opus, the following needs 664 to be considered: 666 o The values for "maxptime", "ptime", "minptime", "maxplaybackrate", 667 and "maxaveragebitrate" should be selected carefully to ensure 668 that a reasonable performance can be achieved for the participants 669 of a session. 670 o The values for "maxptime", "ptime", and "minptime" of the payload 671 format configuration are recommendations by the decoding side to 672 ensure the best performance for the decoder. The decoder MUST be 673 capable to accept any allowed packet sizes to ensure maximum 674 compatibility. 675 o All other parameters of the payload format configuration are 676 declarative and a participant MUST use the configurations that are 677 provided for the session. More than one configuration may be 678 provided if necessary by declaring multiple RTP payload types; 679 however, the number of types should be kept small. 681 7. Security Considerations 683 All RTP packets using the payload format defined in this 684 specification are subject to the general security considerations 685 discussed in the RTP specification [RFC3550] and any profile from 686 e.g. [RFC3711] or [RFC3551]. 688 This payload format transports Opus encoded speech or audio data, 689 hence, security issues include confidentiality, integrity protection, 690 and authentication of the speech or audio itself. The Opus payload 691 format does not have any built-in security mechanisms. Any suitable 692 external mechanisms, such as SRTP [RFC3711], MAY be used. 694 This payload format and the Opus encoding do not exhibit any 695 significant non-uniformity in the receiver-end computational load and 696 thus are unlikely to pose a denial-of-service threat due to the 697 receipt of pathological datagrams. 699 8. Acknowledgements 701 TBD 703 9. Normative References 705 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 706 Requirement Levels", BCP 14, RFC 2119, March 1997. 708 [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time 709 Streaming Protocol (RTSP)", RFC 2326, April 1998. 711 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 712 Announcement Protocol", RFC 2974, October 2000. 714 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 715 with Session Description Protocol (SDP)", RFC 3264, 716 June 2002. 718 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 719 Jacobson, "RTP: A Transport Protocol for Real-Time 720 Applications", STD 64, RFC 3550, July 2003. 722 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 723 Video Conferences with Minimal Control", STD 65, RFC 3551, 724 July 2003. 726 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 727 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 728 RFC 3711, March 2004. 730 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 731 Registration Procedures", BCP 13, RFC 4288, December 2005. 733 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 734 Description Protocol", RFC 4566, July 2006. 736 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 737 Formats", RFC 4855, February 2007. 739 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 740 Media Attributes in the Session Description Protocol 741 (SDP)", RFC 5576, June 2009. 743 [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of 744 Variable Bit Rate Audio with Secure RTP", RFC 6562, 745 March 2012. 747 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 748 Opus Audio Codec", RFC 6716, September 2012. 750 Authors' Addresses 752 Julian Spittka 754 Email: jspittka@gmail.com 756 Koen Vos 757 Skype Technologies S.A. 758 3210 Porter Drive 759 Palo Alto, CA 94304 760 USA 762 Email: koenvos74@gmail.com 764 Jean-Marc Valin 765 Mozilla 766 650 Castro Street 767 Mountain View, CA 94041 768 USA 770 Email: jmvalin@jmvalin.ca