idnits 2.17.1 draft-ietf-payload-rtp-opus-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 30, 2014) is 3558 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2326 (Obsoleted by RFC 7826) ** Downref: Normative reference to an Experimental RFC: RFC 2974 ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Spittka 3 Internet-Draft 4 Intended status: Standards Track K. Vos 5 Expires: January 31, 2015 vocTone 6 JM. Valin 7 Mozilla 8 July 30, 2014 10 RTP Payload Format for Opus Speech and Audio Codec 11 draft-ietf-payload-rtp-opus-03 13 Abstract 15 This document defines the Real-time Transport Protocol (RTP) payload 16 format for packetization of Opus encoded speech and audio data 17 necessary to integrate the codec in the most compatible way. 18 Further, it describes media type registrations for the RTP payload 19 format. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on January 31, 2015. 38 Copyright Notice 40 Copyright (c) 2014 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Conventions, Definitions and Acronyms used in this document . 3 57 2.1. Audio Bandwidth . . . . . . . . . . . . . . . . . . . . . 3 58 3. Opus Codec . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 3.1. Network Bandwidth . . . . . . . . . . . . . . . . . . . . 4 60 3.1.1. Recommended Bitrate . . . . . . . . . . . . . . . . . 4 61 3.1.2. Variable versus Constant Bitrate . . . . . . . . . . 4 62 3.1.3. Discontinuous Transmission (DTX) . . . . . . . . . . 4 63 3.2. Complexity . . . . . . . . . . . . . . . . . . . . . . . 5 64 3.3. Forward Error Correction (FEC) . . . . . . . . . . . . . 5 65 3.4. Stereo Operation . . . . . . . . . . . . . . . . . . . . 6 66 4. Opus RTP Payload Format . . . . . . . . . . . . . . . . . . . 6 67 4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . 6 68 4.2. Payload Structure . . . . . . . . . . . . . . . . . . . . 7 69 5. Congestion Control . . . . . . . . . . . . . . . . . . . . . 8 70 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 71 6.1. Opus Media Type Registration . . . . . . . . . . . . . . 9 72 6.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 12 73 6.2.1. Offer-Answer Model Considerations for Opus . . . . . 14 74 6.2.2. Declarative SDP Considerations for Opus . . . . . . . 15 75 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 76 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 77 9. Normative References . . . . . . . . . . . . . . . . . . . . 16 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 80 1. Introduction 82 The Opus codec is a speech and audio codec developed within the IETF 83 Internet Wideband Audio Codec working group. The codec has a very 84 low algorithmic delay and it is highly scalable in terms of audio 85 bandwidth, bitrate, and complexity. Further, it provides different 86 modes to efficiently encode speech signals as well as music signals, 87 thus making it the codec of choice for various applications using the 88 Internet or similar networks. 90 This document defines the Real-time Transport Protocol (RTP) 91 [RFC3550] payload format for packetization of Opus encoded speech and 92 audio data necessary to integrate the Opus codec in the most 93 compatible way. Further, it describes media type registrations for 94 the RTP payload format. More information on the Opus codec can be 95 obtained from [RFC6716]. 97 2. Conventions, Definitions and Acronyms used in this document 99 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 100 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 101 document are to be interpreted as described in [RFC2119]. 103 CBR: Constant bitrate 104 CPU: Central Processing Unit 105 DTX: Discontinuous transmission 106 FEC: Forward error correction 107 IP: Internet Protocol 108 samples: Speech or audio samples (per channel) 109 SDP: Session Description Protocol 110 VBR: Variable bitrate 112 2.1. Audio Bandwidth 114 Throughout this document, we refer to the following definitions: 116 +--------------+----------------+-----------------+-----------------+ 117 | Abbreviation | Name | Audio Bandwidth | Sampling Rate | 118 | | | (Hz) | (Hz) | 119 +--------------+----------------+-----------------+-----------------+ 120 | NB | Narrowband | 0 - 4000 | 8000 | 121 | | | | | 122 | MB | Mediumband | 0 - 6000 | 12000 | 123 | | | | | 124 | WB | Wideband | 0 - 8000 | 16000 | 125 | | | | | 126 | SWB | Super-wideband | 0 - 12000 | 24000 | 127 | | | | | 128 | FB | Fullband | 0 - 20000 | 48000 | 129 +--------------+----------------+-----------------+-----------------+ 131 Audio bandwidth naming 133 Table 1 135 3. Opus Codec 137 The Opus [RFC6716] codec encodes speech signals as well as general 138 audio signals. Two different modes can be chosen, a voice mode or an 139 audio mode, to allow the most efficient coding depending on the type 140 of the input signal, the sampling frequency of the input signal, and 141 the intended application. 143 The voice mode allows efficient encoding of voice signals at lower 144 bit rates while the audio mode is optimized for general audio signals 145 at medium and higher bitrates. 147 The Opus speech and audio codec is highly scalable in terms of audio 148 bandwidth, bitrate, and complexity. Further, Opus allows 149 transmitting stereo signals. 151 3.1. Network Bandwidth 153 Opus supports all bitrates from 6 kb/s to 510 kb/s. The bitrate can 154 be changed dynamically within that range. All other parameters being 155 equal, higher bitrates result in higher quality. 157 3.1.1. Recommended Bitrate 159 For a frame size of 20 ms, these are the bitrate "sweet spots" for 160 Opus in various configurations: 162 o 8-12 kb/s for NB speech, 163 o 16-20 kb/s for WB speech, 164 o 28-40 kb/s for FB speech, 165 o 48-64 kb/s for FB mono music, and 166 o 64-128 kb/s for FB stereo music. 168 3.1.2. Variable versus Constant Bitrate 170 For the same average bitrate, variable bitrate (VBR) can achieve 171 higher quality than constant bitrate (CBR). For the majority of 172 voice transmission applications, VBR is the best choice. One reason 173 for choosing CBR is the potential information leak that _might_ occur 174 when encrypting the compressed stream. See [RFC6562] for guidelines 175 on when VBR is appropriate for encrypted audio communications. In 176 the case where an existing VBR stream needs to be converted to CBR 177 for security reasons, then the Opus padding mechanism described in 178 [RFC6716] is the RECOMMENDED way to achieve padding because the RTP 179 padding bit is unencrypted. 181 The bitrate can be adjusted at any point in time. To avoid 182 congestion, the average bitrate SHOULD NOT exceed the available 183 network capacity. If no target bitrate is specified, the bitrates 184 specified in Section 3.1.1 are RECOMMENDED. 186 3.1.3. Discontinuous Transmission (DTX) 188 The Opus codec can, as described in Section 3.1.2, be operated with a 189 variable bitrate. In that case, the encoder will automatically 190 reduce the bitrate for certain input signals, like periods of 191 silence. When using continuous transmission, it will reduce the 192 bitrate when the characteristics of the input signal permit, but will 193 never interrupt the transmission to the receiver. Therefore, the 194 received signal will maintain the same high level of quality over the 195 full duration of a transmission while minimizing the average bit rate 196 over time. 198 In cases where the bitrate of Opus needs to be reduced even further 199 or in cases where only constant bitrate is available, the Opus 200 encoder can use discontinuous transmission (DTX), where parts of the 201 encoded signal that correspond to periods of silence in the input 202 speech or audio signal are not transmitted to the receiver. A 203 receiver can distinguish between DTX and packet loss by looking for 204 gaps in the sequence number, as described by Section 4.1 205 of [RFC3551]. 207 On the receiving side, the non-transmitted parts will be handled by a 208 frame loss concealment unit in the Opus decoder which generates a 209 comfort noise signal to replace the non transmitted parts of the 210 speech or audio signal. Use of [RFC3389] Comfort Noise (CN) with 211 Opus is discouraged. The transmitter MUST drop whole frames only, 212 based on the size of the last transmitted frame, to ensure successive 213 RTP timestamps differ by a multiple of 120 and to allow the receiver 214 to use whole frames for concealment. 216 DTX can be used with both variable and constant bitrate. It will 217 have a slightly lower speech or audio quality than continuous 218 transmission. Therefore, using continuous transmission is 219 RECOMMENDED unless restraints on network capacity are severe. 221 3.2. Complexity 223 Complexity can be scaled to optimize for CPU resources in real-time, 224 mostly as a trade-off between audio quality and bitrate. Also, 225 different modes of Opus have different complexity. 227 3.3. Forward Error Correction (FEC) 229 The voice mode of Opus allows for embedding "in-band" forward error 230 correction (FEC) data into the Opus bit stream. This FEC scheme adds 231 redundant information about the previous packet (N-1) to the current 232 output packet N. For each frame, the encoder decides whether to use 233 FEC based on (1) an externally-provided estimate of the channel's 234 packet loss rate; (2) an externally-provided estimate of the 235 channel's capacity; (3) the sensitivity of the audio or speech signal 236 to packet loss; (4) whether the receiving decoder has indicated it 237 can take advantage of "in-band" FEC information. The decision to 238 send "in-band" FEC information is entirely controlled by the encoder 239 and therefore no special precautions for the payload have to be 240 taken. 242 On the receiving side, the decoder can take advantage of this 243 additional information when it loses a packet and the next packet is 244 available. In order to use the FEC data, the jitter buffer needs to 245 provide access to payloads with the FEC data. The receiver can then 246 configure its decoder to decode the FEC data from the packet rather 247 than the regular audio data. If no FEC data is available for the 248 current frame, the decoder will consider the frame lost and invoke 249 frame loss concealment. 251 If the FEC scheme is not implemented on the receiving side, FEC 252 SHOULD NOT be used, as it leads to an inefficient usage of network 253 resources. Decoder support for FEC SHOULD be indicated at the time a 254 session is set up. 256 3.4. Stereo Operation 258 Opus allows for transmission of stereo audio signals. This operation 259 is signaled in-band in the Opus payload and no special arrangement is 260 needed in the payload format. Any implementation of the Opus decoder 261 MUST be capable of receiving stereo signals, although it MAY decode 262 those signals as mono. 264 If a decoder can not take advantage of the benefits of a stereo 265 signal this SHOULD be indicated at the time a session is set up. In 266 that case the sending side SHOULD NOT send stereo signals as it leads 267 to an inefficient usage of network resources. 269 4. Opus RTP Payload Format 271 The payload format for Opus consists of the RTP header and Opus 272 payload data. 274 4.1. RTP Header Usage 276 The format of the RTP header is specified in [RFC3550]. The use of 277 the fields of the RTP header by the Opus payload format is consistent 278 with that specification. 280 The payload length of Opus is an integer number of octets and 281 therefore no padding is necessary. The payload MAY be padded by an 282 integer number of octets according to [RFC3550]. 284 The timestamp, sequence number, and marker bit (M) of the RTP header 285 are used in accordance with Section 4.1 of [RFC3551]. 287 The RTP payload type for Opus has not been assigned statically and is 288 expected to be assigned dynamically. 290 The receiving side MUST be prepared to receive duplicate RTP packets. 291 The receiver MUST provide at most one of those payloads to the Opus 292 decoder for decoding, and MUST discard the others. 294 Opus supports 5 different audio bandwidths, which can be adjusted 295 during a call. The RTP timestamp is incremented with a 48000 Hz 296 clock rate for all modes of Opus and all sampling rates. The unit 297 for the timestamp is samples per single (mono) channel. The RTP 298 timestamp corresponds to the sample time of the first encoded sample 299 in the encoded frame. For data encoded with sampling rates other 300 than 48000 Hz, the sampling rate has to be adjusted to 48000 Hz using 301 the corresponding multiplier in Table 2. 303 +--------------------+------------+ 304 | Sampling Rate (Hz) | Multiplier | 305 +--------------------+------------+ 306 | 8000 | 6 | 307 | | | 308 | 12000 | 4 | 309 | | | 310 | 16000 | 3 | 311 | | | 312 | 24000 | 2 | 313 | | | 314 | 48000 | 1 | 315 +--------------------+------------+ 317 Table 2: Timestamp multiplier 319 4.2. Payload Structure 321 The Opus encoder can output encoded frames representing 2.5, 5, 10, 322 20, 40, or 60 ms of speech or audio data. Further, an arbitrary 323 number of frames can be combined into a packet, up to a maximum 324 packet duration representing 120 ms of speech or audio data. The 325 grouping of one or more Opus frames into a single Opus packet is 326 defined in Section 3 of [RFC6716]. An RTP payload MUST contain 327 exactly one Opus packet as defined by that document. 329 Figure 1 shows the structure combined with the RTP header. 331 +----------+--------------+ 332 |RTP Header| Opus Payload | 333 +----------+--------------+ 335 Figure 1: Payload Structure with RTP header 337 Table 3 shows supported frame sizes in milliseconds of encoded speech 338 or audio data for the speech and audio modes (Mode) and sampling 339 rates (fs) of Opus and shows how the timestamp is incremented for 340 packetization (ts incr). If the Opus encoder outputs multiple 341 encoded frames into a single packet, the timestamp increment is the 342 sum of the increments for the individual frames. 344 +---------+-----------------+-----+-----+-----+-----+------+------+ 345 | Mode | fs | 2.5 | 5 | 10 | 20 | 40 | 60 | 346 +---------+-----------------+-----+-----+-----+-----+------+------+ 347 | ts incr | all | 120 | 240 | 480 | 960 | 1920 | 2880 | 348 | | | | | | | | | 349 | voice | NB/MB/WB/SWB/FB | | | x | x | x | x | 350 | | | | | | | | | 351 | audio | NB/WB/SWB/FB | x | x | x | x | | | 352 +---------+-----------------+-----+-----+-----+-----+------+------+ 354 Table 3: Supported Opus frame sizes and timestamp increments 356 5. Congestion Control 358 The target bitrate of Opus can be adjusted at any point in time, thus 359 allowing efficient congestion control. Furthermore, the amount of 360 encoded speech or audio data encoded in a single packet can be used 361 for congestion control, since the transmission rate is inversely 362 proportional to the packet duration. A lower packet transmission 363 rate reduces the amount of header overhead, but at the same time 364 increases latency and loss sensitivity, so it ought to be used with 365 care. 367 It is RECOMMENDED that senders of Opus encoded data apply congestion 368 control. 370 6. IANA Considerations 372 One media subtype (audio/opus) has been defined and registered as 373 described in the following section. 375 6.1. Opus Media Type Registration 377 Media type registration is done according to [RFC4288] and [RFC4855]. 379 Type name: audio 381 Subtype name: opus 383 Required parameters: 385 rate: the RTP timestamp is incremented with a 48000 Hz clock rate 386 for all modes of Opus and all sampling rates. For data encoded 387 with sampling rates other than 48000 Hz, the sampling rate has to 388 be adjusted to 48000 Hz using the corresponding multiplier in 389 Table 2. 391 Optional parameters: 393 maxplaybackrate: a hint about the maximum output sampling rate that 394 the receiver is capable of rendering in Hz. The decoder MUST be 395 capable of decoding any audio bandwidth but due to hardware 396 limitations only signals up to the specified sampling rate can be 397 played back. Sending signals with higher audio bandwidth results 398 in higher than necessary network usage and encoding complexity, so 399 an encoder SHOULD NOT encode frequencies above the audio bandwidth 400 specified by maxplaybackrate. This parameter can take any value 401 between 8000 and 48000, although commonly the value will match one 402 of the Opus bandwidths (Table 1). By default, the receiver is 403 assumed to have no limitations, i.e. 48000. 405 sprop-maxcapturerate: a hint about the maximum input sampling rate 406 that the sender is likely to produce. This is not a guarantee 407 that the sender will never send any higher bandwidth (e.g. it 408 could send a pre-recorded prompt that uses a higher bandwidth), 409 but it indicates to the receiver that frequencies above this 410 maximum can safely be discarded. This parameter is useful to 411 avoid wasting receiver resources by operating the audio processing 412 pipeline (e.g. echo cancellation) at a higher rate than necessary. 413 This parameter can take any value between 8000 and 48000, although 414 commonly the value will match one of the Opus bandwidths 415 (Table 1). By default, the sender is assumed to have no 416 limitations, i.e. 48000. 418 maxptime: the maximum duration of media represented by a packet 419 (according to Section 6 of [RFC4566]) that a decoder wants to 420 receive, in milliseconds rounded up to the next full integer 421 value. Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary 422 multiple of an Opus frame size rounded up to the next full integer 423 value, up to a maximum value of 120, as defined in Section 4. If 424 no value is specified, the default is 120. This value is a 425 recommendation by the decoding side to ensure the best performance 426 for the decoder. The decoder MUST be capable of accepting any 427 allowed packet sizes to ensure maximum compatibility. 429 ptime: the preferred duration of media represented by a packet 430 (according to Section 6 of [RFC4566]) that a decoder wants to 431 receive, in milliseconds rounded up to the next full integer 432 value. Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary 433 multiple of an Opus frame size rounded up to the next full integer 434 value, up to a maximum value of 120, as defined in Section 4. If 435 no value is specified, the default is 20. If ptime is greater 436 than maxptime, ptime MUST be ignored. This parameter MAY be 437 changed during a session. This value is a recommendation by the 438 decoding side to ensure the best performance for the decoder. The 439 decoder MUST be capable of accepting any allowed packet sizes to 440 ensure maximum compatibility. 442 minptime: the minimum duration of media represented by a packet 443 (according to Section 6 of [RFC4566]) that SHOULD be encapsulated 444 in a received packet, in milliseconds rounded up to the next full 445 integer value. Possible values are 3, 5, 10, 20, 40, and 60 or an 446 arbitrary multiple of Opus frame sizes rounded up to the next full 447 integer value up to a maximum value of 120 as defined in 448 Section 4. If no value is specified, the default is 3. This 449 value is a recommendation by the decoding side to ensure the best 450 performance for the decoder. The decoder MUST be capable to 451 accept any allowed packet sizes to ensure maximum compatibility. 453 maxaveragebitrate: specifies the maximum average receive bitrate of 454 a session in bits per second (b/s). The actual value of the 455 bitrate can vary, as it is dependent on the characteristics of the 456 media in a packet. Note that the maximum average bitrate MAY be 457 modified dynamically during a session. Any positive integer is 458 allowed, but values outside the range 6000 to 510000 SHOULD be 459 ignored. If no value is specified, the maximum value specified in 460 Section 3.1.1 for the corresponding mode of Opus and corresponding 461 maxplaybackrate is the default. 463 stereo: specifies whether the decoder prefers receiving stereo or 464 mono signals. Possible values are 1 and 0 where 1 specifies that 465 stereo signals are preferred, and 0 specifies that only mono 466 signals are preferred. Independent of the stereo parameter every 467 receiver MUST be able to receive and decode stereo signals but 468 sending stereo signals to a receiver that signaled a preference 469 for mono signals may result in higher than necessary network 470 utilization and encoding complexity. If no value is specified, 471 the default is 0 (mono). 473 sprop-stereo: specifies whether the sender is likely to produce 474 stereo audio. Possible values are 1 and 0, where 1 specifies that 475 stereo signals are likely to be sent, and 0 specifies that the 476 sender will likely only send mono. This is not a guarantee that 477 the sender will never send stereo audio (e.g. it could send a pre- 478 recorded prompt that uses stereo), but it indicates to the 479 receiver that the received signal can be safely downmixed to mono. 480 This parameter is useful to avoid wasting receiver resources by 481 operating the audio processing pipeline (e.g. echo cancellation) 482 in stereo when not necessary. If no value is specified, the 483 default is 0 (mono). 485 cbr: specifies if the decoder prefers the use of a constant bitrate 486 versus variable bitrate. Possible values are 1 and 0, where 1 487 specifies constant bitrate and 0 specifies variable bitrate. If 488 no value is specified, the default is 0 (vbr). When cbr is 1, the 489 maximum average bitrate can still change, e.g. to adapt to 490 changing network conditions. 492 useinbandfec: specifies that the decoder has the capability to take 493 advantage of the Opus in-band FEC. Possible values are 1 and 0. 494 Providing 0 when FEC cannot be used on the receiving side is 495 RECOMMENDED. If no value is specified, useinbandfec is assumed to 496 be 0. This parameter is only a preference and the receiver MUST 497 be able to process packets that include FEC information, even if 498 it means the FEC part is discarded. 500 usedtx: specifies if the decoder prefers the use of DTX. Possible 501 values are 1 and 0. If no value is specified, the default is 0. 503 Encoding considerations: 505 The Opus media type is framed and consists of binary data 506 according to Section 4.8 in [RFC4288]. 508 Security considerations: 510 See Section 7 of this document. 512 Interoperability considerations: none 514 Published specification: none 516 Applications that use this media type: 518 Any application that requires the transport of speech or audio 519 data can use this media type. Some examples are, but not limited 520 to, audio and video conferencing, Voice over IP, media streaming. 522 Person & email address to contact for further information: 524 SILK Support silksupport@skype.net 525 Jean-Marc Valin jmvalin@jmvalin.ca 527 Intended usage: COMMON 529 Restrictions on usage: 531 For transfer over RTP, the RTP payload format (Section 4 of this 532 document) SHALL be used. 534 Author: 536 Julian Spittka jspittka@gmail.com 538 Koen Vos koenvos74@gmail.com 540 Jean-Marc Valin jmvalin@jmvalin.ca 542 Change controller: TBD 544 6.2. Mapping to SDP Parameters 546 The information described in the media type specification has a 547 specific mapping to fields in the Session Description Protocol (SDP) 548 [RFC4566], which is commonly used to describe RTP sessions. When SDP 549 is used to specify sessions employing the Opus codec, the mapping is 550 as follows: 552 o The media type ("audio") goes in SDP "m=" as the media name. 554 o The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding 555 name. The RTP clock rate in "a=rtpmap" MUST be 48000 and the 556 number of channels MUST be 2. 557 o The OPTIONAL media type parameters "ptime" and "maxptime" are 558 mapped to "a=ptime" and "a=maxptime" attributes, respectively, in 559 the SDP. 560 o The OPTIONAL media type parameters "maxaveragebitrate", 561 "maxplaybackrate", "minptime", "stereo", "cbr", "useinbandfec", 562 and "usedtx", when present, MUST be included in the "a=fmtp" 563 attribute in the SDP, expressed as a media type string in the form 564 of a semicolon-separated list of parameter=value pairs (e.g., 565 maxaveragebitrate=20000). They MUST NOT be specified in an SSRC- 566 specific "fmtp" source-level attribute (as defined in Section 6.3 567 of [RFC5576]). 568 o The OPTIONAL media type parameters "sprop-maxcapturerate", and 569 "sprop-stereo" MAY be mapped to the "a=fmtp" SDP attribute by 570 copying them directly from the media type parameter string as part 571 of the semicolon-separated list of parameter=value pairs (e.g., 572 sprop-stereo=1). These same OPTIONAL media type parameters MAY 573 also be specified using an SSRC-specific "fmtp" source-level 574 attribute as described in Section 6.3 of [RFC5576]. They MAY be 575 specified in both places, in which case the parameter in the 576 source-level attribute overrides the one found on the "a=fmtp" 577 line. The value of any parameter which is not specified in a 578 source-level source attribute MUST be taken from the "a=fmtp" 579 line, if it is present there. 581 Below are some examples of SDP session descriptions for Opus: 583 Example 1: Standard mono session with 48000 Hz clock rate 585 m=audio 54312 RTP/AVP 101 586 a=rtpmap:101 opus/48000/2 588 Example 2: 16000 Hz clock rate, maximum packet size of 40 ms, 589 recommended packet size of 40 ms, maximum average bitrate of 20000 590 bps, prefers to receive stereo but only plans to send mono, FEC is 591 desired, DTX is not desired 592 m=audio 54312 RTP/AVP 101 593 a=rtpmap:101 opus/48000/2 594 a=fmtp:101 maxplaybackrate=16000; sprop-maxcapturerate=16000; 595 maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0 596 a=ptime:40 597 a=maxptime:40 599 Example 3: Two-way full-band stereo preferred 601 m=audio 54312 RTP/AVP 101 602 a=rtpmap:101 opus/48000/2 603 a=fmtp:101 stereo=1; sprop-stereo=1 605 6.2.1. Offer-Answer Model Considerations for Opus 607 When using the offer-answer procedure described in [RFC3264] to 608 negotiate the use of Opus, the following considerations apply: 610 o Opus supports several clock rates. For signaling purposes only 611 the highest, i.e. 48000, is used. The actual clock rate of the 612 corresponding media is signaled inside the payload and is not 613 restricted by this payload format description. The decoder MUST 614 be capable of decoding every received clock rate. An example is 615 shown below: 617 m=audio 54312 RTP/AVP 100 618 a=rtpmap:100 opus/48000/2 620 o The "ptime" and "maxptime" parameters are unidirectional receive- 621 only parameters and typically will not compromise 622 interoperability; however, some values might cause application 623 performance to suffer. [RFC3264] defines the SDP offer-answer 624 handling of the "ptime" parameter. The "maxptime" parameter MUST 625 be handled in the same way. 626 o The "minptime" parameter is a unidirectional receive-only 627 parameters and typically will not compromise interoperability; 628 however, some values might cause application performance to suffer 629 and ought to be used with care. 630 o The "maxplaybackrate" parameter is a unidirectional receive-only 631 parameter that reflects limitations of the local receiver. When 632 sending to a single destination, a sender MUST NOT use an audio 633 bandwidth higher than necessary to make full use of audio sampled 634 at a sampling rate of "maxplaybackrate". Gateways or senders that 635 are sending the same encoded audio to multiple destinations SHOULD 636 NOT use an audio bandwidth higher than necessary to represent 637 audio sampled at "maxplaybackrate", as this would lead to 638 inefficient use of network resources. The "maxplaybackrate" 639 parameter does not affect interoperability. Also, this parameter 640 SHOULD NOT be used to adjust the audio bandwidth as a function of 641 the bitrate, as this is the responsibility of the Opus encoder 642 implementation. 643 o The "maxaveragebitrate" parameter is a unidirectional receive-only 644 parameter that reflects limitations of the local receiver. The 645 sender of the other side MUST NOT send with an average bitrate 646 higher than "maxaveragebitrate" as it might overload the network 647 and/or receiver. The "maxaveragebitrate" parameter typically will 648 not compromise interoperability; however, some values might cause 649 application performance to suffer, and ought to be set with care. 650 o The "sprop-maxcapturerate" and "sprop-stereo" parameters are 651 unidirectional sender-only parameters that reflect limitations of 652 the sender side. They allow the receiver to set up a reduced- 653 complexity audio processing pipeline if the sender is not planning 654 to use the full range of Opus's capabilities. Neither "sprop- 655 maxcapturerate" nor "sprop-stereo" affect interoperability and the 656 receiver MUST be capable of receiving any signal. 657 o The "stereo" parameter is a unidirectional receive-only parameter. 658 When sending to a single destination, a sender MUST NOT use stereo 659 when "stereo" is 0. Gateways or senders that are sending the same 660 encoded audio to multiple destinations SHOULD NOT use stereo when 661 "stereo" is 0, as this would lead to inefficient use of network 662 resources. The "stereo" parameter does not affect 663 interoperability. 664 o The "cbr" parameter is a unidirectional receive-only parameter. 665 o The "useinbandfec" parameter is a unidirectional receive-only 666 parameter. 667 o The "usedtx" parameter is a unidirectional receive-only parameter. 668 o Any unknown parameter in an offer MUST be ignored by the receiver 669 and MUST be removed from the answer. 671 6.2.2. Declarative SDP Considerations for Opus 673 For declarative use of SDP such as in Session Announcement Protocol 674 (SAP), [RFC2974], and RTSP, [RFC2326], for Opus, the following needs 675 to be considered: 677 o The values for "maxptime", "ptime", "minptime", "maxplaybackrate", 678 and "maxaveragebitrate" ought to be selected carefully to ensure 679 that a reasonable performance can be achieved for the participants 680 of a session. 681 o The values for "maxptime", "ptime", and "minptime" of the payload 682 format configuration are recommendations by the decoding side to 683 ensure the best performance for the decoder. The decoder MUST be 684 capable of accepting any allowed packet sizes to ensure maximum 685 compatibility. 686 o All other parameters of the payload format configuration are 687 declarative and a participant MUST use the configurations that are 688 provided for the session. More than one configuration can be 689 provided if necessary by declaring multiple RTP payload types; 690 however, the number of types ought to be kept small. 692 7. Security Considerations 694 All RTP packets using the payload format defined in this 695 specification are subject to the general security considerations 696 discussed in the RTP specification [RFC3550] and any profile from, 697 e.g., [RFC3711] or [RFC3551]. 699 This payload format transports Opus encoded speech or audio data. 700 Hence, security issues include confidentiality, integrity protection, 701 and authentication of the speech or audio itself. The Opus payload 702 format does not have any built-in security mechanisms. Any suitable 703 external mechanisms, such as SRTP [RFC3711], MAY be used. 705 This payload format and the Opus encoding do not exhibit any 706 significant non-uniformity in the receiver-end computational load and 707 thus are unlikely to pose a denial-of-service threat due to the 708 receipt of pathological datagrams. 710 8. Acknowledgements 712 TBD 714 9. Normative References 716 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 717 Requirement Levels", BCP 14, RFC 2119, March 1997. 719 [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time 720 Streaming Protocol (RTSP)", RFC 2326, April 1998. 722 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 723 Announcement Protocol", RFC 2974, October 2000. 725 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 726 with Session Description Protocol (SDP)", RFC 3264, June 727 2002. 729 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 730 Comfort Noise (CN)", RFC 3389, September 2002. 732 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 733 Jacobson, "RTP: A Transport Protocol for Real-Time 734 Applications", STD 64, RFC 3550, July 2003. 736 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 737 Video Conferences with Minimal Control", STD 65, RFC 3551, 738 July 2003. 740 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 741 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 742 RFC 3711, March 2004. 744 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 745 Registration Procedures", RFC 4288, December 2005. 747 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 748 Description Protocol", RFC 4566, July 2006. 750 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 751 Formats", RFC 4855, February 2007. 753 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 754 Media Attributes in the Session Description Protocol 755 (SDP)", RFC 5576, June 2009. 757 [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of 758 Variable Bit Rate Audio with Secure RTP", RFC 6562, March 759 2012. 761 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 762 Opus Audio Codec", RFC 6716, September 2012. 764 Authors' Addresses 766 Julian Spittka 768 Email: jspittka@gmail.com 770 Koen Vos 771 vocTone 773 Email: koenvos74@gmail.com 774 Jean-Marc Valin 775 Mozilla 776 331 E. Evelyn Avenue 777 Mountain View, CA 94041 778 USA 780 Email: jmvalin@jmvalin.ca