idnits 2.17.1 draft-spittka-payload-rtp-opus-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 9, 2012) is 4309 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'Opus' is mentioned on line 694, but not defined ** Obsolete normative reference: RFC 2326 (Obsoleted by RFC 7826) ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Spittka 3 Internet-Draft K. Vos 4 Intended status: Informational Skype Technologies S.A. 5 Expires: January 10, 2013 JM. Valin 6 Mozilla 7 July 9, 2012 9 RTP Payload Format for Opus Speech and Audio Codec 10 draft-spittka-payload-rtp-opus-01.txt 12 Abstract 14 This document defines the Real-time Transport Protocol (RTP) payload 15 format for packetization of Opus encoded speech and audio data that 16 is essential to integrate the codec in the most compatible way. 17 Further, media type registrations are described for the RTP payload 18 format. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on January 10, 2013. 37 Copyright Notice 39 Copyright (c) 2012 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Conventions, Definitions and Acronyms used in this document . 4 56 2.1. Audio Bandwidth . . . . . . . . . . . . . . . . . . . . . 4 57 3. Opus Codec . . . . . . . . . . . . . . . . . . . . . . . . . . 5 58 3.1. Network Bandwidth . . . . . . . . . . . . . . . . . . . . 5 59 3.1.1. Recommended Bitrate . . . . . . . . . . . . . . . . . 5 60 3.1.2. Variable versus Constant Bit Rate . . . . . . . . . . 5 61 3.1.3. Discontinuous Transmission (DTX) . . . . . . . . . . . 6 62 3.2. Complexity . . . . . . . . . . . . . . . . . . . . . . . . 6 63 3.3. Forward Error Correction (FEC) . . . . . . . . . . . . . . 6 64 3.4. Stereo Operation . . . . . . . . . . . . . . . . . . . . . 7 65 4. Opus RTP Payload Format . . . . . . . . . . . . . . . . . . . 8 66 4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . . 8 67 4.2. Payload Structure . . . . . . . . . . . . . . . . . . . . 9 68 5. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 11 69 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 70 6.1. Opus Media Type Registration . . . . . . . . . . . . . . . 12 71 6.2. Mapping to SDP Parameters . . . . . . . . . . . . . . . . 15 72 6.2.1. Offer-Answer Model Considerations for Opus . . . . . . 16 73 6.2.2. Declarative SDP Considerations for Opus . . . . . . . 17 74 7. Security Considerations . . . . . . . . . . . . . . . . . . . 18 75 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 76 9. Normative References . . . . . . . . . . . . . . . . . . . . . 20 77 A. Informational References . . . . . . . . . . . . . . . . . . . 21 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 80 1. Introduction 82 The Opus codec is a speech and audio codec developed within the IETF 83 Internet Wideband Audio Codec working group [codec]. The codec has a 84 very low algorithmic delay and is is highly scalable in terms of 85 audio bandwidth, bitrate, and complexity. Further, it provides 86 different modes to efficiently encode speech signals as well as music 87 signals, thus, making it the codec of choice for various applications 88 using the Internet or similar networks. 90 This document defines the Real-time Transport Protocol (RTP) 91 [RFC3550] payload format for packetization of Opus encoded speech and 92 audio data that is essential to integrate the Opus codec in the most 93 compatible way. Further, media type registrations are described for 94 the RTP payload format. More information on the Opus codec can be 95 obtained from the following IETF draft [Opus]. 97 2. Conventions, Definitions and Acronyms used in this document 99 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 100 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 101 document are to be interpreted as described in [RFC2119]. 103 CPU: Central Processing Unit 104 IP: Internet Protocol 105 PSTN: Public Switched Telephone Network 106 samples: Speech or audio samples 107 SDP: Session Description Protocol 109 2.1. Audio Bandwidth 111 Throughout this document, we refer to the following definitions: 113 +--------------+----------------+-----------+----------+ 114 | Abbreviation | Name | Bandwidth | Sampling | 115 +--------------+----------------+-----------+----------+ 116 | nb | Narrowband | 0 - 4000 | 8000 | 117 | | | | | 118 | mb | Mediumband | 0 - 6000 | 12000 | 119 | | | | | 120 | wb | Wideband | 0 - 8000 | 16000 | 121 | | | | | 122 | swb | Super-wideband | 0 - 12000 | 24000 | 123 | | | | | 124 | fb | Fullband | 0 - 20000 | 48000 | 125 +--------------+----------------+-----------+----------+ 127 Audio bandwidth naming 129 Table 1 131 3. Opus Codec 133 The Opus [Opus] speech and audio codec has been developed to encode 134 speech signals as well as audio signals. Two different modes, a 135 voice mode or an audio mode, may be chosen to allow the most 136 efficient coding dependent on the type of input signal, the sampling 137 frequency of the input signal, and the specific application. 139 The voice mode allows to efficiently encode voice signals at lower 140 bit rates while the audio mode is optimized for audio signals at 141 medium and higher bitrates. 143 The Opus speech and audio codec is highly scalable in terms of audio 144 bandwidth and bitrate and complexity. Further, Opus allows to 145 transmit stereo signals. 147 3.1. Network Bandwidth 149 Opus supports all bitrates from 6 kb/s to 510 kb/s. The bitrate can 150 be changed dynamically within that range. All other parameters being 151 equal, higher bitrate results in higher quality. 153 3.1.1. Recommended Bitrate 155 For a frame size of 20 ms, these are the bitrate "sweet spots" for 156 Opus in various configurations: 157 o 8-12 kb/s for NB speech, 158 o 16-20 kb/s for WB speech, 159 o 28-40 kb/s for FB speech, 160 o 48-64 kb/s for FB mono music, and 161 o 64-128 kb/s for FB stereo music. 163 3.1.2. Variable versus Constant Bit Rate 165 For the same average bitrate, variable bitrate (VBR) can achieve 166 higher quality than constant bitrate (CBR). For the majority of 167 voice transmission application, VBR is the best choice. One 168 potential reason for choosing CBR is the potential information leak 169 that _may_ occur when encrypting the compressed stream. See 170 [RFC6562] for guidelines on when VBR is appropriate for encrypted 171 audio communications. In the case where an existing VBR stream needs 172 to be converted to CBR for security reasons, then the Opus padding 173 mechanism described in [Opus] is the RECOMMENDED way to achieve 174 padding because the RTP padding bit is unencrypted. 176 The bitrate can be adjusted at any point in time. To avoid 177 congestion, the average bitrate SHOULD be adjusted to the available 178 network capacity. If no target bitrate is specified the average 179 bitrate may go up to the highest bitrate specified in Section 3.1.1. 181 3.1.3. Discontinuous Transmission (DTX) 183 The Opus codec may, as described in Section 3.1.2, be operated with 184 an adaptive bitrate. In that case, the bitrate will automatically be 185 reduced for certain input signals like periods of silence. During 186 continuous transmission the bitrate will be reduced, when the input 187 signal allows to do so, but the transmission to the receiver itself 188 will never be interrupted. Therefore, the received signal will 189 maintain the same high level of quality over the full duration of a 190 transmission while minimizing the average bit rate over time. 192 In cases where the bitrate of Opus needs to be reduced even further 193 or in cases where only constant bitrate is available, the Opus 194 encoder may be set to use discontinuous transmission (DTX), where 195 parts of the encoded signal that correspond to periods of silence in 196 the input speech or audio signal are not transmitted to the receiver. 198 On the receiving side, the non-transmitted parts will be handled by a 199 frame loss concealment unit in the Opus decoder which generates a 200 comfort noise signal to replace the non transmitted parts of the 201 speech or audio signal. 203 The DTX mode of Opus will have a slightly lower speech or audio 204 quality than the continuous mode. Therefore, it is RECOMMENDED to 205 use Opus in the continuous mode unless restraints on network capacity 206 are severe. The DTX mode can be engaged for operation in both 207 adaptive or constant bitrate. 209 3.2. Complexity 211 Complexity can be scaled to optimize for CPU resources in real-time, 212 mostly as a trade-off between audio quality and bitrate. Also, 213 different modes of Opus have different complexity. 215 3.3. Forward Error Correction (FEC) 217 The voice mode of Opus allows for "in-band" forward error correction 218 (FEC) data to be embedded into the bit stream of Opus. This FEC 219 scheme adds redundant information about the previous packet (n-1) to 220 the current output packet n. For each frame, the encoder decides 221 whether to use FEC based on (1) an externally-provided estimate of 222 the channel's packet loss rate; (2) an externally-provided estimate 223 of the channel's capacity; (3) the sensitivity of the audio or speech 224 signal to packet loss; (4) whether the receiving decoder has 225 indicated it can take advantage of "in-band" FEC information. The 226 decision to send "in-band" FEC information is entirely controlled by 227 the encoder and therefore no special precautions for the payload have 228 to be taken. 230 On the receiving side, the decoder can take advantage of this 231 additional information when, in case of a packet loss, the next 232 packet is available. In order to use the FEC data, the jitter buffer 233 needs to provide access to payloads with the FEC data. The decoder 234 API function has a flag to indicate that a FEC frame rather than a 235 regular frame should be decoded. If no FEC data is available for the 236 current frame, the decoder will consider the frame lost and invokes 237 the frame loss concealment. 239 If the FEC scheme is not implemented on the receiving side, FEC 240 SHOULD NOT be used, as it leads to an inefficient usage of network 241 resources. Decoder support for FEC SHOULD be indicated at the time a 242 session is set up. 244 3.4. Stereo Operation 246 Opus allows for transmission of stereo audio signals. This operation 247 is signaled in-band in the Opus payload and no special arrangement is 248 required in the payload format. Any implementation of the Opus 249 decoder MUST be capable of receiving stereo signals. 251 If a decoder can not take advantage of the benefits of a stereo 252 signal this SHOULD be indicated at the time a session is set up. In 253 that case the sending side SHOULD NOT send stereo signals as it leads 254 to an inefficient usage of the network. 256 4. Opus RTP Payload Format 258 The payload format for Opus consists of the RTP header and Opus 259 payload data. 261 4.1. RTP Header Usage 263 The format of the RTP header is specified in [RFC3550]. The Opus 264 payload format uses the fields of the RTP header consistent with this 265 specification. 267 The payload length of Opus is a multiple number of octets and 268 therefore no padding is required. The payload MAY be padded by an 269 integer number of octets according to [RFC3550]. 271 The marker bit (M) of the RTP header has no function in combination 272 with Opus and MAY be ignored. 274 The RTP payload type for Opus has not been assigned statically and is 275 expected to be assigned dynamically. 277 The receiving side MUST be prepared to receive duplicates of RTP 278 packets. Only one of those payloads MUST be provided to the Opus 279 decoder for decoding and others MUST be discarded. 281 Opus supports 5 different audio bandwidths which may be adjusted 282 during the duration of a call. The RTP timestamp clock frequency is 283 defined as the highest supported sampling frequency of Opus, i.e. 284 48000 Hz, for all modes and sampling rates of Opus. The unit for the 285 timestamp is samples per single (mono) channel. The RTP timestamp 286 corresponds to the sample time of the first encoded sample in the 287 encoded frame. For sampling rates lower than 48000 Hz the number of 288 samples has to be multiplied with a multiplier according to Table 2 289 to determine the RTP timestamp. 291 +---------+------------+ 292 | fs (Hz) | Multiplier | 293 +---------+------------+ 294 | 8000 | 6 | 295 | | | 296 | 12000 | 4 | 297 | | | 298 | 16000 | 3 | 299 | | | 300 | 24000 | 2 | 301 | | | 302 | 48000 | 1 | 303 +---------+------------+ 305 fs specifies the audio sampling frequency in Hertz (Hz); Multiplier 306 is the value that the number of samples have to be multiplied with to 307 calculate the RTP timestamp. 309 Table 2 311 4.2. Payload Structure 313 The Opus encoder can be set to output encoded frames representing 314 2.5, 5, 10, 20, 40, or 60 ms of speech or audio data. Further, an 315 arbitrary number of frames can be combined into a packet. The 316 maximum packet length is limited to the amount of encoded data 317 representing 120 ms of speech or audio data. The packetization of 318 encoded data is purely done by the Opus encoder and therefore only 319 one packet output from the Opus encoder MUST be used as a payload. 321 Figure 1 shows the structure combined with the RTP header. 323 +----------+--------------+ 324 |RTP Header| Opus Payload | 325 +----------+--------------+ 327 Figure 1: Payload Structure with RTP header 329 Table 3 shows supported frame sizes for different modes and sampling 330 rates of Opus and how the timestamp needs to be incremented for 331 packetization. 333 +---------+-----------------+-----+-----+-----+-----+------+------+ 334 | Mode | fs | 2.5 | 5 | 10 | 20 | 40 | 60 | 335 +---------+-----------------+-----+-----+-----+-----+------+------+ 336 | ts incr | all | 120 | 240 | 480 | 960 | 1920 | 2880 | 337 | | | | | | | | | 338 | voice | nb/mb/wb/swb/fb | | | x | x | x | x | 339 | | | | | | | | | 340 | audio | nb/wb/swb/fb | x | x | x | x | | | 341 +---------+-----------------+-----+-----+-----+-----+------+------+ 343 Mode specifies the Opus mode of operation; fs specifies the audio 344 sampling frequency in Hertz (Hz); 2.5, 5, 10, 20, 40, and 60 345 represent the duration of encoded speech or audio data in a packet; 346 ts incr specifies the value the timestamp needs to be incremented for 347 the representing packet size. For multiple frames in a packet these 348 values have to be multiplied with the respective number of frames. 350 Table 3 352 5. Congestion Control 354 The adaptive nature of the Opus codec allows for an efficient 355 congestion control. 357 The target bitrate of Opus can be adjusted at any point in time and 358 thus allowing for an efficient congestion control. Furthermore, the 359 amount of encoded speech or audio data encoded in a single packet can 360 be used for congestion control since the transmission rate is 361 inversely proportional to these frame sizes. A lower packet 362 transmission rate reduces the amount of header overhead but at the 363 same time increases latency and error sensitivity and should be done 364 with care. 366 It is RECOMMENDED that congestion control is applied during the 367 transmission of Opus encoded data. 369 6. IANA Considerations 371 One media subtype (audio/opus) has been defined and registered as 372 described in the following section. 374 6.1. Opus Media Type Registration 376 Media type registration is done according to [RFC4288] and [RFC4855]. 378 Type name: audio 380 Subtype name: opus 382 Required parameters: 384 rate: RTP timestamp clock rate is incremented with 48000 Hz clock 385 rate for all modes of Opus and all sampling frequencies. For 386 audio sampling rates other than 48000 Hz the rate has to be 387 adjusted to 48000 Hz according to Table 2. 389 Optional parameters: 391 maxcodedaudiobandwidth: a hint about the maximum audio bandwidth 392 that the receiver is capable of rendering. The decoder MUST be 393 capable of decoding any audio bandwidth but due to hardware 394 limitations only signals up to the specified audio bandwidth can 395 be processed. Sending signals with higher audio bandwidth results 396 in higher than necessary network usage and encoding complexity, so 397 an encoder SHOULD NOT encode frequencies above the audio bandwidth 398 specified by maxcodedaudiobandwidth. Possible values are nb, mb, 399 wb, swb, fb. By default, the receiver is assumed to have no 400 limitations, i.e. fb. 402 maxptime: the decoder's maximum length of time in milliseconds 403 rounded up to the next full integer value represented by the media 404 in a packet that can be encapsulated in a received packet 405 according to Section 6 of [RFC4566]. Possible values are 3, 5, 406 10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes 407 rounded up to the next full integer value up to a maximum value of 408 120 as defined in Section 4. If no value is specified, 120 is 409 assumed as default. This value is a recommendation by the 410 decoding side to ensure the best performance for the decoder. The 411 decoder MUST be capable of accepting any allowed packet sizes to 412 ensure maximum compatibility. 414 ptime: the decoder's recommended length of time in milliseconds 415 rounded up to the next full integer value represented by the media 416 in a packet according to Section 6 of [RFC4566]. Possible values 417 are 3, 5, 10, 20, 40, or 60 or an arbitrary multiple of Opus frame 418 sizes rounded up to the next full integer value up to a maximum 419 value of 120 as defined in Section 4. If no value is specified, 420 20 is assumed as default. If ptime is greater than maxptime, 421 ptime MUST be ignored. This parameter MAY be changed during a 422 session. This value is a recommendation by the decoding side to 423 ensure the best performance for the decoder. The decoder MUST be 424 capable of accepting any allowed packet sizes to ensure maximum 425 compatibility. 427 minptime: the decoder's minimum length of time in milliseconds 428 rounded up to the next full integer value represented by the media 429 in a packet that SHOULD be encapsulated in a received packet 430 according to Section 6 of [RFC4566]. Possible values are 3, 5, 431 10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes 432 rounded up to the next full integer value up to a maximum value of 433 120 as defined in Section 4. If no value is specified, 3 is 434 assumed as default. This value is a recommendation by the 435 decoding side to ensure the best performance for the decoder. The 436 decoder MUST be capable to accept any allowed packet sizes to 437 ensure maximum compatibility. 439 maxaveragebitrate: specifies the maximum average receive bitrate of 440 a session in bits per second (b/s). The actual value of the 441 bitrate may vary as it is dependent on the characteristics of the 442 media in a packet. Note that the maximum average bitrate MAY be 443 modified dynamically during a session. Any positive integer is 444 allowed but values outside the range between 6000 and 510000 445 SHOULD be ignored. If no value is specified, the maximum value 446 specified in Section 3.1.1 for the corresponding mode of Opus and 447 corresponding maxcodedaudiobandwidth: will be the default. 449 stereo: specifies whether the decoder prefers receiving stereo or 450 mono signals. Possible values are 1 and 0 where 1 specifies that 451 stereo signals are preferred and 0 specifies that only mono 452 signals are preferred. Independent of the stereo parameter every 453 receiver MUST be able to receive and decode stereo signals but 454 sending stereo signals to a receiver that signaled a preference 455 for mono signals may result in higher than necessary network 456 utilisation and encoding complexity. If no value is specified, 457 mono is assumed (stereo=0). 459 cbr: specifies if the decoder prefers the use of a constant bitrate 460 versus variable bitrate. Possible values are 1 and 0 where 1 461 specifies constant bitrate and 0 specifies variable bitrate. If 462 no value is specified, cbr is assumed to be 0. Note that the 463 maximum average bitrate may still be changed, e.g. to adapt to 464 changing network conditions. 466 useinbandfec: specifies that Opus in-band FEC is supported by the 467 decoder and MAY be used during a session. Possible values are 1 468 and 0. It is RECOMMENDED to provide 0 in case FEC is not 469 implemented on the receiving side. If no value is specified, 470 useinbandfec is assumed to be 1. 472 usedtx: specifies if the decoder prefers the use of DTX. Possible 473 values are 1 and 0. If no value is specified, usedtx is assumed 474 to be 0. 476 Encoding considerations: 478 Opus media type is framed and consists of binary data according to 479 Section 4.8 in [RFC4288]. 481 Security considerations: 483 See Section 7 of this document. 485 Interoperability considerations: none 487 Published specification: none 489 Applications that use this media type: 491 Any application that requires the transport of speech or audio 492 data may use this media type. Some examples are, but not limited 493 to, audio and video conferencing, Voice over IP, media streaming. 495 Person & email address to contact for further information: 497 SILK Support silksupport@skype.net 498 Jean-Marc Valin jmvalin@jmvalin.ca 500 Intended usage: COMMON 501 Restrictions on usage: 503 For transfer over RTP, the RTP payload format (Section 4 of this 504 document) SHALL be used. 506 Author: 508 Julian Spittka julian.spittka@skype.net 510 Koen Vos koen.vos@skype.net 512 Jean-Marc Valin jmvalin@jmvalin.ca 514 Change controller: TBD 516 6.2. Mapping to SDP Parameters 518 The information described in the media type specification has a 519 specific mapping to fields in the Session Description Protocol (SDP) 520 [RFC4566], which is commonly used to describe RTP sessions. When SDP 521 is used to specify sessions employing the Opus codec, the mapping is 522 as follows: 524 o The media type ("audio") goes in SDP "m=" as the media name. 525 o The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding 526 name. The RTP clock rate in "a=rtpmap" MUST be mapped to the 527 required media type parameter "rate". 528 o The optional media type parameters "ptime" and "maxptime" are 529 mapped to "a=ptime" and "a=maxptime" attributes, respectively, in 530 the SDP. 531 o All remaining media type parameters are mapped to the "a=fmtp" 532 attribute in the SDP by copying them directly from the media type 533 parameter string as a semicolon-separated list of parameter=value 534 pairs (e.g. maxaveragebitrate=20000). 536 Below are some examples of SDP session descriptions for Opus: 538 Example 1: Standard session with 48000 Hz clock rate 540 m=audio 54312 RTP/AVP 101 541 a=rtpmap:101 opus/48000 543 Example 2: 16000 Hz clock rate, maximum packet size of 40 ms, 544 recommended packet size of 40 ms, maximum average bitrate of 20000 545 bps, stereo signals are preferred, FEC is allowed, DTX is not allowed 547 m=audio 54312 RTP/AVP 101 548 a=rtpmap:101 opus/48000 549 a=fmtp:101 maxcodedaudiobandwidth=wb; maxaveragebitrate=20000; 550 stereo=1; useinbandfec=1; usedtx=0 551 a=ptime:40 552 a=maxptime:40 554 6.2.1. Offer-Answer Model Considerations for Opus 556 When using the offer-answer procedure described in [RFC3264] to 557 negotiate the use of Opus, the following considerations apply: 559 o Opus supports several clock rates. For signaling purposes only 560 the highest, i.e. 48000, is used. The actual clock rate of the 561 corresponding media is signaled inside the payload and is not 562 subject to this payload format description. The decoder MUST be 563 capable to decode every received clock rate. An example is shown 564 below: 566 m=audio 54312 RTP/AVP 100 567 a=rtpmap:100 opus/48000 569 o The parameters "ptime" and "maxptime" are unidirectional receive- 570 only parameters and typically will not compromise 571 interoperability; however, dependent on the set values of the 572 parameters the performance of the application may suffer. 573 [RFC3264] defines the SDP offer-answer handling of the "ptime" 574 parameter. The "maxptime" parameter MUST be handled in the same 575 way. 576 o The parameter "minptime" is a unidirectional receive-only 577 parameters and typically will not compromise interoperability; 578 however, dependent on the set values of the parameter the 579 performance of the application may suffer and should be set with 580 care. 581 o The parameter "maxcodedaudiobandwidth" is a unidirectional 582 receive-only parameter that reflects limitations of the local 583 receiver. The sender of the other side SHOULD NOT send with an 584 audio bandwidth higher than "maxcodedaudiobandwidth" as this would 585 lead to inefficient use of network resources. The 586 "maxcodedaudiobandwidth" parameter does not affect 587 interoperability. Also, this parameter SHOULD NOT be used to 588 adjust the audio bandwidth as a function of the bitrates, as this 589 is the responsability of the Opus encoder implementation. 590 o The parameter "maxaveragebitrate" is a unidirectional receive-only 591 parameter that reflects limitations of the local receiver. The 592 sender of the other side MUST NOT send with an average bitrate 593 higher than "maxaveragebitrate" as it might overload the network 594 and/or receiver. The parameter "maxaveragebitrate" typically will 595 not compromise interoperability; however, dependent on the set 596 value of the parameter the performance of the application may 597 suffer and should be set with care. 598 o If the parameter "maxaveragebitrate" is below the range specified 599 in Section 3.1.1 the session MUST be rejected. 600 o The parameter "stereo" is a unidirectional receive-only parameter. 601 o The parameter "cbr" is a unidirectional receive-only parameter. 602 o The parameter "useinbandfec" is a unidirectional receive-only 603 parameter. 604 o The parameter "usedtx" is a unidirectional receive-only parameter. 605 o Any unknown parameter in an offer MUST be ignored by the receiver 606 and MUST be removed from the answer. 608 6.2.2. Declarative SDP Considerations for Opus 610 For declarative use of SDP such as in Session Announcement Protocol 611 (SAP), [RFC2974], and RTSP, [RFC2326], for Opus, the following needs 612 to be considered: 614 o The values for "maxptime", "ptime", "minptime", 615 "maxcodedaudiobandwidth", and "maxaveragebitrate" should be 616 selected carefully to ensure that a reasonable performance can be 617 achieved for the participants of a session. 618 o The values for "maxptime", "ptime", and "minptime" of the payload 619 format configuration are recommendations by the decoding side to 620 ensure the best performance for the decoder. The decoder MUST be 621 capable to accept any allowed packet sizes to ensure maximum 622 compatibility. 623 o All other parameters of the payload format configuration are 624 declarative and a participant MUST use the configurations that are 625 provided for the session. More than one configuration may be 626 provided if necessary by declaring multiple RTP payload types; 627 however, the number of types should be kept small. 629 7. Security Considerations 631 All RTP packets using the payload format defined in this 632 specification are subject to the general security considerations 633 discussed in the RTP specification [RFC3550] and any profile from 634 e.g. [RFC3711] or [RFC3551]. 636 This payload format transports Opus encoded speech or audio data, 637 hence, security issues include confidentiality, integrity protection, 638 and authentication of the speech or audio itself. The Opus payload 639 format does not have any built-in security mechanisms. Any suitable 640 external mechanisms, such as SRTP [RFC3711], MAY be used. 642 This payload format and the Opus encoding do not exhibit any 643 significant non-uniformity in the receiver-end computational load and 644 thus are unlikely to pose a denial-of-service threat due to the 645 receipt of pathological datagrams. 647 8. Acknowledgements 649 TBD 651 9. Normative References 653 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 654 Requirement Levels", BCP 14, RFC 2119, March 1997. 656 [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time 657 Streaming Protocol (RTSP)", RFC 2326, April 1998. 659 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 660 Announcement Protocol", RFC 2974, October 2000. 662 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 663 with Session Description Protocol (SDP)", RFC 3264, 664 June 2002. 666 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 667 Jacobson, "RTP: A Transport Protocol for Real-Time 668 Applications", STD 64, RFC 3550, July 2003. 670 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 671 Video Conferences with Minimal Control", STD 65, RFC 3551, 672 July 2003. 674 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 675 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 676 RFC 3711, March 2004. 678 [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and 679 Registration Procedures", BCP 13, RFC 4288, December 2005. 681 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 682 Description Protocol", RFC 4566, July 2006. 684 [RFC4855] Casner, S., "Media Type Registration of RTP Payload 685 Formats", RFC 4855, February 2007. 687 [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of 688 Variable Bit Rate Audio with Secure RTP", RFC 6562, 689 March 2012. 691 Appendix A. Informational References 693 [codec] http://datatracker.ietf.org/wg/codec/ 694 [Opus] http://datatracker.ietf.org/doc/draft-ietf-codec-opus/ 696 Authors' Addresses 698 Julian Spittka 699 Skype Technologies S.A. 700 3210 Porter Drive 701 Palo Alto, CA 94304 702 USA 704 Email: julian.spittka@skype.net 706 Koen Vos 707 Skype Technologies S.A. 708 3210 Porter Drive 709 Palo Alto, CA 94304 710 USA 712 Email: koen.vos@skype.net 714 Jean-Marc Valin 715 Mozilla 716 650 Castro Street 717 Mountain View, CA 94041 718 USA 720 Email: jmvalin@jmvalin.ca