idnits 2.17.1 

draft-spittka-payload-rtp-opus-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 9, 2012) is 4309 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'Opus' is mentioned on line 694, but not defined

  ** Obsolete normative reference: RFC 2326 (Obsoleted by RFC 7826)

  ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838)

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Spittka
3	Internet-Draft                                                    K. Vos
4	Intended status: Informational                   Skype Technologies S.A.
5	Expires: January 10, 2013                                      JM. Valin
6	                                                                 Mozilla
7	                                                            July 9, 2012

9	           RTP Payload Format for Opus Speech and Audio Codec
10	                 draft-spittka-payload-rtp-opus-01.txt

12	Abstract

14	   This document defines the Real-time Transport Protocol (RTP) payload
15	   format for packetization of Opus encoded speech and audio data that
16	   is essential to integrate the codec in the most compatible way.
17	   Further, media type registrations are described for the RTP payload
18	   format.

20	Status of this Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at http://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on January 10, 2013.

37	Copyright Notice

39	   Copyright (c) 2012 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (http://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of
49	   the Trust Legal Provisions and are provided without warranty as
50	   described in the Simplified BSD License.

52	Table of Contents

54	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	   2.  Conventions, Definitions and Acronyms used in this document  .  4
56	     2.1.  Audio Bandwidth  . . . . . . . . . . . . . . . . . . . . .  4
57	   3.  Opus Codec . . . . . . . . . . . . . . . . . . . . . . . . . .  5
58	     3.1.  Network Bandwidth  . . . . . . . . . . . . . . . . . . . .  5
59	       3.1.1.  Recommended Bitrate  . . . . . . . . . . . . . . . . .  5
60	       3.1.2.  Variable versus Constant Bit Rate  . . . . . . . . . .  5
61	       3.1.3.  Discontinuous Transmission (DTX) . . . . . . . . . . .  6
62	     3.2.  Complexity . . . . . . . . . . . . . . . . . . . . . . . .  6
63	     3.3.  Forward Error Correction (FEC) . . . . . . . . . . . . . .  6
64	     3.4.  Stereo Operation . . . . . . . . . . . . . . . . . . . . .  7
65	   4.  Opus RTP Payload Format  . . . . . . . . . . . . . . . . . . .  8
66	     4.1.  RTP Header Usage . . . . . . . . . . . . . . . . . . . . .  8
67	     4.2.  Payload Structure  . . . . . . . . . . . . . . . . . . . .  9
68	   5.  Congestion Control . . . . . . . . . . . . . . . . . . . . . . 11
69	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 12
70	     6.1.  Opus Media Type Registration . . . . . . . . . . . . . . . 12
71	     6.2.  Mapping to SDP Parameters  . . . . . . . . . . . . . . . . 15
72	       6.2.1.  Offer-Answer Model Considerations for Opus . . . . . . 16
73	       6.2.2.  Declarative SDP Considerations for Opus  . . . . . . . 17
74	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 18
75	   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
76	   9.  Normative References . . . . . . . . . . . . . . . . . . . . . 20
77	   A.  Informational References . . . . . . . . . . . . . . . . . . . 21
78	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22

80	1.  Introduction

82	   The Opus codec is a speech and audio codec developed within the IETF
83	   Internet Wideband Audio Codec working group [codec].  The codec has a
84	   very low algorithmic delay and is is highly scalable in terms of
85	   audio bandwidth, bitrate, and complexity.  Further, it provides
86	   different modes to efficiently encode speech signals as well as music
87	   signals, thus, making it the codec of choice for various applications
88	   using the Internet or similar networks.

90	   This document defines the Real-time Transport Protocol (RTP)
91	   [RFC3550] payload format for packetization of Opus encoded speech and
92	   audio data that is essential to integrate the Opus codec in the most
93	   compatible way.  Further, media type registrations are described for
94	   the RTP payload format.  More information on the Opus codec can be
95	   obtained from the following IETF draft [Opus].

97	2.  Conventions, Definitions and Acronyms used in this document

99	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
100	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
101	   document are to be interpreted as described in [RFC2119].

103	   CPU:  Central Processing Unit
104	   IP:  Internet Protocol
105	   PSTN:  Public Switched Telephone Network
106	   samples:  Speech or audio samples
107	   SDP:  Session Description Protocol

109	2.1.  Audio Bandwidth

111	   Throughout this document, we refer to the following definitions:

113	         +--------------+----------------+-----------+----------+
114	         | Abbreviation |      Name      | Bandwidth | Sampling |
115	         +--------------+----------------+-----------+----------+
116	         |      nb      |   Narrowband   |  0 - 4000 |   8000   |
117	         |              |                |           |          |
118	         |      mb      |   Mediumband   |  0 - 6000 |   12000  |
119	         |              |                |           |          |
120	         |      wb      |    Wideband    |  0 - 8000 |   16000  |
121	         |              |                |           |          |
122	         |      swb     | Super-wideband | 0 - 12000 |   24000  |
123	         |              |                |           |          |
124	         |      fb      |    Fullband    | 0 - 20000 |   48000  |
125	         +--------------+----------------+-----------+----------+

127	                          Audio bandwidth naming

129	                                  Table 1

131	3.  Opus Codec

133	   The Opus [Opus] speech and audio codec has been developed to encode
134	   speech signals as well as audio signals.  Two different modes, a
135	   voice mode or an audio mode, may be chosen to allow the most
136	   efficient coding dependent on the type of input signal, the sampling
137	   frequency of the input signal, and the specific application.

139	   The voice mode allows to efficiently encode voice signals at lower
140	   bit rates while the audio mode is optimized for audio signals at
141	   medium and higher bitrates.

143	   The Opus speech and audio codec is highly scalable in terms of audio
144	   bandwidth and bitrate and complexity.  Further, Opus allows to
145	   transmit stereo signals.

147	3.1.  Network Bandwidth

149	   Opus supports all bitrates from 6 kb/s to 510 kb/s.  The bitrate can
150	   be changed dynamically within that range.  All other parameters being
151	   equal, higher bitrate results in higher quality.

153	3.1.1.  Recommended Bitrate

155	   For a frame size of 20 ms, these are the bitrate "sweet spots" for
156	   Opus in various configurations:
157	   o  8-12 kb/s for NB speech,
158	   o  16-20 kb/s for WB speech,
159	   o  28-40 kb/s for FB speech,
160	   o  48-64 kb/s for FB mono music, and
161	   o  64-128 kb/s for FB stereo music.

163	3.1.2.  Variable versus Constant Bit Rate

165	   For the same average bitrate, variable bitrate (VBR) can achieve
166	   higher quality than constant bitrate (CBR).  For the majority of
167	   voice transmission application, VBR is the best choice.  One
168	   potential reason for choosing CBR is the potential information leak
169	   that _may_ occur when encrypting the compressed stream.  See
170	   [RFC6562] for guidelines on when VBR is appropriate for encrypted
171	   audio communications.  In the case where an existing VBR stream needs
172	   to be converted to CBR for security reasons, then the Opus padding
173	   mechanism described in [Opus] is the RECOMMENDED way to achieve
174	   padding because the RTP padding bit is unencrypted.

176	   The bitrate can be adjusted at any point in time.  To avoid
177	   congestion, the average bitrate SHOULD be adjusted to the available
178	   network capacity.  If no target bitrate is specified the average
179	   bitrate may go up to the highest bitrate specified in Section 3.1.1.

181	3.1.3.  Discontinuous Transmission (DTX)

183	   The Opus codec may, as described in Section 3.1.2, be operated with
184	   an adaptive bitrate.  In that case, the bitrate will automatically be
185	   reduced for certain input signals like periods of silence.  During
186	   continuous transmission the bitrate will be reduced, when the input
187	   signal allows to do so, but the transmission to the receiver itself
188	   will never be interrupted.  Therefore, the received signal will
189	   maintain the same high level of quality over the full duration of a
190	   transmission while minimizing the average bit rate over time.

192	   In cases where the bitrate of Opus needs to be reduced even further
193	   or in cases where only constant bitrate is available, the Opus
194	   encoder may be set to use discontinuous transmission (DTX), where
195	   parts of the encoded signal that correspond to periods of silence in
196	   the input speech or audio signal are not transmitted to the receiver.

198	   On the receiving side, the non-transmitted parts will be handled by a
199	   frame loss concealment unit in the Opus decoder which generates a
200	   comfort noise signal to replace the non transmitted parts of the
201	   speech or audio signal.

203	   The DTX mode of Opus will have a slightly lower speech or audio
204	   quality than the continuous mode.  Therefore, it is RECOMMENDED to
205	   use Opus in the continuous mode unless restraints on network capacity
206	   are severe.  The DTX mode can be engaged for operation in both
207	   adaptive or constant bitrate.

209	3.2.  Complexity

211	   Complexity can be scaled to optimize for CPU resources in real-time,
212	   mostly as a trade-off between audio quality and bitrate.  Also,
213	   different modes of Opus have different complexity.

215	3.3.  Forward Error Correction (FEC)

217	   The voice mode of Opus allows for "in-band" forward error correction
218	   (FEC) data to be embedded into the bit stream of Opus.  This FEC
219	   scheme adds redundant information about the previous packet (n-1) to
220	   the current output packet n.  For each frame, the encoder decides
221	   whether to use FEC based on (1) an externally-provided estimate of
222	   the channel's packet loss rate; (2) an externally-provided estimate
223	   of the channel's capacity; (3) the sensitivity of the audio or speech
224	   signal to packet loss; (4) whether the receiving decoder has
225	   indicated it can take advantage of "in-band" FEC information.  The
226	   decision to send "in-band" FEC information is entirely controlled by
227	   the encoder and therefore no special precautions for the payload have
228	   to be taken.

230	   On the receiving side, the decoder can take advantage of this
231	   additional information when, in case of a packet loss, the next
232	   packet is available.  In order to use the FEC data, the jitter buffer
233	   needs to provide access to payloads with the FEC data.  The decoder
234	   API function has a flag to indicate that a FEC frame rather than a
235	   regular frame should be decoded.  If no FEC data is available for the
236	   current frame, the decoder will consider the frame lost and invokes
237	   the frame loss concealment.

239	   If the FEC scheme is not implemented on the receiving side, FEC
240	   SHOULD NOT be used, as it leads to an inefficient usage of network
241	   resources.  Decoder support for FEC SHOULD be indicated at the time a
242	   session is set up.

244	3.4.  Stereo Operation

246	   Opus allows for transmission of stereo audio signals.  This operation
247	   is signaled in-band in the Opus payload and no special arrangement is
248	   required in the payload format.  Any implementation of the Opus
249	   decoder MUST be capable of receiving stereo signals.

251	   If a decoder can not take advantage of the benefits of a stereo
252	   signal this SHOULD be indicated at the time a session is set up.  In
253	   that case the sending side SHOULD NOT send stereo signals as it leads
254	   to an inefficient usage of the network.

256	4.  Opus RTP Payload Format

258	   The payload format for Opus consists of the RTP header and Opus
259	   payload data.

261	4.1.  RTP Header Usage

263	   The format of the RTP header is specified in [RFC3550].  The Opus
264	   payload format uses the fields of the RTP header consistent with this
265	   specification.

267	   The payload length of Opus is a multiple number of octets and
268	   therefore no padding is required.  The payload MAY be padded by an
269	   integer number of octets according to [RFC3550].

271	   The marker bit (M) of the RTP header has no function in combination
272	   with Opus and MAY be ignored.

274	   The RTP payload type for Opus has not been assigned statically and is
275	   expected to be assigned dynamically.

277	   The receiving side MUST be prepared to receive duplicates of RTP
278	   packets.  Only one of those payloads MUST be provided to the Opus
279	   decoder for decoding and others MUST be discarded.

281	   Opus supports 5 different audio bandwidths which may be adjusted
282	   during the duration of a call.  The RTP timestamp clock frequency is
283	   defined as the highest supported sampling frequency of Opus, i.e.
284	   48000 Hz, for all modes and sampling rates of Opus.  The unit for the
285	   timestamp is samples per single (mono) channel.  The RTP timestamp
286	   corresponds to the sample time of the first encoded sample in the
287	   encoded frame.  For sampling rates lower than 48000 Hz the number of
288	   samples has to be multiplied with a multiplier according to Table 2
289	   to determine the RTP timestamp.

291	                         +---------+------------+
292	                         | fs (Hz) | Multiplier |
293	                         +---------+------------+
294	                         |   8000  |      6     |
295	                         |         |            |
296	                         |  12000  |      4     |
297	                         |         |            |
298	                         |  16000  |      3     |
299	                         |         |            |
300	                         |  24000  |      2     |
301	                         |         |            |
302	                         |  48000  |      1     |
303	                         +---------+------------+

305	    fs specifies the audio sampling frequency in Hertz (Hz); Multiplier
306	   is the value that the number of samples have to be multiplied with to
307	                       calculate the RTP timestamp.

309	                                  Table 2

311	4.2.  Payload Structure

313	   The Opus encoder can be set to output encoded frames representing
314	   2.5, 5, 10, 20, 40, or 60 ms of speech or audio data.  Further, an
315	   arbitrary number of frames can be combined into a packet.  The
316	   maximum packet length is limited to the amount of encoded data
317	   representing 120 ms of speech or audio data.  The packetization of
318	   encoded data is purely done by the Opus encoder and therefore only
319	   one packet output from the Opus encoder MUST be used as a payload.

321	   Figure 1 shows the structure combined with the RTP header.

323	   +----------+--------------+
324	   |RTP Header| Opus Payload |
325	   +----------+--------------+

327	                Figure 1: Payload Structure with RTP header

329	   Table 3 shows supported frame sizes for different modes and sampling
330	   rates of Opus and how the timestamp needs to be incremented for
331	   packetization.

333	    +---------+-----------------+-----+-----+-----+-----+------+------+
334	    |   Mode  |        fs       | 2.5 |  5  |  10 |  20 |  40  |  60  |
335	    +---------+-----------------+-----+-----+-----+-----+------+------+
336	    | ts incr |       all       | 120 | 240 | 480 | 960 | 1920 | 2880 |
337	    |         |                 |     |     |     |     |      |      |
338	    |  voice  | nb/mb/wb/swb/fb |     |     |  x  |  x  |   x  |   x  |
339	    |         |                 |     |     |     |     |      |      |
340	    |  audio  |   nb/wb/swb/fb  |  x  |  x  |  x  |  x  |      |      |
341	    +---------+-----------------+-----+-----+-----+-----+------+------+

343	     Mode specifies the Opus mode of operation; fs specifies the audio
344	       sampling frequency in Hertz (Hz); 2.5, 5, 10, 20, 40, and 60
345	    represent the duration of encoded speech or audio data in a packet;
346	   ts incr specifies the value the timestamp needs to be incremented for
347	   the representing packet size.  For multiple frames in a packet these
348	    values have to be multiplied with the respective number of frames.

350	                                  Table 3

352	5.  Congestion Control

354	   The adaptive nature of the Opus codec allows for an efficient
355	   congestion control.

357	   The target bitrate of Opus can be adjusted at any point in time and
358	   thus allowing for an efficient congestion control.  Furthermore, the
359	   amount of encoded speech or audio data encoded in a single packet can
360	   be used for congestion control since the transmission rate is
361	   inversely proportional to these frame sizes.  A lower packet
362	   transmission rate reduces the amount of header overhead but at the
363	   same time increases latency and error sensitivity and should be done
364	   with care.

366	   It is RECOMMENDED that congestion control is applied during the
367	   transmission of Opus encoded data.

369	6.  IANA Considerations

371	   One media subtype (audio/opus) has been defined and registered as
372	   described in the following section.

374	6.1.  Opus Media Type Registration

376	   Media type registration is done according to [RFC4288] and [RFC4855].

378	   Type name: audio

380	   Subtype name: opus

382	   Required parameters:

384	   rate:  RTP timestamp clock rate is incremented with 48000 Hz clock
385	      rate for all modes of Opus and all sampling frequencies.  For
386	      audio sampling rates other than 48000 Hz the rate has to be
387	      adjusted to 48000 Hz according to Table 2.

389	   Optional parameters:

391	   maxcodedaudiobandwidth:  a hint about the maximum audio bandwidth
392	      that the receiver is capable of rendering.  The decoder MUST be
393	      capable of decoding any audio bandwidth but due to hardware
394	      limitations only signals up to the specified audio bandwidth can
395	      be processed.  Sending signals with higher audio bandwidth results
396	      in higher than necessary network usage and encoding complexity, so
397	      an encoder SHOULD NOT encode frequencies above the audio bandwidth
398	      specified by maxcodedaudiobandwidth.  Possible values are nb, mb,
399	      wb, swb, fb.  By default, the receiver is assumed to have no
400	      limitations, i.e. fb.

402	   maxptime:  the decoder's maximum length of time in milliseconds
403	      rounded up to the next full integer value represented by the media
404	      in a packet that can be encapsulated in a received packet
405	      according to Section 6 of [RFC4566].  Possible values are 3, 5,
406	      10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes
407	      rounded up to the next full integer value up to a maximum value of
408	      120 as defined in Section 4.  If no value is specified, 120 is
409	      assumed as default.  This value is a recommendation by the
410	      decoding side to ensure the best performance for the decoder.  The
411	      decoder MUST be capable of accepting any allowed packet sizes to
412	      ensure maximum compatibility.

414	   ptime:  the decoder's recommended length of time in milliseconds
415	      rounded up to the next full integer value represented by the media
416	      in a packet according to Section 6 of [RFC4566].  Possible values
417	      are 3, 5, 10, 20, 40, or 60 or an arbitrary multiple of Opus frame
418	      sizes rounded up to the next full integer value up to a maximum
419	      value of 120 as defined in Section 4.  If no value is specified,
420	      20 is assumed as default.  If ptime is greater than maxptime,
421	      ptime MUST be ignored.  This parameter MAY be changed during a
422	      session.  This value is a recommendation by the decoding side to
423	      ensure the best performance for the decoder.  The decoder MUST be
424	      capable of accepting any allowed packet sizes to ensure maximum
425	      compatibility.

427	   minptime:  the decoder's minimum length of time in milliseconds
428	      rounded up to the next full integer value represented by the media
429	      in a packet that SHOULD be encapsulated in a received packet
430	      according to Section 6 of [RFC4566].  Possible values are 3, 5,
431	      10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes
432	      rounded up to the next full integer value up to a maximum value of
433	      120 as defined in Section 4.  If no value is specified, 3 is
434	      assumed as default.  This value is a recommendation by the
435	      decoding side to ensure the best performance for the decoder.  The
436	      decoder MUST be capable to accept any allowed packet sizes to
437	      ensure maximum compatibility.

439	   maxaveragebitrate:  specifies the maximum average receive bitrate of
440	      a session in bits per second (b/s).  The actual value of the
441	      bitrate may vary as it is dependent on the characteristics of the
442	      media in a packet.  Note that the maximum average bitrate MAY be
443	      modified dynamically during a session.  Any positive integer is
444	      allowed but values outside the range between 6000 and 510000
445	      SHOULD be ignored.  If no value is specified, the maximum value
446	      specified in Section 3.1.1 for the corresponding mode of Opus and
447	      corresponding maxcodedaudiobandwidth: will be the default.

449	   stereo:  specifies whether the decoder prefers receiving stereo or
450	      mono signals.  Possible values are 1 and 0 where 1 specifies that
451	      stereo signals are preferred and 0 specifies that only mono
452	      signals are preferred.  Independent of the stereo parameter every
453	      receiver MUST be able to receive and decode stereo signals but
454	      sending stereo signals to a receiver that signaled a preference
455	      for mono signals may result in higher than necessary network
456	      utilisation and encoding complexity.  If no value is specified,
457	      mono is assumed (stereo=0).

459	   cbr:  specifies if the decoder prefers the use of a constant bitrate
460	      versus variable bitrate.  Possible values are 1 and 0 where 1
461	      specifies constant bitrate and 0 specifies variable bitrate.  If
462	      no value is specified, cbr is assumed to be 0.  Note that the
463	      maximum average bitrate may still be changed, e.g. to adapt to
464	      changing network conditions.

466	   useinbandfec:  specifies that Opus in-band FEC is supported by the
467	      decoder and MAY be used during a session.  Possible values are 1
468	      and 0.  It is RECOMMENDED to provide 0 in case FEC is not
469	      implemented on the receiving side.  If no value is specified,
470	      useinbandfec is assumed to be 1.

472	   usedtx:  specifies if the decoder prefers the use of DTX.  Possible
473	      values are 1 and 0.  If no value is specified, usedtx is assumed
474	      to be 0.

476	   Encoding considerations:

478	      Opus media type is framed and consists of binary data according to
479	      Section 4.8 in [RFC4288].

481	   Security considerations:

483	      See Section 7 of this document.

485	   Interoperability considerations: none

487	   Published specification: none

489	   Applications that use this media type:

491	      Any application that requires the transport of speech or audio
492	      data may use this media type.  Some examples are, but not limited
493	      to, audio and video conferencing, Voice over IP, media streaming.

495	   Person & email address to contact for further information:

497	      SILK Support silksupport@skype.net
498	      Jean-Marc Valin jmvalin@jmvalin.ca

500	   Intended usage: COMMON
501	   Restrictions on usage:

503	      For transfer over RTP, the RTP payload format (Section 4 of this
504	      document) SHALL be used.

506	   Author:

508	      Julian Spittka julian.spittka@skype.net

510	      Koen Vos koen.vos@skype.net

512	      Jean-Marc Valin jmvalin@jmvalin.ca

514	   Change controller: TBD

516	6.2.  Mapping to SDP Parameters

518	   The information described in the media type specification has a
519	   specific mapping to fields in the Session Description Protocol (SDP)
520	   [RFC4566], which is commonly used to describe RTP sessions.  When SDP
521	   is used to specify sessions employing the Opus codec, the mapping is
522	   as follows:

524	   o  The media type ("audio") goes in SDP "m=" as the media name.
525	   o  The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding
526	      name.  The RTP clock rate in "a=rtpmap" MUST be mapped to the
527	      required media type parameter "rate".
528	   o  The optional media type parameters "ptime" and "maxptime" are
529	      mapped to "a=ptime" and "a=maxptime" attributes, respectively, in
530	      the SDP.
531	   o  All remaining media type parameters are mapped to the "a=fmtp"
532	      attribute in the SDP by copying them directly from the media type
533	      parameter string as a semicolon-separated list of parameter=value
534	      pairs (e.g. maxaveragebitrate=20000).

536	   Below are some examples of SDP session descriptions for Opus:

538	   Example 1: Standard session with 48000 Hz clock rate

540	       m=audio 54312 RTP/AVP 101
541	       a=rtpmap:101 opus/48000

543	   Example 2: 16000 Hz clock rate, maximum packet size of 40 ms,
544	   recommended packet size of 40 ms, maximum average bitrate of 20000
545	   bps, stereo signals are preferred, FEC is allowed, DTX is not allowed

547	       m=audio 54312 RTP/AVP 101
548	       a=rtpmap:101 opus/48000
549	       a=fmtp:101 maxcodedaudiobandwidth=wb; maxaveragebitrate=20000;
550	       stereo=1; useinbandfec=1; usedtx=0
551	       a=ptime:40
552	       a=maxptime:40

554	6.2.1.  Offer-Answer Model Considerations for Opus

556	   When using the offer-answer procedure described in [RFC3264] to
557	   negotiate the use of Opus, the following considerations apply:

559	   o  Opus supports several clock rates.  For signaling purposes only
560	      the highest, i.e. 48000, is used.  The actual clock rate of the
561	      corresponding media is signaled inside the payload and is not
562	      subject to this payload format description.  The decoder MUST be
563	      capable to decode every received clock rate.  An example is shown
564	      below:

566	           m=audio 54312 RTP/AVP 100
567	           a=rtpmap:100 opus/48000

569	   o  The parameters "ptime" and "maxptime" are unidirectional receive-
570	      only parameters and typically will not compromise
571	      interoperability; however, dependent on the set values of the
572	      parameters the performance of the application may suffer.
573	      [RFC3264] defines the SDP offer-answer handling of the "ptime"
574	      parameter.  The "maxptime" parameter MUST be handled in the same
575	      way.
576	   o  The parameter "minptime" is a unidirectional receive-only
577	      parameters and typically will not compromise interoperability;
578	      however, dependent on the set values of the parameter the
579	      performance of the application may suffer and should be set with
580	      care.
581	   o  The parameter "maxcodedaudiobandwidth" is a unidirectional
582	      receive-only parameter that reflects limitations of the local
583	      receiver.  The sender of the other side SHOULD NOT send with an
584	      audio bandwidth higher than "maxcodedaudiobandwidth" as this would
585	      lead to inefficient use of network resources.  The
586	      "maxcodedaudiobandwidth" parameter does not affect
587	      interoperability.  Also, this parameter SHOULD NOT be used to
588	      adjust the audio bandwidth as a function of the bitrates, as this
589	      is the responsability of the Opus encoder implementation.
590	   o  The parameter "maxaveragebitrate" is a unidirectional receive-only
591	      parameter that reflects limitations of the local receiver.  The
592	      sender of the other side MUST NOT send with an average bitrate
593	      higher than "maxaveragebitrate" as it might overload the network
594	      and/or receiver.  The parameter "maxaveragebitrate" typically will
595	      not compromise interoperability; however, dependent on the set
596	      value of the parameter the performance of the application may
597	      suffer and should be set with care.
598	   o  If the parameter "maxaveragebitrate" is below the range specified
599	      in Section 3.1.1 the session MUST be rejected.
600	   o  The parameter "stereo" is a unidirectional receive-only parameter.
601	   o  The parameter "cbr" is a unidirectional receive-only parameter.
602	   o  The parameter "useinbandfec" is a unidirectional receive-only
603	      parameter.
604	   o  The parameter "usedtx" is a unidirectional receive-only parameter.
605	   o  Any unknown parameter in an offer MUST be ignored by the receiver
606	      and MUST be removed from the answer.

608	6.2.2.  Declarative SDP Considerations for Opus

610	   For declarative use of SDP such as in Session Announcement Protocol
611	   (SAP), [RFC2974], and RTSP, [RFC2326], for Opus, the following needs
612	   to be considered:

614	   o  The values for "maxptime", "ptime", "minptime",
615	      "maxcodedaudiobandwidth", and "maxaveragebitrate" should be
616	      selected carefully to ensure that a reasonable performance can be
617	      achieved for the participants of a session.
618	   o  The values for "maxptime", "ptime", and "minptime" of the payload
619	      format configuration are recommendations by the decoding side to
620	      ensure the best performance for the decoder.  The decoder MUST be
621	      capable to accept any allowed packet sizes to ensure maximum
622	      compatibility.
623	   o  All other parameters of the payload format configuration are
624	      declarative and a participant MUST use the configurations that are
625	      provided for the session.  More than one configuration may be
626	      provided if necessary by declaring multiple RTP payload types;
627	      however, the number of types should be kept small.

629	7.  Security Considerations

631	   All RTP packets using the payload format defined in this
632	   specification are subject to the general security considerations
633	   discussed in the RTP specification [RFC3550] and any profile from
634	   e.g.  [RFC3711] or [RFC3551].

636	   This payload format transports Opus encoded speech or audio data,
637	   hence, security issues include confidentiality, integrity protection,
638	   and authentication of the speech or audio itself.  The Opus payload
639	   format does not have any built-in security mechanisms.  Any suitable
640	   external mechanisms, such as SRTP [RFC3711], MAY be used.

642	   This payload format and the Opus encoding do not exhibit any
643	   significant non-uniformity in the receiver-end computational load and
644	   thus are unlikely to pose a denial-of-service threat due to the
645	   receipt of pathological datagrams.

647	8.  Acknowledgements

649	   TBD

651	9.  Normative References

653	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
654	              Requirement Levels", BCP 14, RFC 2119, March 1997.

656	   [RFC2326]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
657	              Streaming Protocol (RTSP)", RFC 2326, April 1998.

659	   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session
660	              Announcement Protocol", RFC 2974, October 2000.

662	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
663	              with Session Description Protocol (SDP)", RFC 3264,
664	              June 2002.

666	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
667	              Jacobson, "RTP: A Transport Protocol for Real-Time
668	              Applications", STD 64, RFC 3550, July 2003.

670	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
671	              Video Conferences with Minimal Control", STD 65, RFC 3551,
672	              July 2003.

674	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
675	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
676	              RFC 3711, March 2004.

678	   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
679	              Registration Procedures", BCP 13, RFC 4288, December 2005.

681	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
682	              Description Protocol", RFC 4566, July 2006.

684	   [RFC4855]  Casner, S., "Media Type Registration of RTP Payload
685	              Formats", RFC 4855, February 2007.

687	   [RFC6562]  Perkins, C. and JM. Valin, "Guidelines for the Use of
688	              Variable Bit Rate Audio with Secure RTP", RFC 6562,
689	              March 2012.

691	Appendix A.  Informational References

693	      [codec] http://datatracker.ietf.org/wg/codec/
694	      [Opus] http://datatracker.ietf.org/doc/draft-ietf-codec-opus/

696	Authors' Addresses

698	   Julian Spittka
699	   Skype Technologies S.A.
700	   3210 Porter Drive
701	   Palo Alto, CA  94304
702	   USA

704	   Email: julian.spittka@skype.net

706	   Koen Vos
707	   Skype Technologies S.A.
708	   3210 Porter Drive
709	   Palo Alto, CA  94304
710	   USA

712	   Email: koen.vos@skype.net

714	   Jean-Marc Valin
715	   Mozilla
716	   650 Castro Street
717	   Mountain View, CA  94041
718	   USA

720	   Email: jmvalin@jmvalin.ca