idnits 2.17.1 

draft-spittka-payload-rtp-opus-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 4, 2011) is 4679 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'Opus' is mentioned on line 837, but not defined

  == Missing Reference: 'SILK' is mentioned on line 835, but not defined

  == Missing Reference: 'CELT' is mentioned on line 836, but not defined

  ** Obsolete normative reference: RFC 2326 (Obsoleted by RFC 7826)

  ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838)

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)


     Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Spittka
3	Internet-Draft                                                    K. Vos
4	Intended status: Informational                   Skype Technologies S.A.
5	Expires: January 5, 2012                                       JM. Valin
6	                                                            Octasic Inc.
7	                                                            July 4, 2011

9	  RTP Payload Format and File Storage Format for Opus Speech and Audio
10	                                 Codec
11	                 draft-spittka-payload-rtp-opus-00

13	Abstract

15	   This document defines the Real-time Transport Protocol (RTP) payload
16	   format and file storage format for packetization of Opus encoded
17	   speech and audio data that is essential to integrate the codec in the
18	   most compatible way.  Further, media type registrations are described
19	   for the RTP payload format and the file storage format.

21	Status of this Memo

23	   This Internet-Draft is submitted to IETF in full conformance with the
24	   provisions of BCP 78 and BCP 79.

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF), its areas, and its working groups.  Note that
28	   other groups may also distribute working documents as Internet-
29	   Drafts.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   The list of current Internet-Drafts can be accessed at
37	   http://www.ietf.org/ietf/1id-abstracts.txt.

39	   The list of Internet-Draft Shadow Directories can be accessed at
40	   http://www.ietf.org/shadow.html.

42	   This Internet-Draft will expire on January 5, 2012.

44	Copyright Notice

46	   Copyright (c) 2011 IETF Trust and the persons identified as the
47	   document authors.  All rights reserved.

49	   This document is subject to BCP 78 and the IETF Trust's Legal
50	   Provisions Relating to IETF Documents
51	   (http://trustee.ietf.org/license-info) in effect on the date of
52	   publication of this document.  Please review these documents
53	   carefully, as they describe your rights and restrictions with respect
54	   to this document.  Code Components extracted from this document must
55	   include Simplified BSD License text as described in Section 4.e of
56	   the Trust Legal Provisions and are provided without warranty as
57	   described in the BSD License.

59	Table of Contents

61	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
62	   2.  Conventions, Definitions and Acronyms used in this document  .  4
63	   3.  Opus Codec . . . . . . . . . . . . . . . . . . . . . . . . . .  5
64	     3.1.  Modes  . . . . . . . . . . . . . . . . . . . . . . . . . .  5
65	       3.1.1.  Audio Mode . . . . . . . . . . . . . . . . . . . . . .  5
66	       3.1.2.  Audio Mode . . . . . . . . . . . . . . . . . . . . . .  6
67	     3.2.  Network Bandwidth  . . . . . . . . . . . . . . . . . . . .  6
68	       3.2.1.  Variable versus Constant Bit Rate  . . . . . . . . . .  6
69	       3.2.2.  Discontinuous Transmission (DTX) . . . . . . . . . . .  7
70	     3.3.  Complexity . . . . . . . . . . . . . . . . . . . . . . . .  7
71	     3.4.  Forward Error Correction (FEC) . . . . . . . . . . . . . .  7
72	     3.5.  Stereo Operation . . . . . . . . . . . . . . . . . . . . .  8
73	   4.  Opus RTP Payload Format  . . . . . . . . . . . . . . . . . . .  9
74	     4.1.  RTP Header Usage . . . . . . . . . . . . . . . . . . . . .  9
75	     4.2.  Payload Structure  . . . . . . . . . . . . . . . . . . . . 10
76	   5.  Opus Storage Format  . . . . . . . . . . . . . . . . . . . . . 12
77	     5.1.  Storage Header Structure . . . . . . . . . . . . . . . . . 12
78	     5.2.  Storage Block Structure  . . . . . . . . . . . . . . . . . 12
79	   6.  Congestion Control . . . . . . . . . . . . . . . . . . . . . . 14
80	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 15
81	     7.1.  Opus Media Type Registration . . . . . . . . . . . . . . . 15
82	     7.2.  Mapping to SDP Parameters  . . . . . . . . . . . . . . . . 18
83	       7.2.1.  Offer-Answer Model Considerations for Opus . . . . . . 19
84	       7.2.2.  Declarative SDP Considerations for Opus  . . . . . . . 20
85	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 22
86	   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
87	   10. Normative References . . . . . . . . . . . . . . . . . . . . . 24
88	   A.  Informational References . . . . . . . . . . . . . . . . . . . 25
89	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26

91	1.  Introduction

93	   The Opus codec is a speech and audio codec developed within the IETF
94	   Internet Wideband Audio Codec working group [codec].  The codec has a
95	   very low algorithmic delay and is is highly scalable in terms of
96	   audio bandwidth, network bit rate, and complexity.  Further, it
97	   provides different modes to efficiently encode speech signals as well
98	   as music signals, thus, making it the codec of choice for various
99	   applications using the Internet or similar networks.

101	   This document defines the Real-time Transport Protocol (RTP)
102	   [RFC3550] payload format and file storage format for packetization of
103	   Opus encoded speech and audio data that is essential to integrate the
104	   Opus codec in the most compatible way.  Further, media type
105	   registrations are described for the RTP payload format and the file
106	   storage format.  More information on the Opus codec can be obtained
107	   from the following IETF draft [Opus].

109	2.  Conventions, Definitions and Acronyms used in this document

111	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
112	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
113	   document are to be interpreted as described in [RFC2119].

115	   CPU:  Central Processing Unit
116	   IP:  Internet Protocol
117	   PSTN:  Public Switched Telephone Network
118	   samples:  Speech or audio samples
119	   SDP:  Session Description Protocol

121	3.  Opus Codec

123	   The Opus speech and audio codec has been developed to encode speech
124	   signals as well as audio signals.  Two different modes, a voice mode
125	   or an audio mode, may be chosen to allow the most efficient coding
126	   dependent on the type of input signal, the sampling frequency of the
127	   input signal, and the specific application.

129	   The voice mode allows to efficiently encode voice signals at lower
130	   bit rates while the audio mode is optimized for audio signals at
131	   medium and higher bit rates.

133	   The Opus speech and audio codec is highly scalable in terms of audio
134	   bandwidth, network bit rate, and complexity.  Further, Opus allows to
135	   transmit stereo signals.

137	   The Opus speech and audio codec is based on the SILK codec [SILK] and
138	   the CELT codec [CELT].  For more detailed information on how Opus
139	   operates, also refer to [Opus].

141	3.1.  Modes

143	   Opus supports five different audio bandwidths, 8000, 12000, 16000,
144	   24000, and 48000 Hz sampling frequency, for the voice mode and four
145	   different audio bandwidths, 8000, 16000, 24000, and 48000 Hz sampling
146	   frequency, for the audio mode.

148	3.1.1.  Audio Mode

150	   For low bit rate applications transmitting mostly speech signals the
151	   voice mode of Opus SHOULD be used.  The voice mode allows to encode
152	   voice signals at 8000, 12000, 16000, 24000, and 48000 Hz sampling
153	   frequency.

155	   A sampling rate of 8000 Hz SHOULD only be used to interface to PSTN
156	   networks or on low end devices that do not support greater than 8000
157	   Hz sampling frequency.  A sampling rate of 12000 Hz SHOULD be used
158	   for lower end devices that do not support greater than 12000 Hz
159	   sampling frequency or are under severe network bandwidth constrains
160	   (e.g. wireless devices).  A sampling rate of 16000 Hz SHOULD be used
161	   for all-IP platforms that do not support greater than 16000 Hz
162	   sampling frequency.  Higher sampling rates are recommended for all
163	   devices that support those high sampling rates and desire full-
164	   bandwidth speech at medium bit rates.

166	3.1.2.  Audio Mode

168	   For applications desiring very low delay speech transmission as well
169	   as music transmission in trade off to a higher bit rate, the audio
170	   mode SHOULD be used.  This mode supports audio sampling rates of
171	   8000, 16000, 24000, and 48000 Hz.

173	3.2.  Network Bandwidth

175	   The network bit rate is adaptive within the range specified in
176	   Table 1 for corresponding modes and audio sampling rates.  The
177	   average target network bit rate can be defined and modified in real-
178	   time while the actual bit rate will be dependent on the settings of
179	   Opus and the input signal and may change over time.

181	                      +-------+---------+-----------+
182	                      |  Mode | fs (Hz) | BR (kbps) |
183	                      +-------+---------+-----------+
184	                      | voice |   8000  |   6 - 20  |
185	                      |       |         |           |
186	                      | voice |  12000  |   7 - 25  |
187	                      |       |         |           |
188	                      | voice |  16000  |   8 - 30  |
189	                      |       |         |           |
190	                      | voice |  24000  |  18 - 28  |
191	                      |       |         |           |
192	                      | voice |  48000  |  24 - 32  |
193	                      |       |         |           |
194	                      | audio |   8000  |  20 - 28  |
195	                      |       |         |           |
196	                      | audio |  16000  |  24 - 32  |
197	                      |       |         |           |
198	                      | audio |  24000  |  28 - 40  |
199	                      |       |         |           |
200	                      | audio |  48000  |  32 - 128 |
201	                      +-------+---------+-----------+

203	     Mode specifies the Opus mode of operation; fs specifies the audio
204	    sampling frequency in Hertz (Hz); BR specifies the network bit rate
205	                   range in kilobits per second (kbps).

207	                                  Table 1

209	3.2.1.  Variable versus Constant Bit Rate

211	   The voice mode will always use a variable bit rate at audio sampling
212	   rates of 8000, 12000, and 16000 Hz.  The average target bit rate can
213	   be adjusted at any point in time.  To avoid congestion of the
214	   connection the average target bit rate SHOULD be adjusted to the
215	   available network bandwidth.  If no target bit rate is specified the
216	   average bit rate may go up to the highest bit rate specified in
217	   Table 1.

219	   In voice mode at audio sampling rates higher than 16000 Hz, i.e.
220	   24000, and 48000 Hz, and audio mode Opus can be operated in both
221	   variable and constant bit rate.  The target bit rate can be adjusted
222	   at any point in time.

224	3.2.2.  Discontinuous Transmission (DTX)

226	   The Opus codec may, as described in Section 3.2.1, be operated with
227	   an adaptive bit rate.  In that case, the bit rate will automatically
228	   be reduced for certain input signals like periods of silence.  During
229	   continuous transmission the bit rate will be reduced, when the input
230	   signal allows to do so, but the transmission to the receiver itself
231	   will never be interrupted.  Therefore, the received signal will
232	   maintain the same high level of quality over the full duration of a
233	   transmission while minimizing the average bit rate over time.

235	   In cases where the bit rate of Opus needs to be reduced even further
236	   or in cases where only constant bit rate is available, the Opus
237	   encoder may be set to use discontinuous transmission (DTX), where
238	   parts of the encoded signal that correspond to periods of silence in
239	   the input speech or audio signal are not transmitted to the receiver.

241	   On the receiving side, the non-transmitted parts will be handled by a
242	   frame loss concealment unit in the Opus decoder which generates a
243	   comfort noise signal to replace the non transmitted parts of the
244	   speech or audio signal.

246	   The DTX mode of Opus will have a slightly lower speech or audio
247	   quality than the continuous mode.  Therefore, it is RECOMMENDED to
248	   use Opus in the continuous mode unless restraints on network
249	   bandwidth are severe.  The DTX mode can be engaged for operation in
250	   both adaptive or constant bit rate.

252	3.3.  Complexity

254	   Complexity can be scaled to optimize for CPU resources in real-time,
255	   mostly in trade-off to network bit rate.  Also, different modes of
256	   Opus have different complexity.

258	3.4.  Forward Error Correction (FEC)

260	   The voice mode of Opus allows for "in-band" forward error correction
261	   (FEC) data to be embedded into the bit stream of Opus.  This FEC
262	   scheme adds redundant information about the previous packet (n-1) to
263	   the current output packet n.  For each frame, the encoder decides
264	   whether to use FEC based on (1) an externally-provided estimate of
265	   the channel's packet loss rate; (2) an externally-provided estimate
266	   of the channel's capacity; (3) the sensitivity of the audio or speech
267	   signal to packet loss; (4) whether the receiving decoder has
268	   indicated it can take advantage of "in-band" FEC information.  The
269	   decision to send "in-band" FEC information is entirely controlled by
270	   the encoder and therefore no special precautions for the payload or
271	   storage format have to be taken.

273	   On the receiving side, the decoder can take advantage of this
274	   additional information when, in case of a packet loss, the next
275	   packet is available.  In order to use the FEC data, the jitter buffer
276	   needs to provide access to payloads with the FEC data.  The decoder
277	   API function has a flag to indicate that a FEC frame rather than a
278	   regular frame should be decoded.  If no FEC data is available for the
279	   current frame, the decoder will consider the frame lost and invokes
280	   the frame loss concealment.

282	   If the FEC scheme is not implemented on the receiving side, FEC
283	   SHOULD NOT be used, as it leads to an inefficient usage of network
284	   bandwidth.  Decoder support for FEC SHOULD be indicated at the time a
285	   session is set up.

287	3.5.  Stereo Operation

289	   Opus allows for transmission of stereo audio signals.  This operation
290	   will be signaled in the Opus payload and no special arrangements have
291	   to be made in the payload format.  Any implementation of the Opus
292	   decoder MUST be capable to receive stereo signals.

294	   If a decoder can not take advantage of the benefits of a stereo
295	   signal this SHOULD be indicated at the time a session is set up.  In
296	   that case the sending side SHOULD NOT send stereo signals as it leads
297	   to an inefficient usage of network bandwidth.

299	4.  Opus RTP Payload Format

301	   The payload format for Opus consists of the RTP header and Opus
302	   payload data.

304	4.1.  RTP Header Usage

306	   The format of the RTP header is specified in [RFC3550].  The Opus
307	   payload format uses the fields of the RTP header consistent with this
308	   specification.

310	   The payload length of Opus is a multiple number of octets and
311	   therefore no padding is required.  The payload MAY be padded by an
312	   integer number of octets according to [RFC3550].

314	   The marker bit (M) of the RTP header has no function in combination
315	   with Opus and MAY be ignored.

317	   The RTP payload type for Opus has not been assigned statically and is
318	   expected to be assigned dynamically.

320	   The receiving side MUST be prepared to receive duplicates of RTP
321	   packets.  Only one of those payloads MUST be provided to the Opus
322	   decoder for decoding and others MUST be discarded.

324	   Opus supports 5 different sampling rates which may be adjusted during
325	   the duration of a call.  The RTP timestamp clock frequency is defined
326	   as the highest supported sampling frequency of Opus, i.e. 48000 Hz,
327	   for all modes and sampling rates of Opus.  The unit for the timestamp
328	   is samples.  The RTP timestamp corresponds to the sample time of the
329	   first encoded sample in the encoded frame.  For sampling rates lower
330	   than 48000 Hz the number of samples has to be multiplied with a
331	   multiplier according to Table 2 to determine the RTP timestamp.

333	                         +---------+------------+
334	                         | fs (Hz) | Multiplier |
335	                         +---------+------------+
336	                         |   8000  |      6     |
337	                         |         |            |
338	                         |  12000  |      4     |
339	                         |         |            |
340	                         |  16000  |      3     |
341	                         |         |            |
342	                         |  24000  |      2     |
343	                         |         |            |
344	                         |  48000  |      1     |
345	                         +---------+------------+

347	    fs specifies the audio sampling frequency in Hertz (Hz); Multiplier
348	   is the value that the number of samples have to be multiplied with to
349	                       calculate the RTP timestamp.

351	                                  Table 2

353	4.2.  Payload Structure

355	   The Opus encoder can be set to output encoded frames representing
356	   2.5, 5, 10, 20, 40, or 60 ms of speech or audio data.  Further, an
357	   arbitrary number of frames can be combined into a packet.  The
358	   maximum packet length is limited to the amount of encoded data
359	   representing 120 ms of speech or audio data.  The packetization of
360	   encoded data is purely done by the Opus encoder and therefore only
361	   one packet output from the Opus encoder MUST be used as a payload.

363	   Figure 1 shows the structure combined with the RTP header.

365	   +----------+--------------+
366	   |RTP Header| Opus Payload |
367	   +----------+--------------+

369	                Figure 1: Payload Structure with RTP header

371	   Table 3 shows supported frame sizes for different modes and sampling
372	   rates of Opus and how the timestamp needs to be incremented for
373	   packetization.

375	   +------+------------------------+----+----+-----+-----+------+------+
376	   | Mode |           fs           | 2. |  5 |  10 |  20 |  40  |  60  |
377	   |      |                        |  5 |    |     |     |      |      |
378	   +------+------------------------+----+----+-----+-----+------+------+
379	   |  ts  |           all          | 12 | 24 | 480 | 960 | 1920 | 2880 |
380	   | incr |                        |  0 |  0 |     |     |      |      |
381	   |      |                        |    |    |     |     |      |      |
382	   | voic | 8000/12000/16000/24000 |    |    |  x  |  x  |   x  |   x  |
383	   |   e  |         /48000         |    |    |     |     |      |      |
384	   |      |                        |    |    |     |     |      |      |
385	   | audi | 8000/16000/24000/48000 |  x |  x |  x  |  x  |      |      |
386	   |   o  |                        |    |    |     |     |      |      |
387	   +------+------------------------+----+----+-----+-----+------+------+

389	     Mode specifies the Opus mode of operation; fs specifies the audio
390	       sampling frequency in Hertz (Hz); 2.5, 5, 10, 20, 40, and 60
391	    represent the duration of encoded speech or audio data in a packet;
392	   ts incr specifies the value the timestamp needs to be incremented for
393	   the representing packet size.  For multiple frames in a packet these
394	    values have to be multiplied with the respective number of frames.

396	                                  Table 3

398	5.  Opus Storage Format

400	   The Opus storage format allows to store Opus encoded data into e.g. a
401	   file or an email attachment.  The storage format consists of a header
402	   and a series of blocks containing encoded speech or audio frames.
403	   The storage format closely mimics the real-time payload format and
404	   allows to easily convert packets, e.g. received by a voicemail
405	   system, into a storage format and vice versa and therefore allowing
406	   maximum flexibility and low overhead.  Please note that this storage
407	   format is not meant to be a robust storage format, nor the most
408	   efficient storage format.  For a robust storage format that allows
409	   advanced functionality like e.g. seeking, a more advanced container
410	   format should be used.

412	   Figure 2 shows an example of an Opus encoded file.  Note that due to
413	   the potentially adaptive bit rate the packet length may be variable
414	   and no fixed block size can be defined for blocks containing encoded
415	   data.

417	   +------------------+
418	   | Header           |
419	   +-----------+------+
420	   | block 1   |
421	   +-----------+--+
422	   | block 2      |
423	   +--------------+--+
424	   : ...             :
425	   +--------------+--+
426	   | block n         |
427	   +-----------------+

429	   Figure 2: Example of Opus file storage format showing different block
430	           lengths due to potentially adaptive bit rate of Opus

432	5.1.  Storage Header Structure

434	   An Opus storage header contains the following ASCII character string
435	   as a magic number:

437	   "#!opus\n" (hexadecimal: 0x23 0x21 0x6f 0x70 0x75 0x73 0x0A)

439	5.2.  Storage Block Structure

441	   Following the storage header, blocks of encoded data are stored in
442	   consecutive order in time according to Figure 2.  Each block contains
443	   a block header followed by a payload according to Figure 3.

445	   The block header contains information that, for an RTP-based session,
446	   can be derived from the IP and RTP headers: The number of octets
447	   contained in the subsequent payload and the RTP timestamp.

449	   The number of octets in the payload is represented by 16 bits and the
450	   timestamp is specified by 32 bits.  For the first block, the
451	   timestamp MAY be a random number.  For the following blocks, the
452	   timestamp MUST be incremented according to the way timestamps are
453	   incremented when Opus payloads are transmitted over RTP.

455	   0                   16                           48
456	   +-------------------+----------------------------+-----------------
457	   |    # of octets    |        Timestamp           |  Payload
458	   +-------------------+----------------------------+-----------------

460	                 Figure 3: Storage block header structure

462	   The payload of each block in Figure 2 represents one packet of Opus
463	   encoded data the way as originally encoded by the Opus encoder.
464	   Information about frame size representing the duration of encoded
465	   speech or audio data, number of encoded frames, stereo information,
466	   and DTX is embedded into the payload of Opus and not subject to the
467	   storage format.  It can be extracted from the payload during decoding
468	   of the encoded data.

470	   During the usage of DTX no blocks are stored when the channel is
471	   inactive.  Timestamps MUST be used to reassemble the decoded signal
472	   in a time-aligned way.

474	6.  Congestion Control

476	   The adaptive nature of the Opus codec allows for an efficient
477	   congestion control.

479	   The voice mode of Opus at audio sampling rates of 8000, 12000, and
480	   16000 always runs with a variable bit rate.  The average bit rate in
481	   that mode is dependent on the input signal and will especially
482	   decrease during silent periods.  The voice mode at audio sampling
483	   rates of 24000 and 48000 Hz and the audio mode may run at a variable
484	   or constant bit rate.  In either way, the target bit rate of Opus can
485	   be adjusted at any point in time and thus allowing for an efficient
486	   congestion control.

488	   Furthermore, the amount of encoded speech or audio data encoded in a
489	   single packet can be used for congestion control since the
490	   transmission rate is inversely proportional to these frame sizes.  A
491	   lower packet transmission rate reduces the amount of header overhead
492	   but at the same time increases latency and error sensitivity and
493	   should be done with care.

495	   It is RECOMMENDED that congestion control is applied during the
496	   transmission of Opus encoded data.

498	7.  IANA Considerations

500	   One media subtype (audio/opus) has been defined and registered as
501	   described in the following section.

503	7.1.  Opus Media Type Registration

505	   Media type registration is done according to [RFC4288] and [RFC4855].

507	   Type name: audio

509	   Subtype name: opus

511	   Required parameters:

513	   rate:  RTP timestamp clock rate is incremented with 48000 Hz clock
514	      rate for all modes of Opus and all sampling frequencies.  For
515	      audio sampling rates other than 48000 Hz the rate has to be
516	      adjusted to 48000 Hz according to Table 2.

518	   Optional parameters:

520	   maxcodedaudiobandwidth:  the decoder's maximum sampling frequency
521	      specified in Hertz (Hz) that the application can take advantage
522	      of.  The decoder MUST be capable to receive any allowed sampling
523	      frequency but due to hardware limitations only signals up to the
524	      specified sampling frequency can be processed.  Sending signals
525	      with higher sampling frequency may result in higher than necessary
526	      network bandwidth and encoding complexity.  Possible values are
527	      8000, 12000, 16000, 24000, 48000.

529	   maxptime:  the decoder's maximum length of time in milliseconds
530	      rounded up to the next full integer value represented by the media
531	      in a packet that can be encapsulated in a received packet
532	      according to Section 6 of [RFC4566].  Possible values are 3, 5,
533	      10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes
534	      rounded up to the next full integer value up to a maximum value of
535	      120 as defined in Section 4 and Section 5 of this document.  If no
536	      value is specified, 120 is assumed as default.  This value is a
537	      recommendation by the decoding side to ensure the best performance
538	      for the decoder.  The decoder MUST be capable to accept any
539	      allowed packet sizes to ensure maximum compatibility.

541	   ptime:  the decoder's recommended length of time in milliseconds
542	      rounded up to the next full integer value represented by the media
543	      in a packet according to Section 6 of [RFC4566].  Possible values
544	      are 3, 5, 10, 20, 40, or 60 or an arbitrary multiple of Opus frame
545	      sizes rounded up to the next full integer value up to a maximum
546	      value of 120 as defined in Section 4 and Section 5 of this
547	      document.  If no value is specified, 20 is assumed as default.  If
548	      ptime is greater than maxptime, ptime MUST be ignored.  This
549	      parameter MAY be changed during a session.  This value is a
550	      recommendation by the decoding side to ensure the best performance
551	      for the decoder.  The decoder MUST be capable to accept any
552	      allowed packet sizes to ensure maximum compatibility.

554	   minptime:  the decoder's minimum length of time in milliseconds
555	      rounded up to the next full integer value represented by the media
556	      in a packet that SHOULD be encapsulated in a received packet
557	      according to Section 6 of [RFC4566].  Possible values are 3, 5,
558	      10, 20, 40, and 60 or an arbitrary multiple of Opus frame sizes
559	      rounded up to the next full integer value up to a maximum value of
560	      120 as defined in Section 4 and Section 5 of this document.  If no
561	      value is specified, 3 is assumed as default.  This value is a
562	      recommendation by the decoding side to ensure the best performance
563	      for the decoder.  The decoder MUST be capable to accept any
564	      allowed packet sizes to ensure maximum compatibility.

566	   maxaveragebitrate:  specifies the maximum average receive bit rate of
567	      a session in bits per second (bps).  The actual value of the bit
568	      rate may vary as it is dependent on the characteristics of the
569	      media in a packet.  Note that the maximum average bit rate MAY be
570	      modified dynamically during a session.  Any positive integer is
571	      allowed but values outside the range between 6000 and 510000
572	      SHOULD be ignored.  If no value is specified, the maximum value
573	      specified in Table 1 for the corresponding mode of Opus and
574	      corresponding clock rate will be the default.

576	   stereo:  specifies if the decoder prefers to receive stereo signals
577	      versus mono signals.  Possible values are 1 and 0 where 1
578	      specifies that stereo signals are preferred and 0 specifies that
579	      only mono signals are preferred.  Independent of the stereo
580	      parameter every receiver MUST be able to receive and decode stereo
581	      signals but sending stereo signals to a receiver that signaled a
582	      preference for mono signals may result in higher than necessary
583	      network bandwidth and encoding complexity.  If no value is
584	      specified, stereo is assumed to be 0.

586	   cbr:  specifies if the decoder prefers the use of a constant bit rate
587	      versus variable bit rate.  Possible values are 1 and 0 where 1
588	      specifies constant bit rate and 0 specifies variable bit rate.  If
589	      no value is specified, cbr is assumed to be 0.  Note that the
590	      maximum average bit rate may still be changed, e.g. to adapt to
591	      changing network conditions.

593	   useinbandfec:  specifies that Opus in-band FEC is supported by the
594	      decoder and MAY be used during a session.  Possible values are 1
595	      and 0.  It is RECOMMENDED to provide 0 in case FEC is not
596	      implemented on the receiving side.  If no value is specified,
597	      useinbandfec is assumed to be 1.

599	   usedtx:  specifies if the decoder prefers the use of DTX.  Possible
600	      values are 1 and 0.  If no value is specified, usedtx is assumed
601	      to be 0.

603	   Encoding considerations:

605	      Opus media type is framed and consists of binary data according to
606	      Section 4.8 in [RFC4288].

608	   Security considerations:

610	      See Section 8 of this document.

612	   Interoperability considerations: none

614	   Published specification: none

616	   Applications that use this media type:

618	      Any application that requires the transport or storage of speech
619	      or audio data may use this media type.  Some examples are, but not
620	      limited to, audio and video conferencing, Voice over IP, voice
621	      recording, media streaming, voice messaging.

623	   Additional information:

625	      For storage transfer methods the following applies:

627	      Magic number:"#!opus\n" (hexadecimal: 0x23 0x21 0x6f 0x70 0x75
628	      0x73 0x0A)

630	      File extension(s): ops, OPS

632	      Macintosh file type code(s): "opus"

634	   Person & email address to contact for further information:

636	      SILK Support silksupport@skype.net
637	      Jean-Marc Valin jean-marc.valin@octasic.com

639	   Intended usage: COMMON

641	   Restrictions on usage:

643	      For transfer over RTP, the RTP payload format (Section 4 of this
644	      document) SHALL be used.  For storage usage, the storage format
645	      (Section 5 of this document) SHALL be used.

647	   Author:

649	      Julian Spittka julian.spittka@skype.net

651	      Koen Vos koen.vos@skype.net

653	      Jean-Marc Valin jean-marc.valin@octasic.com

655	   Change controller: TBD

657	7.2.  Mapping to SDP Parameters

659	   The information described in the media type specification has a
660	   specific mapping to fields in the Session Description Protocol (SDP)
661	   [RFC4566], which is commonly used to describe RTP sessions.  When SDP
662	   is used to specify sessions employing the Opus codec, the mapping is
663	   as follows:

665	   o  The media type ("audio") goes in SDP "m=" as the media name.
666	   o  The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding
667	      name.  The RTP clock rate in "a=rtpmap" MUST be mapped to the
668	      required media type parameter "rate".
669	   o  The optional media type parameters "ptime" and "maxptime" are
670	      mapped to "a=ptime" and "a=maxptime" attributes, respectively, in
671	      the SDP.

673	   o  All remaining media type parameters are mapped to the "a=fmtp"
674	      attribute in the SDP by copying them directly from the media type
675	      parameter string as a semicolon-separated list of parameter=value
676	      pairs (e.g. maxaveragebitrate=20000).

678	   Below are some examples of SDP session descriptions for Opus:

680	   Example 1: Standard session with 48000 Hz clock rate

682	       m=audio 54312 RTP/AVP 101
683	       a=rtpmap:101 opus/48000

685	   Example 2: 16000 Hz clock rate, maximum packet size of 40 ms,
686	   recommended packet size of 40 ms, maximum average bit rate of 20000
687	   bps, stereo signals are preferred, FEC is allowed, DTX is not allowed

689	       m=audio 54312 RTP/AVP 101
690	       a=rtpmap:101 opus/48000
691	       a=fmtp:101 maxcodedaudiobandwidth=16000; maxaveragebitrate=20000;
692	       stereo=1; useinbandfec=1; usedtx=0
693	       a=ptime:40
694	       a=maxptime:40

696	7.2.1.  Offer-Answer Model Considerations for Opus

698	   When using the offer-answer procedure described in [RFC3264] to
699	   negotiate the use of Opus, the following considerations apply:

701	   o  Opus supports several clock rates.  For signaling purposes only
702	      the highest, i.e. 48000, is used.  The actual clock rate of the
703	      corresponding media is signaled inside the payload and is not
704	      subject to this payload format description.  The decoder MUST be
705	      capable to decode every received clock rate.  An example is shown
706	      below:

708	           m=audio 54312 RTP/AVP 100
709	           a=rtpmap:100 opus/48000

711	   o  The parameters "ptime" and "maxptime" are unidirectional receive-
712	      only parameters and typically will not compromise
713	      interoperability; however, dependent on the set values of the
714	      parameters the performance of the application may suffer.

716	      [RFC3264] defines the SDP offer-answer handling of the "ptime"
717	      parameter.  The "maxptime" parameter MUST be handled in the same
718	      way.
719	   o  The parameter "minptime" is a unidirectional receive-only
720	      parameters and typically will not compromise interoperability;
721	      however, dependent on the set values of the parameter the
722	      performance of the application may suffer and should be set with
723	      care.
724	   o  The parameter "maxcodedaudiobandwidth" is a unidirectional
725	      receive-only parameter that reflects limitations of the local
726	      receiver.  The sender of the other side SHOULD NOT send with a
727	      sampling rate higher than "maxcodedaudiobandwidth" as it
728	      represents an inefficient use of network bandwidth resources and
729	      CPU cycles on the encoding side.  The parameter
730	      "maxcodedaudiobandwidth" typically will not compromise
731	      interoperability; however, dependent on the set value of the
732	      parameter the performance of the application may suffer and should
733	      be set with care.
734	   o  The parameter "maxaveragebitrate" is a unidirectional receive-only
735	      parameter that reflects limitations of the local receiver.  The
736	      sender of the other side MUST NOT send with an average bit rate
737	      higher than "maxaveragebitrate" as it might overload the network
738	      and/or receiver.  The parameter "maxaveragebitrate" typically will
739	      not compromise interoperability; however, dependent on the set
740	      value of the parameter the performance of the application may
741	      suffer and should be set with care.
742	   o  If the parameter "maxaveragebitrate" is below the range specified
743	      in Table 1 the session MUST be rejected.
744	   o  The parameter "stereo" is a unidirectional receive-only parameter.
745	   o  The parameter "cbr" is a unidirectional receive-only parameter.
746	   o  The parameter "useinbandfec" is a unidirectional receive-only
747	      parameter.
748	   o  The parameter "usedtx" is a unidirectional receive-only parameter.
749	   o  Any unknown parameter in an offer MUST be ignored by the receiver
750	      and MUST be removed from the answer.

752	7.2.2.  Declarative SDP Considerations for Opus

754	   For declarative use of SDP such as in Session Announcement Protocol
755	   (SAP), [RFC2974], and RTSP, [RFC2326], for Opus, the following needs
756	   to be considered:

758	   o  The values for "maxptime", "ptime", "minptime",
759	      "maxcodedaudiobandwidth", and "maxaveragebitrate" should be
760	      selected carefully to ensure that a reasonable performance can be
761	      achieved for the participants of a session.

763	   o  The values for "maxptime", "ptime", and "minptime" of the payload
764	      format configuration are recommendations by the decoding side to
765	      ensure the best performance for the decoder.  The decoder MUST be
766	      capable to accept any allowed packet sizes to ensure maximum
767	      compatibility.
768	   o  All other parameters of the payload format configuration are
769	      declarative and a participant MUST use the configurations that are
770	      provided for the session.  More than one configuration may be
771	      provided if necessary by declaring multiple RTP payload types;
772	      however, the number of types should be kept small.

774	8.  Security Considerations

776	   All RTP packets using the payload format defined in this
777	   specification are subject to the general security considerations
778	   discussed in the RTP specification [RFC3550] and any profile from
779	   e.g.  [RFC3711] or [RFC3551].

781	   This payload format transports Opus encoded speech or audio data,
782	   hence, security issues include confidentiality, integrity protection,
783	   and authentication of the speech or audio itself.  The Opus payload
784	   format does not have any built-in security mechanisms.  Any suitable
785	   external mechanisms, such as SRTP [RFC3711], MAY be used.

787	   This payload format and the Opus encoding do not exhibit any
788	   significant non-uniformity in the receiver-end computational load and
789	   thus are unlikely to pose a denial-of-service threat due to the
790	   receipt of pathological datagrams.

792	9.  Acknowledgements

794	   TBD

796	10.  Normative References

798	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
799	              Requirement Levels", BCP 14, RFC 2119, March 1997.

801	   [RFC2326]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
802	              Streaming Protocol (RTSP)", RFC 2326, April 1998.

804	   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session
805	              Announcement Protocol", RFC 2974, October 2000.

807	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
808	              with Session Description Protocol (SDP)", RFC 3264,
809	              June 2002.

811	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
812	              Jacobson, "RTP: A Transport Protocol for Real-Time
813	              Applications", STD 64, RFC 3550, July 2003.

815	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
816	              Video Conferences with Minimal Control", STD 65, RFC 3551,
817	              July 2003.

819	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
820	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
821	              RFC 3711, March 2004.

823	   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
824	              Registration Procedures", BCP 13, RFC 4288, December 2005.

826	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
827	              Description Protocol", RFC 4566, July 2006.

829	   [RFC4855]  Casner, S., "Media Type Registration of RTP Payload
830	              Formats", RFC 4855, February 2007.

832	Appendix A.  Informational References

834	      [codec] http://datatracker.ietf.org/wg/codec/
835	      [SILK] https://developer.skype.com/silk
836	      [CELT] http://www.celt-codec.org/
837	      [Opus] http://datatracker.ietf.org/doc/draft-ietf-codec-opus/

839	Authors' Addresses

841	   Julian Spittka
842	   Skype Technologies S.A.
843	   3210 Porter Drive
844	   Palo Alto, CA  94304
845	   USA

847	   Email: julian.spittka@skype.net

849	   Koen Vos
850	   Skype Technologies S.A.
851	   3210 Porter Drive
852	   Palo Alto, CA  94304
853	   USA

855	   Email: koen.vos@skype.net

857	   Jean-Marc Valin
858	   Octasic Inc.
859	   4101 Molson Street
860	   Montreal, Quebec
861	   Canada

863	   Email: jean-marc.valin@octasic.com