idnits 2.17.1 

draft-ietf-rtcweb-audio-09.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 5, 2015) is 3088 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-12) exists of
     draft-ietf-rtcweb-security-08

  == Outdated reference: A later version (-20) exists of
     draft-ietf-rtcweb-security-arch-11

  == Outdated reference: A later version (-26) exists of
     draft-ietf-rtcweb-rtp-usage-23

  == Outdated reference: A later version (-06) exists of
     draft-ietf-rtcweb-audio-codecs-for-interop-01


     Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                          JM. Valin
3	Internet-Draft                                                   Mozilla
4	Intended status: Standards Track                                 C. Bran
5	Expires: May 8, 2016                                         Plantronics
6	                                                        November 5, 2015

8	             WebRTC Audio Codec and Processing Requirements
9	                       draft-ietf-rtcweb-audio-09

11	Abstract

13	   This document outlines the audio codec and processing requirements
14	   for WebRTC endpoints.

16	Status of This Memo

18	   This Internet-Draft is submitted in full conformance with the
19	   provisions of BCP 78 and BCP 79.

21	   Internet-Drafts are working documents of the Internet Engineering
22	   Task Force (IETF).  Note that other groups may also distribute
23	   working documents as Internet-Drafts.  The list of current Internet-
24	   Drafts is at http://datatracker.ietf.org/drafts/current/.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   This Internet-Draft will expire on May 8, 2016.

33	Copyright Notice

35	   Copyright (c) 2015 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (http://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document.  Please review these documents
42	   carefully, as they describe your rights and restrictions with respect
43	   to this document.  Code Components extracted from this document must
44	   include Simplified BSD License text as described in Section 4.e of
45	   the Trust Legal Provisions and are provided without warranty as
46	   described in the Simplified BSD License.

48	Table of Contents

50	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
51	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   2
52	   3.  Codec Requirements  . . . . . . . . . . . . . . . . . . . . .   2
53	   4.  Audio Level . . . . . . . . . . . . . . . . . . . . . . . . .   3
54	   5.  Acoustic Echo Cancellation (AEC)  . . . . . . . . . . . . . .   4
55	   6.  Legacy VoIP Interoperability  . . . . . . . . . . . . . . . .   5
56	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
57	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
58	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   5
59	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .   5
60	     10.1.  Normative References . . . . . . . . . . . . . . . . . .   6
61	     10.2.  Informative References . . . . . . . . . . . . . . . . .   6
62	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   7

64	1.  Introduction

66	   An integral part of the success and adoption of the Web Real Time
67	   Communications (WebRTC) will be the voice and video interoperability
68	   between WebRTC applications.  This specification will outline the
69	   audio processing and codec requirements for WebRTC endpoint
70	   implementations.

72	2.  Terminology

74	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
75	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
76	   "OPTIONAL" in this document are to be interpreted as described in RFC
77	   2119 [RFC2119].

79	3.  Codec Requirements

81	   To ensure a baseline level of interoperability between WebRTC
82	   endpoints, a minimum set of required codecs are specified below.  If
83	   other suitable audio codecs are available for the browser to use, it
84	   is RECOMMENDED that they are also be included in the offer in order
85	   to maximize the possibility to establish the session without the need
86	   for audio transcoding.

88	   WebRTC endpoints are REQUIRED to implement the following audio
89	   codecs:

91	   o  Opus [RFC6716] with the payload format specified in
92	      [I-D.ietf-payload-rtp-opus].

94	   o  G.711 PCMA and PCMU with the payload format specified in section
95	      4.5.14 of [RFC3551].

97	   o  [RFC3389] comfort noise (CN).  Receivers MUST support RFC3389 CN
98	      for streams encoded with G.711 or any other supported codec that
99	      does not provide its own CN.  Since Opus provides its own CN
100	      mechanism, the use of RFC3389 CN with Opus is NOT RECOMMENDED.
101	      Use of DTX/CN by senders is OPTIONAL.

103	   o  The audio/telephone-event media format as specified in [RFC4733].
104	      WebRTC endpoints are REQUIRED to be able to generate and consume
105	      the following events:

107	         +------------+--------------------------------+-----------+
108	         |Event Code  | Event Name                     | Reference |
109	         +------------+--------------------------------+-----------+
110	         | 0          | DTMF digit "0"                 |  RFC4733  |
111	         | 1          | DTMF digit "1"                 |  RFC4733  |
112	         | 2          | DTMF digit "2"                 |  RFC4733  |
113	         | 3          | DTMF digit "3"                 |  RFC4733  |
114	         | 4          | DTMF digit "4"                 |  RFC4733  |
115	         | 5          | DTMF digit "5"                 |  RFC4733  |
116	         | 6          | DTMF digit "6"                 |  RFC4733  |
117	         | 7          | DTMF digit "7"                 |  RFC4733  |
118	         | 8          | DTMF digit "8"                 |  RFC4733  |
119	         | 9          | DTMF digit "9"                 |  RFC4733  |
120	         | 10         | DTMF digit "*"                 |  RFC4733  |
121	         | 11         | DTMF digit "#"                 |  RFC4733  |
122	         +------------+--------------------------------+-----------+

124	   For all cases where the endpoint is able to process audio at a
125	   sampling rate higher than 8 kHz, it is RECOMMENDED that Opus be
126	   offered before PCMA/PCMU.  For Opus, all modes MUST be supported on
127	   the decoder side.  The choice of encoder-side modes is left to the
128	   implementer.  Endpoints MAY use the offer/answer mechanism to signal
129	   a preference for a particular mode or ptime.

131	   For additional information on implementing codecs other than the
132	   mandatory-to-implement codecs listed above, refer to
133	   [I-D.ietf-rtcweb-audio-codecs-for-interop].

135	4.  Audio Level

137	   It is desirable to standardize the "on the wire" audio level for
138	   speech transmission to avoid users having to manually adjust the
139	   playback and to facilitate mixing in conferencing applications.  It
140	   is also desirable to be consistent with ITU-T recommendations G.169
141	   and G.115, which recommend an active audio level of -19 dBm0.
142	   However, unlike G.169 and G.115, the audio for WebRTC is not
143	   constrained to have a passband specified by G.712 and can in fact be
144	   sampled at any sampling rate from 8 kHz to 48 kHz and up.  For this
145	   reason, the level SHOULD be normalized by only considering
146	   frequencies above 300 Hz, regardless of the sampling rate used.  The
147	   level SHOULD also be adapted to avoid clipping, either by lowering
148	   the gain to a level below -19 dBm0, or through the use of a
149	   compressor.

151	   Assuming 16-bit PCM with a value of +/-32767, -19 dBm0 corresponds to
152	   a root mean square (RMS) level of 2600.  Only active speech should be
153	   considered in the RMS calculation.  If the endpoint has control over
154	   the entire audio capture path, as is typically the case for a regular
155	   phone, then it is RECOMMENDED that the gain be adjusted in such a way
156	   that active speech have a level of 2600 (-19 dBm0) for an average
157	   speaker.  If the endpoint does not have control over the entire audio
158	   capture, as is typically the case for a software endpoint, then the
159	   endpoint SHOULD use automatic gain control (AGC) to dynamically
160	   adjust the level to 2600 (-19 dBm0) +/- 6 dB.  For music or desktop
161	   sharing applications, the level SHOULD NOT be automatically adjusted
162	   and the endpoint SHOULD allow the user to set the gain manually.

164	   The RECOMMENDED filter for normalizing the signal energy is a second-
165	   order Butterworth filter with a 300 Hz cutoff frequency.

167	   It is common for the audio output on some devices to be "calibrated"
168	   for playing back pre-recorded "commercial" music, which is typically
169	   around 12 dB louder than the level recommended in this section.
170	   Because of this, endpoints MAY increase the gain before playback.

172	5.  Acoustic Echo Cancellation (AEC)

174	   It is plausible that the dominant near to mid-term WebRTC usage model
175	   will be people using the interactive audio and video capabilities to
176	   communicate with each other via web browsers running on a notebook
177	   computer that has built-in microphone and speakers.  The notebook-as-
178	   communication-device paradigm presents challenging echo cancellation
179	   problems, the specific remedy of which will not be mandated here.
180	   However, while no specific algorithm or standard will be required by
181	   WebRTC compatible endpoints, echo cancellation will improve the user
182	   experience and should be implemented by the endpoint device.

184	   WebRTC endpoints SHOULD include an AEC or some other form of echo
185	   control.  On general purpose platforms (e.g.  PC), it is common for
186	   the audio capture ADC and the audio playback DAC to use different
187	   clocks.  In these cases, such as when a webcam is used for capture
188	   and a separate soundcard is used for playback, the sampling rates are
189	   likely to differ slightly.  Endpoint AECs SHOULD be robust to such
190	   conditions, unless they are shipped along with hardware that
191	   guarantees capture and playback to be sampled from the same clock.

193	   Endpoints SHOULD allow the entire AEC and/or the non-linear
194	   processing (NLP) to be turned off for applications, such as music,
195	   that do not behave well with the spectral attenuation methods
196	   typically used in NLPs.  Similarly, endpoints SHOULD have the ability
197	   to detect the presence of a headset and disable echo cancellation.

199	   For some applications where the remote endpoint may not have an echo
200	   canceller, the local endpoint MAY include a far-end echo canceller,
201	   but if that is the case, it SHOULD be disabled by default.

203	6.  Legacy VoIP Interoperability

205	   The codec requirements above will ensure, at a minimum, voice
206	   interoperability capabilities between WebRTC endpoints applications
207	   and legacy phone systems that support G.711.

209	7.  IANA Considerations

211	   This document makes no request of IANA.

213	   Note to RFC Editor: this section may be removed on publication as an
214	   RFC.

216	8.  Security Considerations

218	   For security considerations regarding the codecs themselves please
219	   refer their specifications, including [RFC6716],
220	   [I-D.ietf-payload-rtp-opus], [RFC3551], [RFC3389], and [RFC4733].
221	   Likewise, consult the RTP base specification for security RTP-based
222	   security considerations.  WebRTC security is further discussed in
223	   [I-D.ietf-rtcweb-security] and [I-D.ietf-rtcweb-security-arch] and
224	   [I-D.ietf-rtcweb-rtp-usage].

226	   Implementers should consider whether the use of VBR is appropriate
227	   for their application based on [RFC6562].  Encryption and
228	   authentication issues are beyond the scope of this document.

230	9.  Acknowledgements

232	   This draft incorporates ideas and text from various other drafts.  In
233	   particularly we would like to acknowledge, and say thanks for, work
234	   we incorporated from Harald Alvestrand and Cullen Jennings.

236	10.  References
237	10.1.  Normative References

239	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
240	              Requirement Levels", BCP 14, RFC 2119,
241	              DOI 10.17487/RFC2119, March 1997,
242	              <http://www.rfc-editor.org/info/rfc2119>.

244	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
245	              Video Conferences with Minimal Control", STD 65, RFC 3551,
246	              DOI 10.17487/RFC3551, July 2003,
247	              <http://www.rfc-editor.org/info/rfc3551>.

249	   [RFC3389]  Zopf, R., "Real-time Transport Protocol (RTP) Payload for
250	              Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389,
251	              September 2002, <http://www.rfc-editor.org/info/rfc3389>.

253	   [RFC4733]  Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF
254	              Digits, Telephony Tones, and Telephony Signals", RFC 4733,
255	              DOI 10.17487/RFC4733, December 2006,
256	              <http://www.rfc-editor.org/info/rfc4733>.

258	   [RFC6716]  Valin, JM., Vos, K., and T. Terriberry, "Definition of the
259	              Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716,
260	              September 2012, <http://www.rfc-editor.org/info/rfc6716>.

262	   [RFC6562]  Perkins, C. and JM. Valin, "Guidelines for the Use of
263	              Variable Bit Rate Audio with Secure RTP", RFC 6562,
264	              DOI 10.17487/RFC6562, March 2012,
265	              <http://www.rfc-editor.org/info/rfc6562>.

267	   [I-D.ietf-payload-rtp-opus]
268	              Spittka, J., Vos, K., and J. Valin, "RTP Payload Format
269	              for the Opus Speech and Audio Codec", draft-ietf-payload-
270	              rtp-opus-11 (work in progress), April 2015.

272	10.2.  Informative References

274	   [I-D.ietf-rtcweb-security]
275	              Rescorla, E., "Security Considerations for WebRTC", draft-
276	              ietf-rtcweb-security-08 (work in progress), February 2015.

278	   [I-D.ietf-rtcweb-security-arch]
279	              Rescorla, E., "WebRTC Security Architecture", draft-ietf-
280	              rtcweb-security-arch-11 (work in progress), March 2015.

282	   [I-D.ietf-rtcweb-rtp-usage]
283	              Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time
284	              Communication (WebRTC): Media Transport and Use of RTP",
285	              draft-ietf-rtcweb-rtp-usage-23 (work in progress), March
286	              2015.

288	   [I-D.ietf-rtcweb-audio-codecs-for-interop]
289	              Proust, S., Berger, E., Feiten, B., Burman, B., Bogineni,
290	              K., Lei, M., and E. Marocco, "Additional WebRTC audio
291	              codecs for interoperability.", draft-ietf-rtcweb-audio-
292	              codecs-for-interop-01 (work in progress), January 2015.

294	Authors' Addresses

296	   Jean-Marc Valin
297	   Mozilla
298	   331 E. Evelyn Avenue
299	   Mountain View, CA  94041
300	   USA

302	   Email: jmvalin@jmvalin.ca

304	   Cary Bran
305	   Plantronics
306	   345 Encinial Street
307	   Santa Cruz, CA  95060
308	   USA

310	   Phone: +1 206 661-2398
311	   Email: cary.bran@plantronics.com