idnits 2.17.1 

draft-ietf-rtcweb-audio-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 23, 2012) is 4172 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC6716' is mentioned on line 86, but not defined

  -- No information found for draft-ekr-security-considerations-for-rtc-web -
     is the name correct?

  -- Possible downref: Normative reference to a draft: ref.
     'I-D.ekr-security-considerations-for-rtc-web' 


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                          JM. Valin
3	Internet-Draft                                                   Mozilla
4	Intended status: Standards Track                                 C. Bran
5	Expires: May 27, 2013                                        Plantronics
6	                                                       November 23, 2012

8	             WebRTC Audio Codec and Processing Requirements
9	                       draft-ietf-rtcweb-audio-01

11	Abstract

13	   This document outlines the audio codec and processing requirements
14	   for WebRTC client application and endpoint devices.

16	Status of this Memo

18	   This Internet-Draft is submitted in full conformance with the
19	   provisions of BCP 78 and BCP 79.

21	   Internet-Drafts are working documents of the Internet Engineering
22	   Task Force (IETF).  Note that other groups may also distribute
23	   working documents as Internet-Drafts.  The list of current Internet-
24	   Drafts is at http://datatracker.ietf.org/drafts/current/.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   This Internet-Draft will expire on May 27, 2013.

33	Copyright Notice

35	   Copyright (c) 2012 IETF Trust and the persons identified as the
36	   document authors.  All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (http://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document.  Please review these documents
42	   carefully, as they describe your rights and restrictions with respect
43	   to this document.  Code Components extracted from this document must
44	   include Simplified BSD License text as described in Section 4.e of
45	   the Trust Legal Provisions and are provided without warranty as
46	   described in the Simplified BSD License.

48	Table of Contents

50	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
51	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
52	   3.  Codec Requirements  . . . . . . . . . . . . . . . . . . . . . . 3
53	   4.  Audio Level . . . . . . . . . . . . . . . . . . . . . . . . . . 3
54	   5.  Acoustic Echo Cancellation (AEC)  . . . . . . . . . . . . . . . 4
55	   6.  Legacy VoIP Interoperability  . . . . . . . . . . . . . . . . . 5
56	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 5
57	   8.  Security Considerations . . . . . . . . . . . . . . . . . . . . 5
58	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . 6
59	   10. Normative References  . . . . . . . . . . . . . . . . . . . . . 6
60	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 6

62	1.  Introduction

64	   An integral part of the success and adoption of the Web Real Time
65	   Communications (WebRTC) will be the voice and video interoperability
66	   between WebRTC applications.  This specification will outline the
67	   audio processing and codec requirements for WebRTC client
68	   implementations.

70	2.  Terminology

72	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
73	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
74	   document are to be interpreted as described in RFC 2119 [RFC2119].

76	3.  Codec Requirements

78	   To ensure a baseline level of interoperability between WebRTC
79	   clients, a minimum set of required codecs are specified below.  While
80	   this section specifies the codecs that will be mandated for all
81	   WebRTC client implementations, it leaves the question of supporting
82	   additional codecs to the will of the implementer.

84	   WebRTC clients are REQUIRED to implement the following audio codecs.

86	   o  Opus [RFC6716], with any ptime value up to 120 ms

88	   o  G.711 PCMA and PCMU with one channel, a rate of 8000 Hz and a
89	      ptime of 20 - see section 4.5.14 of [RFC3551]

91	   o  Telephone Event - [RFC4733]

93	   For all cases where the client is able to process audio at a sampling
94	   rate higher than 8 kHz, it is RECOMMENDED that Opus be offered before
95	   PCMA/PCMU.  For Opus, all modes MUST be supported on the decoder
96	   side.  The choice of encoder-side modes is left to the implementer.
97	   Clients MAY use the offer/answer mechanism to signal a preference for
98	   a particular mode or ptime.

100	4.  Audio Level

102	   It is desirable to standardize the "on the wire" audio level for
103	   speech transmission to avoid users having to manually adjust the
104	   playback and to facilitate mixing in conferencing applications.  It
105	   is also desirable to be consistent with ITU-T recommendations G.169
106	   and G.115, which recommend an active audio level of -19 dBm0.

108	   However, unlike G.169 and G.115, the audio for WebRTC is not
109	   constrained to have a passband specified by G.712 and can in fact be
110	   sampled at any sampling rate from 8 kHz to 48 kHz and up.  For this
111	   reason, the level SHOULD be normalized by only considering
112	   frequencies above 300 Hz, regardless of the sampling rate used.  The
113	   level SHOULD also be adapted to avoid clipping, either by lowering
114	   the gain to a level below -19 dBm0, or through the use of a
115	   compressor.

117	   AUTHORS' NOTE: The idea of using the same level as what the ITU-T
118	   recommends is that it should improve inter-operability while at the
119	   same time maintaining sufficient dynamic range and reducing the risk
120	   of clipping.  The main drawbacks are that the resulting level is
121	   about 12 dB lower than typical "commercial music" levels and it
122	   leaves room for ill-behaved clients to be much louder than a normal
123	   client.  While using music-type levels is not really an option (it
124	   would require using the same compressor-limitors that studios use),
125	   it would be possible to have a level slightly higher (e.g. 3 dB) than
126	   what is recommended above without causing interoperability problems.

128	   Assuming 16-bit PCM with a value of +/-32767, -19 dBm0 corresponds to
129	   a root mean square (RMS) level of 2600.  Only active speech should be
130	   considered in the RMS calculation.  If the client has control over
131	   the entire audio capture path, as is typically the case for a regular
132	   phone, then it is RECOMMENDED that the gain be adjusted in such a way
133	   that active speech have a level of 2600 (-19 dBm0) for an average
134	   speaker.  If the client does not have control over the entire audio
135	   capture, as is typically the case for a software client, then the
136	   client SHOULD use automatic gain control (AGC) to dynamically adjust
137	   the level to 2600 (-19 dBm0) +/- 6 dB.  For music or desktop sharing
138	   applications, the level SHOULD NOT be automatically adjusted and the
139	   client SHOULD allow the user to set the gain manually.

141	   The RECOMMENDED filter for normalizing the signal energy is a second-
142	   order Butterworth filter with a 300 Hz cutoff frequency.

144	   It is common for the audio output on some devices to be "calibrated"
145	   for playing back pre-recorded "commercial" music, which is typically
146	   around 12 dB louder than the level recommended in this section.
147	   Because of this, clients MAY increase the gain before playback.

149	5.  Acoustic Echo Cancellation (AEC)

151	   It is plausible that the dominant near to mid-term WebRTC usage model
152	   will be people using the interactive audio and video capabilities to
153	   communicate with each other via web browsers running on a notebook
154	   computer that has built-in microphone and speakers.  The notebook-as-
155	   communication-device paradigm presents challenging echo cancellation
156	   problems, the specific remedy of which will not be mandated here.
157	   However, while no specific algorithm or standard will be required by
158	   WebRTC compatible clients, echo cancellation will improve the user
159	   experience and should be implemented by the endpoint device.

161	   WebRTC clients SHOULD include an AEC and if that is not possible, the
162	   clients SHOULD ensure that the speaker-to-microphone gain is below
163	   unity at all frequencies to avoid instability when none of the client
164	   has echo cancellation.  For clients that do not control the audio
165	   capture and playback devices directly, it is RECOMMENDED to support
166	   echo cancellation between devices running at slight different
167	   sampling rates, such as when a webcam is used for microphone.

169	   The client SHOULD allow either the entire AEC or the non-linear
170	   processing (NLP) to be turned off for applications, such as music,
171	   that do not behave well with the spectral attenuation methods
172	   typically used in NLPs.  It SHOULD have the ability to detect the
173	   presence of a headset and disable echo cancellation.

175	   For some applications where the remote client may not have an echo
176	   canceller, the local client MAY include a far-end echo canceller, but
177	   if that is the case, it SHOULD be disabled by default.

179	6.  Legacy VoIP Interoperability

181	   The codec requirements above will ensure, at a minimum, voice
182	   interoperability capabilities between WebRTC client applications and
183	   legacy phone systems.

185	7.  IANA Considerations

187	   This document makes no request of IANA.

189	   Note to RFC Editor: this section may be removed on publication as an
190	   RFC.

192	8.  Security Considerations

194	   The codec requirements have no additional security considerations
195	   other than those captured in
196	   [I-D.ekr-security-considerations-for-rtc-web].

198	9.  Acknowledgements

200	   This draft incorporates ideas and text from various other drafts.  In
201	   particularly we would like to acknowledge, and say thanks for, work
202	   we incorporated from Harald Alvestrand and Cullen Jennings.

204	10.  Normative References

206	   [I-D.ekr-security-considerations-for-rtc-web]
207	              Rescorla, E., "Security Considerations for RTC-Web",
208	              May 2011.

210	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
211	              Requirement Levels", BCP 14, RFC 2119, March 1997.

213	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
214	              Video Conferences with Minimal Control", STD 65, RFC 3551,
215	              July 2003.

217	   [RFC4733]  Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF
218	              Digits, Telephony Tones, and Telephony Signals", RFC 4733,
219	              December 2006.

221	Authors' Addresses

223	   Jean-Marc Valin
224	   Mozilla
225	   650 Castro Street
226	   Mountain View, CA  94041
227	   USA

229	   Email: jmvalin@jmvalin.ca

231	   Cary Bran
232	   Plantronics
233	   345 Encinial Street
234	   Santa Cruz, CA  95060
235	   USA

237	   Phone: +1 206 661-2398
238	   Email: cary.bran@plantronics.com