idnits 2.17.1 

draft-ietf-avtext-client-to-mixer-audio-level-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to contain a disclaimer for pre-RFC5378 work, but was
     first submitted on or after 10 November 2008.  The disclaimer is usually
     necessary only for documents that revise or obsolete older RFCs, and that
     take significant amounts of text from those RFCs.  If you can contact all
     authors of the source material and they are willing to grant the BCP78
     rights to the IETF Trust, you can and should remove the disclaimer. 
     Otherwise, the disclaimer is needed and you can ignore this comment. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 5, 2011) is 4676 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 5285 (Obsoleted by RFC 8285)

  == Outdated reference: A later version (-06) exists of
     draft-ietf-avtext-mixer-to-client-audio-level-02


     Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	AVT                                                       J. Lennox, Ed.
3	Internet-Draft                                                     Vidyo
4	Intended status: Standards Track                                 E. Ivov
5	Expires: January 6, 2012                                           Jitsi
6	                                                              E. Marocco
7	                                                          Telecom Italia
8	                                                            July 5, 2011

10	 A Real-Time Transport Protocol (RTP) Header Extension for  Client-to-
11	                      Mixer Audio Level Indication
12	            draft-ietf-avtext-client-to-mixer-audio-level-03

14	Abstract

16	   This document defines a mechanism by which packets of Real-Time
17	   Transport Protocol (RTP) audio streams can indicate, in an RTP header
18	   extension, the audio level of the audio sample carried in the RTP
19	   packet.  In large conferences, this can reduce the load on an audio
20	   mixer or other middlebox which wants to forward only a few of the
21	   loudest audio streams, without requiring it to decode and measure
22	   every stream that is received.

24	Status of this Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on January 6, 2012.

41	Copyright Notice

43	   Copyright (c) 2011 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	   This document may contain material from IETF Documents or IETF
57	   Contributions published or made publicly available before November
58	   10, 2008.  The person(s) controlling the copyright in some of this
59	   material may not have granted the IETF Trust the right to allow
60	   modifications of such material outside the IETF Standards Process.
61	   Without obtaining an adequate license from the person(s) controlling
62	   the copyright in such materials, this document may not be modified
63	   outside the IETF Standards Process, and derivative works of it may
64	   not be created outside the IETF Standards Process, except to format
65	   it for publication as an RFC or to translate it into languages other
66	   than English.

68	Table of Contents

70	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
71	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
72	   3.  Audio Levels . . . . . . . . . . . . . . . . . . . . . . . . .  4
73	   4.  Signaling (Setup) Information  . . . . . . . . . . . . . . . .  6
74	   5.  Considerations on Use  . . . . . . . . . . . . . . . . . . . .  6
75	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . .  7
76	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  8
77	   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  8
78	     8.1.  Normative References . . . . . . . . . . . . . . . . . . .  8
79	     8.2.  Informative References . . . . . . . . . . . . . . . . . .  8
80	   Appendix A.  Changes From Earlier Versions . . . . . . . . . . . .  9
81	     A.1.  Changes From Draft -02 . . . . . . . . . . . . . . . . . .  9
82	     A.2.  Changes From Draft -01 . . . . . . . . . . . . . . . . . .  9
83	     A.3.  Changes From Draft -00 . . . . . . . . . . . . . . . . . . 10
84	     A.4.  Changes From Individual Submission Draft -01 . . . . . . . 10
85	     A.5.  Changes From Individual Submission Draft -00 . . . . . . . 10
86	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10

88	1.  Introduction

90	   In a centralized Real-Time Transport Protocol (RTP) [RFC3550] audio
91	   conference, an audio mixer or forwarder receives audio streams from
92	   many or all of the conference participants.  It then selectively
93	   forwards some of them to other participants in the conference.  In
94	   large conferences, it is possible that such a server might be
95	   receiving a large number of streams, of which only a few should be
96	   forwarded to the other conference participants.

98	   In such a scenario, in order to pick the audio streams to forward, a
99	   centralized server needs to decode, measure audio levels, and
100	   possibly perform voice activity detection on audio data from a large
101	   number of streams.  The need for such processing limits the size or
102	   number of conferences such a server can support.

104	   As an alternative, this document defines an RTP header extension
105	   [RFC5285] through which senders of audio packets can indicate the
106	   audio level of the packets' payload, reducing the processing load for
107	   a server.

109	   The header extension in this draft is different than, but
110	   complementary with, the one defined in
111	   [I-D.ietf-avtext-mixer-to-client-audio-level], which defines a
112	   mechanism by which audio mixers can indicate to clients the levels of
113	   the contributing sources that made up the mixed audio.

115	2.  Terminology

117	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
118	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
119	   document are to be interpreted as described in RFC 2119 [RFC2119] and
120	   indicate requirement levels for compliant implementations.

122	3.  Audio Levels

124	   The audio level header extension carries the level of the audio in
125	   the RTP payload of the packet it is associated with.  This
126	   information is carried in an RTP header extension element as defined
127	   by the "General Mechanism for RTP Header Extensions" [RFC5285].

129	   The payload of the audio level header extension element can be
130	   encoded using the one or the two-byte header defined in [RFC5285].
131	   Figure 1 and Figure 2 show sample audio level encodings with each of
132	   them.

134	          0                   1
135	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
136	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
137	         |  ID   | len=0 |V| level       |
138	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

140	   Sample audio level encoding using the one-byte header format

142	                                 Figure 1

144	        0                   1                   2                   3
145	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
146	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
147	       |      ID       |     len=1     |V|    level    |    0 (pad)    |
148	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

150	   Sample audio level encoding using the two-byte header format

152	                                 Figure 2

154	   Note that, as indicated in [RFC5285] length field in the one-byte
155	   header format takes the value 0 to indicate that 1 byte follows.  In
156	   the two-byte header format on the other hand it takes the value of 1.

158	   The magnitude of the audio level itself is packed into the seven
159	   least significant bits of the single byte of the header extension,
160	   shown in Figure 1 and Figure 2.  The least significant bit of the
161	   audio level magnitude is packed into the least significant bit of the
162	   byte.  The most significant bit of the byte is used as a separate
163	   flag bit "V", defined below.

165	   The audio level is expressed in -dBov, with values from 0 to 127
166	   representing 0 to -127 dBov. dBov is the level, in decibels, relative
167	   to the overload point of the system, i.e. the maximum-amplitude
168	   signal that can be handled by the system without clipping.  (Note:
169	   Representation relative to the overload point of a system is
170	   particularly useful for digital implementations, since one does not
171	   need to know the relative calibration of the analog circuitry.)  For
172	   example, in the case of u-law (audio/pcmu) audio [ITU.G711.1988], the
173	   0 dBov reference would be a square wave with values +/- 8031.  (This
174	   translates to 6.18 dBm0, relative to u-law's dBm0 definition in Table
175	   6 of G.711.)

177	   The audio level for digital silence, for example for a muted audio
178	   source, MUST be represented as 127 (-127 dBov), regardless of the
179	   dynamic range of the encoded audio format.

181	   The audio level header extension only carries the level of the audio
182	   in the RTP payload of the packet it is associated with, with no long-
183	   term averaging or smoothing applied.  That level is measured as a
184	   root mean square of all the samples in the measured range.

186	   To simplify implementation of the encoding procedures described here,
187	   the reference implementation section in
188	   [I-D.ietf-avtext-mixer-to-client-audio-level] provides a sample Java
189	   implementation of an audio level calculator that helps obtain such
190	   values from raw linear PCM audio samples.

192	   In addition, a flag bit (labeled V) indicates whether the encoder
193	   believes the audio packet contains voice activity (1) or does not
194	   (0).  The voice activity detection algorithm is unspecified and left
195	   implementation-specific.

197	   When this header extension is used with RTP data sent using the RTP
198	   Payload for Redundant Audio Data [RFC2198], the header's data
199	   describes the contents of the primary encoding.

201	   Note: This audio level is defined in the same manner as is audio
202	   noise level in the RTP Payload Comfort Noise specification [RFC3389].
203	   In the comfort noice specification, the overall magnitude of the
204	   noise level in comfort noise is encoded into the first byte of the
205	   payload, with spectral information about the noise in subsequent
206	   bytes.  This specification's audio level parameter is defined so as
207	   to be identical to the comfort noise payload's noise-level byte.

209	4.  Signaling (Setup) Information

211	   The URI for declaring this header extension in an extmap attribute is
212	   "urn:ietf:params:rtp-hdrext:ssrc-audio-level".  There is no
213	   additional setup information needed for this extension (i.e. no
214	   extensionattributes).

216	5.  Considerations on Use

218	   Mixers and forwarders generally should not base audio forwarding
219	   decisions directly on packet-by-packet audio level information, but
220	   rather should apply some analysis of the audio levels and trends.
221	   This general rule applies whether audio levels are provided by
222	   endpoints (as defined in this document), or are calculated at a
223	   server, as would be done in the absence of this information.  This
224	   section discusses several issues that mixers and forwarders may wish
225	   to take into account.  (Note that this section provides design
226	   guidance only, and is not normative.)

228	   First of all, audio levels should generally be measured over longer
229	   intervals than that of a single audio packet.  In order to avoid
230	   false-positives for short bursts of sound (such as a cough or a
231	   dropped microphone), it is often useful to require that a
232	   participant's audio level be maintained for some period of time
233	   before considering it to be "real", i.e. some type of low-pass filter
234	   should be applied to the audio levels.  Note, though, that such
235	   filtering must be balanced with the need to avoid clipping of the
236	   beginning of a speaker's speech.

238	   Additionally, different participants may have their audio input set
239	   differently.  It may be useful to apply some sort of automatic gain
240	   control to the audio levels.  There are a number of possible
241	   approaches to acheiving this, e.g. by measuring peak audio levels, by
242	   average audio levels during speech, or by measuring background audio
243	   levels (average audio level levels during non-speech).

245	6.  Security Considerations

247	   A malicious endpoint could choose to set the values in this header
248	   extension falsely, so as to falsely claim that audio or voice is or
249	   is not present.  It is not clear what could be gained by falsely
250	   claiming that audio is not present, but an endpoint falsely claiming
251	   that audio is present could perform a denial-of-service attack on an
252	   audio conference, so as to send silence to suppress other conference
253	   members' audio.  Thus, a device relying on audio level data from
254	   untrusted endpoints SHOULD periodically audit the level information
255	   transmitted, taking appropriate corrective action if endpoints appear
256	   to be sending incorrect data.  (Note that as it is valid for an
257	   endpoint to choose to measure audio levels prior to encoding, some
258	   degree of discrepancy SHOULD be tolerated.)

260	   In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP
261	   header extensions are authenticated but not encrypted.  When this
262	   header extension is used, audio levels are therefore visible on a
263	   packet-by-packet basis to an attacker passively observing the audio
264	   stream.  As discussed in [I-D.perkins-avt-srtp-vbr-audio], such an
265	   attacker might be able to infer information about the conversation,
266	   possibly with phoneme-level resolution.  In scenarios where this is a
267	   concern, additional mechanisms SHOULD be used to protect the
268	   confidentiality of the header extension.  One solution is header
269	   extension encryption [I-D.lennox-avtcore-srtp-encrypted-header-ext].

271	7.  IANA Considerations

273	   This document defines a new extension URI to the RTP Compact Header
274	   Extensions subregistry of the Real-Time Transport Protocol (RTP)
275	   Parameters registry, according to the following data:

277	   Extension URI:  urn:ietf:params:rtp-hdrext:ssrc-audio-level
278	   Description:  Audio Level
279	   Contact:  jonathan@vidyo.com
280	   Reference:  RFC XXXX

282	   Note to RFC Editor: please replace "RFC XXXX" with the number of this
283	   RFC.

285	8.  References

287	8.1.  Normative References

289	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
290	              Requirement Levels", BCP 14, RFC 2119, March 1997.

292	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
293	              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
294	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
295	              September 1997.

297	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
298	              Jacobson, "RTP: A Transport Protocol for Real-Time
299	              Applications", STD 64, RFC 3550, July 2003.

301	   [RFC5285]  Singer, D. and H. Desineni, "A General Mechanism for RTP
302	              Header Extensions", RFC 5285, July 2008.

304	8.2.  Informative References

306	   [I-D.ietf-avtext-mixer-to-client-audio-level]
307	              Ivov, E., Marocco, E., and J. Lennox, "A Real-Time
308	              Transport Protocol (RTP) Header Extension for Mixer-to-
309	              Client Audio Level Indication",
310	              draft-ietf-avtext-mixer-to-client-audio-level-02 (work in
311	              progress), May 2011.

313	   [I-D.lennox-avtcore-srtp-encrypted-header-ext]
314	              Lennox, J., "Encryption of Header Extensions in the Secure
315	              Real-Time Transport Protocol (SRTP)",
316	              draft-lennox-avtcore-srtp-encrypted-header-ext-00 (work in
317	              progress), March 2011.

319	   [I-D.perkins-avt-srtp-vbr-audio]
320	              Perkins, C. and J. Valin, "Guidelines for the use of
321	              Variable Bit Rate Audio with Secure RTP",
322	              draft-perkins-avt-srtp-vbr-audio-05 (work in progress),
323	              December 2010.

325	   [ITU.G711.1988]
326	              International Telecommunications Union, "Pulse Code
327	              Modulation (PCM) of Voice Frequencies", ITU-
328	              T Recommendation G.711, November 1988.

330	   [RFC3389]  Zopf, R., "Real-time Transport Protocol (RTP) Payload for
331	              Comfort Noise (CN)", RFC 3389, September 2002.

333	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
334	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
335	              RFC 3711, March 2004.

337	Appendix A.  Changes From Earlier Versions

339	   Note to the RFC-Editor: please remove this section prior to
340	   publication as an RFC.

342	A.1.  Changes From Draft -02

344	   o  Changed encoding related text so that it would cover both the one-
345	      byte and the two-byte header formats.
346	   o  Clarified use of root mean square for dBov calculation
347	   o  Other minor editorial changes.

349	A.2.  Changes From Draft -01

351	   o  Changed the URI for declaring this header extension from
352	      "urn:ietf:params:rtp-hdrext:audio-level" to
353	      "urn:ietf:params:rtp-hdrext:ssrc-audio-level" for consistency with
354	      [I-D.ietf-avtext-mixer-to-client-audio-level].
355	   o  Removed the "Limitations" section; it was discussing a potential
356	      extension that consensus indicated was out of scope of this
357	      document.
358	   o  Closed the P.56 open issue.  It was agreed on IETF 80 that P.56 is
359	      mostly about speech levels and the levels transported by the
360	      extension defined here should also be able to serve as an
361	      indication for noise.
362	   o  Closed the open issue about transmitting noise floor information.
363	      Noise floor is (loosely) inferrable by observing the per-packet
364	      level information over a period of time, so the additional
365	      complexity seemed unnecessary.

367	   o  Editorial changes for consistency with
368	      [I-D.ietf-avtext-mixer-to-client-audio-level].
369	   o  Moved several descriptions of normative items that previously had
370	      only been described in informative sections of the text.
371	   o  Other editorial clarifications.

373	A.3.  Changes From Draft -00

375	   o  Added references to the sample level calculator in
376	      [I-D.ietf-avtext-mixer-to-client-audio-level].
377	   o  Changed affiliation for Emil Ivov.

379	A.4.  Changes From Individual Submission Draft -01

381	   o  This version is primarily a document refresh.
382	   o  Emil Ivov and Enrico Marocco have been added as co-authors.
383	   o  Additional open issues listed.

385	A.5.  Changes From Individual Submission Draft -00

387	   o  The draft name has been changed to clarify that this document
388	      defines Client-To-Mixer Audio Levels, to more clearly distinguish
389	      it from [I-D.ietf-avtext-mixer-to-client-audio-level].
390	   o  The header extension format has been changed from a two-byte to a
391	      one-byte payload, eliminating the 7 reserved bits and the one
392	      must-be-zero bit.
393	   o  The sections Considerations on Use (Section 5) and Limitations
394	      have been added.
395	   o  It has been noted that senders MAY indicate -127 dBov for digital
396	      silence, and that level measurement MAY be done prior to encoding
397	      audio.
398	   o  A reference to [I-D.lennox-avtcore-srtp-encrypted-header-ext] has
399	      been added to the security considerations.
400	   o  The term "header extension" is now used consistentenly throughout
401	      the document (as opposed to "extension header").

403	Authors' Addresses

405	   Jonathan Lennox (editor)
406	   Vidyo, Inc.
407	   433 Hackensack Avenue
408	   Seventh Floor
409	   Hackensack, NJ  07601
410	   US

412	   Email: jonathan@vidyo.com
413	   Emil Ivov
414	   Jitsi
415	   Strasbourg  67000
416	   France

418	   Email: emcho@jitsi.org

420	   Enrico Marocco
421	   Telecom Itialia
422	   Via G. Reiss Romoli, 274
423	   Turin  10148
424	   Italy

426	   Email: enrico.marocco@telecomitalia.it