idnits 2.17.1 

draft-ietf-avtext-client-to-mixer-audio-level-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 14, 2011) is 4547 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 5285 (Obsoleted by RFC 8285)

  == Outdated reference: A later version (-05) exists of
     draft-ietf-avtcore-srtp-encrypted-header-ext-01

  == Outdated reference: A later version (-04) exists of
     draft-ietf-avtcore-srtp-vbr-audio-03

  == Outdated reference: A later version (-06) exists of
     draft-ietf-avtext-mixer-to-client-audio-level-05


     Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	AVT                                                       J. Lennox, Ed.
3	Internet-Draft                                                     Vidyo
4	Intended status: Standards Track                                 E. Ivov
5	Expires: May 17, 2012                                              Jitsi
6	                                                              E. Marocco
7	                                                          Telecom Italia
8	                                                       November 14, 2011

10	 A Real-Time Transport Protocol (RTP) Header Extension for  Client-to-
11	                      Mixer Audio Level Indication
12	            draft-ietf-avtext-client-to-mixer-audio-level-06

14	Abstract

16	   This document defines a mechanism by which packets of Real-Time
17	   Transport Protocol (RTP) audio streams can indicate, in an RTP header
18	   extension, the audio level of the audio sample carried in the RTP
19	   packet.  In large conferences, this can reduce the load on an audio
20	   mixer or other middlebox which wants to forward only a few of the
21	   loudest audio streams, without requiring it to decode and measure
22	   every stream that is received.

24	Status of this Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on May 17, 2012.

41	Copyright Notice

43	   Copyright (c) 2011 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	Table of Contents

58	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
59	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
60	   3.  Audio Levels . . . . . . . . . . . . . . . . . . . . . . . . .  3
61	   4.  Signaling (Setup) Information  . . . . . . . . . . . . . . . .  5
62	   5.  Considerations on Use  . . . . . . . . . . . . . . . . . . . .  6
63	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . .  7
64	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  7
65	   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  8
66	     8.1.  Normative References . . . . . . . . . . . . . . . . . . .  8
67	     8.2.  Informative References . . . . . . . . . . . . . . . . . .  8
68	   Appendix A.  Changes From Earlier Versions . . . . . . . . . . . .  9
69	     A.1.  Changes From Draft -05 . . . . . . . . . . . . . . . . . .  9
70	     A.2.  Changes From Draft -04 . . . . . . . . . . . . . . . . . .  9
71	     A.3.  Changes From Draft -03 . . . . . . . . . . . . . . . . . .  9
72	     A.4.  Changes From Draft -02 . . . . . . . . . . . . . . . . . . 10
73	     A.5.  Changes From Draft -01 . . . . . . . . . . . . . . . . . . 10
74	     A.6.  Changes From Individual Submission Draft -01 . . . . . . . 10
75	     A.7.  Changes From Individual Submission Draft -00 . . . . . . . 10
76	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11

78	1.  Introduction

80	   In a centralized Real-Time Transport Protocol (RTP) [RFC3550] audio
81	   conference, an audio mixer or forwarder receives audio streams from
82	   many or all of the conference participants.  It then selectively
83	   forwards some of them to other participants in the conference.  In
84	   large conferences, it is possible that such a server might be
85	   receiving a large number of streams, of which only a few are intended
86	   to be forwarded to the other conference participants.

88	   In such a scenario, in order to pick the audio streams to forward, a
89	   centralized server needs to decode, measure audio levels, and
90	   possibly perform voice activity detection on audio data from a large
91	   number of streams.  The need for such processing limits the size or
92	   number of conferences such a server can support.

94	   As an alternative, this document defines an RTP header extension
95	   [RFC5285] through which senders of audio packets can indicate the
96	   audio level of the packets' payload, reducing the processing load for
97	   a server.

99	   The header extension in this draft is different than, but
100	   complementary with, the one defined in
101	   [I-D.ietf-avtext-mixer-to-client-audio-level], which defines a
102	   mechanism by which audio mixers can indicate to clients the levels of
103	   the contributing sources that made up the mixed audio.

105	2.  Terminology

107	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
108	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
109	   document are to be interpreted as described in RFC 2119 [RFC2119] and
110	   indicate requirement levels for compliant implementations.

112	3.  Audio Levels

114	   The audio level header extension carries the level of the audio in
115	   the RTP [RFC3550] payload of the packet it is associated with.  This
116	   information is carried in an RTP header extension element as defined
117	   by the "General Mechanism for RTP Header Extensions" [RFC5285].

119	   The payload of the audio level header extension element can be
120	   encoded using the one-byte or the two-byte header defined in
121	   [RFC5285].  Figure 1 and Figure 2 show sample audio level encodings
122	   with each of them.

124	          0                   1
125	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
126	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
127	         |  ID   | len=0 |V| level       |
128	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

130	   Sample audio level encoding using the one-byte header format

132	                                 Figure 1

134	        0                   1                   2                   3
135	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
136	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
137	       |      ID       |     len=1     |V|    level    |    0 (pad)    |
138	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

140	   Sample audio level encoding using the two-byte header format

142	                                 Figure 2

144	   Note that, as indicated in [RFC5285] length field in the one-byte
145	   header format takes the value 0 to indicate that 1 byte follows.  In
146	   the two-byte header format on the other hand it takes the value of 1.

148	   The magnitude of the audio level itself is packed into the seven
149	   least significant bits of the single byte of the header extension,
150	   shown in Figure 1 and Figure 2.  The least significant bit of the
151	   audio level magnitude is packed into the least significant bit of the
152	   byte.  The most significant bit of the byte is used as a separate
153	   flag bit "V", defined below.

155	   The audio level is expressed in -dBov, with values from 0 to 127
156	   representing 0 to -127 dBov. dBov is the level, in decibels, relative
157	   to the overload point of the system, i.e. the highest-intensity
158	   signal encodable by the payload format.  (Note: Representation
159	   relative to the overload point of a system is particularly useful for
160	   digital implementations, since one does not need to know the relative
161	   calibration of the analog circuitry.)  For example, in the case of
162	   u-law (audio/pcmu) audio [ITU.G711.1988], the 0 dBov reference would
163	   be a square wave with values +/- 8031.  (This translates to 6.18
164	   dBm0, relative to u-law's dBm0 definition in Table 6 of G.711.)

166	   The audio level for digital silence, for example for a muted audio
167	   source, MUST be represented as 127 (-127 dBov), regardless of the
168	   dynamic range of the encoded audio format.

170	   The audio level header extension only carries the level of the audio
171	   in the RTP payload of the packet it is associated with, with no long-
172	   term averaging or smoothing applied.  For payload formats that
173	   contain extra error-correction bits or loss-concealment information,
174	   the level corresponds only to the data that would result from the
175	   payload's normal decoding process, not what it would produce under
176	   error or packet loss concealment.  The level is measured as a root
177	   mean square of all the samples in the audio encoded by the packet.

179	   To simplify implementation of the encoding procedures described here,
180	   the reference implementation section in
181	   [I-D.ietf-avtext-mixer-to-client-audio-level] provides a sample Java
182	   implementation of an audio level calculator that helps obtain such
183	   values from raw linear PCM audio samples.

185	   In addition, a flag bit (labeled V) optionally indicates whether the
186	   encoder believes the audio packet contains voice activity.  If the V
187	   bit is in use, the value 1 indicates that the encoder believes the
188	   audio packet contains voice activity, and the value 0 indicates that
189	   the encoder believes it does not.  (The voice activity detection
190	   algorithm is unspecified and left implementation-specific.)  If the V
191	   bit is not in use, its value is unspecified and MUST be ignored by
192	   receivers.  The use of the V bit is signaled using the extension
193	   attribute "vad", discussed in Section 4.

195	   When this header extension is used with RTP data sent using the RTP
196	   Payload for Redundant Audio Data [RFC2198], the header's data
197	   describes the contents of the primary encoding.

199	   Note: This audio level is defined in the same manner as is audio
200	   noise level in the RTP Payload Comfort Noise specification [RFC3389].
201	   In the comfort noise specification, the overall magnitude of the
202	   noise level in comfort noise is encoded into the first byte of the
203	   payload, with spectral information about the noise in subsequent
204	   bytes.  This specification's audio level parameter is defined so as
205	   to be identical to the comfort noise payload's noise-level byte.

207	4.  Signaling (Setup) Information

209	   The URI for declaring this header extension in an extmap attribute is
210	   "urn:ietf:params:rtp-hdrext:ssrc-audio-level".

212	   It has a single extension attribute, named "vad".  It takes the form
213	   "vad=on" or "vad=off".  If the header extension element is signaled
214	   with "vad=on", the "V" bit described in Section 3 is in use, and MUST
215	   be set by senders.  If the header extension element is signaled with
216	   "vad=off", the "V" bit is not in use, and its value MUST be ignored
217	   by receivers.  If the "vad" extension attribute is not specified, the
218	   default is "vad=on".

220	   An example attribute line in the SDP, for a conference might hence
221	   be:

223	       a=extmap:6 urn:ietf:params:rtp-hdrext:ssrc-audio-level vad=on

225	   The "vad" extension attribute only controls the semantics of this
226	   header extension attribute, and does not make any statement about
227	   whether the sender is using any other voice activity detection
228	   features such as discontinuous transmission, comfort noise, or
229	   silence suppression.

231	   Using the mechanisms of [RFC5285], an endpoint MAY signal multiple
232	   instances of the header extension element, with different values of
233	   the vad attribute, so long as these instances use different values
234	   for the extension identifier.  However, again following the rules of
235	   [RFC5285], the semantics chosen for a header extension element
236	   (including its vad setting) for a particular extension identifier
237	   value MUST NOT be changed within an RTP session.

239	5.  Considerations on Use

241	   Mixers and forwarders generally ought not base audio forwarding
242	   decisions directly on packet-by-packet audio level information, but
243	   rather ought to apply some analysis of the audio levels and trends.
244	   This general rule applies whether audio levels are provided by
245	   endpoints (as defined in this document), or are calculated at a
246	   server, as would be done in the absence of this information.  This
247	   section discusses several issues that mixers and forwarders may wish
248	   to take into account.  (Note that this section provides design
249	   guidance only, and is not normative.)

251	   First of all, audio levels generally ought to be measured over longer
252	   intervals than that of a single audio packet.  In order to avoid
253	   false-positives for short bursts of sound (such as a cough or a
254	   dropped microphone), it is often useful to require that a
255	   participant's audio level be maintained for some period of time
256	   before considering it to be "real", i.e. some type of low-pass filter
257	   ought to be applied to the audio levels.  Note, though, that such
258	   filtering must be balanced with the need to avoid clipping of the
259	   beginning of a speaker's speech.

261	   Additionally, different participants may have their audio input set
262	   differently.  It may be useful to apply some sort of automatic gain
263	   control to the audio levels.  There are a number of possible
264	   approaches to acheiving this, e.g. by measuring peak audio levels, by
265	   average audio levels during speech, or by measuring background audio
266	   levels (average audio level levels during non-speech).

268	6.  Security Considerations

270	   A malicious endpoint could choose to set the values in this header
271	   extension falsely, so as to falsely claim that audio or voice is or
272	   is not present.  It is not clear what could be gained by falsely
273	   claiming that audio is not present, but an endpoint falsely claiming
274	   that audio is present could perform a denial-of-service attack on an
275	   audio conference, so as to send silence to suppress other conference
276	   members' audio, or could dominate a conference (by seizing its
277	   speaker-selection algorithm) without actually speaking.  Thus, if a
278	   device relies on audio level data from untrusted endpoints, it SHOULD
279	   periodically audit the level information transmitted, taking
280	   appropriate corrective action against endpoints that appear to be
281	   sending incorrect data.  (However, as it is valid for an endpoint to
282	   choose to measure audio levels prior to encoding, some degree of
283	   discrepancy could be present.  This would not indicate that an
284	   endpoint is malicous.)

286	   In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP
287	   header extensions are authenticated but not encrypted.  When this
288	   header extension is used, audio levels are therefore visible on a
289	   packet-by-packet basis to an attacker passively observing the audio
290	   stream.  As discussed in [I-D.ietf-avtcore-srtp-vbr-audio], such an
291	   attacker might be able to infer information about the conversation,
292	   possibly with phoneme-level resolution.  In scenarios where this is a
293	   concern, additional mechanisms MUST be used to protect the
294	   confidentiality of the header extension.  This mechanism could be
295	   header extension encryption
296	   [I-D.ietf-avtcore-srtp-encrypted-header-ext], or a lower-level
297	   security and authentication mechanism such as IPsec [RFC4301].

299	7.  IANA Considerations

301	   This document defines a new extension URI to the RTP Compact Header
302	   Extensions subregistry of the Real-Time Transport Protocol (RTP)
303	   Parameters registry, according to the following data:

305	   Extension URI:  urn:ietf:params:rtp-hdrext:ssrc-audio-level
306	   Description:  Audio Level
307	   Contact:  jonathan@vidyo.com
308	   Reference:  RFC XXXX

310	   Note to RFC Editor: please replace "RFC XXXX" with the number of this
311	   RFC.

313	8.  References

315	8.1.  Normative References

317	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
318	              Requirement Levels", BCP 14, RFC 2119, March 1997.

320	   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
321	              Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
322	              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
323	              September 1997.

325	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
326	              Jacobson, "RTP: A Transport Protocol for Real-Time
327	              Applications", STD 64, RFC 3550, July 2003.

329	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
330	              Internet Protocol", RFC 4301, December 2005.

332	   [RFC5285]  Singer, D. and H. Desineni, "A General Mechanism for RTP
333	              Header Extensions", RFC 5285, July 2008.

335	8.2.  Informative References

337	   [I-D.ietf-avtcore-srtp-encrypted-header-ext]
338	              Lennox, J., "Encryption of Header Extensions in the Secure
339	              Real-Time Transport Protocol (SRTP)",
340	              draft-ietf-avtcore-srtp-encrypted-header-ext-01 (work in
341	              progress), October 2011.

343	   [I-D.ietf-avtcore-srtp-vbr-audio]
344	              Perkins, C. and J. Valin, "Guidelines for the use of
345	              Variable Bit Rate Audio with Secure RTP",
346	              draft-ietf-avtcore-srtp-vbr-audio-03 (work in progress),
347	              July 2011.

349	   [I-D.ietf-avtext-mixer-to-client-audio-level]
350	              Ivov, E., Marocco, E., and J. Lennox, "A Real-Time
351	              Transport Protocol (RTP) Header Extension for Mixer-to-
352	              Client Audio Level Indication",
353	              draft-ietf-avtext-mixer-to-client-audio-level-05 (work in
354	              progress), September 2011.

356	   [ITU.G711.1988]
357	              International Telecommunications Union, "Pulse Code
358	              Modulation (PCM) of Voice Frequencies", ITU-
359	              T Recommendation G.711, November 1988.

361	   [RFC3389]  Zopf, R., "Real-time Transport Protocol (RTP) Payload for
362	              Comfort Noise (CN)", RFC 3389, September 2002.

364	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
365	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
366	              RFC 3711, March 2004.

368	Appendix A.  Changes From Earlier Versions

370	   Note to the RFC-Editor: please remove this section prior to
371	   publication as an RFC.

373	A.1.  Changes From Draft -05

375	   o  Added an informative reference to RFC 4301 (IPsec).  (Brought up
376	      by Stephen Farrell)
377	   o  Clarified the meaning of "overload point of the system".  (Brought
378	      up by Robert Sparks).
379	   o  Clarified that levels correspond only to the audio carried in the
380	      normal decoding process, not error or packet loss concealment.
381	      (Brought up by Robert Sparks).
382	   o  Added security consideration that false audio levels could be used
383	      to seize a speaker-selection algorithm (Brought up by Robert
384	      Sparks and Stewart Bryant).
385	   o  Updated reference to [I-D.ietf-avtcore-srtp-vbr-audio].

387	A.2.  Changes From Draft -04

389	   o  Adjusted IPR header.

391	A.3.  Changes From Draft -03

393	   o  Added vad extension attribute to negotiate use of the V bit.
394	   o  Addressed editorial comments made on the mailing list.

396	A.4.  Changes From Draft -02

398	   o  Changed encoding related text so that it would cover both the one-
399	      byte and the two-byte header formats.
400	   o  Clarified use of root mean square for dBov calculation
401	   o  Added references to the sample level calculator in
402	      [I-D.ietf-avtext-mixer-to-client-audio-level].
403	   o  Changed affiliation for Emil Ivov.
404	   o  Other minor editorial changes.

406	A.5.  Changes From Draft -01

408	   o  Changed the URI for declaring this header extension from
409	      "urn:ietf:params:rtp-hdrext:audio-level" to
410	      "urn:ietf:params:rtp-hdrext:ssrc-audio-level" for consistency with
411	      [I-D.ietf-avtext-mixer-to-client-audio-level].
412	   o  Removed the "Limitations" section; it was discussing a potential
413	      extension that consensus indicated was out of scope of this
414	      document.
415	   o  Closed the P.56 open issue.  It was agreed on IETF 80 that P.56 is
416	      mostly about speech levels and the levels transported by the
417	      extension defined here should also be able to serve as an
418	      indication for noise.
419	   o  Closed the open issue about transmitting noise floor information.
420	      Noise floor is (loosely) inferrable by observing the per-packet
421	      level information over a period of time, so the additional
422	      complexity seemed unnecessary.
423	   o  Editorial changes for consistency with
424	      [I-D.ietf-avtext-mixer-to-client-audio-level].
425	   o  Moved several descriptions of normative items that previously had
426	      only been described in informative sections of the text.
427	   o  Other editorial clarifications.

429	A.6.  Changes From Individual Submission Draft -01

431	   o  This version is primarily a document refresh.
432	   o  Emil Ivov and Enrico Marocco have been added as co-authors.
433	   o  Additional open issues listed.

435	A.7.  Changes From Individual Submission Draft -00

437	   o  The draft name has been changed to clarify that this document
438	      defines Client-To-Mixer Audio Levels, to more clearly distinguish
439	      it from [I-D.ietf-avtext-mixer-to-client-audio-level].
440	   o  The header extension format has been changed from a two-byte to a
441	      one-byte payload, eliminating the 7 reserved bits and the one
442	      must-be-zero bit.

444	   o  The sections Considerations on Use (Section 5) and Limitations
445	      have been added.
446	   o  It has been noted that senders MAY indicate -127 dBov for digital
447	      silence, and that level measurement MAY be done prior to encoding
448	      audio.
449	   o  A reference to [I-D.ietf-avtcore-srtp-encrypted-header-ext] has
450	      been added to the security considerations.
451	   o  The term "header extension" is now used consistentenly throughout
452	      the document (as opposed to "extension header").

454	Authors' Addresses

456	   Jonathan Lennox (editor)
457	   Vidyo, Inc.
458	   433 Hackensack Avenue
459	   Seventh Floor
460	   Hackensack, NJ  07601
461	   US

463	   Email: jonathan@vidyo.com

465	   Emil Ivov
466	   Jitsi
467	   Strasbourg  67000
468	   France

470	   Email: emcho@jitsi.org

472	   Enrico Marocco
473	   Telecom Itialia
474	   Via G. Reiss Romoli, 274
475	   Turin  10148
476	   Italy

478	   Email: enrico.marocco@telecomitalia.it