idnits 2.17.1 draft-ietf-avtext-client-to-mixer-audio-level-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2, 2011) is 4706 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5285 (Obsoleted by RFC 8285) == Outdated reference: A later version (-06) exists of draft-ietf-avtext-mixer-to-client-audio-level-02 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 AVT J. Lennox, Ed. 3 Internet-Draft Vidyo 4 Intended status: Standards Track E. Ivov 5 Expires: December 4, 2011 Jitsi 6 E. Marocco 7 Telecom Italia 8 June 2, 2011 10 A Real-Time Transport Protocol (RTP) Header Extension for Client-to- 11 Mixer Audio Level Indication 12 draft-ietf-avtext-client-to-mixer-audio-level-02 14 Abstract 16 This document defines a mechanism by which packets of Real-Time 17 Transport Protocol (RTP) audio streams can indicate, in an RTP header 18 extension, the audio level of the audio sample carried in the RTP 19 packet. In large conferences, this can reduce the load on an audio 20 mixer or other middlebox which wants to forward only a few of the 21 loudest audio streams, without requiring it to decode and measure 22 every stream that is received. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on December 4, 2011. 41 Copyright Notice 43 Copyright (c) 2011 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 This document may contain material from IETF Documents or IETF 57 Contributions published or made publicly available before November 58 10, 2008. The person(s) controlling the copyright in some of this 59 material may not have granted the IETF Trust the right to allow 60 modifications of such material outside the IETF Standards Process. 61 Without obtaining an adequate license from the person(s) controlling 62 the copyright in such materials, this document may not be modified 63 outside the IETF Standards Process, and derivative works of it may 64 not be created outside the IETF Standards Process, except to format 65 it for publication as an RFC or to translate it into languages other 66 than English. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 3. Audio Levels . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 4. Signaling (Setup) Information . . . . . . . . . . . . . . . . 6 74 5. Considerations on Use . . . . . . . . . . . . . . . . . . . . 6 75 6. Security Considerations . . . . . . . . . . . . . . . . . . . 7 76 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 77 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 78 8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 79 8.2. Informative References . . . . . . . . . . . . . . . . . . 8 80 Appendix A. Changes From Earlier Versions . . . . . . . . . . . . 9 81 A.1. Changes From Draft -01 . . . . . . . . . . . . . . . . . . 9 82 A.2. Changes From Draft -00 . . . . . . . . . . . . . . . . . . 9 83 A.3. Changes From Individual Submission Draft -01 . . . . . . . 10 84 A.4. Changes From Individual Submission Draft -00 . . . . . . . 10 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 87 1. Introduction 89 In a centralized Real-Time Transport Protocol (RTP) [RFC3550] audio 90 conference, an audio mixer or forwarder receives audio streams from 91 many or all of the conference participants. It then selectively 92 forwards some of them to other participants in the conference. In 93 large conferences, it is possible that such a server might be 94 receiving a large number of streams, of which only a few should be 95 forwarded to the other conference participants. 97 In such a scenario, in order to pick the audio streams to forward, a 98 centralized server needs to decode, measure audio levels, and 99 possibly perform voice activity detection on audio data from a large 100 number of streams. The need for such processing limits the size or 101 number of conferences such a server can support. 103 As an alternative, this document defines an RTP header extension 104 [RFC5285] through which senders of audio packets can indicate the 105 audio level of the packets' payload, reducing the processing load for 106 a server. 108 The header extension in this draft is different than, but 109 complementary with, the one defined in 110 [I-D.ietf-avtext-mixer-to-client-audio-level], which defines a 111 mechanism by which audio mixers can indicate to clients the levels of 112 the contributing sources that made up the mixed audio. 114 2. Terminology 116 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 117 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 118 document are to be interpreted as described in RFC 2119 [RFC2119] and 119 indicate requirement levels for compliant implementations. 121 3. Audio Levels 123 The audio level header extension element carries the level of the 124 audio in the RTP payload of the packet it is associated with, and 125 also an indication as to whether voice activity has been detected in 126 the packet. This information is carried in an RTP header extension 127 element as defined by [RFC5285]. 129 The payload of the audio level header extension element is as 130 follows: 132 0 1 133 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 134 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 135 | ID | len=0 |V| level | 136 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 138 Figure 1 140 The length field takes the value 0 to indicate that 1 byte follows. 142 The two-byte header defined in RFC 5285 [RFC5285] may also be used. 144 The magnitude of the audio level is packed into the seven least 145 significant bits of the single byte of the header extension, shown in 146 Figure 1. The least significant bit of the audio level magnitude is 147 packed into the least significant bit of the byte. The most 148 significant bit of the byte is used as a separate flag bit "V", 149 defined below. 151 The audio level is expressed in -dBov, with values from 0 to 127 152 representing 0 to -127 dBov. dBov is the level, in decibels, relative 153 to the overload point of the system, i.e. the maximum-amplitude 154 signal that can be handled by the system without clipping. (Note: 155 Representation relative to the overload point of a system is 156 particularly useful for digital implementations, since one does not 157 need to know the relative calibration of the analog circuitry.) For 158 example, in the case of u-law (audio/pcmu) audio [ITU.G711.1988], the 159 0 dBov reference would be a square wave with values +/- 8031. (This 160 translates to 6.18 dBm0, relative to u-law's dBm0 definition in Table 161 6 of G.711.) 163 The audio level for digital silence, for example for a muted audio 164 source, MAY be represented as 127 (-127 dBov), regardless of the 165 dynamic range of the encoded audio format. 167 Implementations MAY choose to measure audio levels prior to encoding 168 them in the payload carried in the RTP payload, e.g. on raw linear 169 PCM input. 171 The audio level header extension only carries the level of the audio 172 in the RTP payload of the packet it is associated with, with no long- 173 term averaging or smoothing applied. 175 To simplify implementation of the encoding procedures described here, 176 the reference implementation section in 177 [I-D.ietf-avtext-mixer-to-client-audio-level] provides a sample Java 178 implementation of an audio level calculator that helps obtain such 179 values from raw linear PCM audio samples. 181 In addition, a flag bit (labeled V) indicates whether the encoder 182 believes the audio packet contains voice activity (1) or does not 183 (0). The voice activity detection algorithm is unspecified and left 184 implementation-specific. 186 When this header extension is used with RTP data sent using the RTP 187 Payload for Redundant Audio Data [RFC2198], the header's data 188 describes the contents of the primary encoding. 190 Note: This audio level is defined in the same manner as is audio 191 noise level in the RTP Payload Comfort Noise specification [RFC3389]. 192 In the comfort noice specification, the overall magnitude of the 193 noise level in comfort noise is encoded into the first byte of the 194 payload, with spectral information about the noise in subsequent 195 bytes. This specification's audio level parameter is defined so as 196 to be identical to the comfort noise payload's noise-level byte. 198 4. Signaling (Setup) Information 200 The URI for declaring this header extension in an extmap attribute is 201 "urn:ietf:params:rtp-hdrext:ssrc-audio-level". There is no 202 additional setup information needed for this extension (i.e. no 203 extensionattributes). 205 5. Considerations on Use 207 Mixers and forwarders generally should not base audio forwarding 208 decisions directly on packet-by-packet audio level information, but 209 rather should apply some analysis of the audio levels and trends. 210 This general rule applies whether audio levels are provided by 211 endpoints (as defined in this document), or are calculated at a 212 server, as would be done in the absence of this information. This 213 section discusses several issues that mixers and forwarders may wish 214 to take into account. (Note that this section provides design 215 guidance only, and is not normative.) 217 First of all, audio levels should generally be measured over longer 218 intervals than that of a single audio packet. In order to avoid 219 false-positives for short bursts of sound (such as a cough or a 220 dropped microphone), it is often useful to require that a 221 participant's audio level be maintained for some period of time 222 before considering it to be "real", i.e. some type of low-pass filter 223 should be applied to the audio levels. Note, though, that such 224 filtering must be balanced with the need to avoid clipping of the 225 beginning of a speaker's speech. 227 Additionally, different participants may have their audio input set 228 differently. It may be useful to apply some sort of automatic gain 229 control to the audio levels. There are a number of possible 230 approaches to acheiving this, e.g. by measuring peak audio levels, by 231 average audio levels during speech, or by measuring background audio 232 levels (average audio level levels during non-speech). 234 6. Security Considerations 236 A malicious endpoint could choose to set the values in this header 237 extension falsely, so as to falsely claim that audio or voice is or 238 is not present. It is not clear what could be gained by falsely 239 claiming that audio is not present, but an endpoint falsely claiming 240 that audio is present could perform a denial-of-service attack on an 241 audio conference, so as to send silence to suppress other conference 242 members' audio. Thus, a device relying on audio level data from 243 untrusted endpoints SHOULD periodically audit the level information 244 transmitted, taking appropriate corrective action if endpoints appear 245 to be sending incorrect data. (Note that as it is valid for an 246 endpoint to choose to measure audio levels prior to encoding, some 247 degree of discrepancy SHOULD be tolerated.) 249 In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP 250 header extensions are authenticated but not encrypted. When this 251 header extension is used, audio levels are therefore visible on a 252 packet-by-packet basis to an attacker passively observing the audio 253 stream. As discussed in [I-D.perkins-avt-srtp-vbr-audio], such an 254 attacker might be able to infer information about the conversation, 255 possibly with phoneme-level resolution. In scenarios where this is a 256 concern, additional mechanisms SHOULD be used to protect the 257 confidentiality of the header extension. One solution is header 258 extension encryption [I-D.lennox-avtcore-srtp-encrypted-header-ext]. 260 7. IANA Considerations 262 This document defines a new extension URI to the RTP Compact Header 263 Extensions subregistry of the Real-Time Transport Protocol (RTP) 264 Parameters registry, according to the following data: 266 Extension URI: urn:ietf:params:rtp-hdrext:ssrc-audio-level 267 Description: Audio Level 268 Contact: jonathan@vidyo.com 269 Reference: RFC XXXX 271 Note to RFC Editor: please replace "RFC XXXX" with the number of this 272 RFC. 274 8. References 276 8.1. Normative References 278 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 279 Requirement Levels", BCP 14, RFC 2119, March 1997. 281 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 282 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 283 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 284 September 1997. 286 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 287 Jacobson, "RTP: A Transport Protocol for Real-Time 288 Applications", STD 64, RFC 3550, July 2003. 290 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 291 Header Extensions", RFC 5285, July 2008. 293 8.2. Informative References 295 [I-D.ietf-avtext-mixer-to-client-audio-level] 296 Ivov, E., Marocco, E., and J. Lennox, "A Real-Time 297 Transport Protocol (RTP) Header Extension for Mixer-to- 298 Client Audio Level Indication", 299 draft-ietf-avtext-mixer-to-client-audio-level-02 (work in 300 progress), May 2011. 302 [I-D.lennox-avtcore-srtp-encrypted-header-ext] 303 Lennox, J., "Encryption of Header Extensions in the Secure 304 Real-Time Transport Protocol (SRTP)", 305 draft-lennox-avtcore-srtp-encrypted-header-ext-00 (work in 306 progress), March 2011. 308 [I-D.perkins-avt-srtp-vbr-audio] 309 Perkins, C. and J. Valin, "Guidelines for the use of 310 Variable Bit Rate Audio with Secure RTP", 311 draft-perkins-avt-srtp-vbr-audio-05 (work in progress), 312 December 2010. 314 [ITU.G711.1988] 315 International Telecommunications Union, "Pulse Code 316 Modulation (PCM) of Voice Frequencies", ITU- 317 T Recommendation G.711, November 1988. 319 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 320 Comfort Noise (CN)", RFC 3389, September 2002. 322 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 323 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 324 RFC 3711, March 2004. 326 Appendix A. Changes From Earlier Versions 328 Note to the RFC-Editor: please remove this section prior to 329 publication as an RFC. 331 A.1. Changes From Draft -01 333 o Changed the URI for declaring this header extension from 334 "urn:ietf:params:rtp-hdrext:audio-level" to 335 "urn:ietf:params:rtp-hdrext:ssrc-audio-level" for consistency with 336 [I-D.ietf-avtext-mixer-to-client-audio-level]. 337 o Removed the "Limitations" section; it was discussing a potential 338 extension that consensus indicated was out of scope of this 339 document. 340 o Closed the P.56 open issue. It was agreed on IETF 80 that P.56 is 341 mostly about speech levels and the levels transported by the 342 extension defined here should also be able to serve as an 343 indication for noise. 344 o Closed the open issue about transmitting noise floor information. 345 Noise floor is (loosely) inferrable by observing the per-packet 346 level information over a period of time, so the additional 347 complexity seemed unnecessary. 348 o Editorial changes for consistency with 349 [I-D.ietf-avtext-mixer-to-client-audio-level]. 350 o Moved several descriptions of normative items that previously had 351 only been described in informative sections of the text. 352 o Other editorial clarifications. 354 A.2. Changes From Draft -00 356 o Added references to the sample level calculator in 357 [I-D.ietf-avtext-mixer-to-client-audio-level]. 358 o Changed affiliation for Emil Ivov. 360 A.3. Changes From Individual Submission Draft -01 362 o This version is primarily a document refresh. 363 o Emil Ivov and Enrico Marocco have been added as co-authors. 364 o Additional open issues listed. 366 A.4. Changes From Individual Submission Draft -00 368 o The draft name has been changed to clarify that this document 369 defines Client-To-Mixer Audio Levels, to more clearly distinguish 370 it from [I-D.ietf-avtext-mixer-to-client-audio-level]. 371 o The header extension format has been changed from a two-byte to a 372 one-byte payload, eliminating the 7 reserved bits and the one 373 must-be-zero bit. 374 o The sections Considerations on Use (Section 5) and Limitations 375 have been added. 376 o It has been noted that senders MAY indicate -127 dBov for digital 377 silence, and that level measurement MAY be done prior to encoding 378 audio. 379 o A reference to [I-D.lennox-avtcore-srtp-encrypted-header-ext] has 380 been added to the security considerations. 381 o The term "header extension" is now used consistentenly throughout 382 the document (as opposed to "extension header"). 384 Authors' Addresses 386 Jonathan Lennox (editor) 387 Vidyo, Inc. 388 433 Hackensack Avenue 389 Seventh Floor 390 Hackensack, NJ 07601 391 US 393 Email: jonathan@vidyo.com 395 Emil Ivov 396 Jitsi 397 Strasbourg 67000 398 France 400 Email: emcho@jitsi.org 401 Enrico Marocco 402 Telecom Itialia 403 Via G. Reiss Romoli, 274 404 Turin 10148 405 Italy 407 Email: enrico.marocco@telecomitalia.it