idnits 2.17.1 draft-ietf-avtext-client-to-mixer-audio-level-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 27, 2011) is 4625 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5285 (Obsoleted by RFC 8285) == Outdated reference: A later version (-05) exists of draft-ietf-avtcore-srtp-encrypted-header-ext-00 == Outdated reference: A later version (-06) exists of draft-ietf-avtext-mixer-to-client-audio-level-03 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 AVT J. Lennox, Ed. 3 Internet-Draft Vidyo 4 Intended status: Standards Track E. Ivov 5 Expires: February 28, 2012 Jitsi 6 E. Marocco 7 Telecom Italia 8 August 27, 2011 10 A Real-Time Transport Protocol (RTP) Header Extension for Client-to- 11 Mixer Audio Level Indication 12 draft-ietf-avtext-client-to-mixer-audio-level-04 14 Abstract 16 This document defines a mechanism by which packets of Real-Time 17 Transport Protocol (RTP) audio streams can indicate, in an RTP header 18 extension, the audio level of the audio sample carried in the RTP 19 packet. In large conferences, this can reduce the load on an audio 20 mixer or other middlebox which wants to forward only a few of the 21 loudest audio streams, without requiring it to decode and measure 22 every stream that is received. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on February 28, 2012. 41 Copyright Notice 43 Copyright (c) 2011 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 This document may contain material from IETF Documents or IETF 57 Contributions published or made publicly available before November 58 10, 2008. The person(s) controlling the copyright in some of this 59 material may not have granted the IETF Trust the right to allow 60 modifications of such material outside the IETF Standards Process. 61 Without obtaining an adequate license from the person(s) controlling 62 the copyright in such materials, this document may not be modified 63 outside the IETF Standards Process, and derivative works of it may 64 not be created outside the IETF Standards Process, except to format 65 it for publication as an RFC or to translate it into languages other 66 than English. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 3. Audio Levels . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 4. Signaling (Setup) Information . . . . . . . . . . . . . . . . 6 74 5. Considerations on Use . . . . . . . . . . . . . . . . . . . . 7 75 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 76 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 77 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 78 8.1. Normative References . . . . . . . . . . . . . . . . . . . 9 79 8.2. Informative References . . . . . . . . . . . . . . . . . . 9 80 Appendix A. Changes From Earlier Versions . . . . . . . . . . . . 10 81 A.1. Changes From Draft -03 . . . . . . . . . . . . . . . . . . 10 82 A.2. Changes From Draft -02 . . . . . . . . . . . . . . . . . . 10 83 A.3. Changes From Draft -01 . . . . . . . . . . . . . . . . . . 10 84 A.4. Changes From Individual Submission Draft -01 . . . . . . . 11 85 A.5. Changes From Individual Submission Draft -00 . . . . . . . 11 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 88 1. Introduction 90 In a centralized Real-Time Transport Protocol (RTP) [RFC3550] audio 91 conference, an audio mixer or forwarder receives audio streams from 92 many or all of the conference participants. It then selectively 93 forwards some of them to other participants in the conference. In 94 large conferences, it is possible that such a server might be 95 receiving a large number of streams, of which only a few are intended 96 to be forwarded to the other conference participants. 98 In such a scenario, in order to pick the audio streams to forward, a 99 centralized server needs to decode, measure audio levels, and 100 possibly perform voice activity detection on audio data from a large 101 number of streams. The need for such processing limits the size or 102 number of conferences such a server can support. 104 As an alternative, this document defines an RTP header extension 105 [RFC5285] through which senders of audio packets can indicate the 106 audio level of the packets' payload, reducing the processing load for 107 a server. 109 The header extension in this draft is different than, but 110 complementary with, the one defined in 111 [I-D.ietf-avtext-mixer-to-client-audio-level], which defines a 112 mechanism by which audio mixers can indicate to clients the levels of 113 the contributing sources that made up the mixed audio. 115 2. Terminology 117 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 118 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 119 document are to be interpreted as described in RFC 2119 [RFC2119] and 120 indicate requirement levels for compliant implementations. 122 3. Audio Levels 124 The audio level header extension carries the level of the audio in 125 the RTP [RFC3550] payload of the packet it is associated with. This 126 information is carried in an RTP header extension element as defined 127 by the "General Mechanism for RTP Header Extensions" [RFC5285]. 129 The payload of the audio level header extension element can be 130 encoded using the one-byte or the two-byte header defined in 131 [RFC5285]. Figure 1 and Figure 2 show sample audio level encodings 132 with each of them. 134 0 1 135 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 136 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 137 | ID | len=0 |V| level | 138 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 140 Sample audio level encoding using the one-byte header format 142 Figure 1 144 0 1 2 3 145 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 146 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 147 | ID | len=1 |V| level | 0 (pad) | 148 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 150 Sample audio level encoding using the two-byte header format 152 Figure 2 154 Note that, as indicated in [RFC5285] length field in the one-byte 155 header format takes the value 0 to indicate that 1 byte follows. In 156 the two-byte header format on the other hand it takes the value of 1. 158 The magnitude of the audio level itself is packed into the seven 159 least significant bits of the single byte of the header extension, 160 shown in Figure 1 and Figure 2. The least significant bit of the 161 audio level magnitude is packed into the least significant bit of the 162 byte. The most significant bit of the byte is used as a separate 163 flag bit "V", defined below. 165 The audio level is expressed in -dBov, with values from 0 to 127 166 representing 0 to -127 dBov. dBov is the level, in decibels, relative 167 to the overload point of the system, i.e. the maximum-amplitude 168 signal that can be handled by the system without clipping. (Note: 169 Representation relative to the overload point of a system is 170 particularly useful for digital implementations, since one does not 171 need to know the relative calibration of the analog circuitry.) For 172 example, in the case of u-law (audio/pcmu) audio [ITU.G711.1988], the 173 0 dBov reference would be a square wave with values +/- 8031. (This 174 translates to 6.18 dBm0, relative to u-law's dBm0 definition in Table 175 6 of G.711.) 177 The audio level for digital silence, for example for a muted audio 178 source, MUST be represented as 127 (-127 dBov), regardless of the 179 dynamic range of the encoded audio format. 181 The audio level header extension only carries the level of the audio 182 in the RTP payload of the packet it is associated with, with no long- 183 term averaging or smoothing applied. That level is measured as a 184 root mean square of all the samples in the measured range. 186 To simplify implementation of the encoding procedures described here, 187 the reference implementation section in 188 [I-D.ietf-avtext-mixer-to-client-audio-level] provides a sample Java 189 implementation of an audio level calculator that helps obtain such 190 values from raw linear PCM audio samples. 192 In addition, a flag bit (labeled V) optionally indicates whether the 193 encoder believes the audio packet contains voice activity. If the V 194 bit is in use, the value 1 indicates that the encoder believes the 195 audio packet contains voice activity, and the value 0 indicates that 196 the encoder believes it does not. (The voice activity detection 197 algorithm is unspecified and left implementation-specific.) If the V 198 bit is not in use, its value is unspecified and MUST be ignored by 199 receivers. The use of the V bit is signaled using the extension 200 attribute "vad", discussed in Section 4. 202 When this header extension is used with RTP data sent using the RTP 203 Payload for Redundant Audio Data [RFC2198], the header's data 204 describes the contents of the primary encoding. 206 Note: This audio level is defined in the same manner as is audio 207 noise level in the RTP Payload Comfort Noise specification [RFC3389]. 208 In the comfort noise specification, the overall magnitude of the 209 noise level in comfort noise is encoded into the first byte of the 210 payload, with spectral information about the noise in subsequent 211 bytes. This specification's audio level parameter is defined so as 212 to be identical to the comfort noise payload's noise-level byte. 214 4. Signaling (Setup) Information 216 The URI for declaring this header extension in an extmap attribute is 217 "urn:ietf:params:rtp-hdrext:ssrc-audio-level". 219 It has a single extension attribute, named "vad". It takes the form 220 "vad=on" or "vad=off". If the header extension element is signaled 221 with "vad=on", the "V" bit described in Section 3 is in use, and MUST 222 be set by senders. If the header extension element is signaled with 223 "vad=off", the "V" bit is not in use, and its value MUST be ignored 224 by receivers. If the "vad" extension attribute is not specified, the 225 default is "vad=on". 227 An example attribute line in the SDP, for a conference might hence 228 be: 230 a=extmap:6 urn:ietf:params:rtp-hdrext:ssrc-audio-level vad=on 232 The "vad" extension attribute only controls the semantics of this 233 header extension attribute, and does not make any statement about 234 whether the sender is using any other voice activity detection 235 features such as discontinuous transmission, comfort noise, or 236 silence suppression. 238 Using the mechanisms of [RFC5285], an endpoint MAY signal multiple 239 instances of the header extension element, with different values of 240 the vad attribute, so long as these instances use different values 241 for the extension identifier. However, again following the rules of 242 [RFC5285], the semantics chosen for a header extension element 243 (including its vad setting) for a particular extension identifier 244 value MUST NOT be changed within an RTP session. 246 5. Considerations on Use 248 Mixers and forwarders generally ought not base audio forwarding 249 decisions directly on packet-by-packet audio level information, but 250 rather ought to apply some analysis of the audio levels and trends. 251 This general rule applies whether audio levels are provided by 252 endpoints (as defined in this document), or are calculated at a 253 server, as would be done in the absence of this information. This 254 section discusses several issues that mixers and forwarders may wish 255 to take into account. (Note that this section provides design 256 guidance only, and is not normative.) 258 First of all, audio levels generally ought to be measured over longer 259 intervals than that of a single audio packet. In order to avoid 260 false-positives for short bursts of sound (such as a cough or a 261 dropped microphone), it is often useful to require that a 262 participant's audio level be maintained for some period of time 263 before considering it to be "real", i.e. some type of low-pass filter 264 ought to be applied to the audio levels. Note, though, that such 265 filtering must be balanced with the need to avoid clipping of the 266 beginning of a speaker's speech. 268 Additionally, different participants may have their audio input set 269 differently. It may be useful to apply some sort of automatic gain 270 control to the audio levels. There are a number of possible 271 approaches to acheiving this, e.g. by measuring peak audio levels, by 272 average audio levels during speech, or by measuring background audio 273 levels (average audio level levels during non-speech). 275 6. Security Considerations 277 A malicious endpoint could choose to set the values in this header 278 extension falsely, so as to falsely claim that audio or voice is or 279 is not present. It is not clear what could be gained by falsely 280 claiming that audio is not present, but an endpoint falsely claiming 281 that audio is present could perform a denial-of-service attack on an 282 audio conference, so as to send silence to suppress other conference 283 members' audio. Thus, if a device relys on audio level data from 284 untrusted endpoints, it SHOULD periodically audit the level 285 information transmitted, taking appropriate corrective action against 286 endpoints that appear to be sending incorrect data. (However, as it 287 is valid for an endpoint to choose to measure audio levels prior to 288 encoding, some degree of discrepancy could be present. This would 289 not indicate that an endpoint is malicous.) 291 In the Secure Real-Time Transport Protocol (SRTP) [RFC3711], RTP 292 header extensions are authenticated but not encrypted. When this 293 header extension is used, audio levels are therefore visible on a 294 packet-by-packet basis to an attacker passively observing the audio 295 stream. As discussed in [I-D.perkins-avt-srtp-vbr-audio], such an 296 attacker might be able to infer information about the conversation, 297 possibly with phoneme-level resolution. In scenarios where this is a 298 concern, additional mechanisms SHOULD be used to protect the 299 confidentiality of the header extension. This mechanism could be 300 header extension encryption 301 [I-D.ietf-avtcore-srtp-encrypted-header-ext], or a lower-level 302 security and authentication mechanism. 304 7. IANA Considerations 306 This document defines a new extension URI to the RTP Compact Header 307 Extensions subregistry of the Real-Time Transport Protocol (RTP) 308 Parameters registry, according to the following data: 310 Extension URI: urn:ietf:params:rtp-hdrext:ssrc-audio-level 311 Description: Audio Level 312 Contact: jonathan@vidyo.com 313 Reference: RFC XXXX 315 Note to RFC Editor: please replace "RFC XXXX" with the number of this 316 RFC. 318 8. References 320 8.1. Normative References 322 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 323 Requirement Levels", BCP 14, RFC 2119, March 1997. 325 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 326 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 327 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 328 September 1997. 330 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 331 Jacobson, "RTP: A Transport Protocol for Real-Time 332 Applications", STD 64, RFC 3550, July 2003. 334 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 335 Header Extensions", RFC 5285, July 2008. 337 8.2. Informative References 339 [I-D.ietf-avtcore-srtp-encrypted-header-ext] 340 Lennox, J., "Encryption of Header Extensions in the Secure 341 Real-Time Transport Protocol (SRTP)", 342 draft-ietf-avtcore-srtp-encrypted-header-ext-00 (work in 343 progress), June 2011. 345 [I-D.ietf-avtext-mixer-to-client-audio-level] 346 Ivov, E., Marocco, E., and J. Lennox, "A Real-Time 347 Transport Protocol (RTP) Header Extension for Mixer-to- 348 Client Audio Level Indication", 349 draft-ietf-avtext-mixer-to-client-audio-level-03 (work in 350 progress), July 2011. 352 [I-D.perkins-avt-srtp-vbr-audio] 353 Perkins, C. and J. Valin, "Guidelines for the use of 354 Variable Bit Rate Audio with Secure RTP", 355 draft-perkins-avt-srtp-vbr-audio-05 (work in progress), 356 December 2010. 358 [ITU.G711.1988] 359 International Telecommunications Union, "Pulse Code 360 Modulation (PCM) of Voice Frequencies", ITU- 361 T Recommendation G.711, November 1988. 363 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 364 Comfort Noise (CN)", RFC 3389, September 2002. 366 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 367 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 368 RFC 3711, March 2004. 370 Appendix A. Changes From Earlier Versions 372 Note to the RFC-Editor: please remove this section prior to 373 publication as an RFC. 375 A.1. Changes From Draft -03 377 o Added vad extension attribute to negotiate use of the V bit. 378 o Addressed editorial comments made on the mailing list. 380 A.2. Changes From Draft -02 382 o Changed encoding related text so that it would cover both the one- 383 byte and the two-byte header formats. 384 o Clarified use of root mean square for dBov calculation 385 o Added references to the sample level calculator in 386 [I-D.ietf-avtext-mixer-to-client-audio-level]. 387 o Changed affiliation for Emil Ivov. 388 o Other minor editorial changes. 390 A.3. Changes From Draft -01 392 o Changed the URI for declaring this header extension from 393 "urn:ietf:params:rtp-hdrext:audio-level" to 394 "urn:ietf:params:rtp-hdrext:ssrc-audio-level" for consistency with 395 [I-D.ietf-avtext-mixer-to-client-audio-level]. 396 o Removed the "Limitations" section; it was discussing a potential 397 extension that consensus indicated was out of scope of this 398 document. 399 o Closed the P.56 open issue. It was agreed on IETF 80 that P.56 is 400 mostly about speech levels and the levels transported by the 401 extension defined here should also be able to serve as an 402 indication for noise. 403 o Closed the open issue about transmitting noise floor information. 404 Noise floor is (loosely) inferrable by observing the per-packet 405 level information over a period of time, so the additional 406 complexity seemed unnecessary. 407 o Editorial changes for consistency with 408 [I-D.ietf-avtext-mixer-to-client-audio-level]. 409 o Moved several descriptions of normative items that previously had 410 only been described in informative sections of the text. 412 o Other editorial clarifications. 414 A.4. Changes From Individual Submission Draft -01 416 o This version is primarily a document refresh. 417 o Emil Ivov and Enrico Marocco have been added as co-authors. 418 o Additional open issues listed. 420 A.5. Changes From Individual Submission Draft -00 422 o The draft name has been changed to clarify that this document 423 defines Client-To-Mixer Audio Levels, to more clearly distinguish 424 it from [I-D.ietf-avtext-mixer-to-client-audio-level]. 425 o The header extension format has been changed from a two-byte to a 426 one-byte payload, eliminating the 7 reserved bits and the one 427 must-be-zero bit. 428 o The sections Considerations on Use (Section 5) and Limitations 429 have been added. 430 o It has been noted that senders MAY indicate -127 dBov for digital 431 silence, and that level measurement MAY be done prior to encoding 432 audio. 433 o A reference to [I-D.ietf-avtcore-srtp-encrypted-header-ext] has 434 been added to the security considerations. 435 o The term "header extension" is now used consistentenly throughout 436 the document (as opposed to "extension header"). 438 Authors' Addresses 440 Jonathan Lennox (editor) 441 Vidyo, Inc. 442 433 Hackensack Avenue 443 Seventh Floor 444 Hackensack, NJ 07601 445 US 447 Email: jonathan@vidyo.com 449 Emil Ivov 450 Jitsi 451 Strasbourg 67000 452 France 454 Email: emcho@jitsi.org 455 Enrico Marocco 456 Telecom Itialia 457 Via G. Reiss Romoli, 274 458 Turin 10148 459 Italy 461 Email: enrico.marocco@telecomitalia.it