idnits 2.17.1 draft-ietf-avtext-mixer-to-client-audio-level-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 5, 2011) is 4615 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'ITU.P56.1993' is defined on line 522, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5285 (Obsoleted by RFC 8285) == Outdated reference: A later version (-05) exists of draft-ietf-avtcore-srtp-encrypted-header-ext-00 == Outdated reference: A later version (-06) exists of draft-ietf-avtext-client-to-mixer-audio-level-04 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Ivov, Ed. 3 Internet-Draft Jitsi 4 Intended status: Standards Track E. Marocco, Ed. 5 Expires: March 8, 2012 Telecom Italia 6 J. Lennox 7 Vidyo, Inc. 8 September 5, 2011 10 A Real-Time Transport Protocol (RTP) Header Extension for Mixer-to- 11 Client Audio Level Indication 12 draft-ietf-avtext-mixer-to-client-audio-level-05 14 Abstract 16 This document describes a mechanism for RTP-level mixers in audio 17 conferences to deliver information about the audio level of 18 individual participants. Such audio level indicators are transported 19 in the same RTP packets as the audio data they pertain to. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on March 8, 2012. 38 Copyright Notice 40 Copyright (c) 2011 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 3. Protocol Operation . . . . . . . . . . . . . . . . . . . . . . 4 58 4. Audio Levels . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 5. Signaling Information . . . . . . . . . . . . . . . . . . . . 7 60 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 61 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 62 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 63 9. Changes From Earlier Versions . . . . . . . . . . . . . . . . 11 64 9.1. Changes From Draft -04 . . . . . . . . . . . . . . . . . . 11 65 9.2. Changes From Draft -03 . . . . . . . . . . . . . . . . . . 11 66 9.3. Changes From Draft -02 . . . . . . . . . . . . . . . . . . 11 67 9.4. Changes From Draft -01 . . . . . . . . . . . . . . . . . . 12 68 9.5. Changes From Draft -00 . . . . . . . . . . . . . . . . . . 12 69 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 70 10.1. Normative References . . . . . . . . . . . . . . . . . . . 12 71 10.2. Informative References . . . . . . . . . . . . . . . . . . 13 72 Appendix A. Reference Implementation . . . . . . . . . . . . . . 14 73 A.1. AudioLevelCalculator.java . . . . . . . . . . . . . . . . 14 74 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16 76 1. Introduction 78 The Framework for Conferencing with the Session Initiation Protocol 79 (SIP) defined in RFC 4353 [RFC4353] presents an overall architecture 80 for multi-party conferencing. Among others, the framework borrows 81 from RTP [RFC3550] and extends the concept of a mixer entity 82 "responsible for combining the media streams that make up a 83 conference, and generating one or more output streams that are 84 delivered to recipients". Every participant would hence receive, in 85 a flat single stream, media originating from all the others. 87 Using such centralized mixer-based architectures simplifies support 88 for conference calls on the client side since they would hardly 89 differ from one-to-one conversations. However, the method also 90 introduces a few limitations. The flat nature of the streams that a 91 mixer would output and send to participants makes it difficult for 92 users to identify the original source of what they are hearing. 94 Mechanisms that allow the mixer to send to participants cues on 95 current speakers (e.g. the CSRC fields in RTP [RFC3550]) only work 96 for speaking/silent binary indications. There are, however, a number 97 of use cases where one would require more detailed information. 98 Possible examples include the presence of background chat/noise/ 99 music/typing, someone breathing noisily in their microphone, or other 100 cases where identifying the source of the disturbance would make it 101 easy to remove it (e.g. by sending a private IM to the concerned 102 party asking them to mute their microphone). A more advanced 103 scenario could involve an intense discussion between multiple 104 participants that the user does not personally know. Audio level 105 information would help better recognize the speakers by associating 106 with them complex (but still human readable) characteristics like 107 loudness and speed for example. 109 One way of presenting such information in a user friendly manner 110 would be for a conferencing client to attach audio level indicators 111 to the corresponding participant related components in the user 112 interface as displayed in Figure 1. 114 ________________________ 115 | | 116 | 00:42 | Weekly Call | 117 |________________________| 118 | | 119 | | 120 | Alice |====== | (S) | 121 | | 122 | Bob |= | | 123 | | 124 | Carol | | (M) | 125 | | 126 | Dave |=== | | 127 | | 128 |________________________| 130 Figure 1: Displaying detailed speaker information to the user by 131 including audio level for every participant. 133 Implementing a user interface like the above requires analysis of the 134 media sent from other participants. In a conventional audio 135 conference this is only possible for the mixer since all other 136 conference participants are generally receiving a single, flat audio 137 stream and have therefore no immediate way of determining individual 138 audio levels. 140 This document specifies an RTP extension header that allows such 141 mixers to deliver audio level information to conference participants 142 by including it directly in the RTP packets transporting the 143 corresponding audio data. 145 2. Terminology 147 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 148 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 149 document are to be interpreted as described in RFC 2119 [RFC2119]. 151 3. Protocol Operation 153 According to RFC 3550 [RFC3550] a mixer is expected to include in 154 outgoing RTP packets a list of identifiers (CSRC IDs) indicating the 155 sources that contributed to the resulting stream. The presence of 156 such CSRC IDs allows RTP clients to determine, in a binary way, the 157 active speaker(s) in any given moment. RTCP also provides a basic 158 mechanism to map the CSRC IDs to user identities through the CNAME 159 field. More advanced mechanisms can exist depending on the signaling 160 protocol used to establish and control a conference. In the case of 161 the Session Initiation Protocol [RFC3261] for example, the Event 162 Package for Conference State [RFC4575] defines a tag which 163 binds CSRC IDs to media streams and SIP URIs. 165 This document describes an RTP header extension that allows mixers to 166 indicate the audio-level of every contributing conference participant 167 (CSRC) in addition to simply indicating their on/off status. This 168 new header extension uses "General Mechanism for RTP Header 169 Extensions" described in [RFC5285]. 171 Each instance of this header contains a list of one-octet audio 172 levels expressed in -dBov, with values from 0 to 127 representing 0 173 to -127 dBov(see Figure 2 and Figure 3). Appendix A provides a 174 reference implementation indicating one way of obtaining such values 175 from raw audio samples. 177 Every audio level value pertains to the CSRC identifier located at 178 the corresponding position in the CSRC list. In other words, the 179 first value would indicate the audio level of the conference 180 participant represented by the first CSRC identifier in that packet 181 and so forth. The number and order of these values MUST therefore 182 match the number and order of the CSRC IDs present in the same 183 packet. 185 When encoding audio level information, a mixer SHOULD include in a 186 packet information that corresponds to the audio data being 187 transported in that same packet. It is important that these values 188 follow the actual stream as closely as possible. Therefore a mixer 189 SHOULD also calculate the values after the original contributing 190 stream has undergone possible processing such as level normalization, 191 and noise reduction for example. 193 It can sometimes happen that a conference involves more than a single 194 mixer. In such cases each of the mixers MAY choose to relay the CSRC 195 list and audio-level information they receive from peer mixers (as 196 long as the total CSRC count remains below 16). Given that the 197 maximum audio level is not precisely defined by this specification, 198 it is likely that in such situations average audio levels would be 199 perceptibly different for the participants located behind the 200 different mixers. 202 4. Audio Levels 204 The audio level header extension carries the level of the audio in 205 the RTP payload of the packet it is associated with. This 206 information is carried in an RTP header extension element as defined 207 by the "General Mechanism for RTP Header Extensions" [RFC5285]. 209 The payload of the audio level header extension element can be 210 encoded using the one-byte or the two-byte header defined in 211 [RFC5285]. Figure 2 and Figure 3 show sample audio level encodings 212 with each of them. 214 0 1 2 3 215 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 216 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 217 | ID | len=2 |0| level 1 |0| level 2 |0| level 3 | 218 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 220 Sample audio level encoding using the one-byte header format 222 Figure 2 224 0 1 2 3 225 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 227 | ID | len=3 |0| level 1 |0| level 2 | 228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 229 |0| level 3 | 0 (pad) | ... 230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 232 Sample audio level encoding using the two-byte header format 234 Figure 3 236 In the case of the one-byte header format, the 4-bit len field is the 237 number minus one of data bytes (i.e. audio level values) transported 238 in this header extension element following the one-byte header. 239 Therefore, the value zero in this field indicates that one byte of 240 data follows. In the case of the two-byte header format the 8-bit 241 len field contains the exact number of audio levels carried in the 242 extension. RFC 3550 [RFC3550] only allows RTP packets to carry a 243 maximum of 15 CSRC IDs. Given that audio levels directly refer to 244 CSRC IDs, implementations MUST NOT include more than 15 audio level 245 values. The maximum value allowed in the len field is therefore 14 246 for one-byte header format and 15 for two-byte header format. 248 Audio levels in this document are defined in the same manner as is 249 audio noise level in the RTP Payload Comfort Noise specification 250 [RFC3389]. In the comfort noise specification, the overall magnitude 251 of the noise level in comfort noise is encoded into the first byte of 252 the payload, with spectral information about the noise in subsequent 253 bytes. This specification's audio level parameter is defined so as 254 to be identical to the comfort noise payload's noise-level byte. 256 The magnitude of the audio level itself is packed into the seven 257 least significant bits of the single byte of the header extension, 258 shown in Figure 2 and Figure 3. The least significant bit of the 259 audio level magnitude is packed into the least significant bit of the 260 byte. The most significant bit of the byte is unused and always set 261 to 0. 263 The audio level is expressed in -dBov, with values from 0 to 127 264 representing 0 to -127 dBov. dBov is the level, in decibels, relative 265 to the overload point of the system, i.e. the maximum-amplitude 266 signal that can be handled by the system without clipping. (Note: 267 Representation relative to the overload point of a system is 268 particularly useful for digital implementations, since one does not 269 need to know the relative calibration of the analog circuitry.) For 270 example, in the case of u-law (audio/pcmu) audio [ITU.G.711], the 0 271 dBov reference would be a square wave with values +/- 8031. (This 272 translates to 6.18 dBm0, relative to u-law's dBm0 definition in Table 273 6 of G.711.) 275 The audio level for digital silence, for example for a muted audio 276 source, MUST be represented as 127 (-127 dBov), regardless of the 277 dynamic range of the encoded audio format. 279 The audio level header extension only carries the level of the audio 280 in the RTP payload of the packet it is associated with, with no long- 281 term averaging or smoothing applied. That level is measured as a 282 root mean square of all the samples in the measured range. 284 To simplify implementation of the encoding procedures described here, 285 this specification provides a sample Java implementation (Appendix A) 286 of an audio level calculator that helps obtain such values from raw 287 linear PCM audio samples. 289 5. Signaling Information 291 The URI for declaring the audio level header extension in an SDP 292 extmap attribute and mapping it to a local extension header 293 identifier is "urn:ietf:params:rtp-hdrext:csrc-audio-level". There 294 is no additional setup information needed for this extension (i.e. no 295 extensionattributes). 297 An example attribute line in the SDP, for a conference might be: 299 a=extmap:7 urn:ietf:params:rtp-hdrext:csrc-audio-level 301 The above mapping will most often be provided per media stream (in 302 the media-level section(s) of SDP, i.e., after an "m=" line) or 303 globally if there is more than one stream containing audio level 304 indicators in a session. 306 Presence of the above attribute in the SDP description of a media 307 stream indicates that RTP packets in that stream, which contain the 308 level extension defined in this document, will be carrying them with 309 an ID of 7. 311 Conferencing clients that support audio level indicators and have no 312 mixing capabilities would not be able to provide content for this 313 audio level extension and would hence have to always include the 314 direction parameter in the "extmap" attribute with a value of 315 "recvonly". Conference focus entities with mixing capabilities can 316 omit the direction or set it to "sendrecv" in SDP offers. Such 317 entities would need to set it to "sendonly" in SDP answers to offers 318 with a "recvonly" parameter and to "sendrecv" when answering other 319 "sendrecv" offers. 321 This specification only defines use of the audio level extensions in 322 audio streams. They MUST NOT be advertised with other media types 323 such as video or text for example. 325 The following Figure 4 and Figure 5 show two example offer/answer 326 exchanges between a conferencing client and a focus, and between two 327 conference focus entities. 329 v=0 330 o=alice 2890844526 2890844526 IN IP6 host.example.com 331 s=- 332 c=IN IP6 host.example.com 333 t=0 0 334 m=audio 49170 RTP/AVP 0 4 335 a=rtpmap:0 PCMU/8000 336 a=rtpmap:4 G723/8000 337 a=extmap:1/recvonly urn:ietf:params:rtp-hdrext:csrc-audio-level 339 v=0 340 i=A Seminar on the session description protocol 341 o=conf-focus 2890844730 2890844730 IN IP6 focus.example.net 342 s=- 343 c=IN IP6 focus.example.net 344 t=0 0 345 m=audio 52544 RTP/AVP 0 346 a=rtpmap:0 PCMU/8000 347 a=extmap:1/sendonly urn:ietf:params:rtp-hdrext:csrc-audio-level 349 A client-initiated example SDP offer/answer exchange negotiating an 350 audio stream with one-way flow of of audio level information. 352 Figure 4 354 v=0 355 i=Un seminaire sur le protocole de description des sessions 356 o=fr-focus 2890844730 2890844730 IN IP6 focus.fr.example.net 357 s=- 358 c=IN IP6 focus.fr.example.net 359 t=0 0 360 m=audio 49170 RTP/AVP 0 361 a=rtpmap:0 PCMU/8000 362 a=extmap:1/sendrecv urn:ietf:params:rtp-hdrext:csrc-audio-level 364 v=0 365 i=A Seminar on the session description protocol 366 o=us-focus 2890844526 2890844526 IN IP6 focus.us.example.net 367 s=- 368 c=IN IP6 focus.us.example.net 369 t=0 0 370 m=audio 52544 RTP/AVP 0 371 a=rtpmap:0 PCMU/8000 372 a=extmap:1/sendrecv urn:ietf:params:rtp-hdrext:csrc-audio-level 374 An example SDP offer/answer exchange between two conference focus 375 entities with mixing capabilities negotiating an audio stream with 376 bidirectional flow of audio level information. 378 Figure 5 380 6. Security Considerations 382 1. This document defines a means of attributing audio level to a 383 particular participant in a conference. An attacker may try to 384 modify the content of RTP packets in a way that would make audio 385 activity from one participant appear as coming from another. 386 2. Furthermore, the fact that audio level values would not be 387 protected even in an SRTP session might be of concern in some 388 cases where the activity of a particular participant in a 389 conference is confidential. Also, as discussed in 390 [I-D.perkins-avt-srtp-vbr-audio], an attacker might be able to 391 infer information about the conversation, possibly with phoneme- 392 level resolution. 393 3. Both of the above are concerns that stem from the design of the 394 RTP protocol itself and they would probably also apply when using 395 CSRC identifiers the way they were specified in RFC 3550 396 [RFC3550]. It is therefore important that according to the needs 397 of a particular scenario, implementors and deployers consider use 398 of header extension encryption 399 [I-D.ietf-avtcore-srtp-encrypted-header-ext] or a lower level 400 security and authentication mechanism. 402 7. IANA Considerations 404 This document defines a new extension URI that, if approved, would 405 need to be added to the RTP Compact Header Extensions sub-registry of 406 the Real-Time Transport Protocol (RTP) Parameters registry, according 407 to the following data: 409 Extension URI: urn:ietf:params:rtp-hdrext:csrc-audio-level 410 Description: Mixer-to-client audio level indicators 411 Contact: emcho@jitsi.org 412 Reference: RFC XXXX 414 Note to the RFC-Editor: please replace "RFC XXXX" by the number of 415 this RFC. 417 8. Acknowledgments 419 Lyubomir Marinov contributed level measurement and rendering code. 421 Keith Drage, Roni Even, Miguel A. Garcia, John Elwell, Kevin P. 422 Fleming, Ingemar Johansson, Michael Ramalho, Magnus Westerlund and 423 several others provided helpful feedback over the avt and avtext 424 mailing lists. 426 Jitsi's participation in this specification is funded by the NLnet 427 Foundation. 429 9. Changes From Earlier Versions 431 Note to the RFC-Editor: please remove this section prior to 432 publication as an RFC. 434 9.1. Changes From Draft -04 436 o Fixed problems with missing "s=" attributes and odd RTP port 437 numbers in the SDP examples. 439 9.2. Changes From Draft -03 441 o Addressed editorial comments made on the mailing list. 443 9.3. Changes From Draft -02 445 o Removed the no-data use case that allowed sending levels in RTP 446 packets. Choosing the right RTP payload type for this use case 447 would have incurred complexity without bringing any real value. 449 o Merged the "Header Format" and the "Audio level encoding" sections 450 into a single "Audio Levels" section. 451 o Changed encoding related text so that it would cover both the one- 452 byte and the two-byte header formats. 453 o Clarified use of root mean square for dBov calculation 454 o Added a reference to [I-D.perkins-avt-srtp-vbr-audio] to better 455 explain some "Security Considerations" . 456 o Other minor editorial changes. 458 9.4. Changes From Draft -01 460 o Removed code related the AudioLevelRenderer from "APPENDIX A. 461 Reference Implementation" as it was considered an implementation 462 matter by the working group. 463 o Modified the AudioLevelCalculator in "APPENDIX A. Reference 464 Implementation" to take overload as a parameter. 465 o Clarified non-use of audio levels in video streams 466 o Closed the P.56 open issue. It was agreed on IETF 80 that P.56 is 467 mostly about speech levels and the levels transported by the 468 extension defined here should also be able to serve as an 469 indication for noise. 470 o The Open Issues section has been removed as all issues that were 471 in there are now resolved or clarified. 472 o Editorial changes for consistency with 473 [I-D.ietf-avtext-client-to-mixer-audio-level]. 475 9.5. Changes From Draft -00 477 o Added code for sound pressure calculation and measurement in 478 "APPENDIX A. Reference Implementation". 479 o Changed affiliation for Emil Ivov. 480 o Removed "Appendix: Design choices". 482 10. References 484 10.1. Normative References 486 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 487 Requirement Levels", BCP 14, RFC 2119, March 1997. 489 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 490 Jacobson, "RTP: A Transport Protocol for Real-Time 491 Applications", STD 64, RFC 3550, July 2003. 493 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 494 Header Extensions", RFC 5285, July 2008. 496 10.2. Informative References 498 [I-D.ietf-avtcore-srtp-encrypted-header-ext] 499 Lennox, J., "Encryption of Header Extensions in the Secure 500 Real-Time Transport Protocol (SRTP)", 501 draft-ietf-avtcore-srtp-encrypted-header-ext-00 (work in 502 progress), June 2011. 504 [I-D.ietf-avtext-client-to-mixer-audio-level] 505 Lennox, J., Ivov, E., and E. Marocco, "A Real-Time 506 Transport Protocol (RTP) Header Extension for Client-to- 507 Mixer Audio Level Indication", 508 draft-ietf-avtext-client-to-mixer-audio-level-04 (work in 509 progress), August 2011. 511 [I-D.perkins-avt-srtp-vbr-audio] 512 Perkins, C. and J. Valin, "Guidelines for the use of 513 Variable Bit Rate Audio with Secure RTP", 514 draft-perkins-avt-srtp-vbr-audio-05 (work in progress), 515 December 2010. 517 [ITU.G.711] 518 International Telecommunications Union, "Pulse Code 519 Modulation (PCM) of Voice Frequencies", ITU- 520 T Recommendation G.711, November 1988. 522 [ITU.P56.1993] 523 International Telecommunications Union, "Objective 524 Measurement of Active Speech Level", ITU-T Recommendation 525 P.56, March 1988. 527 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 528 A., Peterson, J., Sparks, R., Handley, M., and E. 529 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 530 June 2002. 532 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 533 Comfort Noise (CN)", RFC 3389, September 2002. 535 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 536 Session Initiation Protocol (SIP)", RFC 4353, 537 February 2006. 539 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 540 Initiation Protocol (SIP) Event Package for Conference 541 State", RFC 4575, August 2006. 543 Appendix A. Reference Implementation 545 This appendix contains Java code for a reference implementation of 546 the level calculation and rendering methods.The code is not normative 547 and by no means the only possible implementation. Its purpose is to 548 help implementors add audio level support to mixers and clients. 550 The Java code contains an AudioLevelCalculator class that calculates 551 the sound pressure level of a signal with specific samples. It can 552 be used in mixers to generate values suitable for the level extension 553 headers. 555 The implementation is provided in Java but does not rely on any of 556 the language specific and can be easily ported to another. 558 A.1. AudioLevelCalculator.java 560 /** 561 * Calculates the audio level of specific samples of a signal based on 562 * sound pressure level. 563 */ 564 public class AudioLevelCalculator 565 { 567 /** 568 * Calculates the sound pressure level of a signal with specific 569 * samples. 570 * 571 * @param samples the samples of the signal to calculate the sound 572 * pressure level of. The samples are specified as an int 573 * array starting at offset, extending length 574 * number of elements and each int element in the specified 575 * range representing a sample of the signal to calculate the sound 576 * pressure level of. Though a sample is provided in the form of an 577 * int value, the sample size in bits is determined by the 578 * caller via overload. 579 * 580 * @param offset the offset in samples at which the samples 581 * start 582 * 583 * @param length the length of the signal specified in 584 * samples starting at offset 585 * 586 * @param overload the overload (point) of signal. 587 * For example, overload can be {@link Byte#MAX_VALUE} 588 * for 8-bit signed samples or {@link Short#MAX_VALUE} for 589 * 16-bit signed samples. 591 * 592 * @return the sound pressure level of the specified signal 593 */ 594 public static int calculateSoundPressureLevel( 595 int[] samples, int offset, int length, 596 int overload) 597 { 598 /* 599 * Calcuate the root mean square of the signal i.e. the 600 * effective sound pressure. 601 */ 602 double rms = 0; 604 for (; offset < length; offset++) 605 { 606 double sample = samples[offset]; 608 sample /= overload; 609 rms += sample * sample; 610 } 611 rms = (length == 0) ? 0 : Math.sqrt(rms / length); 613 /* 614 * The sound pressure level is a logarithmic measure of the 615 * effectivesound pressure of a sound relative to a reference 616 * value and is measured in decibels. 617 */ 618 double db; 620 /* 621 * The minimum sound pressure level which matches the maximum 622 * of the sound meter. 623 */ 624 final double MIN_SOUND_PRESSURE_LEVEL = 0; 625 /* 626 * The maximum sound pressure level which matches the maximum 627 * of the sound meter. 628 */ 629 final double MAX_SOUND_PRESSURE_LEVEL 630 = 127 /* HUMAN TINNITUS (RINGING IN THE EARS) BEGINS */; 632 if (rms > 0) 633 { 634 /* 635 * The commonly used "zero" reference sound pressure in air 636 * is 20 uPa RMS, which is usually considered the threshold 637 * of human hearing. 638 */ 640 final double REF_SOUND_PRESSURE = 0.00002; 642 db = 20 * Math.log10(rms / REF_SOUND_PRESSURE); 644 /* 645 * Ensure that the calculated level is within the minimum 646 * and maximum sound pressure level. 647 */ 648 if (db < MIN_SOUND_PRESSURE_LEVEL) 649 db = MIN_SOUND_PRESSURE_LEVEL; 650 else if (db > MAX_SOUND_PRESSURE_LEVEL) 651 db = MAX_SOUND_PRESSURE_LEVEL; 652 } 653 else 654 { 655 db = MIN_SOUND_PRESSURE_LEVEL; 656 } 658 return (int) db; 659 } 660 } 662 AudioLevelCalculator.java 664 Authors' Addresses 666 Emil Ivov (editor) 667 Jitsi 668 Strasbourg 67000 669 France 671 Email: emcho@jitsi.org 673 Enrico Marocco (editor) 674 Telecom Italia 675 Via G. Reiss Romoli, 274 676 Turin 10148 677 Italy 679 Email: enrico.marocco@telecomitalia.it 680 Jonathan Lennox 681 Vidyo, Inc. 682 433 Hackensack Avenue 683 Seventh Floor 684 Hackensack, NJ 07601 685 US 687 Email: jonathan@vidyo.com