[AVT] Acoustic Echo cancellation memo
Andre.Adrian@dfs.de Thu, 14 October 2004 08:45 UTC
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id EAA19525 for <avt-archive@ietf.org>; Thu, 14 Oct 2004 04:45:47 -0400 (EDT)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1CI1QE-0004Zg-Qs for avt-archive@ietf.org; Thu, 14 Oct 2004 04:57:19 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1CI19G-0001eT-Kk; Thu, 14 Oct 2004 04:39:46 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1CI001-0004yk-In for avt@megatron.ietf.org; Thu, 14 Oct 2004 03:26:10 -0400
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA15204 for <avt@ietf.org>; Thu, 14 Oct 2004 03:26:07 -0400 (EDT)
From: Andre.Adrian@dfs.de
Received: from [195.27.201.52] (helo=mail1.dfs.de) by ietf-mx.ietf.org with smtp (Exim 4.33) id 1CI0B4-0003EK-OP for avt@ietf.org; Thu, 14 Oct 2004 03:37:38 -0400
To: avt@ietf.org
X-Mailer: Lotus Notes Release 5.0.8 June 18, 2001
Message-ID: <OF733D58B0.BC382DDD-ONC1256F2A.00270B69@dfs.de>
Date: Thu, 14 Oct 2004 09:25:31 +0200
X-MIMETrack: Serialize by Router on OHVLNM06/SRV/DFS(Release 5.0.10 |March 22, 2002) at 14.10.2004 09:26:04
MIME-Version: 1.0
Content-type: text/plain; charset="iso-8859-1"
Content-transfer-encoding: quoted-printable
X-Spam-Score: 0.3 (/)
X-Scan-Signature: 0af64a5d6771d0a11c20b73fee87759a
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Thu, 14 Oct 2004 04:39:43 -0400
Subject: [AVT] Acoustic Echo cancellation memo
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
Sender: avt-bounces@ietf.org
Errors-To: avt-bounces@ietf.org
X-Spam-Score: 0.3 (/)
X-Scan-Signature: e0ce87e8a8080dd27b518f94257398f8
Content-Transfer-Encoding: quoted-printable
Dear Members of Audio/Video Transport group, as attachment you find a memo about "Acoustic Echo Cancellation". This memo was created while developing a Voice-over-IP Prototype for intercom communication between air traffic controllers by the german air traffic control agency DFS. Mr. Colin Perkins wrote me: >We're grateful that you considered the IETF AVT working group as a >venue for this work. Unfortunately, we don't have sufficient expertise >to effectively review it, and so cannot accept it as an AVT work item. >If you have a paper on this subject, you're welcome to post a pointer >to the AVT mailing list to encourage uptake, though. >I'm not sure what an appropriate venue for publication might be, >although the ITU-T has done related work in the past. >Regards, >Colin As you can read in the memo, the algorithm and the implementation are royalty free and should not be monopolized as intellectual property by DFS or by others. The software is currently implemented in kphone - a SIP softphone running on Linux. You can find the memo and the Kphone source file patches on http://home.arcor.de/andreadrian/ With best regards, Andre Adrian Senior engineer email work: <Andre.Adrian@dfs.de> email home: <adrianandre@compuserve.de> snail-mail: DFS Flughafen Frankfurt Gebaeude 501 60549 Frankfurt Germany Tel: (++49) 69 69766 176 FAX: (++49) 69 69766 175 ################################################################################## Attachment: Draft Andre Adrian Document: draft-avt-aec-01.txt DFS Deutsche Flugsicherung Category: Experimental october 11th, 2004 Expires: ? Voice over Internet Acoustic Echo Cancellation Status of this Memo This document specifies an Acoustic Echo Cancellation implementation for hands-free Voice over Internet telephony and requests discussion and suggestions for improvements. Distribution of this memo is unlimited. Copyright Notice Copyright (C) DFS Deutsche Flugsicherung (2004). All Rights Reserved. You are allowed to use this source code in any open source or closed source software you want. You are allowed to use the algorithms for a hardware solution. You are allowed to modify the source code. You are not allowed to remove the name of the author from this memo or from the source code files. You are not allowed to monopolize the source code or the algorithms behind the source code as your intellectual property. This source code is free of royalty and comes with no warranty. Abstract This document specifies an acoustic echo cancellation (AEC) for voice over IP. Because of the large latency in VoIP communication (tenths to hunderts of milliseconds), AEC is necessary. The presented implementation is based on the well-known Normalized Least Means Square (NLMS) and Geigel Double talk detector (DTD) algorithms. To improve performance, a pre-whitening filter is used. The presented algorithm is therefore of NLMS-pw family. The NLMS-pw family is known to give good echo cancellation for moderate processing resources. This algorithm is of complexity O(3*L) with L number of taps in the NLMS filter. Table of Contents 1. INTRODUCTION 2. AEC PRINCIPLES 3. AEC algorithms 3.1. Infinite Impulse Response (IIR) Highpass Filter 3.2. Geigel Double Talk Detector 3.3. Normalized Least Means Square - Pre-Whitening Filter 4. References A. The C++ Source Code A.1 aec.h A.2 aec.cxx A.3 aec_test.cxx 1. INTRODUCTION A hands-free telephone or full-duplex intercom system has a feedback or echo problem because the output from the loudspeaker feeds into the microphone. Several methods can be used to reduce or eliminate the problem: 1.) Reduce the overall amplification. If the system amplification is less then 1 a feedback dies away. This solution leads to poor volume. 2.) Use Acoustic Echo Suppression. Echo Suppression is realized with speech activated switches. Suppression reduces the full-duplex telephone to half- duplex. The switches can even "switch away" beginnings of words. 3.) Use Acoustic Echo Cancellation. This is realized with an adaptive or learning filter. First the filter learns from given microphone and speaker signals the acoustics. After learning, the filter can calculate an estimated microphone signal from the loudspeaker signal. This estimated mic signal is subtracted from the real mic signal. The difference signal no longer contains the loudspeaker signal - the feedback loop is broken. The Least Means Square algorithm from Widrow and Hoff is known since 1960. Unfortunately the LMS is a slow learner. The learning speed or convergence rate is controlled by a constant value. This value in the LMS can only be optimized for loud signals or for weak signals. Optimizing for loud signals produces slow convergence with weak signals. Optimizing for weak signals gives divergence with loud signals. Divergence can be defined as "the filter does not reduce the echo but does increase the echo" and is very ugly. The Normalized LMS has a constant convergence rate for loud and weak signals, the convergence rate controlling parameter is derived from the signal energy. For white noise signal, where all frequencies have the same energy, the NLMS performs good. But the human speech has more energy in low frequencies then in high frequencies. Therefore, a NLMS gives good echo cancellation for low frequencies and poor echo cancellation for high frequencies. A pre-whitening filter in front of the echo cancellation filter transforms human speech into something more "white noise" like - the energy of high frequency signals is similar to the energy of low frequency signals. The presented algorithm uses the most simple pre-whitening filter possible, a first order or one pole highpass filter with transfer frequency equal to half of the sample frequency (4kHz for the narrowband sample frequency of 8kHz). Because the pre-whitening filter is fixed, the complexity of this NLMS-pw filter is still the same as for the NLMS filter. One important point should be remembered: The AEC in your telephony device helps your telephony partner to hear no echo. Therefore AEC is an altruistic algorithm. 2. AEC PRINCIPLES The core of the acoustic echo cancellation is described in the introduction. Next to the NLMS-pw three more blocks are used: 1.) A highpass filter for the microphone signal. Telephone users are used to a frequency range between 300Hz and 3400Hz. Narrowband VoIP can give 0Hz to 4000Hz. After hearing a VoIP signal with frequencies below 300Hz testers complained about the bad quality. With a 300Hz cut-off filter sound is limited as in telephone. The highpass filter in use is a 6th order infinite impulse response (IIR) filter. IIR filter was used because of its simplicity and low processing demand. 2.) A double talk detector. The AEC filter should only learn if the signal from the microphone is determined from the loudspeaker signal only. If the local or near-end user is talking, the filter can no longer learn successful. Detection of user talking is done by comparing the volume levels of loudspeaker and microphone. This implementation uses the well-known Geigel DTD. 3.) An Acoustic Echo Suppressor (AES) or Non Linear Processor (NLP). If the Double talk detector (DTD) detects "no talking", the microphone signal gets attenuated by 6dB. This is done to suppress echo artefacts. AEC block diagram. Sin is the microphone signal, Rout and Rin is the loudspeaker signal. Sout is the echo-cancelled microphone signal: +--+ + +---+ Sin -->---|HP|--+------->(+)----+-->|NLP|--->-- Sout +--+ | /|\ | +---+ | -| | \|/ | | +---+ +----+ | |DTD|---->|NLMS|<-+ +---+ +----+ /|\ /|\ | | | | Rout -<---------+---------+-----------------<-- Rin Figure 1.) AEC block diagram 3. AEC algorithms This chapter gives the mathematical background to the source code. This document will not give derivations of the algorithms or proofs. See references for more information. 3.1. Infinite Impulse Response (IIR) Highpass Filter IIR lowpass filters are also known as "exponential smoothing". The traditional form of exponential smoothing is: y[n+1] = (1-alpha) * y[n] + alpha * x[n+1] with x[n+1] is the actual measurement value, y[n] is the previous smoothed or lowpass-filtered value, y[n+1] is the actual smoothed value, alpha is the smoothing constant or slowly changeing variable, determining the transfer frequency. After a little algebra the exponential smoothing formula looks this: y[n+1] += alpha*(x[n+1] - y[n]) To move from lowpass to highpass we use the following assumption: highpass = signal - lowpass In this formula "highpass", "signal" and "lowpass" are rather abstract things. The implementation uses the following formulas: lowpassf[i+1] += AlphaHp*(highpassf[i] - lowpassf[i+1]) highpassf[i+1] = highpassf[i] - lowpassf[i+1] with highpassf[i] is the "highpassed" value from the previous filter stage, highpassf[i+1] is the "highpassed" value of this filter stage, lowpassf[i+1] is the "lowpassed" value of this filter stage, AlphaHp is a constant that determines the transfer frequency. Attention: The index i refers to filter stage and should not be confused with the index n above which refers to time. The above two formulas give an attenuation of 3dB below the transfer frequency. To get steeper filters, we use 12 stages. The signal to be "highpassed" is feed in as highpassf[0]. The result is in highpassf[12]. Because the filter attenuates the signal above transfer frequency, a amplification constant of 1.45 or 3.2dB is used. The value for AlphaHp for a 300Hz highpass filter was found empirically. Only one AlphaHp constant for all stages of the 6th order filter is a nice feature of this approach. 3.2. Geigel Double Talk Detector Talk detection can be done with a threshold for the microphone signal only. This approach is very sensitive to the threshold level. A more robust approach is to compare microphone level with loudspeaker level. The threshold in this solution will be a relative one. Because we deal with echo, it is not sufficient to compare only the actual levels, but we have to consider previous levels, too. The Geigel DTD brings these ideas in one simple formula: The last L levels (index 0 for now and index L-1 for L samples ago) from loudspeaker signal are compared to the actual microphone signal. To avoid problems with phase, the absolute values are used. Double talk is declared if: |d| >= c * max(|x[0]|, |x[1]|, .., |x[L-1]|) with |d| is the absolute level of actual microphone signal, c is a threshold value (typical value 0.5 for -6dB or 0.71 for -3dB), |x[0]| is the absolute level of actual loudspeaker signel, |x[L-1]| is the absolute level of loudspeaker signal L samples ago. See references 3, 7, 9. 3.3. Normalized Least Means Square - Pre-Whitening Filter The NLMS-pw, NLMS and LMS are of the gradient descent-based algorithms family. The good features of gradient-descent based algorithms are simplicity and robustness. First we look at the "echo cancelling" formula, the convolution. This formula is used to subtract the (from the loudspeaker signal) estimated microphone signal from the real microphone signal. e = d - X' * W with e is the linear error signal or echo-cancelled microphone signal, d is the desired signal or the microphone signal with echo, X' is the transpose of the loudspeaker signals vector, W is the adaptive weights vector. With a matching vector W the echo cancellation can be perfect. Unfortunately, learning the vector W has limitations. The loudspeaker is not the only audio source at filter learning. Ambient sounds and noises, system internal amplifier and converter noises and non-linearities of loudspeaker and microphone have a negative impact on learning. Due to the LMS simplicity, all elements of W are updated with the same "mikro * e" term. This simple approach makes the LMS robust and only demanding moderate processing resources, but this "one term fits all" approach prevents "perfect" learning, too. The LMS algorithm has the update formula: W[n+1] = W[n] + 2*mikro*e*X[n] with W[n+1] is the new adaptive weights vector, W[n] is the previous adaptive weights vector, mikro is the step size constant or variable, e is the error signal X[n] is the loudspeaker signals vector. The constant scalar mikro becomes a variable in NLMS. This variable is calculated from the loudspeaker signals vector with: 1 mikro = ------ X' * X with X' is the transpose of the loudspeaker signals vector, X is the loudspeaker signals vector. Note: The vector dot product is a scalar. It is the sum of the element-wise multiplication of both vectors. The constant value 2 in the LMS formula changes into a stability "tuneing" constant. For stable adaptation this constant should be between 0 and 2, this NLMS-pw uses a value of 0.5. The NLMS-pw uses for the weights vector update and the calculation of mikro highpass-filtered values of e and X. The filtered values are used because the NLMS converges best with white noise signals, and human voice is not white noise. The fixed highpass filter approach used in this NLMS-pw does not increase the overall complexity. With ef = highpass(e) Xf = highpass(X) we get our NLMS-pw weights vector update formulas: 0.5 mikro = -------- Xf' * Xf W[n+1] = W[n] + mikro*ef*Xf[n] with ef is the highpass-filtered value of e, Xf is the highpass-filtered value of X, and the other values are as above. Both filters are 1. order FIR with a transfer frequency of 4000Hz. For other pre-whitening algorithms see references 6, 8, 9. For non-LMS echo cancellation algorithms see references 6 and 9. 4. References [1] B. Widrow, M. E. Hoff Jr., "Adaptive switching circuits", Western Electric Show and Convention Record, Part 4, pages 96-104, Aug. 1960 [2] B. Widrow, et al, "Stationary and Nonstationary Learning Characteristics of the LMS Adaptive Filter", Proc. of the IEEE, vol. 64 No. 8, pp. 1151-1162, Aug. 1976 [3] D.L. Duttweiler, "A twelve-channel digital echo canceller", IEEE Trans. Commun., Vol. 26, pp. 647-653, May 1978 [4] B. Widrow, S.D. Stearns, Adaptive Signal Processing, Prentice-Hall, 1985 [5] D. Messerschmitt, D. Hedberg, C. Cole, A. Haoui, P. Winship, "Digital Voice Echo Canceller with a TMS32020", Application report SPRA129, Texas Instruments, 1989 [6] R. Storn, "Echo Cancellation Techniques for Multimedia Applications - a Survey", TR-96-046, International Computer Science Institute, Berkeley, Nov. 1996 [7] J. Nikolic, "Implementing a Line Echo Canceller using the block update and NLMS algorithms on the TMS320C54x DSP", Application report SPRA188, Texas Instruments, Apr. 1997 [8] M. G. Siqueira, "Adaptive Filtering Algorithms in Acoustic Echo Cancellation and Feedback Reduction", Ph.D. thesis, University of California, Los Angeles, 1998 [9] T. Gaensler, S. L. Gay, M. M. Sondhi, J. Benesty, "Double-Talk robust fast converging algorithms for network echo cancellation", IEEE trans. on speech and audio processing, vol. 8, No. 6, Nov. 2000 [10] M. Hutson, "Acoustic Echo Cancellation using Digital Signal Processing", Bachelor of Engineering (Honours) thesis, The School of Information Technology and Electrical Engineering, The University of Queensland, Nov 2003 [11] A. Adrian, "Audio Echo Cancellation", Free Software/Open Source Telephony Summit 2004, German Unix User Group, Geilenkirchen, Germany, Jan. 16-20, 2004 Appendix A. The C++ Source Code /*************************************************************** A.1 aec.h ***************************************************************/ #ifndef _AEC_H /* include only once */ /* aec.h * Acoustic Echo Cancellation NLMS-pw algorithm * Author: Andre Adrian, DFS Deutsche Flugsicherung * <Andre.Adrian@dfs.de> * * Version 1.1 */ /* dB Values */ const float M0dB = 1.0f; const float M3dB = 0.71f; const float M6dB = 0.50f; /* dB values for 16bit PCM */ const float M10dB_PCM = 10362.0f; const float M20dB_PCM = 3277.0f; const float M25dB_PCM = 1843.0f; const float M30dB_PCM = 1026.0f; const float M35dB_PCM = 583.0f; const float M40dB_PCM = 328.0f; const float M45dB_PCM = 184.0f; const float M50dB_PCM = 104.0f; const float M55dB_PCM = 58.0f; const float M60dB_PCM = 33.0f; const float MAXPCM = 32767.0f; /* Design constants (Change to fine tune the algorithms */ /* For Normalized Least Means Square - Pre-whitening */ #define NLMS_LEN (240*8) /* maximum NLMS filter length in taps */ const float PreWhiteAlphaTF = (4000.0f/8000.0f); /* FIR controls Transfer Frequency */ /* for Geigel Double Talk Detector */ const float GeigelThreshold = M3dB; const int Thold = 30*8; /* DTD hangover in taps */ const float UpdateThreshold = M30dB_PCM; /* for Non Linear Processor */ const float NLPAttenuation = M0dB; /* Below this line there are no more design constants */ /* Exponential Smoothing or IIR Infinite Impulse Response Filter */ class IIR_HP { float lowpassf; float alphaTF; /* controls Transfer Frequency */ public: IIR_HP() { lowpassf = 0.0f; alphaTF = 0.0f; } void init(float alphaTF_) { alphaTF = alphaTF_; } float highpass(float in) { /* Highpass = Signal - Lowpass. Lowpass = Exponential Smoothing */ lowpassf += alphaTF*(in - lowpassf); return in - lowpassf; } }; #define POL 6 /* -6dB attenuation per octave per Pol */ class IIR_HP6 { float lowpassf[2*POL+1]; float highpassf[2*POL+1]; public: IIR_HP6(); float highpass(float in) { const float AlphaHp6 = 0.075; /* controls Transfer Frequency */ const float Gain6 = 1.45f; /* gain to undo filter attenuation */ highpassf[0] = in; int i; for (i = 0; i < 2*POL; ++i) { /* Highpass = Signal - Lowpass. Lowpass = Exponential Smoothing */ lowpassf[i+1] += AlphaHp6*(highpassf[i] - lowpassf[i+1]); highpassf[i+1] = highpassf[i] - lowpassf[i+1]; } return Gain6*highpassf[2*POL]; } }; /* Recursive single pole FIR Finite Impulse response filter */ class FIR1 { float a0, a1, b1; float last_in, last_out; public: FIR1(); void init(float preWhiteTransferAlpha); float highpass(float in) { float out = a0 * in + a1 * last_in + b1 * last_out; last_in = in; last_out = out; return out; } }; #define NLMS_EXT (10*8) // Extention in taps to reduce mem copies #define DTD_LEN 16 // block size in taps to optimize DTD calculation class AEC { // Time domain Filters IIR_HP6 hp0; // 300Hz cut-off Highpass IIR_HP hp1; // DC-level remove Highpass) FIR1 Fx, Fe; // pre-whitening Highpass for x, e // Geigel DTD (Double Talk Detector) float max_max_x; // max(|x[0]|, .. |x[L-1]|) int hangover; float max_x[NLMS_LEN/DTD_LEN]; // optimize: less calculations for max() int dtdCnt; int dtdNdx; // NLMS-pw float x[NLMS_LEN+NLMS_EXT]; // tap delayed loudspeaker signal float xf[NLMS_LEN+NLMS_EXT]; // pre-whitening tap delayed signal float w[NLMS_LEN]; // tap weights int j; // optimize: less memory copies int lastupdate; // optimize: iterative dotp(x,x) double dotp_xf_xf; // double to avoid loss of precision public: AEC(); /* Geigel Double-Talk Detector * * in d: microphone sample (PCM as floating point value) * in x: loudspeaker sample (PCM as floating point value) * return: 0 for no talking, 1 for talking */ int dtd(float d, float x); /* Normalized Least Mean Square Algorithm pre-whitening (NLMS-pw) * The LMS algorithm was developed by Bernard Widrow * book: Widrow/Stearns, Adaptive Signal Processing, Prentice-Hall, 1985 * * in mic: microphone sample (PCM as floating point value) * in spk: loudspeaker sample (PCM as floating point value) * in update: 0 for convolve only, 1 for convolve and update * return: echo cancelled microphone sample */ float nlms_pw(float mic, float spk, int update); /* Acoustic Echo Cancellation and Suppression of one sample * in d: microphone signal with echo * in x: loudspeaker signal * return: echo cancelled microphone signal */ int AEC::doAEC(int d, int x); }; #define _AEC_H #endif /*************************************************************** A.2 aec.cxx ***************************************************************/ /* aec.cxx * Acoustic Echo Cancellation NLMS-pw algorithm * Author: Andre Adrian, DFS Deutsche Flugsicherung * <Andre.Adrian@dfs.de> * * Version 1.1 */ #include <stdio.h> #include <stdlib.h> #include <math.h> #include <string.h> #include "aec.h" IIR_HP6::IIR_HP6() { memset(this, 0, sizeof(IIR_HP6)); } /* Vector Dot Product */ float dotp(float a[], float b[]) { float sum0 = 0.0, sum1 = 0.0; int j; for (j = 0; j < NLMS_LEN; j+= 2) { // optimize: partial loop unrolling sum0 += a[j] * b[j]; sum1 += a[j+1] * b[j+1]; } return sum0+sum1; } /* * Algorithm: Recursive single pole FIR high-pass filter * * Reference: The Scientist and Engineer's Guide to Digital Processing */ FIR1::FIR1() { } void FIR1::init(float preWhiteTransferAlpha) { float x = exp(-2.0 * M_PI * preWhiteTransferAlpha); a0 = (1.0f + x) / 2.0f; a1 = -(1.0f + x) / 2.0f; b1 = x; last_in = 0.0f; last_out = 0.0f; } AEC::AEC() { hp1.init(0.01f); /* 10Hz */ Fx.init(PreWhiteAlphaTF); Fe.init(PreWhiteAlphaTF); max_max_x = 0.0f; hangover = 0; memset(max_x, 0, sizeof(max_x)); dtdCnt = dtdNdx = 0; memset(x, 0, sizeof(x)); memset(xf, 0, sizeof(xf)); memset(w, 0, sizeof(w)); j = NLMS_EXT; lastupdate = 0; dotp_xf_xf = 0.0f; } float AEC::nlms_pw(float mic, float spk, int update) { float d = mic; // desired signal x[j] = spk; xf[j] = Fx.highpass(spk); // pre-whitening of x // calculate error value (mic signal - estimated mic signal from spk signal) float e = d - dotp(w, x + j); float ef = Fe.highpass(e); // pre-whitening of e if (update) { if (lastupdate) { // optimize: iterative dotp(xf, xf) dotp_xf_xf += (xf[j]*xf[j] - xf[j+NLMS_LEN-1]*xf[j+NLMS_LEN-1]); } else { dotp_xf_xf = dotp(xf+j, xf+j); } // calculate variable step size float mikro_ef = 0.5f * ef / dotp_xf_xf; // update tap weights (filter learning) int i; for (i = 0; i < NLMS_LEN; i += 2) { // optimize: partial loop unrolling w[i] += mikro_ef*xf[i+j]; w[i+1] += mikro_ef*xf[i+j+1]; } } lastupdate = update; if (--j < 0) { // optimize: decrease number of memory copies j = NLMS_EXT; memmove(x+j+1, x, (NLMS_LEN-1)*sizeof(float)); memmove(xf+j+1, xf, (NLMS_LEN-1)*sizeof(float)); } return e; } int AEC::dtd(float d, float x) { // optimized implementation of max(|x[0]|, |x[1]|, .., |x[L-1]|): // calculate max of block (DTD_LEN values) x = fabsf(x); if (x > max_x[dtdNdx]) { max_x[dtdNdx] = x; if (x > max_max_x) { max_max_x = x; } } if (++dtdCnt >= DTD_LEN) { dtdCnt = 0; // calculate max of max max_max_x = 0.0f; for (int i = 0; i < NLMS_LEN/DTD_LEN; ++i) { if (max_x[i] > max_max_x) { max_max_x = max_x[i]; } } // rotate Ndx if (++dtdNdx >= NLMS_LEN/DTD_LEN) dtdNdx = 0; max_x[dtdNdx] = 0.0f; } // The Geigel DTD algorithm with Hangover timer Thold if (fabsf(d) >= GeigelThreshold * max_max_x) { hangover = Thold; } if (hangover) --hangover; if (max_max_x < UpdateThreshold) { // avoid update with silence or noise return 1; } else { return (hangover > 0); } } int AEC::doAEC(int d, int x) { float s0 = (float)d; float s1 = (float)x; // Mic Highpass Filter - telephone users are used to 300Hz cut-off s0 = hp0.highpass(s0); // Spk Highpass Filter - to remove DC s1 = hp1.highpass(s1); // Double Talk Detector int update = !dtd(s0, s1); // Acoustic Echo Cancellation s0 = nlms_pw(s0, s1, update); // Acoustic Echo Suppression if (update) { // Non Linear Processor (NLP): attenuate low volumes s0 *= NLPAttenuation; } // Saturation if (s0 > MAXPCM) { return (int)MAXPCM; } else if (s0 < -MAXPCM) { return (int)-MAXPCM; } else { return (int)roundf(s0); } } /*************************************************************** A.3 aec_test.cxx ***************************************************************/ /* aec_test.cxx * Test stub for Acoustic Echo Cancellation NLMS-pw algorithm * Author: Andre Adrian, DFS Deutsche Flugsicherung * <Andre.Adrian@dfs.de> * * Version 1.1 */ #include <stdio.h> #include <stdlib.h> #include <math.h> #include <string.h> #include "aec.h" #define TAPS (80*8) typedef signed short MONO; typedef struct { signed short l; signed short r; } STEREO; /* Read a raw audio file (8KHz sample frequency, 16bit PCM, stereo) * from stdin, echo cancel it and write it to stdout */ int main(int argc, char *argv[]) { STEREO inbuf[TAPS], outbuf[TAPS]; fprintf(stderr, "usage: aec_test <in.raw >out.raw\n"); AEC aec; int taps; while (taps = fread(inbuf, sizeof(STEREO), TAPS, stdin)) { int i; for (i = 0; i < taps; ++i) { int s0 = inbuf[i].l; /* left channel microphone */ int s1 = inbuf[i].r; /* right channel speaker */ /* and do NLMS*/ s0 = aec.doAEC(s0, s1); /* copy back */ outbuf[i].l = 0; /* left channel silence */ outbuf[i].r = (MONO)(s0); /* right channel echo cancelled mic */ } fwrite(outbuf, sizeof(STEREO), taps, stdout); } fflush(NULL); return 0; } /*************************************************************** A.4 Compile source code ***************************************************************/ On a Linux system with GNU C++ compiler enter: g++ aec_test.cxx aec.cxx -o aec_test -lm /*************************************************************** A.5 Test source code ***************************************************************/ The microphone and loudspeaker signals have to be synchronized on a sample-to-sample basis to make acoustic echo cancellation working. An AC97 conformal on-board soundcard in a Personal Computer can be set in a special stereo mode: The left channnel records microphone signal and the right channel reports loudspeaker signal. To set-up a Linux PC with ALSA sound system, microphone connected to Mic in and loudspeaker connected to right Line out enter: amixer -q set 'Master',0 50% unmute amixer -q set 'PCM',0 80% unmute amixer -q set 'Line',0 0% mute amixer -q set 'CD',0 0% mute amixer -q set 'Mic',0 0% mute amixer -q set 'Video',0 0% mute amixer -q set 'Phone',0 0% mute amixer -q set 'PC Speaker',0 0% mute amixer -q set 'Aux',0 0% mute amixer -q set 'Capture',0 50%,0% amixer -q set 'Mic Boost (+20dB)',0 1 amixer -q cset iface=MIXER,name='Capture Source' 0,5 amixer -q cset iface=MIXER,name='Capture Switch' 1 To test the acoustic echo cancellation we simulate a real telephone conversation in 5 steps: (1) record far-end speaker, (2) perform acoustic echo cancellation (this should change nothing) (3) playback far-end speaker and at the same time record near-end speaker (4) perform acoustic echo cancellation (5) playback near-end speaker (far-end speech should be cancelled) To record 10 seconds of speech into the file b.raw enter: arecord -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 -d 10 >b.raw To perform AEC at the far-end enter: ./aec_test <b.raw >b1.raw To playback file b1.raw and simultaneously record b2.raw enter both commands in one go: aplay -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 b1.raw & arecord -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 -d 10 >b2.raw To perform AEC at the near-end enter: ./aec_test <b2.raw >b3.raw To playback the echo-cancelled near-end enter: aplay -D plug:hw:0 -c 2 -t raw -f S16_LE -r 8000 b3.raw DFS Deutsche Flugsicherung GmbH TWR-Süd, Gebäude 501 Frankfurt - Flughafen D - 60549 Frankfurt Tel.: +49-(0)69-69766-101 Fax: +49-(0)69-69766-105 Home Page: http://www.dfs.de -----BEGIN PGP PUBLIC KEY BLOCK----- Version: PGP 6.5.8 mQGiBECbVgwRBAD09k4R2DiCObeUeO+FZCBJ8OkjzEIQ3niUMHSwlQmX5prKCJQe NjEGvsS4Ex6qdYQ/awmXkNtOpsF0mN3aBoKUyRDF6KkkfsTNYQQ6WyK5RHu2Q4wQ G93DL+Ryhgs2oNH3Ou4FbEiYATJCl14fpxd08D0DCsmL0ZfeaZlZeBCUzwCg/8sY qJ2uSj5JgHWEp170menK6CUEAIlI3gXegKbBY1PFSpzNpjVGQJg9bQR4B6tqdASP nLfsQR+1BIIz0WFgiIickqPSRbGYP7slpw9onE43su3HVg2sBMI25Q5kK6WujPUG n72PDy8yogXCcYS807FcqMqKTqYjiRQxbcQn3gJaoTau0/HJTHF9jES89SyIDXdm CjphA/9FZ0tmotILaxyL53X8G01lf28NhykkGzbBTiIAsgTcvCx6b1GxBwUb/WlL KmWG3kjwSsZxtPzrUPN3Z83pavfCQI4E9tNI4mVgX9gtklKoVtJPglu2jPrJ+umZ UO78anBrsTnPzOJ954+uziMe3imsFAC8T2gAmgsAvZgZP98gBLQYREZTIEdtYkgg PHB1YmtleUBkZnMuZGU+iQBOBBARAgAOBQJAm1YMBAsDAQICGQEACgkQN3h5OLny dHrchQCgmuRvdqRthFARXOQatgKCc+5pWs4AoPkSU2XeYbNq4AVmv0BJOpRgOsCJ uQMNBECbVosQDADMHXdXJDhK4sTw6I4TZ5dOkhNh9tvrJQ4X/faY98h8ebByHTh1 +/bBc8SDESYrQ2DD4+jWCv2hKCYLrqmus2UPogBTAaB81qujEh76DyrOH3SET8rz F/OkQOnX0ne2Qi0CNsEmy2henXyYCQqNfi3t5F159dSST5sYjvwqp0t8MvZCV7cI fwgXcqK61qlC8wXo+VMROU+28W65Szgg2gGnVqMU6Y9AVfPQB8bLQ6mUrfdMZIZJ +AyDvWXpF9Sh01D49Vlf3HZSTz09jdvOmeFXklnN/biudE/F/Ha8g8VHMGHOfMlm /xX5u/2RXscBqtNbno2gpXI61Brwv0YAWCvl9Ij9WE5J280gtJ3kkQc2azNsOA1F HQ98iLMcfFstjvbzySPAQ/ClWxiNjrtVjLhdONM0/XwXV0OjHRhs3jMhLLUq/zzh sSlAGBGNfISnCnLWhsQDGcgHKXrKlQzZlp+r0ApQmwJG0wg9ZqRdQZ+cfL2JSyIZ Jrqrol7DVelMMm8AAgIL/2zbjaNlPL+13ZFiJwAGg0yj4zciLkp141Pwvn2OtY+B JZxnIfcPKINj2f5QiW4weqV9OMJ5EgZcx8aRxkk5uJsJv3S1JFUUNaSwCl0xynpr Spw5QsoCAQAhzmOlqj1tvCJW3bm3iniiud6UzGjbdpvU9oeiSOGMFYVpfGCHC5fb 4TnnsLcrmARXh3COKle27X7TGOROUWyxqKWdHvBsMEjO2ERF2A+nMEYz4dd8kezd Iiw9hjftJtp9GpCJ5CWq4jcyQ5Bb+D0IUqI0FdH9Mfe8ytMnDRwDPH1r9FaCNkaH Q+8Aqp20QbSHe03CaT8UbYziNCNdzCFt4QjDqAfDsTKEHGeBzKfBprsKbox6CURk IikAiUX0YE1P3bxH2ovP5bxEormlPfFN870QYNZYmo03hX41H6LnOaI4YaHzfiXG Plrm/mtkDryXoqA57f09vcQcAmS6Qa50qyqheGK49lSM9MndqXGWrmddtccE3qUJ /U1UAxqX11l80Yz8Wk+brokARgQYEQIABgUCQJtWiwAKCRA3eHk4ufJ0enLHAJ9R 3Z0uPt+U+qSJU/63IpU/y+Ho3QCgg571CpdVdsohBeaF21f4uckz3nU= =h1ys -----END PGP PUBLIC KEY BLOCK----- _______________________________________________ Audio/Video Transport Working Group avt@ietf.org https://www1.ietf.org/mailman/listinfo/avt
- [AVT] Acoustic Echo cancellation memo Andre.Adrian