A widely linear model for stereophonic acoustic echo ... - CiteSeerX

Comment

Report 0 Downloads 110 Views

Signal Processing 93 (2013) 511–516

Contents lists available at SciVerse ScienceDirect

Signal Processing journal homepage: www.elsevier.com/locate/sigpro

Fast communication

A widely linear model for stereophonic acoustic echo cancellation c ¨ Cristian Stanciu a, Jacob Benesty b, Constantin Paleologu a,n, Tomas Gansler , Silviu Ciochin˘a a a

University Politehnica of Bucharest, 1-3, Iuliu Maniu Blvd., 061071 Bucharest, Romania INRS-EMT, University of Quebec, Montreal, QC, Canada H5A 1K6 c mh Acoustics, 25-A Summit Avenue, Summit, NJ 07901, USA b

a r t i c l e i n f o

abstract

Article history: Received 8 March 2012 Received in revised form 4 June 2012 Accepted 17 August 2012 Available online 29 August 2012

The stereophonic acoustic echo, due to the coupling between two loudspeakers and two microphones, can be modelled by a two-input/two-output system with real random variables. In this paper, we recast the problem as a single-input/single-output system with complex random variables, by using the widely linear (WL) model, and propose a new distortion method that ﬁts well in this context. In order to illustrate the behavior of this scheme, the recursive least-squares (RLS)-dichotomous coordinate descent (DCD) algorithm is used. Experimental results indicate that the RLS-DCD algorithm represents an attractive choice for this application since it has good numerical features in terms of stability and complexity. & 2012 Elsevier B.V. All rights reserved.

Keywords: Stereophonic acoustic echo cancellation Widely linear (WL) model Nonlinear distortion Recursive least-squares (RLS)-dichotomous coordinate descent (DCD) algorithm

1. Introduction Research and development of stereophonic acoustic echo cancellation (SAEC) systems have been a subject of interest over the last two decades [1,2]. The stereo transmission, which can provide schemes for telepresence along with our binaural hearing system, is becoming very popular in hands-free teleconferencing systems. In the usual approach, an SAEC system consists of four adaptive ﬁlters aiming at identifying four echo paths from two loudspeakers to two microphones. For each microphone in the receiving (i.e., near-end) location, the SAEC consists of the identiﬁcation of a two-input unknown system, consisting of the parallel combination of two acoustic echo paths (from the two loudspeakers to the microphone). The main challenge of SAEC is that the two channels may carry linearly related signals, which in turn may n

Corresponding author. E-mail addresses: [email protected] (C. Stanciu), [email protected] (J. Benesty), [email protected] (C. Paleologu), ¨ [email protected] (T. Gansler), [email protected] (S. Ciochin˘a). 0165-1684/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2012.08.017

make the normal equation, to be solved by the adaptive algorithm, singular. This implies that there is no unique solution to the equation (as in the single-channel case) but an inﬁnite number of solutions [3]. This nonuniqueness problem can be solved by using a preprocessor on the loudspeaker signals in order to reduce their coherence and thereby remove the singularity [4]. Of course, this distortion should not affect too much the stereo perception and the sound quality. In this paper, we present a different approach for SAEC by using the widely linear (WL) model [5,6]; in this framework, the classical two-input/two-output scheme with real random variables is recasted as a single-input/ single-output system with complex random variables. Also, we develop a new nonlinear distortion method which is more suitable for the WL model of SAEC. Besides, we propose to use the recursive least-squares (RLS)dichotomous coordinate descent (DCD) [7,8] as an alternative to the classical choice for SAEC, i.e., the fast RLS (FRLS) algorithm [1,2]. Simulation results indicate that the RLS-DCD algorithm represents an attractive choice for SAEC, in terms of convergence rate, stability, and complexity.

512

C. Stanciu et al. / Signal Processing 93 (2013) 511–516

2. The WL model for SAEC Let us consider the stereophonic setup, where we have two input or loudspeaker signals denoted by xL ðnÞ and xR ðnÞ (i.e., ‘‘left’’ and ‘‘right’’), and two output or microphone signals denoted by dL ðnÞ and dR ðnÞ, where n is the time index. In the receiving location, the microphone signals are obtained as dL ðnÞ ¼ yL ðnÞ þ vL ðnÞ,

ð1Þ

dR ðnÞ ¼ yR ðnÞ þ vR ðnÞ,

ð2Þ

where yL ðnÞ and yR ðnÞ denote the stereo echo signals, and vL ðnÞ and vR ðnÞ are the near-end signals (i.e., noise or a combination of noise and near-end speech). The echo signals can be modelled as [3,4] T

T

T

T

yL ðnÞ ¼ ht,LL xL ðnÞ þ ht,RL xR ðnÞ, yR ðnÞ ¼ ht,LR xL ðnÞ þht,RR xR ðnÞ,

ð3Þ ð4Þ

where ht,LL ,ht,RL ,ht,LR ,ht,RR are L-dimensional vectors of the loudspeaker-to-microphone (true) acoustic impulse responses, the superscript T denotes transposition, and xL ðnÞ ¼ ½xL ðnÞ xL ðn1Þ xL ðnL þ1ÞT xR ðnÞ ¼ ½xR ðnÞ xR ðn1Þ xR ðnL þ1ÞT

comprise the L most recent loudspeaker signal samples. In order to cancel the echo, we need to estimate the four acoustic impulse responses, ht,LL ,ht,RL ,ht,LR ,ht,RR , from the microphone signals dL ðnÞ and dR ðnÞ. We propose to recast the classical two-input/two-output scheme (with real random variables) as a singleinput/single-output system with complex random variables (CRVs). First, we can form the CRV dðnÞ ¼ dL ðnÞ þjdR ðnÞ ¼ yðnÞ þ vðnÞ, ð5Þ pﬃﬃﬃﬃﬃﬃﬃ where j ¼ 1, yðnÞ ¼ yL ðnÞ þ jyR ðnÞ, and vðnÞ ¼ vL ðnÞ þ jvR ðnÞ. Next, let us deﬁne the complex random vector xðnÞ ¼ xL ðnÞ þjxR ðnÞ:

ð6Þ

In this context, the (complex) echo signal can be obtained as H

0H

yðnÞ ¼ ht xðnÞ þ ht xn ðnÞ,

h0t

¼ h0t,1 þ jh0t,2 ,

ð8Þ

with ht,1 ¼ ðht,LL þ ht,RR Þ=2, ht,2 ¼ ðht,RL ht,LR Þ=2, ¼ ðht,LL ht,RR Þ=2, and h0t,2 ¼ ðht,RL þ ht,LR Þ=2. Alternatively, we can express (7) as ð10Þ

T 0T ~ ¼ ½xT ðnÞ xH ðnÞT . Therefore, where h~ t ¼ ½ht ht T and xðnÞ the complex observation is H ~ dðnÞ ¼ h~ t xðnÞ þ vðnÞ:

ð12Þ

be the error signal at time n. In this context, Fig. 1 depicts the proposed WL model for SAEC. As compared to the classical SAEC approach [3,4], which requires four adaptive ﬁlters of length L, the proposed model involves only one ﬁlter of length 2L. On the other hand, we are dealing now with CRVs having both real and imaginary parts. Apparently, the overall complexity of the proposed model is similar to the classical approach. However, there are many other aspects that should be taken into account in practice, e.g., numerical effects (since in SAEC we use in general RLS-based algorithms which encounter speciﬁc numerical problems in ﬁnite precision), implementation issues, memory usage, etc.; consequently, it could be more convenient to handle only one adaptive ﬁlter instead of four such systems. Moreover, as we will show in the next section, there are other speciﬁc features of the CRVs which could be exploited in order to improve the overall performance of the SAEC scheme. To end this section, let us redeﬁne in the context of the WL model one of the most used performance measure in echo cancellation, which is the so-called normalized misalignment [1]. It quantiﬁes directly how ‘‘well’’ (in terms of convergence, tracking, and accuracy to the solution) an adaptive ﬁlter converges to the impulse response of the system that needs to be identiﬁed. The normalized misalignment (in dB) in the WL context is deﬁned as MisðnÞ ¼ 20 log10

~ Jh~ t hðnÞJ 2 ðdBÞ, ~ Jh t J

ð13Þ

2

where J J2 denotes the ‘2 norm.

3. A new distortion for the WL model

ð9Þ h0t,1

H ~ yðnÞ ¼ h~ t xðnÞ,

H ~ eðnÞ ¼ dðnÞh~ ðn1ÞxðnÞ

ð7Þ

where the superscripts H and n denote transposeconjugate and conjugate, respectively, and ht ¼ ht,1 þ jht,2 ,

It can be noticed that we are dealing now with a complex acoustic impulse response of length 2L, i.e., h~ t , whose complex input and output are, respectively, x(n) and d(n). From (7) or (10), we recognize the widely linear (WL) model for CRVs proposed in [5]. Thanks to the WL model, the two-input/two-output system with real random variables was converted to a single-input/singleoutput system with CRVs. This approach is in line with the duality principle [9]. Finally, the new goal is to estimate the system h~ t in ~ order to cancel the echo. Let hðnÞ be an adaptive ﬁlter (which is an estimate of h~ t ) and let

ð11Þ

As it was discussed in Section 1, it may be required to distort the input signals xL ðnÞ and xR ðnÞ, in order to have a unique solution to the SAEC problem. Reducing the coherence between these two signals will lead to a better estimate of the true acoustic impulse responses [3,4,10]. Obviously, this distortion should be performed without affecting too much the quality of the signals and the stereo effect. A simple but efﬁcient nonlinear method uses positive and negative half-wave rectiﬁers on each channel respectively [4].

C. Stanciu et al. / Signal Processing 93 (2013) 511–516

513

Fig. 1. The WL model for SAEC.

where

The nonlinearly transformed signals become x0L ðnÞ ¼ xL ðnÞ þ ar

xL ðnÞ þ 9xL ðnÞ9 , 2

ð14Þ

x0R ðnÞ ¼ xR ðnÞ þ ar

xR ðnÞ9xR ðnÞ9 : 2

ð15Þ 9x0 ðnÞ9 ¼

ð16Þ

where yr ðnÞ [with tan yr ðnÞ ¼ xR ðnÞ=xL ðnÞ] and 9xðnÞ9 ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ x2L ðnÞ þx2R ðnÞ are the phase and module of x(n), respectively. In this formulation, we represent the stereo perception with yr ðnÞ and the quality of the stereo signals with 9xðnÞ9. A modiﬁcation of yr ðnÞ only, will mostly affect the stereo effect of x(n); while a modiﬁcation of 9xðnÞ9 will mostly affect the quality of the stereo signals. With the complex notation, (14) and (15) can be expressed as 0

x0 ðnÞ ¼ x0L ðnÞ þ jx0R ðnÞ ¼ ejyr ðnÞ 9x0 ðnÞ9,

x0R ðnÞ ar þ 2 þ ar sgn½xL ðnÞ ¼ tan yr ðnÞ x0L ðnÞ ar þ 2ar sgn½xR ðnÞ

ð18Þ

and

where ar is a parameter used to control the amount of nonlinearity. Experiments show that stereo perception is not affected by this method even with ar as large as 0.5. Also, the audible distortion introduced for speech is small because of the nature of the speech signal and psychoacoustic masking effects [11]. In the following, we propose a new distortion that ﬁts well with the WL model. The complex input signal can be expressed as xðnÞ ¼ xL ðnÞ þ jxR ðnÞ ¼ ejyr ðnÞ 9xðnÞ9,

tan y0r ðnÞ ¼

ð17Þ

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 ð1þ ar þ 0:5a2r Þ9xðnÞ9 þ ar ð1 þ 0:5a2r Þ½xL ðnÞ9xL ðnÞ9xR ðnÞ9xR ðnÞ9:

From the two previous expressions, we observe that both the phase and the module are modiﬁed with a nonlinear distortion. Amazingly, even with a value of ar as large as 0.5, the stereo effect is not affected. This is likely due to the fact that the phase is not changed randomly, like in some other approaches, but according to the changes of the stereo signals. The speciﬁc SAEC problem of nonuniqueness happens because the signals xL ðnÞ and xR ðnÞ are linearly related. Let us consider the worst case scenario, where xL ðnÞ is equal to xR ðnÞ, i.e., xL ðnÞ ¼ xR ðnÞ,

8n:

ð19Þ

In this situation, (17) becomes qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 0 x0 ðnÞ ¼ 1 þ ar þ 0:5a2r ejyr ðnÞ 9xðnÞ9,

ð20Þ

where tan y0r ðnÞ ¼ ðar þ 1Þtan yr ðnÞ tan y0r ðnÞ ¼

1

ar þ 1

tan yr ðnÞ

if xL ðnÞ 4 0, if xL ðnÞ o0:

ð21Þ

ð22Þ

514

C. Stanciu et al. / Signal Processing 93 (2013) 511–516

We see that the module is not affected since ar is constant across time but y0r ðnÞ depends on xL ðnÞ ¼ xR ðnÞ. As a result, only the phase is changed. While xL ðnÞ ¼ xR ðnÞ, x0L ðnÞax0R ðnÞ and the transformed signals are no more linearly related. We know by experience that, even in this difﬁcult scenario, the misalignment is improved with the nonlinear transformations. This suggests that we may not really need to modify the module of the complex signal, x(n). Therefore, we propose the new following transformations: x00L ðnÞ ¼ cos y0r ðnÞ9xðnÞ9,

ð23Þ

x00R ðnÞ ¼ sin y0r ðnÞ9xðnÞ9:

ð24Þ

Clearly, the phase is computed from the half-wave rectiﬁers [see (18)] while the module corresponds to the module of the original signals. As a consequence, with (23) and (24) we may have the same misalignment as with (14) and (15) but with the advantage of little distortion. So we can even increase the value of ar to have better performance as long as the stereo effect is not much affected. 4. The RLS-DCD algorithm Thanks to their fast convergence rate, RLS-based algorithms are very popular in SAEC. In the following, we slightly change the notation for convenience. Let us redeﬁne the input signal vector (of length 2L) as ~ xðnÞ ¼ ½vT ðnÞ vT ðn1Þ vT ðnLþ 1ÞT ,

ð25Þ

T

where vðnÞ ¼ ½xðnÞ x ðnÞ . Thus, the new deﬁnitions of the true impulse response and the adaptive ﬁlter are n

Table 1 RLS-DCD algorithm. ~ Initialization: hð0Þ ¼ 0,rð0Þ ¼ 0,Rx~ ð0Þ ¼ dI2L For n¼ 1,2,y ~ x~ H ðnÞ Step 1: Rx~ ðnÞ ¼ lRx~ ðn1Þ þ xðnÞ H ~ Step 2: eðnÞ ¼ dðnÞh~ ðn1ÞxðnÞ n ~ Step 3: pðnÞ ¼ lrðn1Þ þ xðnÞe ðnÞ ~ ~ Step 4: R ~ ðnÞDhðnÞ ¼ pðnÞ ) DhðnÞ,rðnÞ x

(to be solved with DCD iterations) ~ ~ ~ Step 5: hðnÞ ¼ hðn1Þ þ DhðnÞ

nonstationary signals like speech, the FRLS algorithm is not stable and needs to be restarted when instability is detected (by using a speciﬁc variable), which is not always easy. In order to overcome this issue, we propose to use the RLS-DCD algorithm [7,8] in the context of the WL model for SAEC. Following the steps presented in [7], the normal equation (27) can be recursively solved as shown in Table 1, where the step 4 involves the DCD iterations [12]. In this table, rðnÞ is the so-called residual vector of the ~ solution [7], DhðnÞ is the increment of the ﬁlter weights (or the solution vector of the DCD algorithm), d denotes the initialization constant, and I2L is the 2L 2L identity matrix. The arithmetic complexity can be greatly reduced in the ﬁrst step of the algorithm, taking into account that ~ the vector xðnÞ has the time shift property [see (25)] and the matrix Rx~ ðnÞ is Hermitian. Thus, only the ﬁrst two columns of this matrix have to be computed, i.e., ~ Rð1Þ ðnÞ ¼ lRð1Þ ðn1Þ þxn ðnÞxðnÞ, x~ x~

ð28Þ

~ hðnÞ ¼ ½h0 ðnÞ h00 ðnÞ hL1 ðnÞ h0L1 ðnÞT ,

~ ðnÞ ¼ lRð2Þ ðn1Þ þxðnÞxðnÞ, Rð2Þ x~ x~

ð29Þ

where ht,l , h0t,l , hl(n), and h0l ðnÞ, with l ¼ 0,1, . . . ,L1, are the 0 elements of the vectors ht , h0t , hðnÞ, and h ðnÞ, respectively (i.e., the vectors are interleaved now instead of being concatenated). Obviously, these new deﬁnitions do not change the deﬁnition of d(n) [see (11)]. Next, we deﬁne the least-squares (LS) error criterion as [2]

since the lower-right ð2L2Þ ð2L2Þ block of Rx~ ðnÞ can be obtained by copying the ð2L2Þ ð2L2Þ upper-left block of Rx~ ðn1Þ. The DCD algorithm [12] is based on coordinate descent iterations with a power of two variable step-size. It does not need multiplications or divisions (these operations are simply replaced by bit-shifts), but only additions, so that it is well suited to hardware implementation. In our case, the auxiliary normal equation from step 4 is solved by using the complex-valued cyclic DCD algorithm [8], ~ where the solution vector DhðnÞ is updated in directions of Euclidian coordinates in a cyclic manner. There are two pre-deﬁned parameters that have to be selected within the DCD algorithm. First, we need to ﬁx M b , i.e., the number of bits used for a ﬁxed-point representation of elements of the solution vector. The second parameter to be chosen is N u , i.e., the maximum number of allowed updates or ‘‘successful’’ iterations [7,8]. Due to the lack of space we do not further detail the DCD algorithm. Insightful analysis and detailed implementation of this algorithm can be found in [7,8], respectively. Most important, the arithmetic complexity of this algorithm is proportional to 2LN u (where Nu 5L) but using only additions, which is very attractive in practice.

h~ t ¼ ½ht,0 h0t,0

~ J½hðnÞ ¼

n X

ht,L1 h0t,L1 T ,

H

2

~ lni 9dðiÞh~ ðnÞxðiÞ9 ,

ð26Þ

i¼1

where l ð0 5 l o 1Þ is the forgetting factor, which inﬂuences the memory of the data in the different statistics ~ ~ estimates. The minimization of J½hðnÞ with respect to hðnÞ leads to the normal equation [2]: ~ Rx~ ðnÞhðnÞ ¼ pxd ð27Þ ~ ðnÞ, Pn Pn ni H ~ x~ ðiÞ and pxd where Rx~ ðnÞ ¼ i ¼ 1 l xðiÞ ~ ðnÞ ¼ i¼1 n ~ lni xðiÞd ðiÞ. The classical RLS algorithm [2] was developed in order to recursively solve (27); unfortunately, its arithmetic complexity is proportional to ð2LÞ2 , which is prohibitively high for SAEC implementations. The FRLS algorithm [2] further reduces the computational amount; the arithmetic complexity of this algorithm is proportional to 2L, which is realizable in practice. However, with

C. Stanciu et al. / Signal Processing 93 (2013) 511–516

5. Simulation results

0

RLS−DCD, Nu= 8 RLS−DCD, Nu= 16

MSE (dB)

−10 −15 −20 −25 −30 0

10

20

30 40 Time (seconds)

50

60

Fig. 3. MSE of the RLS algorithm and RLS-DCD algorithm using different values for N u and M b ¼ 16. The forgetting factor is l ¼ 11=ð14LÞ, with L¼ 512. The source signal is speech; preprocessing with (23) and (24) and ar ¼ 0:3.

0

without distortion positive and negative half−wave rectifiers new distortion

−2 −4 −6 −8 −10 −12 −14 −16 0

10

20 30 40 Time (seconds)

50

60

Fig. 4. Misalignment of the RLS-DCD algorithm (using N u ¼ 8 and M b ¼ 16) for different types of distortion with ar ¼ 0:3. The forgetting factor is l ¼ 11=ð14LÞ, with L ¼512. The source signal is speech.

0 RLS RLS−DCD, Nu= 4

−2

RLS RLS−DCD, Nu = 4

−5

Misalignment (dB)

Simulations are performed in the context of the proposed WL model for SAEC (Fig. 1). The acoustic impulse responses in the far-end location [i.e., g L ðnÞ and g R ðnÞ] have 2048 coefﬁcients, while the length of the impulse responses in the near-end location [i.e., ht,LL ðnÞ, ht,RL ðnÞ, ht,LR ðnÞ, and ht,RR ðnÞ] is L¼512. The length of the adaptive ~ ﬁlter hðnÞ is 2L ¼ 1024 and the sampling rate is 8 kHz. The source signal in the far-end location is a speech sequence. All simulations are performed in the single-talk scenario, i.e., the absence of a near-end talker. In this case, the nearend signal v(n) consists only of the background noise. We can deﬁne the stereo echo-to-noise ratio (SENR) [which is equivalent to the signal-to-noise ratio (SNR)] as SENR ¼ s2y =s2v , where s2y ¼ E½9yðnÞ92 and s2v ¼ E½9vðnÞ92 are the variances of y(n) and v(n), respectively. In our simulations, the background noise in the near-end is an independent white Gaussian signal and its level is set such that SENR¼ 30 dB. Two performance measures are used, i.e., (a) the normalized misalignment (in dB) as deﬁned in (13) and (b) the mean-square error (MSE) averaged over 256 points for the purpose of smoothing the results. In the ﬁrst experiment, the classical RLS algorithm with the WL model [2] is compared with the RLS-DCD using different values of N u . The proposed distortion [see (23)–(24))] (with ar ¼ 0:3) is used to preprocess the farend microphone signals. The forgetting factor for all the algorithms is l ¼ 11=ð14LÞ and the initialization constant is d ¼ 0:001. For the initialization of the RLS-DCD algorithm we used M b ¼ 16. For this ﬁrst simulation, the misalignment plots are given in Fig. 2 and the corresponding MSE curves are depicted in Fig. 3. As we can notice from Fig. 2, the convergence rate of the RLS-DCD algorithm is improved when Nu increases, but up to a certain value, i.e., N u ¼ 8. In this case, the RLS-DCD outperforms the classical RLS algorithm in terms of misalignment level.

Misalignment (dB)

515

RLS−DCD, Nu= 8 RLS−DCD, Nu= 16

−4 −6 −8 −10 −12 −14 −16 0

10

20

30

40

50

60

Time (seconds) Fig. 2. Misalignment of the RLS algorithm and RLS-DCD algorithm using different values for N u and M b ¼ 16. The forgetting factor is l ¼ 11=ð14LÞ, with L¼512. The source signal is speech; preprocessing with (23) and (24) and ar ¼ 0:3.

Besides, according to Fig. 3, the RLS-DCD algorithm also outperforms the classical RLS in terms of the MSE. In the second experiment, we compare the performance of the RLD-DCD algorithm (with Nu ¼ 8) using positive and negative half-wave rectiﬁers [see (14)–(15)] versus the new proposed distortion [see (23)–(24)]; also, the case without distortion is shown as a reference. The distortion parameter is set to ar ¼ 0:3. Other parameters are the same as in the previous simulation. It can be noticed from Fig. 4 that the misalignment is reduced by the new distortion. Also, according to Fig. 5, the new distortion leads to a better performance in terms of the MSE as compared to the positive and negative half-wave rectiﬁers method. To justify this behavior, we depicted in Fig. 6 the coherence function between the two channels (estimated using the Welch method). We can see that the new distortion leads to a weaker coherence between the

516

C. Stanciu et al. / Signal Processing 93 (2013) 511–516

variables by using the WL model. As a consequence, the four real-valued acoustic impulse responses are converted to one complex-valued impulse response. The main advantage of this approach is that instead of handling two (real) output signals separately, we only handle one (complex) output signal. In this framework, we proposed a new distortion that ﬁts well with the WL model and leads to good performance for speech signals. Finally, the RLS-DCD algorithm was implemented in the context of the WL model for SAEC. Simulation results indicate that this algorithm represents a reliable choice for practical SAEC applications, since it achieves a fast convergence rate but also has good numerical features.

0 without distortion positive and negative half–wave rectifiers new distortion

−5

MSE (dB)

−10 −15 −20 −25 −30 0

10

20 30 40 Time (seconds)

50

60 Acknowledgments

Fig. 5. MSE of the RLS-DCD algorithm (using N u ¼ 8 and Mb ¼ 16) for different types of distortion with ar ¼ 0:3. The forgetting factor is l ¼ 11=ð14LÞ, with L ¼512. The source signal is speech.

1 0.9

The work of the ﬁrst author was supported under the Grant POSDRU/107/1.5/S/76903. This work was also supported under the Grant UEFISCDI PN-II-RU-TE no. 7/5.08.2010 and the Grant UEFISCDI PN-II-ID-PCE2011-3-0097. References

0.8

Coherence

0.7 0.6 0.5 0.4 0.3 0.2 without distortion positive and negative half−wave rectifiers new distortion

0.1 0 0

0.5

1

1.5 2 2.5 Frequency (kHz)

3

3.5

4

Fig. 6. Magnitude squared coherence function for different types of distortion with ar ¼ 0:3. The source signal is a speech sequence.

channels compared to the positive and negative half-wave rectiﬁers. This difference is visible especially at higher frequencies. Several informal subjective tests were conducted in order to evaluate the speech quality in the case of the new distortion. The results indicate that the audio quality and the stereo effects are well preserved. 6. Conclusions In this paper, we have recasted the SAEC problem as a single-input/single-output system with complex random

[1] J. Benesty, T. Gaensler, D.R. Morgan, M.M. Sondhi, S.L. Gay, Advances in Network and Acoustic Echo Cancellation, SpringerVerlag, Berlin, Germany, 2001. ¨ [2] J. Benesty, C. Paleologu, T. Gansler, S. Ciochin˘a, A Perspective on Stereophonic Acoustic Echo Cancellation, Springer-Verlag, Berlin, Germany, 2011. [3] M.M. Sondhi, D.R. Morgan, J.L. Hall, Stereophonic acoustic echo cancellation—an overview of the fundamental problem, IEEE Signal Processing Letters 2 (8) (1995) 148–151. [4] J. Benesty, D.R. Morgan, M.M. Sondhi, A better understanding and an improved solution to the speciﬁc problems of stereophonic acoustic echo cancellation, IEEE Transactions on Speech and Audio Processing 6 (3) (1998) 156–165. [5] B. Picinbono, P. Chevalier, Widely linear estimation with complex data, IEEE Transactions on Signal Processing 43 (8) (1995) 2030–2033. ¨ [6] C. Stanciu, J. Benesty, C. Paleologu, T. Gansler, S. Ciochin˘a, A novel perspective on stereophonic acoustic echo cancellation, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2012, pp. 25–28. [7] Y.V. Zakharov, G.P. White, J. Liu, Low-complexity RLS algorithms using dichotomous coordinate descent iterations, IEEE Transactions on Signal Processing 56 (7) (2008) 3150–3161. [8] J. Liu, Y.V. Zakharov, B. Weaver, Architecture and FPGA design of dichotomous coordinate descent algorithms, IEEE Transactions on Circuits and Systems I: Regular Papers 56 (11) (2009) 2425–2438. [9] D.P. Mandic, S. Still, S.C. Douglas, Duality between widely linear and dual channel adaptive ﬁltering, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2009, pp. 1729–1732. [10] S. Emura, Y. Haneda, A. Kataoka, S. Makino, Stereo echo cancellation algorithm using adaptive update on the basis of enhanced inputsignal vector, Signal Processing 86 (2006) 1157–1167. [11] B.C.J. Moore, An Introduction to the Psychology of Hearing, Academic Press, London, UK, 1989. [12] Y.V. Zakharov, T.C. Tozer, Multiplication-free iterative algorithm for LS problem, IEE Electronics Letters 40 (4) (2004) 567–569.

Recommend Documents

Widely linear general Kalman filter for stereophonic ... - CiteSeerX

A Combined FDAF/WSAF Algorithm for Stereophonic Acoustic Echo ...

Stereophonic Acoustic Echo Suppression ... - Semantic Scholar

a novel perspective on stereophonic acoustic echo ... - Semantic Scholar

NON-LINEAR ACOUSTIC ECHO CANCELLATION USING ... - Eurecom