Q 1 Q 1 - CUHK EE

Report 1 Downloads 64 Views
ROBUST INTERFERENCE SUPPRESSION AND BLIND SPEECH BEAMFORMING IN ROOM REVERBERANT ENVIRONMENTS Wing-Kin Ma and P.C. Ching Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong. E-mail: {wkma,pcching}@ee.cuhk.edu.hk ABSTRACT In a microphone array system where references of interference are additionally available, the unwanted signals in the received signals can be removed using Widrow’s interference-cancelling (IC) approach. However, in the presence of crosstalk, IC can result in severe cancellation of the desired speech signal. In this paper, we propose a crosstalk-resistant method for joint interference suppression and blind speech signal beamforming. The proposed method is based on the Capon blind estimation principle, and is implemented using a powerful approximation tool, namely semidefinite relaxation. Simulation results show that the proposed method yields improved mean squared error performance compared with the IC-based method. 1. INTRODUCTION

desired speaker

reference array

primary array

unwanted sources

Figure 1: Reference-assisted microphone array system. In applications such as hands-free communication and teleconferencing, microphone array processing shows great potential in noise and interference mitigation compared to the single microphone based methods. A challenging task in microphone array processing is to beamform the desired speech source blindly under the adverse conditions such as room reverberation (i.e., multipath propagation) and interference from other acoustic sources. In the scenario where reverberation and uncorrelated noise are present, blind beamforming can be achieved using various methods, such as the principal eigenvector method [1] and the nonstationary second order statistics method [2]. When the interfering sources are also present, the blind beamforming problem becomes much more difficult to deal with. In an effort to well separate the desired signal and the interfering signals, various blind signal separation methods have been studied (see [3] and references therein). In this paper, we consider a system employing an additional microphone array, the advantages of which lie in its substantially simplified blind beamforming problem and improved interference rejection capability. Fig. 1 depicts such a system operating in a room environment. A set of microphones, called the primary array, is located in vicinity of the desired speaker source. Another This work was partially supported by a research grant awarded by the Hong Kong Research Grant Council.

microphone array, called the reference, is placed in proximity to the interfering sources for obtaining references of the interfering signals. Using Widrow’s reference-assisted interference cancelling (IC) approach [4], we can make use of the reference to remove the interference in the primary array. By doing so, the blind beamforming problem is reduced to the simpler single source problem, for which effective blind beamforming algorithms (such as [1, 2]) are available. Though conceptually simple, this IC-based blind beamforming method is susceptible to crosstalk; i.e., the desired signal is able to reach the reference array. This commonly encountered crosstalk problem can result in desired signal cancellation in the IC process, leading to performance loss. This paper focusses on developing a crosstalk-resistant blind beamforming method for the reference-assisted microphone array system. Different from the IC method, our proposed method uses the Capon minimum variance beamforming method [5, 6] to provide joint interference suppression and speech enhancement. The proposed method contains a blind channel estimator, which identifies the multipath channel responses of the desired source by exploiting the “near-far” property that the desired signal in the reference array is weak compared to that in the primary array. It will be illustrated that this robust blind channel estimator is the solution of a nonconvex constrained optimization problem, which may not be easily obtained. To overcome this difficulty, we will employ an accurate approximation method, called semidefinite relaxation (SDR) [7, 8], to suboptimally implement the robust blind channel estimator. Worthwhile noticing is that this SDR method shows good approximation accuracy in certain engineering applications [7, 10]. It will be shown by simulations that the proposed robust beamformer exhibits better mean squared error (MSE) performance than the IC-based beamforming method. 2. BACKGROUND In this section, the signal model for the aforementioned microphone array system will be presented. Then, the conceptually straightforward IC blind beamforming method will be described. Let P1 and P2 be the number of microphones of the primary and reference arrays, respectively. Likewise define y1 n  P1 and y2 n  P2 to be the vector sampled signal of the primary and reference arrays, respectively. The signal yi n can be modeled as Q 1

yi n 

∑ gi n  q s q

q0



  

K Q1

∑ ∑ hik n  q uk q

k1 q0





 

vi n

(1)

where sn is the desired speech signal, gi n is the room impulse response of the desired source to the primary/reference array, uk n is the kth interfering source signal, hik n is the room impulse response of the kth interfering source to the primary/reference ar-

ray, vi n is additive white noise, K is the number of interfering sources, and Q is the length of the room impulse responses. It is assumed that g1 n has stronger magnitude than g2 n. In the reverberant (or convolutive) signal scenario, it is common to perform blind beamforming in the short-time Fourier transform (STFT) domain (see [1–3,5] for the details). To formulate the received signal 1  jωn demodel in the STFT domain, let X ω; l   ∑N n0 xn  lN e fine the STFT of a signal xn where N represents the frame length, and define xω; l  to be the element-wise STFT of a vector signal xn. Given a sufficiently large N, the element-wise STFT of yi n can be well approximated by the following expression [1, 2, 5] yi ω; l   gi ωSω; l  

K

∑ hik

k1

ωUk ω; l   vi ω; l 



(2)

where gi ω and hik ω are the element-wise Fourier transforms of gi n and hik n, respectively. 

y1 ω; l 

yIC ω; l 

Z ω; l  single-source blind beamformer

ΓH MMSE ω

Figure 2: Interference-cancelling blind beamformer. An STFT-domain IC blind beamforming method for the received signal in (2) is depicted in Fig. 2. This method consists of two stages, namely interference cancellation and singlesource blind beamforming. First, the minimum MSE (MMSE) IC method [4] is applied to y1 ω; l  with y2 ω; l  as the reference, producing an interference-cancelled output yIC ω; l   y1 ω; l   ΓH MMSE ωy2 ω; l 

Let yω; l   yT1 ω; l  yT2 ω; l T define a block received signal vector. From (2), yω; l  is expressed as yω; l   gωSω; l  

K

∑ hk

k 1

ωUk ω; l   vω; l 



(4)

where gω  gT1 ω gT2 ωT , hk ω  hT1k ω hT2k ωT , and vω; l   vT1 ω; l  vT2 ω; l T . In MV beamforming, we seek to find a weight vector fω   P , P  P1  P2 , such that the desired signal in the beamformer output Z ω; l   fH ωyω; l  is enhanced and the interference suppressed. Assume that, for a given ω, (i) Sω; l  and Uk ω; l  are wide-sense stationary sequences; and (ii) Sω; l  is uncorrelated with Uk ω; l  and vω; l . The MV weight vector is determined by [2] fMV ω  arg min EfH ωyω; l 2  st fH ωgω  Dω

(5)

where Dω is the desired output frequency response, a common choice of which is Dω  e jωτ for some delay τ. As seen in (5), the MV beamformer gives a pass response to the signal component coherent to gω (i.e., Sω; l ) and suppresses those not coherent to gω, thereby achieving joint signal enhancement and interference suppression. The solution to (5) is given by [2, 5]



y2 ω; l 

3.1. Minimum Variance Beamforming

(3)

where ΓMMSE ω  arg min Ey1 ω; l   ΓH ωy2 ω; l 2  is the MMSE IC weight matrix (the closed form solution of ΓMMSE can be found in the literature [4]). Assuming that the desired signal vector g1 ωSω; l  remains intact during the IC process (which is true if g2 ω  0), a single-source blind beamformer is then applied to yIC ω; l  to yield an estimate of Sω; l . In this work, the single-source blind beamforming algorithm employed is the principal eigenvector algorithm [1]. As previously mentioned, the major problem with this IC method is that in the presence of crosstalk, the IC process in (3) may remove the desired signal as well.

fMV ω 

D ω 1 R yy ωgω 1 gH ωR yy ωgω

(6)

where Ryy ω  Eyω; l yH ω; l . To implement the MV beamformer in (6), knowledge of Ryy ω and gω is required. The correlation matrix Ryy ω can be estimated through frame-by-frame averaging. In the next subsection, a blind estimator for gω will be developed. 3.2. Robust Capon Blind Channel Estimator In this subsection we propose a blind estimator for the desired channel response vector gω, based on the Capon principle [6]. Unlike the IC method, the proposed blind channel estimator takes into account the presence of g2 ω, and thus is more robust against the crosstalk effects. To illustrate this, let

PMV gω  

min

fH ωgωDω

EfH ωyω; l 2 

Dω2

1 gH ωR yy ωgω

(7)

3. ROBUST CAPON BLIND BEAMFORMING This section presents our proposed crosstalk-resistant blind beamformer. The proposed method is different from the previously described IC in that the former uses the minimum variance (MV) beamforming method to provide interference suppression and signal enhancement jointly. The MV beamforming framework will be described in Section 3.1. In Section 3.2, we will develop a crosstalk-resistant blind channel estimator for the MV beamformer, using the Capon blind identification principle [6] and the semidefinite relaxation approximation method [7, 8]. Like many blind estimation methods, the resulting blind beamformer is subject to scaling ambiguity. A remedy for this problem will be suggested in Section 3.3.

define the output power of the MV beamformer in (5), given a fixed gω. The idea of Capon blind estimation is to find the maximum of the output power PMV gω with respect to gω, subject to certain constraints that the true gω is expected to satisfy. To prevent the Capon blind estimator from incorrectly estimating the unwanted channel response vector hk ω, those constraints should not be satisfied by hk ω. In most practical situations, the desired speaker source is closer to the primary array compared to the reference array. Hence, it is expected that the average magnitude of g2 ω be smaller than that of g1 ω; i.e., 1 1 g2 ω2  αg g1 ω2 P2 P1

(8)

for some small constant αg  0. The near-far condition in (8) is unlikely to be satisfied by hk ω, because of the proximity of the interfering sources and reference array. By enforcing (8) for a given αg , a crosstalk-resistant Capon blind channel estimator is formulated as follows 1 gˆ ω  arg min g ωR yy ωgω

(9a)

st gω

(9b)

H

2



1

g2 ω2 

P2 αg g1 ω2 P1

(9c)

where we maximize PMV gω in (7) by minimizing its denominator, and the equality constraint in (9b) is to rule out the trivial solution gω  0. The robust Capon estimation problem in (9) is a nonconvex optimization problem, due to the indefinite quadratic inequality constraint in (9c) and the (positive definite) quadratic equality constraint in (9b). Since the globally optimal solution of (9) may not be easily obtained, we consider a suboptimal solution of (9) using the accurate semidefinite relaxation (SDR) approximation method [7–10]. To illustrate SDR for (9), we use the property 1 H 1 gH ωR yy ωgω  Tracegωg ωRyy ω to reformulate (9) in the following form min TraceGωR1 ω yy H

st Gω  gωg ω TraceGω  1 P



pP1 1

G pp ω 

(10a) gω   P

P2 αg P1

P1

∑ G pp

p1

ω



1 TraceGωR yy ω Gω 0 TraceGω  1 P P2 αg ∑ G pp ω  P1 pP1 1

∑ G pp

p1



g˘ ω  qTmax1 ω



P2 αg qmax1 ω T P1 qmax2 ω qmax2 ω

T

(12) (13)

such that equality in (9c) is enforced. 3.3. Remedy for Scaling Ambiguity Like many blind estimation methods, the robust Capon channel estimator suffers from scaling ambiguity; i.e., without estimation error, the estimate gˆ ω is a scaled version of the true gω gˆ ω  cωgω

(14)

where the scaling factor cω   is not known. To compensate this problem, we consider a simple method [2] which eliminates cω by allowing the output desired signal to be reverberated. This method is suitable for speech applications as the intelligibility of speech is resistant to the reverberation effects. Suppose that for the MV beamformer in (5), the desired output response Dω is chosen as Dω  G11 ωe jωτ

(15)

where G11 ω is the first element of g1 ω (or that of gω), and τ is a pre-specified delay. In the absence of interference and noise, the output of the MV beamformer in (5) is

(10d)

 jωτ Sω; l  Z ω; l   fH MV ωyω; l   G11 ωe

(11)

P1

gˆ SDR ω  g˘ ωg˘ ω

(10b) (10c)

where (10c) is directly from (9b), and (10d) from (9c). We notice that the constraints in (10c) and (10d) are linear (and thus convex) in Gω, unlike their original counterparts in (9b) and (9c). Similarly the objective in (10a) is linear in Gω. However, Problem (10) remains nonconvex because of the rank-1 constraint in (10b). If (10b) is replaced by a convex constraint that Gω is Hermitian positive semidefinite (PSD) (i.e., Gω 0), we obtain the following relaxed problem min st

in (9c); otherwise, we choose gˆ SDR ω to be an amplitude rescaled version of qmax ω

ω



The resulting SDR problem in (11) is known as the semidefinite programming (SDP) problem [9], which is convex and thus does not suffer from local minima. Another advantage of considering SDR is that efficient optimization algorithms [9] (based on the interior methods) are readily available for SDP. Once the SDR problem in (11) is solved, its solution is used to approximate the solution to (9). Here we consider the dominant eigenvector approximation method for this approximation ˆ ω define the solution of the SDR problem in (11). process. Let G ˆ ω, and partiDenote by qmax ω the dominant eigenvector of G tion qmax ω  qTmax1 ω qTmax2 ωT where qmaxi ω   Pi . Let gˆ SDR ω denote the SDR approximate solution of (9). We choose gˆ SDR ω  qmax ω if qmax ω satisfies the inequality constraint

(16)

the time domain signal of which is zn  g11 n sn  τ where

means convolution. Clearly, the output desired signal is a delayed and reverberated version of the original desired signal. An interesting property of this reverberated output MV beamformer lies in its weight vector. Substituting (15) into (6), we show that for any gˆ ω  cωgω, the MV weight vector can be rewritten as fMV ω 

Gˆ 11 ωe jωτ 1 ˆ ω R yy ωg 1 H ˆ ω gˆ ωR yy ωg

(17)

and that (17) is invariant to cω. Hence, (17) can be used to provide scaling ambiguity free MV beamforming for our proposed robust Capon method. We should also point out that the IC blind beamforming method in Section 2 suffers from the same scaling ambiguity problem mentioned above, and that the remedy in (17) can also be applied to the IC method. 4. SIMULATION RESULTS Two examples are used to compare the performance of the robust Capon blind beamformer and the IC blind beamformer. In the first example, we use real speech signals to test the two blind beamformers. Fig. 3 shows the layout of our simulated room environment. The respective room impulse responses were simulated using the image method [11]. We found that at a sampling frequency of 8kHz, the effective lengths of the room impulse responses are no less than 500 samples. The interfering source signals are speech. Similar to the beamformer in [2], the time-domain beamformer weight truncation and overlap-add procedures are additionally applied to prevent the cyclic convolution effects. The

NMSE  Ezn  sd n

2

E



s2d n

[9] L. Vandenberghe and S. Boyd, “Semidefinite programming,” SIAM Rev., vol. 38, pp. 49–95, 1996. [10] W.-K. Ma, T.N. Davidson, K.M. Wong, Z.-Q. Luo, and P.C. Ching, “Quasi-maximum-likelihood multiuser detection using semi-definite relaxation with applications to synchronous CDMA,” IEEE Trans. Signal Processing, vol. 50, no. 4, pp. 912–922, 2002. [11] J.B. Allen and D.A. Berkley, “Image method for efficiently simulating small-room acoustics,” JASA, vol. 65, pp. 943–950, 1979. 8

6

x (metre)

simulation parameters are as follows: the sampling frequency is 8kHz, the frame length is N  512, the number of sampled frequency points for STFT is 1024, the number of frames is 92, and τ  256. The average input signal-to-interference ratio (SIR) and signal-to-noise ratio (SNR) of the primary array are 0dB and 6dB, respectively. The outputs of the robust Capon beamformer (with αg  01) and the IC beamformer are shown in Fig. 4. Comparing the beamformer outputs and the output desired signal, we observe that the robust Capon beamformer provides better desired speech enhancement performance than the IC beamformer. In the second example, the MSE performance of the two blind beamformers is evaluated. The simulation parameters are similar to those in the previous example, except that the number of frames is 100. The desired signal and the interference signals are autoregressive Gaussian random signals. Fig. 5 plots the normalized MSEs of the beamformers, defined by

4

2 primary microphones (height= 2.8m) secondary microphones (height= 2.8m) desired speaker (height= 1.8m) interfering sources (height= 1m)

(18)

where zn is the beamformer output, and sd n  g11 n sn  τ is the output desired signal. Clearly, the robust Capon beamformer yields better performance than the IC beamformer, especially under high input SNRs. Moreover, it is observed that having αg either too large or too small degrades the robust Capon beamformer performance. This indicates that αg should be judiciously chosen. 5. CONCLUSION

0 0

room height= 3m 2

4

6

8

10

y (metre)

Figure 3: Layout of the simulated room environment. (a) output desired signal g11 n ª sn   τ 5

0

−5 0

0.5

1

1.5

2

2.5

2

2.5

2

2.5

(b) interference cancelling

In this paper, a crosstalk-resistant Capon blind beamforming method has been proposed for the reference-assisted microphone array system. An important constituent part of the proposed Capon beamformer is to solve a nonconvex constrained optimization problem. We have developed an accurate approximation of this problem using the semidefinite relaxation method. Simulation results based on the room environment have shown that the proposed robust Capon blind beamforming method yields better MSE performance than the conventional interference-cancelling blind beamforming method.

5

0

−5 0

0.5

1

1.5

(c) robust Capon (αg  01) 5

0

−5 0

0.5

1

1.5

Time (second)

6. REFERENCES [1] S. Affes and Y. Grenier, “A signal subspace tracking algorithm for microphone array processing of speech,” IEEE Trans. Speech and Audio Processing, vol. 5, no. 5, pp. 425–437, 1997.

Figure 4: Beamformed speech signals. (a) Output desired signal; (b) IC beamformer; (c) robust Capon beamformer. Average input SIR of the primary array= 0dB 5

[2] S. Gannot, D. Burshtein, and E. Weinstein, “Signal enhancement using beamforming and nonstationarity with applications to speech,” IEEE Trans. Signal Processing, vol. 49, no. 8, pp. 1614–1626, 2001.

[4] S. Haykin, Adaptive Filter Theory, Prentice Hall, 4th edition, 2002. [5] B.D. Van Veen and K.M. Buckley, “Beamforming: a versatile approach to spatial filtering,” IEEE Acoust., Speech, and Signal Processing Magazine, vol. 5, no. 2, pp. 4–24, 1988. [6] M.K. Tsatsanis and Z. Xu, “Performance analysis of minimum variance CDMA receivers,” IEEE Trans. Signal Processing, vol. 46, no. 11, pp. 3014–3022, 1998. [7] M.X. Goemans and D.P. Williamson, “Improved approximation algorithms for maximum cut and satisfiability problem using semidefinite programming,” J. ACM, vol. 42, pp. 1115–1145, 1995. [8] Y.E. Nesterov, “Semidefinite relaxation and nonconvex quadratic optimization,” Optim. Methods Software, vol. 9, pp. 140–160, 1998.

4

3

Normalized MSE (dB)

[3] L. Parra and C. Spence, “Convolutive blind separation of nonstationary sources,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 3, pp. 320–327, 2000.

interference cancelling robust Capon αg  001 αg  005 αg  02 αg  05

2

1

0

−1

−2 −10

−5

0

5

10

15

20

Average input SNR of the primary array (dB)

Figure 5: Normalized MSE performance of the beamformers.