on 2d localization of reflectors using robust ... - Semantic Scholar

Report 1 Downloads 53 Views
ON 2D LOCALIZATION OF REFLECTORS USING ROBUST BEAMFORMING TECHNIQUES Edwin Mabande1 , Haohai Sun2 , Konrad Kowalczyk1 , and Walter Kellermann1 1

2

University of Erlangen-Nuremberg, Multimedia Communications and Signal Processing, Erlangen, Germany Norwegian Uni. of Sci. and Tech., Acoustics Research Center, Dept. of Elec. and Telecom., Trondheim, Norway {mabande,kowalczyk,wk}@LNT.de, [email protected]

M

ABSTRACT This paper presents a method for the localization of reflectors in an acoustic environment, using robust beamforming techniques and a cylindrical microphone array, for which an intuitive and highly efficient three-step procedure is proposed. First, the directions of arrival (DOAs) corresponding to the sound source and reflectors are estimated by a robust Minimum Variance Distortionless Response (MVDR) beamformer. Next, signals originating from the estimated DOAs are extracted by a robust superdirective beamformer, from which time differences of arrival (TDOAs) of major reflections are estimated by crosscorrelation analysis. Finally, by using additional information about the position of the direct sound source relative to the array, the positions of the reflective boundaries of the room can be inferred. Experiments based on real measurements carried out in a moderately reverberant room show the effectiveness of this method. Index Terms— Source Localization, Beamforming, TDOA Estimation, Room Inference 1. INTRODUCTION When microphone arrays are employed for sampling wavefields, signal processing of the microphone data may be used for extracting parameters characterizing an acoustic environment. Typically, this involves the measurement and processing of room impulse responses [1, 2, 3]. In this work, we propose to localize major reflectors in a room using robust beamforming techniques based solely on the recorded microphone signals. Thus, the proposed approach does not involve measuring room impulse responses and can generally be applied for any source signals which sufficiently excite all room modes. Our method consists of the three-step procedure depicted in Fig. 1, where ˆ corresponding to all sources first the directions of arrival (DOAs), φ, and reflections are determined, then the signals originating from these DOAs are extracted, and finally, the time differences of arrival (TDOAs), τˆ, are estimated from crosscorrelation analysis of the extracted direct sound and its reflections, from which the distances from the circular array to the reflectors are inferred. The knowledge of the location of early reflections is of interest, e.g., for signal enhancement methods such as dereverberation [4] and matched filtering [5]. This work was partially supported by the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 226007 SCENIC and by the QUEVIRCO project, financed by the Research Council of Norway, TANDBERG (now part of CISCO), and Statoil AS.

978-1-4577-0539-7/11/$26.00 ©2011 IEEE

153

1

DOA Estimation φˆ Signal Extraction

TDOA Estimation

τˆ

Fig. 1. Block diagram of the three-step procedure for localization of reflectors

Localization and extraction of early reflections is challenging due to relatively low energy in comparison to the direct sound and high coherence. To obtain accurate DOA estimates, robust and high resolution source localization algorithms are needed which are capable of localizing coherent sources. Extraction of the localized source signals may be achieved by using broadband beamforming techniques which offer high directivity and arbitrary null placement capabilities. Such beamformer designs typically have very small white noise gain (WNG) and are therefore highly sensitive to noise [6]. Consequently these beamformers are highly sensitive to small errors in the array characteristics and they amplify spatially white noise such as microphone self-noise. Direct control of the robustness of these beamformers is required for their application in practice. In this paper, convex-optimized beamformers are used for both localization and extraction. The advantage of using beamformers based on convex optimization is that the robustness of the beamformers can be controlled easily, e.g., [7]. Having performed the localization and extraction, statistical analysis of crosscorrelations between the extracted signals can then be performed to obtain the TDOA estimates between the direct sound and its respective reflections. Using this information in combination with the distance between the direct sound source and the array, which is assumed to be known, the locations of the reflective boundaries in the room can be found. 2. DOA ESTIMATION Robust and high resolution acoustic source localization of coherent and non-coherent sources is very important for ensuring accurate room inference. In particular, the former is crucial in localizing reflecting surfaces since the reflected signal is strongly coherent with the original source signal. For estimation of DOAs corresponding

ICASSP 2011

to the original source and reflected signals, the room is scanned using the Minimum Variance Distortionless Response (MVDR) beamformer [8] and the output power for each look-direction is plotted to form an acoustic map of the environment. The locations of the peaks of the acoustic map determine the estimated DOAs. The idea behind the MVDR is to minimize the output variance or power subject to a distortionless constraint on the response of the beamformer in the direction of interest which may result in a noisesensitive beamformer [8]. In practice, it is desirable to control the robustness of such a design. One method to increase robustness is the diagonal loading with a frequency-dependent loading factor obtained via iterative design schemes [8]. The Robust MVDR (RMVDR) beamformer may also be achieved by incorporating a WNG constraint to the conventional MVDR, as proposed in [6], resulting in the following problem formulation for the k-th frequency bin min wfH (ωk )R(ωk )wf (ωk ),

wf (ωk )

subject to |wfH (ωk )d(ωk , φl )|2 ≥ γ, wfH (ωk )wf (ωk )

wfH (ωk )d(ωk , φl ) = 1,

For determining the distances from the array to the reflecting surfaces, accurate TDOA estimates of the reflected signals are needed, which can be estimated using signals extracted from the localized directions. To this end, beamformers with high directivity are steered towards the sound sources based on the DOA information obtained by the localizer. The robust least squares frequency-invariant beamformer (RLSFIB) proposed in [7], which is based on convex optimization, was chosen for this purpose. The one-step robust data-independent broadband beamforming method is optimum in the least squares sense and simultaneously constrains the WNG to remain above a given lower limit by directly solving a constrained optimization problem which is convex. The constraints are similar to those in (1). This beamformer design also allows for placement of nulls in the directions of undesired sources. By using a data-independent beamformer, we also avoid problems typically encountered by adaptive data-dependent beamformers when the desired source signals are correlated with the interference signals, as in our case.

(1)

where wf (ωk ) denote the frequency responses of the beamforming filters, d(ωk , φl ) denotes the so-called steering vector, φl is the desired look direction, R(ωk ) is the covariance matrix of the microphone signals, (·)H denotes the Hermitian transpose, and γ is the user-defined lower bound for the WNG which enables direct control of the robustness of the beamforming design. It should be noted that unlike in [6], where the WNG is required to be equal to a specified value, here we restrict it to lie above a lower limit γ. The resulting constraint set is larger (and therefore less restrictive) as it covers the volume of a hypersphere [7] and not only its surface, which may lead to better performance due to the resulting larger set of admissible solutions. An inherent property of the MVDR beamformer is that it suffers from severe performance degradation in environments where interference sources are highly correlated with the desired source, which is prevalent in our scenario. To achieve its goal of minimum output power, the beamformer tends to cancel the portion of the desired source that is correlated with the interference signals. For such signals the reduction of the correlation between the desired signals and the interference signals can be achieved by using focusing matrices and frequency smoothing [9, 10]. The focused and frequencysmoothed covariance matrix is given by [10] K X ˜ 0) = R(ω T(ωk )R(ωk )TH (ωk ) (2) k=1

where T(ωk ) are the focusing matrices, K is the total number of frequency bins, and ω0 is a specified reference frequency. The co˜ 0 ) is used in (1) and the problem is only solved variance matrix R(ω for a single frequency ω0 . The focusing matrices are given by [10] T(ωk ) = J(ω0 )[JH (ωk )J(ωk )]−1 JH (ωk )

3. SIGNAL EXTRACTION

(3)

where entries of the matrix J(ωk ) are obtained from the angleindependent part of the array response, i.e., they only depend on the frequency and the microphone positions. This is due to the fact that an array response can be given as the product of an angleindependent part and a frequency-independent part (see [10, 11] for more details). These matrices can be calculated beforehand for a given array geometry and frequency band of interest.

154

4. TDOA ESTIMATION Once all the directional signals of interest have been extracted, the TDOA estimates are obtained by using crosscorrelation analysis. In this step, crosscorrelations between the reference (direct sound) beamformer output bφr and all other beamformer outputs bφˆi , i = 1, . . . , I, are determined. I is the total number of distinct peaks in the acoustic map (from the first step) excluding the peak representing the direct sound, i.e., DOAs of the reflections. The DOA of the reference (direct) sound source, φr , is either assumed to be known a priori or alternatively, the highest peak in the map is selected as such. Statistical analysis of crosscorrelations is used to obtain the TDOAs of the reflections. The crosscorrelations are given by

Cbφr ,bφˆ [m] = i

1 N

N −m−1 X n=0

bφr [n + m]bφˆi [n].

(4)

where N is the length of extracted signals, m is the lag index, and l is the sample index. By searching for maxima in crosscorrelation functions the TDOAs of first-order reflections can be determined as τˆi = mi,peak /fs , where fs is the sampling frequency and mi,peak is the time lag of the highest peak in the crosscorrelation function excluding the zeroth and neighboring lags, which correspond to the direct-path signal. Assuming that the distance from the direct sound source to the center of the array, dr , is known, the distance from the array center to the i-th reflecting surface can be estimated by dˆi =

d2r − dˆ2i,Total 2(dr cos(|φr − φˆi )| − dˆi,Total )

(5)

τi + dr is the distance traveled by the i-th reflection, where dˆi,Total = cˆ i.e., from the sound source to the reflector and then to the array, and c is the speed of sound. 5. EXPERIMENTAL RESULTS In order to verify the performance of the proposed procedure, experimental measurements were carried out in a room with a T60 of

2.78m

γ = 3.16. The focusing frequency of the RMVDR beamformer was set to ω0 = 4.5kHz, and scanning was performed in 1D with an angular resolution of 1◦ . For comparison an acoustic map was also computed using (1) for a single frequency ω = 4.5kHz, i.e., without focusing and frequency-smoothing (FFS). The resulting 1D acoustic maps are depicted in Fig. 4, where multiple peaks corresponding to the DOAs of the sound source and reflections are clearly visible. The four highest peaks in the FFS acoustic map, i.e., at 21◦ , 93◦ , 163◦ , and 269◦ , correspond to the DOAs of the direct sound source and three first order reflections from the boundaries of the room (see Fig. 2). The other peaks correspond to other high order reflections. The four highest peaks in the other acoustic map (without FFS) are at 92◦ , 174◦ , 268◦ , and 342◦ . Although the two peaks at 92◦ and 268◦ are good estimates of the DOAs of the direct sound and a first order reflection, 174◦ corresponds an error of about 12 degrees and 342◦ is not due to a first order reflection (this claim may be justified by considering Fig. 2), i.e., the first order reflection from 20◦ is not among the four highest peaks.

2m 5.8m

φ 20°

162° Array

270°

5.9m

Fig. 2. Experimental setup. about 400ms. Fig. 2 shows the dimensions of the room and the measurement setup. The dotted lines depict the expected paths of three specular reflections. The loudspeaker was placed at 90◦ relative to the microphone array. A circular microphone array with a radius of 0.04m consisting of ten omnidirectional microphones mounted into a rigid cylindrical baffle [11], as depicted in Fig. 3, was used.

RMVDR (WNGC) FFS Without FFS

−2

Magnitude [dB]

0.55m

Loudspeaker

−6

−10

−14

−18 0

90

180 φ [degrees]

270

360

Fig. 4. 1D acoustic map for RMVDR (WNGC) with FFS and without FFS.

Fig. 3. Cylindrical ten-element microphone array with radius 0.04m. The aim of the experiment was to localize (by estimating DOAs and TDOAs) the points of the first-order specular reflections from three sides of the room. Since a cylindrical microphone array was used, localizing ceiling and floor reflections was excluded from the analysis, as well as that from the wall behind the source. In the experiment, a white Gaussian noise signal with a duration of five seconds was played back via a loudspeaker and the microphone signals were recorded. However, note that any source signal which sufficiently excites the room could be used instead. Microphone signals were then processed offline according to the proposed procedure. An SNR of 35dB and a sampling frequency of 48kHz were used. Lower and upper cut-off frequencies of 1kHz and 6kHz, respectively, were chosen in order to ensure good spatial selectivity and avoid spatial aliasing at higher frequencies. The sound speed was assumed to be 340ms and the WNG lower limit was set to 5dB, i.e.,

155

In the second step an RLSFIB beamformer, with a 3dB beamwidth of 20 degrees and γ = 3.16, was used to extract the signals arriving from the estimated directions. Thus the main beams of four beamformers were steered to 21◦ , 93◦ , 163◦ and 269◦ , respectively. As illustrated in Fig. 4, the direction of the reference (direct) source is at 93◦ degrees (as expected, it also results in the highest peak in the acoustic map). Finally, crosscorrelations between the reference beamformer output, i.e., b93 , and beamformer outputs for the other three look directions were computed and all results are normalized to the autocorrelation of b93 . As depicted in Fig. 5, it can be clearly seen that distinct peaks are present in the crosscorrelation functions. The peak which is situated at the zeroth lag (for each crosscorrelation) is due to the direct sound present in all the beamformer outputs. This is because the direct sound has significantly more energy than the reflections and the beamformers are not able to completely attenuate the direct sound signal. In all figures, the second highest peak (the highest peak excluding the zeroth and neighboring lags) corresponds to the TDOA of the investigated reflection. Although other (smaller) peaks in crosscorrelation functions may provide additional information, we will restrict the discussion here to the dominant peaks, which correspond to the first-order reflections. Comparing the height of the strongest peaks, the largest reflection peak occurs for the crosscorrelation between the outputs of b93 and b269 and it corresponds to the reflection from the window, which

6. CONCLUSIONS

is known to be highly reflective. Rb

Rb

,b

93

21

0.1

0.5

0

0

−0.1

−0.5

−1000

0 1000 lags in samples

2000

−1 −2000

−1000

0 1000 lags in samples

R

93

93

0.1

0.1

0

0

−0.1

−0.1

0 1000 lags in samples

2000

−0.2 −2000

269

[1] M. Kuster, D. de Vries, E.M. Hulsebos, and A. Gisolf, “Acoustic imaging in enclosed spaces: Analysis of room geometry modifications on the impulse response,” Journal of the Acoustical Society of America, vol. 116 (4), pp. 21262137, 2004. −1000

0 1000 lags in samples

2000

Fig. 5. Computed crosscorrelations. To verify the usefulness of the results obtained by the proposed procedure for room boundary inference, the results were compared to the ‘ground truth’. The ‘ground truth’ is based on the manually measured dimensions of the room and the positions of the loudspeaker and cylindrical array and thus should have an error to the exact ‘ground truth’ of less than 20mm. Geometric identities were then used to obtain the distance of travel and TDOA of the reflections. In (5) φr = 93◦ , i.e., the DOA corresponding to the highest peak in the acoustic map, is used. Table 1 shows that the results obtained by the proposed algorithm are very similar to the measured ’ground truth’, which confirms the accuracy and applicability of the proposed method to localization of major reflectors in an acoustic environment. Note that although in our experiment the loudspeaker is the only active sound source present in the room, if another sound source is present simultaneously we can still distinguish it from a reflection using crosscorrelation analysis.

Ground truth TDOA [ms] τi 11.5 0 13.4 19.1

[2] S. Tervo, T. Korhonen, and T. Lokki, “Estimation of reflections from impulse responses,” Proc. of the Int. Symposium on Room Acoustics, pp. 1–7, Aug. 2010. [3] F. Antonacci, A. Sarti, , and S. Tubaro, “Geometric reconstruction of the environment from its response to multiple acoustic emissions,” Proc. IEEE ICASSP, pp. 2822–2825, March 2010. [4] Y. Peled and B. Rafaely, “Method for dereverberation and noise reduction using spherical microphone arrays,” Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 113–116, 2010. [5] A. O’Donovan, R. Duraiswami, and D. Zotkin, “Automatic matched filter recovery via the audio camera,” Proc. IEEE ICASSP, pp. 2826–2829, March 2010. [6] H. Cox, R. M. Zeskind, and M. M. Owen, “Robust adaptive beamforming,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 10, pp. 1365–1376, Oct. 1987. [7] E. Mabande, A. Schad, and W. Kellermann, “Design of robust superdirective beamformers as a convex optimization problem,” Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 77–80, 2009. [8] J. Bitzer and K. U. Simmer, “Superdirective microphone arrays,” in Microphone Arrays: Signal Processing Techniques and Applications. M.S. Brandstein and D.B. Ward, Eds. Springer-Verlag, Berlin, Germany, 2001.

Table 1. Results for inference DOA [deg] φi 20 90 162 270

In this paper, a highly efficient and robust method for localizing major reflectors in an acoustic environment has been proposed. Having determined the directions of arrival of the first-order reflections, the signals are extracted and the respective TDOAs are estimated from crosscorrelation analysis. Experiments in a real room proved that the method is capable of accurately finding the origin of a reflection caused by the room boundaries. Work is currently being carried out to infer complete boundaries of a room by measuring and processing the microphone signals from a set of different loudspeaker positions. 7. REFERENCES

b ,b

163

0.2

−1000

2000

R

b ,b

0.2

−0.2 −2000

93

1

0.2

−0.2 −2000

,b

93

[9] M. R. Azimi-Sadjadi, A. Pezeshki, L. L. Scharf, and M. Hohil, “Wideband doa estimation algorithms for multiple target detection and tracking using unattended acoustic sensors,” Proc. of the SPIE’04 Defense and Security Symposium., vol. 5417, pp. 1–11, Apr. 2004.

Distance [m] di 2.96 3.28 3.25

[10] T.D. Abhayapala and H. Bhatta, “Coherent broadband source localization by modal space processing,” Proc. IEEE 10th Int. Conf. on Telecommunications (ICT2003), vol. 2, pp. 1617– 1623, Feb. 2003.

Experimental results DOA [deg] TDOA [ms] Distance [m] τˆi φˆi dˆi 21 11.7 2.96 93 0 163 13.1 3.26 269 19.3 3.28

[11] H. Teutsch and W. Kellermann, “Acoustic source detection and localization based on wavefield decomposition using circular microphone arrays,” Journal of the Acoustical Society of America, , no. 5, pp. 2724–2736, 2006.

156