SEQUENTIAL MONTE CARLO SIMULATION OF ... - CiteSeerX

Report 2 Downloads 187 Views
SEQUENTIAL MONTE CARLO SIMULATION OF DYNAMICAL MODELS WITH SLOWLY VARYING PARAMETERS: APPLICATION TO AUDIO William Fong and Simon Godsill Signal Processing Group, University of Cambridge, Cambridge, CB2 1PZ, U.K. [email protected] [email protected] ABSTRACT In this paper, we propose a slow time-varying partial correlation (STV-PARCOR) model for dynamical models with slowly varying parameters. Based on it, we develop an online joint parameter and signal estimation algorithm under the sequential Monte Carlo framework. It is believed that the proposed model and algorithm will improve on the standard Monte Carlo filter as it is known that the standard filter becomes highly degenerate for models with slowly varying parameters. The suggested algorithm is tested with real speech data and the results are compared with those generated using existing approaches. 1. INTRODUCTION Many real world data analysis problems involve sequential estimation of filtering distribution p(xt jy1:t ), where xt is the unobserved state of the system at time t and y1:t , fy1; : : : ; yt g are observations made over some time interval t 2 f1; : : : ; T g. In most cases, the data structures can be very complex, typically involving elements of non-Gaussianity, high-dimensionality and non-linearity, which may not be solvable analytically. Sequential Monte Carlo methods, also known as Particle Filters (PF), have been proposed to overcome these problems. Refer to [1] for an up-to-date survey of the field. Within the particle filter framework, the filtering distribution is approximated with an empirical distribution formed from point masses, or particles,

p(xt jy1:t ) 

N X i=1

wt(i) Æ (xt x(ti) );

N X i=1

wt(i) = 1; wt(i)  0 (i)

where Æ (:) is the Dirac delta function and wt is a weight (i) attached to particle xt . In recent years, various approaches have been developed to apply the sequential Monte Carlo filtering strategies for the purpose of audio signal enhancement (see, for example, [2] and citations therein). These approaches assume a Gaussian random walk model for the system parameter

evolution at every time step, which may not give a sufficiently slow or smooth variation with time. This makes the standard PF inefficient as it is known that the filter becomes highly degenerate for random walks with very low variance. To solve this problem, we propose a slow timevarying partial correlation (STV-PARCOR) model, which considers the system coefficients to evolve stochastically on a block-to-block basis and all coefficients in-between are found by deterministic-interpolation. In speech, for example, this reflects our belief that the shape of the human vocal tract is slowly time-varying owing to physical constraints. Based on the STV-PARCOR model and under the particle filtering framework, we develop methods for on-line joint estimation of system parameters (PARCOR coefficients and model order) and the underlying clean audio signal, given the noise corrupted observation. The suggested algorithm is then tested against standard approaches. 2. STATE-SPACE REPRESENTATION AND AUDIO MODEL In this section, we describe the model adopted in this paper – the STV-PARCOR model. A lengthy time series is divided into R non-overlapping blocks. If is the block size, we define ~x +1 , fx +1 ; : : : ; x + g as a group of unobserved states of the system and ~y +1 , fy +1; : : : ; y + g as observations made over some blocks  2 f0; : : : ; R 1g. Assuming a Markovian structure for the model, the problem can then be formulated in a state-space form as follows,

~x +1 ~y +1

 

f (~x +1 j~x ) State evolution density g (~y +1 j~x +1 ) Observation density

(1) where f (:j:) and g (:j:) are pre-specified state evolution and observation densities. It should be noted that the state-space model adopted here is different from the standard one [4], which relates the unobserved states fxt g and observations fyt g made over a time interval t 2 f1; : : : ; T g. (1), however, defines the state evolution between different blocks. The full specification of the state-space model adopted

is as follows. At any time t, the state vector xt is partitioned as [zt ; t ℄0 with zt and t being the signal state and the parameter state respectively. The signal is assumed to be submerged in white Gaussian noise (WGN) with known variance v2 , i.e. g (yt jxt ) = N (yt ; ut ; v2 ). The parameter vector t is further partitioned as [t ; et ; pt ℄0 where  = [ ;1 ; : : : ;  ;p ℄0 is the partial correlation (PARCOR) coefficient,et is the log-excitation variance and pt is the time-varying model order. Refer to [2] for a detailed description of the audio model adopted here. In the setup of the particle filter, we have chosen the proposal distribution [1, 2] as the parameter state evolution density, which can be factorised as follows,

f (~ j~

1)

f ( j( 1) )f (p jp(  f (e je( 1) )

=



1) )

1)

)=

N (e  (

1)

; 2 e )

(3)

For the block evolution of the model order, pt , a first order Markov process with discrete probabilities is assumed. In order to ensure a slow and smooth evolution, it is assumed that jp p( 1) j  1. Hence,

f (p jp(

1) ) =

+1 X

k=

Pk Æ p

(p( 1) + k )



(4)

1

In our proposed model, it is assumed that both et and pt are fixed within a block, i.e. fet ; pt g = fe ; p g for t 2 f( 1) + 1; : : : ;  g. Finally, for the block variation of the PARCOR coefficients, a constrained random walk model is assumed [2]

f ( j(

(

N (  (

1) ) =

1)



Z

=

1)

(t

(

p(~z j~1: ; ~y1: )p(~1: j~y1: )d~1:

1) )

(5)

N X

p(~1: j~y1: ) 

i=1

w(i) Æ (~1:

(i) ~1: )

(8)

w(i)

/

 Y t=(

1) +1

(i) p(yt j1: t ; y1:t

1)

(9)

The joint filtering distribution (7) can be approximated by

p(~z ; ~ j~y1: ) 

Z

p(~z j~1: ; ~y1: )

 

N X i=1

N X i=1

(i) w(i) Æ (~1: 

~1: )d~1:

(i) y1: )w(i) Æ (~(i) p(~z j~1: ;~

(6)

The advantage of such approach is that it allows the use smoother interpolators, such as the Legendre polynomials, Fourier basis functions and B-splines, which will ensure a slow and smooth evolution of the reflection coefficients [3].

1

~ )

(i)

Now, given 1: , signal realisations can be drawn from (i) ~z(i)  p(~z j~1: y1: ) ;~

We now prove the correctness of the particulate approximation for the marginal parameter filtering distribution (8). Proof: Consider the marginal parameter filtering dis1: j~y1: ), which can be rearranged as follows, tribution p(~

p(~1: j~y1: ) / p(~y j~1: ; ~y1:

f (~ j~

1)

p(~1:

1)

1

Assuming that a particle approximation to p(~ 1: has already been generated,

p(~1:

1

j~y

1:

1)



N X i=1

(i) Æ (~1: 

1

j~y

1:

(10)

In future work, we will consider the use of

t = h(( k) ; : : : ; ( +k) )

1

As mentioned earlier, we have chosen the proposal distribution as the prior importance function (2), so the importance weight will simply take the form,

otherwise

(

(7)

Assume there exists a particulate approximation for the marginal parameter filtering distribution,

i

0

+

p(~z ; ~ j~y1: ) = p(~z j~ ; ~y1: )p(~ j~y1: )

2 if maxfj ;i jg < 1 1) ;  I )

A constrained random walk model of this form will ensure approximate stability provided that the PARCOR coefficients vary sufficiently slowly. Having sampled the last PARCOR coefficient of the block  ,  , all the intermediate values, ft ; ( 1) < t <  g, are found by a deterministic function t = h(( 1) ;  ). For example, we implement a linear interpolator in our simulation:

t = (

Based on the STV-PARCOR model, we describe in this section a new way for joint estimation of the signal and parameter state under the sequential Monte Carlo filter framework. We want to draw random samples from the filtering distribution p(~z ; ~  j~y1: ) which can be factorised as follows,

(2)

Each term of (2) will be considered separately. First of all, a Gaussian random walk model is assumed for the block variation of the log-excitation variance,

f (e je(

3. SEQUENTIAL AUDIO SIGNAL AND PARAMETER ESTIMATION

~1:

1

j~y

1)

1:

1)

1)



(i)

and for each parameter trajectory ~1: 1 , we generate a random sample from ~ (i)  f (~ j~(i) 1 ) as described in section 2. Hence, (10) can be approximated by

p(~1: j~y1: )  p(~y j~1: ; ~y1:

 =

N X i=1 N X

~ ~ 1 )f ( j

(i) p(~y j~1: y1: ;~

 Y

i=1 t=( 1) +1 N X w(i) Æ (~1: = i=1 Q t=( 1) +1 p(yt

1)

N X i=1

(i) Æ (~1: 

(i) Æ (~1: 

~1:

1

(i) p(~z j~1: y1: ) = ;~

(i) p(yt j ; y1:t 1 )Æ (~1: 

~1: )

(i)

t=(

1) +1

(i) p(zt j1: t ; y1: )

(i)

and  , all intermediate reflection co1)

< t <  g are found by the

(i)  ((i)

(i) ( 1)

0

(i)

= [ ; 0℄

1)

i

t

0 if p(i) < p(i) 

1

(

1)

(

 1)

.



For i = 1; : : : ; N , evaluate the importance weight w(i) using (9) and the prediction error decomposition. We then resample (see [1] for details) the parameter t(i) ; i = 1; : : : ; N g N times with replacement set f~ (i) according to w .



For the resampled parameter set f~ t ; i = 1; : : : ; N g, (i) (i) compute the sufficient statistics ftj ; Ptj ; t = ( 1) + 1; : : : ;  g using the Kalman smoother. Sig(i) nal realisations fzt ; i = 1; : : : ; N; t = ( 1) + 1; : : : ;  g can then be generated using (12).

(i) ~1: )

 Y



(i)

with w / j1:t ; y1:t 1 ), as required. For instance, if we assume a conditional Gaussian statespace model, then all the computations can be done under the framework of the Kalman filter and smoother [2], (see [4] for a general discussion on the Kalman filter and smoother). For example, for t 2 f( 1) + 1; : : : ;  g, (i) p(yt j1: t ; y1:t 1 ) can be found by the prediction error decomposition. The marginal signal filtering distribution is given by: (i)

h

where 

(i) 1:t

1) (i)

efficients ft ; ( linear-interpolator

(ti) =

1)

~1: )

1)

(i)

Given (

(i)

(i)

In the sampling of the signal state vector zt , there is only one free variable owing to the overlapping with the previous (i) signal state zt 1 . Hence, we can ensure continuity by taking (i) (i) into account z( 1) in the sampling of z( 1) +1 .

(11) 5. EXPERIMENTAL RESULTS

where (i) (i) (i) p(zt j1: t ; y1: ) = N (zt ; tj ; Ptj ) (i)

(12)

(i)

with tj and Ptj are sufficient statistics, which can be found efficiently using the Kalman filter and smoother. 4. IMPLEMENTATION Given i = 1; : : : ; N particles of the signal sufficient statis(i) (i) (i) tics ( j and P j ) and parameter realisation ( 1) , random samples can be drawn from the filtering distribution p(~z ; ~ j~y1: ) as follows:



(i)

For i = 1; : : : ; N , generate random sample from p

f (p jp

 f (e je  i for each p , sample  from p( j  (i) ) ( 1) (i)

where

((i)

1)

8 " > > > > > > > > < 2 =

> > > > > > > > :

6 6 6 4

(i)

(i)

and e

(

(i)

((i)

0



i) ; p( ),

#

.. .

p(i) ;( 

. Then,

( ) ( 1)

(i)

1)

(i) 1;(

(i)

1) )

3 1)

1)

7 7 7 5

(i)

if p > p(

otherwise



1)

A section of clean speech data as shown in Figure 1 is used to verify the our suggestion that the STV-PARCOR model is a better model for slow time-varying processes then the TV-PARCOR model. Simulation results are compared with those generated using the Rao-Blackwellised PF with TVPARCOR model (as shown in [2]) and the extended Kalman smoother (eKS) [4]. In our simulations, we have chosen a value of N = 500, the block size is fixed to 250. The hyperparameters ( 2 , 2 e and fPk ; k = 1; 0; +1g) adopted here are assumed to be known and fixed. For the TV-PARCOR PF, a Gaussian random walk is assumed on the PARCOR coefficients and the model order at every time step. The hyperparameters adopted are adjusted so that the system parameters will cover the same range as the STV-PARCOR model in time steps. For the eKS, a Gaussian random walk is assumed directly on the AR coefficients and the model order is fixed to 10. This model is employed as the TV-PARCOR model and time-varying model order is not straight forward to implement with the extended Kalman smoother. Figure 2 and Figure 3 show the 3D histogram plots of the first reflection coefficients (1;t ) and the model order (pt ) for the word “service” using STV-PARCOR PF. We then compare the performance of the suggested algorithm

with the TV-PARCOR PF and eKS at different input SNRs. The simulation results are summarised below: SNRin STV-PARCOR PF TV-PARCOR PF eKS 0 dB 5.8 dB 1.9 dB 3.9 dB 5 dB 9.0 dB 4.3 dB 7.8 dB 10 dB 12.8 dB 9.2 dB 12.3 dB Audio outputs can be found at http://www-sigproc.eng. cam.ac.uk/wnwf2. Considering the simulation result for the TV-PARCOR, it verifies the fact that the PF in its original form becomes highly degenerate for model with slowly varying parameters. The proposed STV-PARCOR PF, however, performs well in the suggested scenario and significant improvements in SNR and sound quality are observed. Comparing the output SNRs of the STV-PARCOR PF and the eKS, it is found that our proposed STV-PARCOR PF performs significantly better than the eKS at low SNR, while both algorithms perform equally well at high SNR. Comparing the sound quality, the eKS leaves considerably higher audible artifacts and sibilant sounds than the STVPARCOR PF and the artifacts are so loud that we regard the eKS as fail at low SNR.

1

0.8

0.6

0.4

0.2

0

−0.2

−0.4

−0.6

−0.8

−1

gd

service

0

should

0.5

be

1

rewarded

1.5

by

2

big

tips

2.5

3

3.5

4 4

x 10

Figure 1: Speech signal showing the words “Good service should be rewarded by big tips”

8000

6. CONCLUSION We propose the STV-PARCOR model for audio signal and develop an on-line algorithm for joint signal and parameter estimation. The algorithm is tested on real speech signals and compared with other standard approaches. Encouraging results are obtained. We suggest an extension to our coefficient model (6), which will ensure smoother evolution of the coefficients. In addition, we will develop methods for incorporating the STV-PARCOR model under the Monte Carlo smoothing [1, 2] framework.

6000 4000 8500 8000 7500 7000 6500 6000 5500 5000 4500 4000 3500 t 3000

2000 0 −0.2 −0.3 −0.4 −0.5 −0.6 −0.7 −0.8

ρ

t

Figure 2: 3D histogram plot of 1;t for the word “service” using the STV-PARCOR particle filter for v2 = 0 and N =

1000

7. REFERENCES [1] A. Doucet, N. de Freitas, and N. J. Gordon, editors. Sequential Monte Carlo Methods in Practice. New York: Springer-Verlag, 2001.

ce vi 3000

[2] W. Fong, S. J. Godsill, A. Doucet, and M. West. Monte Carlo smoothing with application to audio signal enhancement. IEEE Trans. on Signal Processing, Special Issue. To appear.

2000 er 1000 0

s

20 18 16 14 8000

12

[3] Y. Grenier. Time-dependent ARMA modeling of nonstationary signals. IEEE Transactions on Acoustics, Speech and Signal Processing, 31(4):899–911, 1983.

7000

10 8

6000 6

5000 4

pt

4000

2 0

[4] A. C. Harvey. Forecasting, structural time series models and the Kalman filter. Cambridge University Press, 1989.

3000

t

Figure 3: 3D histogram plot of pt for the word “service” using the STV-PARCOR particle filter for v2 = 0 and N =

1000