Noise Adaptive Training for Subspace Gaussian Mixture Models

Report 1 Downloads 119 Views
Noise Adaptive Training for Subspace Gaussian Mixture Models Liang Lu, Arnab Ghoshal, Steve Renals University of Edinburgh

Liang Lu,

Interspeech, August, 2013.

C S T R

I

Introduction I

I

Noise adaptive training I I I

I

Liang Lu,

Subspace GMM (SGMM) acoustic model Motivation Adaptive training method Experimental results

Conclusion

Interspeech, August, 2013.

C S T R

Subspace Gaussian Mixture Models [Povey, et al. 2010]

Mi

wi

j−1

j

j+1

vjk

Σi i = 1, . . . , I

I

Globally shared I I I I

I

State-dependent I I

Liang Lu,

Mi is the projection matrix for means wi is the projection vector for weights Σ i is the covariance matrix i is the subspace component index vjk is low dimensional sub-state vectors (e.g. 40dim) Gaussian mean: µ jki = Mi vjk

Interspeech, August, 2013.

C S T R

Subspace Gaussian Mixture Models



 m11 m12  .. ..   . .  mi1 mi2 

.. .



m11 m12  .. ..   . .  mi1 mi2 Liang Lu,

·

Interspeech, August, 2013.



v1 v2





= �

µ1 µ2



µ1 µ2



.. .

C S T R

Subspace Gaussian Mixture Models



 m11 m12  .. ..   . .  mi1 mi2 

.. .



m11 m12  .. ..   . .  mi1 mi2 Liang Lu,

Interspeech, August, 2013.

·



v1 v2





= �

µ1 µ2



µ1 µ2



.. .

C S T R

Subspace Gaussian Mixture Models



 m11 m12  .. ..   . .  mi1 mi2 

.. .



m11 m12  .. ..   . .  mi1 mi2 Liang Lu,

Interspeech, August, 2013.

·



v1 v2



v2 =





µ1 µ2



µ1 µ2



.. .

C S T R

Subspace Gaussian Mixture Models   m11 m12  .. ..   . .  mi1 mi2 � �



.. .



m11 m12  .. ..   . .  mi1 mi2

Liang Lu,

Interspeech, August, 2013.

·

v1 v2

v2 =





µ1 µ2



µ1 µ2



.. .

C S T R

Subspace Gaussian Mixture Models

µ = Mv Factorisation

µ = Mv + Ns Speaker subspace

Liang Lu,

Interspeech, August, 2013.

C S T R

Subspace Gaussian Mixture Models

I

Typical features I I I I

I

Liang Lu,

Re-structure the HMM-GMM model paramaters Smaller number of free model parameters Large number of Gaussian components Factorize the phonetic and speaker variability

Outperforms GMM-based systems on several tasks, e.g. [D. Povey 2010, L. Burget 2010, L. Lu 2011]

Interspeech, August, 2013.

C S T R

Noise robustness I

Aurora 4 dataset

I

GMM with 50K components

I

SGMM with 6.4M components 60

Clean training speech

x

50

Clean acoustic model WER

40

Noisy test speech

30

20

y

10

0

Liang Lu,

Interspeech, August, 2013.

GMMïclean

SGMMïclean

GMMïnoisy

SGMMïnoisy

C S T R

Noise robustness

I

SGMM with Joint uncertainty decoding (JUD [H. Liao, 2005]) T

VTS

T

JUD

Liang Lu,

Interspeech, August, 2013.

C S T R

Noise adaptation of clean speech model

I

Liang Lu,

Adaptation with noise dependent transform for a specific noise condition

Interspeech, August, 2013.

C S T R

Noise adaptation of clean speech model

I

Aurora 4 dataset

I

A, B, C and D denote different noise conditions. Methods Clean model +JUD

Liang Lu,

Interspeech, August, 2013.

A 5.2 5.1

B 58.2 13.1

C 50.7 12.0

D 72.1 23.2

Avg 59.9 16.8

C S T R

Noise adaptation of multi-condition model

I

Liang Lu,

If the training data is from the same types of noise condition

Interspeech, August, 2013.

C S T R

Noise adaptation of multi-condition model

I

We obtain better baseline system

I

However, we obtain worse results with adaptation Methods Clean model +JUD MST model +JUD

Liang Lu,

Interspeech, August, 2013.

A 5.2 5.1 6.8 7.4

B 58.2 13.1 15.2 13.3

C 50.7 12.0 18.6 14.7

D 72.1 23.2 32.3 24.1

Avg 59.9 16.8 22.2 17.6

C S T R

Noise adaptive training scheme

I

I

I

Iterative update of acoustic models M and noise transforms T Optimization of Q(T ; Tˆ ) in [Lu, et al, 2013] ˆ in this Optimization of Q(M; M) paper

{M, T }

Q(T ; Tˆ )

{M, Tˆ }

ˆ Q(M; M)

ˆ Tˆ } {M,

Lu, et al. ”Joint Uncertainty Decoding for Noise-robust Subspace Gaussian Mixture Models”, IEEE TASLP 2013.

Liang Lu,

Interspeech, August, 2013.

C S T R

Noise adaptive training - optimization ˆ Optimization of Q(M; M) I

Gradient-based approach: for θ in M [Liao, et al 2007, Kalinli et al, 2010] "   # 2 Q(·) −1 ∂Q(·) ∂ θ = θ˜ − ζ (1) ∂2θ ∂θ ˜ θ=θ

I

EM-based approach, e.g. noisy-CMLLR [Kim, et al 2011] (r )

yt = H(r ) xt + g(r ) + et −→ P(xt |yt , r ) I

Liang Lu,

(2)

We used the EM-based approach for simplicity

Interspeech, August, 2013.

C S T R

Experiments - noise adaptive training

I

With adaptive training, we obtained better results compared to the clean acoustic model Methods Clean model +JUD MST model +JUD NAT model +JUD

Liang Lu,

Interspeech, August, 2013.

A 5.2 5.1 6.8 7.4 6.5 6.1

B 58.2 13.1 15.2 13.3 20.3 11.3

C 50.7 12.0 18.6 14.7 19.8 11.9

D 72.1 23.2 32.3 24.1 39.7 22.4

Avg 59.9 16.8 22.2 17.6 27.6 15.7

C S T R

Experiments - noise adaptive training I

Effect of phase factor in the extended mismatch function y = f {x, n, h, α } [Deng, et al, 2004] 21 JUDïSGMM with Clean model

Word Error Rate (\%)

20

19

18

17

16

15

Liang Lu,

0

Interspeech, August, 2013.

0.5

1.0 1.5 The value of phase factor

2.0

2.5

C S T R

Experiments - noise adaptive training

21 JUDïSGMM with Clean model JUDïSGMM with MST model JUDïSGMM with NAT model

Word Error Rate (\%)

20

19

18

17

16

15

Liang Lu,

0

Interspeech, August, 2013.

0.5

1.0 1.5 The value of phase factor

2.0

2.5

C S T R

Summary

I

Overview of subspace Gaussian mixture models

I

Joint uncertainty decoding for noise robustness

I

Adaptive training for multi-condition training data

I

Experimental results demonstrate the effectiveness of this approach

I

To integrate the noise robustness technique with more advanced system

Liang Lu,

Interspeech, August, 2013.

C S T R

Thanks for your attention!

Liang Lu,

Interspeech, August, 2013.

C S T R