Noise Adaptive Training for Subspace Gaussian Mixture Models Liang Lu, Arnab Ghoshal, Steve Renals University of Edinburgh
Liang Lu,
Interspeech, August, 2013.
C S T R
I
Introduction I
I
Noise adaptive training I I I
I
Liang Lu,
Subspace GMM (SGMM) acoustic model Motivation Adaptive training method Experimental results
Conclusion
Interspeech, August, 2013.
C S T R
Subspace Gaussian Mixture Models [Povey, et al. 2010]
Mi
wi
j−1
j
j+1
vjk
Σi i = 1, . . . , I
I
Globally shared I I I I
I
State-dependent I I
Liang Lu,
Mi is the projection matrix for means wi is the projection vector for weights Σ i is the covariance matrix i is the subspace component index vjk is low dimensional sub-state vectors (e.g. 40dim) Gaussian mean: µ jki = Mi vjk
Interspeech, August, 2013.
C S T R
Subspace Gaussian Mixture Models
m11 m12 .. .. . . mi1 mi2
.. .
m11 m12 .. .. . . mi1 mi2 Liang Lu,
·
Interspeech, August, 2013.
�
v1 v2
�
�
= �
µ1 µ2
�
µ1 µ2
�
.. .
C S T R
Subspace Gaussian Mixture Models
m11 m12 .. .. . . mi1 mi2
.. .
m11 m12 .. .. . . mi1 mi2 Liang Lu,
Interspeech, August, 2013.
·
�
v1 v2
�
�
= �
µ1 µ2
�
µ1 µ2
�
.. .
C S T R
Subspace Gaussian Mixture Models
m11 m12 .. .. . . mi1 mi2
.. .
m11 m12 .. .. . . mi1 mi2 Liang Lu,
Interspeech, August, 2013.
·
�
v1 v2
�
v2 =
�
�
µ1 µ2
�
µ1 µ2
�
.. .
C S T R
Subspace Gaussian Mixture Models m11 m12 .. .. . . mi1 mi2 � �
.. .
m11 m12 .. .. . . mi1 mi2
Liang Lu,
Interspeech, August, 2013.
·
v1 v2
v2 =
�
�
µ1 µ2
�
µ1 µ2
�
.. .
C S T R
Subspace Gaussian Mixture Models
µ = Mv Factorisation
µ = Mv + Ns Speaker subspace
Liang Lu,
Interspeech, August, 2013.
C S T R
Subspace Gaussian Mixture Models
I
Typical features I I I I
I
Liang Lu,
Re-structure the HMM-GMM model paramaters Smaller number of free model parameters Large number of Gaussian components Factorize the phonetic and speaker variability
Outperforms GMM-based systems on several tasks, e.g. [D. Povey 2010, L. Burget 2010, L. Lu 2011]
Interspeech, August, 2013.
C S T R
Noise robustness I
Aurora 4 dataset
I
GMM with 50K components
I
SGMM with 6.4M components 60
Clean training speech
x
50
Clean acoustic model WER
40
Noisy test speech
30
20
y
10
0
Liang Lu,
Interspeech, August, 2013.
GMMïclean
SGMMïclean
GMMïnoisy
SGMMïnoisy
C S T R
Noise robustness
I
SGMM with Joint uncertainty decoding (JUD [H. Liao, 2005]) T
VTS
T
JUD
Liang Lu,
Interspeech, August, 2013.
C S T R
Noise adaptation of clean speech model
I
Liang Lu,
Adaptation with noise dependent transform for a specific noise condition
Interspeech, August, 2013.
C S T R
Noise adaptation of clean speech model
I
Aurora 4 dataset
I
A, B, C and D denote different noise conditions. Methods Clean model +JUD
Liang Lu,
Interspeech, August, 2013.
A 5.2 5.1
B 58.2 13.1
C 50.7 12.0
D 72.1 23.2
Avg 59.9 16.8
C S T R
Noise adaptation of multi-condition model
I
Liang Lu,
If the training data is from the same types of noise condition
Interspeech, August, 2013.
C S T R
Noise adaptation of multi-condition model
I
We obtain better baseline system
I
However, we obtain worse results with adaptation Methods Clean model +JUD MST model +JUD
Liang Lu,
Interspeech, August, 2013.
A 5.2 5.1 6.8 7.4
B 58.2 13.1 15.2 13.3
C 50.7 12.0 18.6 14.7
D 72.1 23.2 32.3 24.1
Avg 59.9 16.8 22.2 17.6
C S T R
Noise adaptive training scheme
I
I
I
Iterative update of acoustic models M and noise transforms T Optimization of Q(T ; Tˆ ) in [Lu, et al, 2013] ˆ in this Optimization of Q(M; M) paper
{M, T }
Q(T ; Tˆ )
{M, Tˆ }
ˆ Q(M; M)
ˆ Tˆ } {M,
Lu, et al. ”Joint Uncertainty Decoding for Noise-robust Subspace Gaussian Mixture Models”, IEEE TASLP 2013.
Liang Lu,
Interspeech, August, 2013.
C S T R
Noise adaptive training - optimization ˆ Optimization of Q(M; M) I
Gradient-based approach: for θ in M [Liao, et al 2007, Kalinli et al, 2010] " # 2 Q(·) −1 ∂Q(·) ∂ θ = θ˜ − ζ (1) ∂2θ ∂θ ˜ θ=θ
I
EM-based approach, e.g. noisy-CMLLR [Kim, et al 2011] (r )
yt = H(r ) xt + g(r ) + et −→ P(xt |yt , r ) I
Liang Lu,
(2)
We used the EM-based approach for simplicity
Interspeech, August, 2013.
C S T R
Experiments - noise adaptive training
I
With adaptive training, we obtained better results compared to the clean acoustic model Methods Clean model +JUD MST model +JUD NAT model +JUD
Liang Lu,
Interspeech, August, 2013.
A 5.2 5.1 6.8 7.4 6.5 6.1
B 58.2 13.1 15.2 13.3 20.3 11.3
C 50.7 12.0 18.6 14.7 19.8 11.9
D 72.1 23.2 32.3 24.1 39.7 22.4
Avg 59.9 16.8 22.2 17.6 27.6 15.7
C S T R
Experiments - noise adaptive training I
Effect of phase factor in the extended mismatch function y = f {x, n, h, α } [Deng, et al, 2004] 21 JUDïSGMM with Clean model
Word Error Rate (\%)
20
19
18
17
16
15
Liang Lu,
0
Interspeech, August, 2013.
0.5
1.0 1.5 The value of phase factor
2.0
2.5
C S T R
Experiments - noise adaptive training
21 JUDïSGMM with Clean model JUDïSGMM with MST model JUDïSGMM with NAT model
Word Error Rate (\%)
20
19
18
17
16
15
Liang Lu,
0
Interspeech, August, 2013.
0.5
1.0 1.5 The value of phase factor
2.0
2.5
C S T R
Summary
I
Overview of subspace Gaussian mixture models
I
Joint uncertainty decoding for noise robustness
I
Adaptive training for multi-condition training data
I
Experimental results demonstrate the effectiveness of this approach
I
To integrate the noise robustness technique with more advanced system
Liang Lu,
Interspeech, August, 2013.
C S T R
Thanks for your attention!
Liang Lu,
Interspeech, August, 2013.
C S T R