A Bayesian Approach to Hidden Semi-Markov Model Based Speech Synthesis Kei Hashimoto 1. Introduction
Yoshihiko Nankaku
Keiichi Tokuda (Nagoya Institute of Technology)
3. Bayesian approach to HSMM based speech synthesis
Bayesian approach to hidden Markov model (HMM) based speech synthesis [Hashimoto; '09]
・ Reliable predictive distributions can be estimated by treating model parameters as random variables ・ Appropriate model structures can be selected by maximizing the marginal likelihood ・ Outperform HMM-based speech synthesis based on the ML criterion
Problems: Inconsistency between training and synthesis
Although the speech is synthesized from HMMs with explicit state duration probability distributions, HMMs are trained without them.
A Bayesian approach to hidden semi-Markov model (HSMM) based speech synthesis
HMM
Conditions
HSMM : State transition probability from i-th to j-th state : Output probability of from i-th state : Duration probability of from i-th state
Likelihood function
Overview of HMM-based speech synthesis
HMM
HSMM
Parameter generation from HMM
Excitation parameters
Excitation generation
ML apporach Training
: State seq. for synthesis data : Likelihood of synthesis data : Prior distribution of model parameters
: State seq. for training data : Likelihood of training data
Analysis win.
25 ms Hamming window / 5 ms shift
Topology
Model
Model training
Model selection
Number of pdfs (duration pdf)
ML-HMM
HMM
ML
MDL
87,267 (1,375)
ML-HSMM
HSMM
ML
MDL
88,287 (1,415)
Bayes-HMM
HMM
Bayes
Bayes
745,969 (15,025)
Bayes-HSMM
HSMM
Bayes
Bayes
744,955 (17,450)
3.9 3.8
3.630
3.7 3.6 3.355
3.5 3.4 3.3
3.180
3.225
3.2
Approximate distribution of the true posterior distribution
: Calculation of expectation
3.0
ML-HMM
ML-HSMM Bayes-HMM Bayes-HSMM
・ Speech quality was improved by using HSMMs ・ Bayes-HSMM outperformed ML-HSMM
Subjective evaluation for comparing model structure ・ Model structures of ML-HSMM and Bayes-HSMM are swapped for comparing the effect of model structure
SYNTHESIZED SPEECH
Bayesian apporach
3.4
・ Optimization can be effectively performed by iterative updates as the EM algorithm
Generalized forward-backward algorithm
: Synthesis data seq. : Label seq. for synthesis
95% confidence intervals
3.1
Spectral parameters
Synthesis filter
24 mel-cepstrum coef. + Δ + ΔΔ F0 + Δ + ΔΔ 5-state left-to-right MSD-HMM and MSD-HSMM with single Gaussian state output pdfs
Subjective evaluation (20 sentences x 10 subjects)
・ Estimation of posterior distribution based on maximizing
Synthesis : Training data seq. : Label seq. for training
16 kHz
Training part
TEXT Text analysis
Sampling rate
Predictive distribution of Bayesian approach
Training of HMM
Synthesis part Label
53 utterances
Compared models
・ Estimation of approximated posterior distributions ・ Define a lower bound of log marginal likelihood using Jensen's inequality
Spectral parameters
context-dependent HMMs & duration models
Test data
5-point opinion score
Label
450 utterances
: State seq.
Variational Bayesian method
Spectral parameter extraction
Excitation parameters
Training data
・ Predictive distribution is used for model training and speech parameter generation ・ To solve the expectation calculation is difficult ⇒ Variational Bayesian method
Speech signal
Excitation parameter extraction
ATR Japanese speech database b-set
: Model parameters
of HSMM can be computed efficiently by the generalized forward-backward algorithm ・ ・ Partial forward likelihood and partial backward likelihood can be computed recursively ・ Bayesian approach requires almost the same computational cost with the ML criterion
3.225
3.2 3.1
3.040 2.995
2.990
ML-MDL
Bayes-MDL
3.0 2.9
・ All processes are derived from one single predictive distribution ・ Model parameters are used as random variables ・ Bayesian approach indicates the better performance than ML criterion
Problems: Inconsistency between training and synthesis
95% confidence intervals
3.3 5-point opinion score
SPEECH DATABASE
Database
Feature vec.
・ Overcome inconsistency between training and synthesis ・ Estimate reliable predictive distributions
2. HMM-based speech synthesis
6. Experiments
2.8 : Partial forward likelihood
: Partial backward likelihood
ML-Bayes
Bayes-Bayes
・ Bayesian approach overcame the over-fitting problem Acknowledgement: the EMIME project http://www.emime.org