Cross validation and Minimum Generation Error for improved model clustering in HMM-based TTS
ISCSLP(2012)
摘要
In HMM-based speech synthesis, context-dependent hidden Markov model (HMM) is widely used for its capability to synthesize highly intelligible and fairly smooth speech. However, to train HMMs of all possible contexts well is difficult, or even impossible, due to the intrinsic, insufficient training data coverage problem. As a result, thus trained models may over fit and their capability in predicting any unseen context in test is highly restricted. Recently cross-validation (CV) has been explored and applied to the decision tree-based clustering with the Maximum-Likelihood (ML) criterion and showed improved robustness in TTS synthesis. In this paper we generalize CV to decision tree clustering but with a different, Minimum Generation Error (MGE), criterion. Experimental results show that the generalization to MGE results in better TTS synthesis performance than that of the baseline systems.
更多查看译文
关键词
minimum generation error,pattern clustering,hmm-based synthesis,maximum likelihood estimation,speech synthesis,mge,cross validation,context-dependent hidden markov model,decision tree-based clustering,hmm-based tts,hidden markov models,decision trees,context clustering,maximum-likelihood criterion,hmm-based speech synthesis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络