Formant-Based Frequency Warping For Improving Speaker Adaptation In Hmm Tts

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2(2010)

引用 6|浏览28
暂无评分
摘要
Vocal Tract Length Normalization (VLTN), usually implemented as a frequency warping procedure (e.g. bilinear transformation), has been used successfully to adapt the spectral characteristics to a target speaker in speech recognition. In this study we exploit the same concept of frequency warping but concentrate explicitly on mapping the first four formant frequencies of 5 long vowels from source and target speakers. A universal warping function is thus constructed for improving MLLR-based speaker adaptation performance in TTS. The function first warps the frequency scale of the source speaker's speech data toward that of the target speaker and an HMM of the warped features is trained. Finally, MLLR-based speaker adaptation is applied to the trained HMM for synthesizing the target speaker's speech. When tested on a database of 4,000 sentences (source speaker) and 100 sentences of a male and a female speaker (target speakers), the formant based frequency warping has been found very effective in reducing the objective, log spectral distortion over the system without formant frequency warping. The improvement is also subjectively confirmed in AB preference and ABX speaker similarity listening tests.
更多
查看译文
关键词
speech synthesis,speaker adaptation,frequency warping
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要