The Tencent speech synthesis system for Blizzard Challenge 2020

Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020(2020)

引用 0|浏览1
暂无评分
摘要
This paper presents the Tencent speech synthesis system for Blizzard Challenge 2019. The corpus released to the participants this year is a about 8 hours of speech data from an internet talk show by a well-known Chinese character. We built a end to end speech synthesis system for this task. Firstly, a multispeaker Tacotron-like acoustic model fed on nonalignment linguistic feature and sentence embedding by Bert were employed for mel spectrograms modeling. Then the model was re-trained only on the corpus offered. At last, a modified multi-speaker WaveNet model conditioned on the predicted mel features was trained to generate 16-bit speech waveforms at 24 kHz, instead of the conventional vocoder. For achieving higher quality, channel embedding was incorporated in WaveNet. The evaluation results shows that the system we submitted performs good in various criteria which indicated the superiority of our system.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要