Implementing Prosodic Phrasing In Chinese End-To-End Speech Synthesis

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2019)

引用 37|浏览42
暂无评分
摘要
Text-to-Speech (TTS) systems have been evolving rapidly in recent years. With the great modelling power of deep neural networks, researchers have achieved end-to-end conversion from raw text to speech. It has been shown by various research projects that end-to-end TTS systems are able to generate speech that sounds akin to human voice for English and other languages. However, for languages like Chinese, there are two problems to deal with. Firstly, due to the large character set, a small input set comparable to the English character set is needed for the end-to-end solution. Secondly, there are serious prosodic phrasing mistakes when the end-to-end method is applied to Chinese. In this paper, we will propose a solution for an end-to-end Chinese TTS system on the basis of Tacotron 2 and Wavenet vocoder. We will then add extra contextual information to improve the performance of prosodic phrasing. Our experiments have demonstrated the effectiveness of this proposal.
更多
查看译文
关键词
Chinese speech synthesis, Tacotron 2, Wavenet vocoder, end-to-end TTS, prosodic phrasing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要