Universal Grapheme-Based Speech Synthesis

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 24|浏览51
暂无评分
摘要
Grapheme-to-phoneme conversion follows the text processing step in speech synthesis. Typically, lexicons or Letter-to-Sound rules are used to map graphemes to phonemes. However, in some languages, such resources may not be readily available. In this paper, we describe a universal front end that supports using grapheme information alone to build usable speech synthesis systems. This work takes advantage of an explicit mapping of Unicode characters from a wide range of scripts to a single phoneset to create support for building speech synthesizers for most languages in the world. We compare the efficacy of this front end to the baseline approach of treating every single grapheme as a separate phoneme for synthesis by building voices for twelve languages across several language families and to front ends with linguistic knowledge in languages with higher resources. In addition, we improve our models by using Random Forests as opposed to using single Classification and Regression Trees. We find that the common universal front end performs better than the raw graphemes in general. We also find that using Random Forests lead to a significant improvement in synthesis quality, which is better than the quality of the knowledge based front end in many cases.
更多
查看译文
关键词
speech synthesis, lexicons, pronunciation, low resources
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要