Expressive multilingual speech synthesizer

2023 31st Telecommunications Forum (TELFOR)(2023)

引用 0|浏览1
暂无评分
摘要
The research presented in the paper addresses the challenge of multilingual text-to-speech with the specific aim to be able to generate speech in combination of speaker and language which is absent from the training dataset. This scenario is commonly referred to as cross-lingual speech synthesis. The innovative models introduced in the paper accomplish cross-lingual speech synthesis by applying neural network embeddings, not only to speaker and speaking style identifiers but also to context-dependent phonemes and various prosodic events. This approach enables the model to effectively capture relationships between phonemes and prosodic events across different languages, thus enabling the synthesis of speech in the voice of an individual who has never spoken the target language. Subjective and objective evaluation of various aspects of synthesized speech have confirmed that the proposed models are able to synthesize high-quality speech and to maintain the properties of the voice of the original speaker in a cross-lingual scenario.
更多
查看译文
关键词
cross-lingual,embedding,neural networks,prosody,speech synthesis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要