The 2006 TC-STAR Evaluation of the IBM Text-to-Speech Synthesis System

msra(2006)

引用 25|浏览60
暂无评分
摘要
In this paper we present the evaluation of the IBM TTS system completed within the 2006 TC-STAR evaluation. Speech corpora from three speakers and two languages (two Castilian Spanish speakers and one UK English speaker) are used to develop concatenative TTS systems, and subjective measures are gathered to assess the performance on three major tasks: evaluation of the prosody generation component of the TTS system, full-system evaluation using clean-text inputs, and full-system evaluation using the outputs of ASR and SLT systems. We report very encouraging results for what is the first evaluation using the datasets provided. In general, the Spanish systems outperform the English system, with the most optimistic evaluation figure finding the performance of the synthetic male Spanish voice to be very close to the performance attained by natural speech. The evaluation also supports the feasibility of using TTS to convey word-level understanding of recognized and translated input, with the best system, based on recognized English input translated to Spanish, achieving a 3.0% human-transciption word-error-rate on synthesized speech.
更多
查看译文
关键词
word error rate
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要