Comparing the performance of individual articulatory flesh points for articulation-to-speech synthesis

Beiming Cao, Brian Y. Tsang,Jun Wang

semanticscholar(2019)

引用 0|浏览0
暂无评分
摘要
Articulation-to-speech (ATS) synthesis has recently shown the potential for silent speech interfaces (SSIs). SSIs are devices for assisting the oral communication for individuals who have lost their voice by mapping their articulatory movement to audible speech. Electromagnetic Articulograph (EMA) is one of the current articulator motion tracking technologies in SSI, which captures the movement of flesh points on articulators. Understanding how well different individual flesh points contribute to ATS performance may help optimize the SSI setup. To our knowledge, this study is the first to explore the individual flesh point’s contribution to ATS, where we compared ATS performance using EMA data of different flesh points combinations with a deep neural network (DNN)-based ATS model. Experimental results indicated that more flesh points lead to higher performance generally. However, our perceptionbased evaluation may suggest the unnecessity of more than one tongue (tip) flesh point for ATS.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要