Audiovisual-To-Articulatory Speech Inversion Using Active Appearance Models For The Face And Hidden Markov Models For The Dynamics

ICASSP(2008)

引用 12|浏览24
暂无评分
摘要
We are interested in recovering aspects of vocal tract's geometry and dynamics from auditory and visual speech cues. We approach the problem in a statistical framework based on Hidden Markov Models and demonstrate effective estimation of the trajectories followed by certain points of interest in the speech production system. Alternative fusion schemes are investigated to account for asynchrony between the modalities and allow independent modeling of the dynamics of the involved streams. Visual cues are extracted from the speaker's face by means of Active Appearance Modeling. We report experiments on the QSMT database which contains audio, video, and electromagnetic articulography data recorded in parallel. The results show that exploiting both audio and visual modalities in a multistream HMM based scheme clearly improves performance relative to either audio or visual-only estimation.
更多
查看译文
关键词
speech inversion,Hidden Markov Models,audiovisual,articulatory,fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要