Hush-Hush Speak: Speech Reconstruction Using Silent Videos

INTERSPEECH(2019)

引用 12|浏览35
暂无评分
摘要
Speech Reconstruction is the task of recreation of speech using silent videos as input. In the literature, it is also referred to as lipreading. In this paper, we design an encoder-decoder architecture which takes silent videos as input and outputs an audio spectrogram of the reconstructed speech. The model, despite being a speaker-independent model, achieves comparable results on speech reconstruction to the current state-of-the-art speaker-dependent model. We also perform user studies to infer speech intelligibility. Additionally, we test the usability of the trained model using bilingual speech.
更多
查看译文
关键词
speech reconstruction, human-computer interaction, speech recognition, multi-view
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要