Single sensor audiovisual speech source separation

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)(2017)

引用 0|浏览6
暂无评分
摘要
The Kernel Additive Modeling (KAM) is a recent promising framework for the separation of underdetermined convolutive mixture of audio signal. The principle of this method is to estimate the short term Power Spectral Densities (PSD) of the sources directly from the mixture by taking advantage of redundant features in the PSD of the source, such as periodicity or smoothness. The separation itself is then performed with a generalized Wiener filter. This preliminary study aims to evaluate the improvement of using the video of the speaker's face to directly detect such redundancies in the speech that could be used in the KAM framework to perform the extraction of the speech signal.
更多
查看译文
关键词
Multimodality,Convolutive Informed Source Separation,Audiovisual,Wiener Filtering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要