Multi-Microphone Neural Speech Separation For Far-Field Multi-Talker Speech Recognition

Takuya Yoshioka,Hakan Erdogan,Zhuo Chen,Fil Alleva

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)（2018）

引用 123|浏览88

暂无评分

摘要

This paper describes a neural network approach to far-field speech separation using multiple microphones. Our proposed approach is speaker-independent and can learn to implicitly figure out the number of speakers constituting an input speech mixture. This is realized by utilizing the permutation invariant training (PIT) framework, which was recently proposed for single-microphone speech separation. In this paper, PIT is extended to effectively leverage multi-microphone input. It is also combined with beamforming for better recognition accuracy. The effectiveness of the proposed approach is investigated by multi-talker speech recognition experiments that use a large quantity of training data and encompass a range of mixing conditions. Our multi-microphone speech separation system significantly outperforms the single-microphone PIT. Several aspects of the proposed approach are experimentally investigated.

查看译文

关键词

Speech separation, multi-talker speech recognition, far-field audio, cocktail party problem, neural networks, acoustic beamforming

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要