Speech separation of a target speaker based on deep neural networks

Signal Processing(2014)

引用 59|浏览113
暂无评分
摘要
This paper proposes a novel data-driven approach based on deep neural networks (DNNs) for single-channel speech separation. DNN is adopted to directly model the highly non-linear relationship of speech features between a target speaker and the mixed signals. Both supervised and semi-supervised scenarios are investigated. In the supervised mode, both identities of the target speaker and the interfering speaker are provided. While in the semi-supervised mode, only the target speaker is given. We propose using multiple speakers to be mixed with the target speaker to train the DNN which is shown to well predict an unseen interferer in the separation stage. Experimental results demonstrate that our proposed framework achieves better separation results than a GMM-based approach in the supervised mode. More significantly, in the semi-supervised mode which is believed to be the preferred mode in real-world operations, the DNN-based approach even outperforms the GMM-based approach in the supervised mode.
更多
查看译文
关键词
gaussian processes,feature extraction,mixture models,neural nets,speaker recognition,speech processing,dnn,gmm-based approach,data-driven approach,deep neural networks,interfering speaker,mixed signals,semisupervised mode,speech features,speech separation,target speaker,semi-supervised mode,single-channel speech separation,supervised mode
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要