Speaker-Aware Monaural Speech Separation.

INTERSPEECH(2020)

引用 3|浏览23
暂无评分
摘要
Predicting and applying Time-Frequency (T-F) masks on mixture signals have been successfully utilized for speech separation. However, existing studies have not well utilized the identity context of a speaker for the inference of masks. In this paper, we propose a novel speaker-aware monaural speech separation model. We firstly devise an encoder to disentangle speaker identity information with the supervision from the auxiliary speaker verification task. Then, we develop a spectrogram masking network to predict speaker masks, which would be applied to the mixture signal for the reconstruction of source signals. Experimental results on two WSJ0 mixed datasets demonstrate that our proposed model outperforms existing models in different separation scenarios.
更多
查看译文
关键词
Speech separation, disentangled representations, speaker identity, Time-Frequency mask
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要