Furcanext: End-To-End Monaural Speech Separation With Dynamic Gated Dilated Temporal Convolutional Networks

MULTIMEDIA MODELING (MMM 2020), PT I(2020)

引用 105|浏览551
暂无评分
摘要
Deep dilated temporal convolutional networks (TCN) have been proved to be very effective in sequence modeling. In this paper we propose several improvements of TCN for end-to-end approach to monaural speech separation, which consists of (1) multi-scale dynamic weighted gated TCN with a pyramidal structure (FurcaPy), (2) gated TCN with intra-parallel convolutional components (FurcaPa), (3) weight-shared multi-scale gated TCN (FurcaSh) and (4) dilated TCN with gated subtractive-convolutional component (FurcaSu). All these networks take the mixed utterance of two speakers and maps it to two separated utterances, where each utterance contains only one speaker's voice. For the objective, we propose to train the networks by directly optimizing utterance-level signal-to-distortion ratio (SDR) in a permutation invariant training (PIT) style. Our experiments on the public WSJ0-2mix data corpus result in 18.4 dB SDR improvement, which shows our proposed networks can lead to performance improvement on the speaker separation task.
更多
查看译文
关键词
Speech separation, Cocktail party problem, Sequence modeling, Temporal convolutional networks, Permutation invariant training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要