On Training Speech Separation Models With Various Numbers of Speakers.

Hyeonseung Kim,Jong Won Shin

IEEE Signal Process. Lett.(2023)

引用 0|浏览1
暂无评分
摘要
Many monaural speech separation models assume that the exact number of speakers is known in advance, which is not applicable to many real-world scenarios. To deal with an unknown number of speakers, previous approaches either iteratively separate one speech at a time, or employ a more relaxed assumption that the maximum number of speakers is known a priori and set the number of outputs accordingly. When the number of speakers in the mixture is smaller than the number of outputs in the latter case, the extra outputs that are not mapped onto signals in the input mixture are trained to produce predefined target signals such as the silence or the input mixture. In this letter, we propose to ignore the extra outputs in training instead of evaluating the cost with a certain target for separation models with a fixed number of output channels. We also introduce a method to select valid output signals. Experimental results showed that assigning any type of predefined targets degraded separation performance compared with ignoring the extra outputs.
更多
查看译文
关键词
Speaker counting,speech separation,unknown number of speakers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要