Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 0|浏览2
暂无评分
摘要
We introduce blind language separation (BLS) as novel research task, in which we seek to disentangle overlapping voices of multiple languages by language. BLS is expected to separate seen as well as unseen languages, which is different from the target language extraction task that works for one seen target language at a time. To develop a BLS model, we simulate a multilingual cocktail party database, of which each scene consists of two randomly selected languages, each represented by two randomly selected speakers. The database follows the recently proposed GlobalPhoneMCP database design concept that uses the audio data of the GlobalPhone 2000 Speaker Package. We show that a BLS model is able to learn the language characteristics so as to disentangle overlapping voices by language. We achieve a mean SI-SDR improvement of 12.63 dB over 231 test sets. The performance on the individual test sets varies depending on the language combination. Finally, we show that BLS can generalize well to unseen speakers and languages in the mixture.
更多
查看译文
关键词
Blind language separation, selective auditory, attention, multilingual, GlobalPhone, cocktail party problem
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要