GlobalPhone Mix-To-Separate Out of 2 - A Multilingual 2000 Speakers Mixtures Database for Speech Separation.

Interspeech(2021)

引用 3|浏览14
暂无评分
摘要
Monaural speech separation has been well studied on various databases. However, these databases mostly concern English speech. Research in multi-speaker scenarios, such as speech recognition, speaker recognition, speaker diarization, and speech separation calls for speaker mixtures databases comprising multiple languages. In this paper, we propose a new extensive multilingual database for speech separation tasks derived from the GlobalPhone 2000 Speaker Package, called "GlobalPhone Mix-to-Separate out of 2" (GlobalPhoneMS2). We describe the construction of the database and conduct speech separation experiments in monolingual and multilingual as well as seen and unseen languages settings. When trained on a multilingual dataset, the networks improve their performances for unseen languages, and across almost all seen languages. We show that replacing a monolingual dataset with a trilingual one, while keeping the data size roughly the same, helps to improve the performance in most cases. We attribute this to a larger diversity in speech, language, speaker, and recording characteristics. Based on the GlobalPhoneMS2 database, speech separation results for two-speaker mixing scenarios are reported in 22 spoken languages for the first time.
更多
查看译文
关键词
GlobalPhone,speech separation,cocktail party problem,multilingual,crosslingual
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要