Experts Versus All-Rounders: Target Language Extraction for Multiple Target Languages

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)

引用 2|浏览3
暂无评分
摘要
Target language extraction (TLE) is a novel task in the field of selective auditory attention, which seeks to extract all speech signals that are spoken in a target language from other sources in a multilingual cocktail party. In our prior studies, a TLE model was trained to extract a predefined, single target language, referred to as Single-TLE. In this paper, we extend the Single-TLE framework to Multi-TLE. Multi-TLE models can also extract all speech signals of one specific target language, but they are optimized on a set of multiple target languages during training. As such, they learn the characteristics of several target languages and can replace multiple Single-TLE models without retraining. We perform experiments on the GlobalPhoneMCP database and incorporate a dynamic language mixing scheme for training. The Multi-TLE model does not only outperform Single-TLE models, but when given a language ID as additional input, it is also able to extract the speech of a specific target language from a mixture which contains multiple learned target languages.
更多
查看译文
关键词
Target language extraction,selective auditory attention,multilingual,GlobalPhone,cocktail party problem
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要