Adaptive-SpEx: Local and Global Perceptual Modeling with Speaker Adaptation for Target Speaker Extraction.

Xianbo Xu,Diqun Yan,Li Dong

2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)(2023)

引用 0|浏览0
暂无评分
摘要
Target speaker extraction aims to extract a target speaker's speech from a multi-talker environment with the help of the target speaker's reference speech. However, the simple fusion of different features and local perceptual modeling lead to limited extraction performance. In this work, we propose a new speaker extraction model called Adaptive-SpEx. The correlation between mixed speech features and speaker embedding is fully exploited, and a dual-path structure is used for local and global perceptual modeling. We evaluate the model on the WSJ0-2mix-extr dataset in terms of its ability to reconstruct signal quality. Experimental results show that the proposed model outperforms other baseline systems on WSJ0-2mix-extr and achieves better generalizability on the Libri-2talker dataset. Furthermore, the proposed model can significantly reduce the word error rate of mixed speech on speech recognition from 79.49% to 32.73%.
更多
查看译文
关键词
Target speaker extraction,dual-path structure,speaker embedding,speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要