Barlow Twins self-supervised learning for robust speaker recognition

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 1|浏览18
暂无评分
摘要
Acoustic noise is a big challenge for speaker recognition systems. The state-of-the-art speaker recognition systems are based on deep neural network speaker embeddings called x-vector extractor. A noise-robust x-vector extractor is highly demanded in speaker recognition systems. In this paper, we introduce Barlow Twins self-supervised loss function in the area of speaker recognition. Barlow Twins objective function tries to optimize two criteria: Firstly, it increases the similarity between two versions of the same signal (i.e. the clean and its augmented noisy version) to make the speaker embedding invariant to the acoustic noise. Secondly, it reduces the redundancy between dimensions of the x-vectors that improves the overall quality of speaker embeddings. In our research, Barlow Twins objective function is integrated with the ResNet-based speaker embedding system. In the proposed system, the Barlow Twins objective function is calculated in the embedding layer and it is optimized jointly with the speaker classifier loss function. The experimental results on Fabiole corpus show 22 % relative gain in terms of EER in the clean environments and 18% improvement in the presence of noise with low SNR and reverberation.
更多
查看译文
关键词
Speaker recognition, ResNet, Barlow Twins, Robustness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要