A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning

Zhengyang Chen,Yao Qian,Bing Han,Yanmin Qian,Michael Zeng

2022 IEEE Spoken Language Technology Workshop (SLT)（2023）

引用 2|浏览34

暂无评分

摘要

In real application scenarios, it is often challenging to obtain a large amount of labeled data for speaker representation learning due to speaker privacy concerns. Self-supervised learning with no labels has become a more and more promising way to solve it. Compared with contrastive learning, self-distilled approaches use only positive samples in the loss function and thus are more attractive. In this paper, we present a comprehensive study on self-distilled self-supervised speaker representation learning, especially on critical data augmentation. Our proposed strategy of audio perturbation augmentation has pushed the performance of the speaker representation to a new limit. The experimental results show that our model can achieve a new SoTA on Voxceleb 1 speaker verification evaluation benchmark (i.e., equal error rate (EER) 2.505%, 2.473%, and 4.791 % for trial Vox1-O, Vox1-E and Vox1-H, respectively), discarding any speaker labels in the training phase.

查看译文

关键词

Speaker representation learning,self-supervised learning,self-distillation,audio perturbation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要