Channel Invariant Speaker Embedding Learning With Joint Multi-Task And Adversarial Training

Zhengyang Chen,Shuai Wang,Yanmin Qian,Kai Yu

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING（2020）

引用 17|浏览109

暂无评分

摘要

Using deep neural network to extract speaker embedding has significantly improved the speaker verification task. However, such embeddings are still vulnerable to channel variability. Previous works have used adversarial training to suppress channel information to extract channel-invariant embedding and achieved a significant improvement. Inspired by the successful joint multi-task and adversarial training with phonetic information for phonetic-invariant speaker embedding learning, in this paper, a similar methodology is developed to suppress the channel variability. By treating the recording devices or environments as the channel variability, two individual experiments are carried out, and consistent performance improvement is observed in both cases. The best performance is obtained by sequentially applying multi-task training at the statistics pooling layer and adversarial training at the embedding layer, achieving 10.77% and 9.37% relative improvements in terms of EER compared to the baselines, for the recording environments or devices level, respectively.

查看译文

关键词

channel information, adversarial training, multi-task learning, text-dependent speaker verification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要