Channel Invariant Speaker Embedding Learning With Joint Multi-Task And Adversarial Training

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING(2020)

引用 17|浏览109
暂无评分
摘要
Using deep neural network to extract speaker embedding has significantly improved the speaker verification task. However, such embeddings are still vulnerable to channel variability. Previous works have used adversarial training to suppress channel information to extract channel-invariant embedding and achieved a significant improvement. Inspired by the successful joint multi-task and adversarial training with phonetic information for phonetic-invariant speaker embedding learning, in this paper, a similar methodology is developed to suppress the channel variability. By treating the recording devices or environments as the channel variability, two individual experiments are carried out, and consistent performance improvement is observed in both cases. The best performance is obtained by sequentially applying multi-task training at the statistics pooling layer and adversarial training at the embedding layer, achieving 10.77% and 9.37% relative improvements in terms of EER compared to the baselines, for the recording environments or devices level, respectively.
更多
查看译文
关键词
channel information, adversarial training, multi-task learning, text-dependent speaker verification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要