Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification

Miao Zhao, Yanxing Ma, Yuanyuan Ding,Yu Zheng,Min Liu, Min Xu

arXiv (Cornell University)(2021)

引用 0|浏览0
暂无评分
摘要
This paper describes the multi-query multi-head attention (MQMHA) pooling and inter-topK penalty methods which were first proposed in our submitted system description for VoxCeleb speaker recognition challenge (VoxSRC) 2021. Most multi-head attention pooling mechanisms either attend to the whole feature through multiple heads or attend to several split parts of the whole feature. Our proposed MQMHA combines both these two mechanisms and gain more diversified information. The margin-based softmax loss functions are commonly adopted to obtain discriminative speaker representations. To further enhance the inter-class discriminability, we propose a method that adds an extra inter-topK penalty on some confused speakers. By adopting both the MQMHA and inter-topK penalty, we achieved state-of-the-art performance in all of the public VoxCeleb test sets.
更多
查看译文
关键词
speaker,multi-query,multi-head,inter-topk
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要