State-of-the-art Speaker Recognition with Neural Network Embeddings in NIST SRE18 and Speakers In The Wild Evaluations

Computer Speech & Language(2020)

引用 129|浏览198
暂无评分
摘要
•Neural network embeddings become the new state-of-the-art in speaker recognition evaluations, improving i-vector performance by 2 in some conditions.•Comparing network architectures for x-vectors, factorized TDNN performed the best in a moderately large setup. However, E-TDNN can be also competitive with a larger training setup.•Comparing pooling methods, learnable dictionary encoder performed the best indicating that we can take advantage of multi-modal frame-level hidden representations.•Angular-margin based training objectives performed better in-domain conditions but not in domain mismatched conditions.•Calibration in a new domain can be achieved by MAP adaptation of out-of-domain score distribution to the new domain using a very limited number of in-domain recordings.
更多
查看译文
关键词
Speaker recognition,Embeddings,X-Vectors,NIST SRE18,SITW,Domain adaptation,Evaluations,Calibration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要