VoxCeleb: Large-scale Speaker Verification in the Wild

Computer Speech & Language(2020)

引用 417|浏览91
暂无评分
摘要
•We introduce the VoxCeleb dataset, the largest audio-visual dataset for speaker recognition containing over a million real world utterances from over 6000 speakers.•We develop a completely scalable, computer vision based pipeline to automatically create this dataset from open-source media.•We demonstrate that deep ResNet architectures trained on large datasets with NetVlad as an aggregation strategy achieve state of the art performance.
更多
查看译文
关键词
Speaker identification,Speaker verification,Deep learning,Convolutional neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要