Large Scale Self-Supervised Pretraining for Active Speaker Detection

Otavio Braga, Wei Xia, Keith Johnson, Alice Chuang, Yunfan Ye,Olivier Siohan, Tuan Anh Nguyen

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览0
暂无评分
摘要
In this work we investigate the impact of a large-scale self-supervised pretraining strategy for active speaker detection (ASD) on an unlabeled dataset consisting of over 125k hours of YouTube videos. When compared to a baseline trained from scratch on much smaller in-domain labeled datasets we show that with pretraining we not only have a more stable supervised training due to better audio-visual features used for initialization, but also improve the ASD mean average precision by 23% on a challenging dataset collected with Google Nest Hub Max devices capturing real user interactions.
更多
查看译文
关键词
Active Speaker Detection,Self Supervised Learning,ASD,SSL
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要