Self-Motion As Supervision For Egocentric Audiovisual Localization

Calvin Murdock,Ishwarya Ananthabhotla, Hao Lu,Vamsi Krishna Ithapu

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2024）

引用 0|浏览0

暂无评分

摘要

Sound source localization is a key requirement for many assistive applications of augmented reality, such as speech enhancement. In conversational settings, potential sources of interest may be approximated by active speaker detection. However, localizing speakers in crowded, noisy environments is challenging, particularly without extensive ground truth annotations. Still, people are often able to communicate effectively in these scenarios through orienting behavioral responses, such as head motion and eye gaze, which have been shown to correlate with directions of auditory sources. In the absence of ground truth annotations, we propose joint training of egocentric audiovisual localization with behavioral pseudolabels to relate audiovisual stimuli with directional information extracted from future behavior. We evaluate this method as a technique for unsupervised egocentric active speaker localization and compare pseudolabels derived from head and gaze directions against fully-supervised alternatives.

查看译文

关键词

active speaker localization,conversational understanding,audiovisual learning,egocentric learning,eye tracking

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要