Exploiting Visual Context Semantics for Sound Source Localization

WACV(2023)

引用 0|浏览77
暂无评分
摘要
Self-supervised sound source localization in unconstrained visual scenes is an important task of audio-visual learning. In this paper, we propose a visual reasoning module to explicitly exploit the rich visual context semantics, which alleviates the issue of insufficient utilization of visual information in previous works. The learning objectives are carefully designed to provide stronger supervision signals for the extracted visual semantics while enhancing the audio-visual interactions, which lead to more robust feature representations. Extensive experimental results demonstrate that our approach significantly boosts the localization performances on various datasets, even without initializations pretrained on ImageNet. Moreover, with the visual context exploitation, our framework can accomplish both the audio-visual and purely visual inference, which expands the application scope of the sound source localization task and further raises the competitiveness of our approach.
更多
查看译文
关键词
visual context semantics,localization,sound
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要