Visual Sound Source Separation with Partial Supervision Learning.

Huasen Wang, Lingling Gao,Qianchao Tan,Luping Ji

ICIP(2022)

引用 0|浏览5
暂无评分
摘要
Recent deep learning approaches have achieved impressive performance in visually-guided sound source separation tasks. However, due to the lack of real-world mixed/separated audio sample pairs, most methods seriously rely on the "Mix-and-Separate" manner to learn sound source separation, often unsuitable for real-world mixtures. To address this issue, we utilize a semi-supervised learning technique - preserving audio-visual consistency - to improve the separation performance of real-world scenarios. In this way, our network is trained jointly by artificial and real-world mixtures. To the best of our knowledge, this could be the first attempt to improve real-world generalization. We also design a category-guided audiovisual fusion module to learn audio-visual matching. Comparative experiments are performed on two publicly-available datasets, MUSIC and AudioSet. Experiment results demonstrate that our method could often outperform other state-of-the-art ones in visual sound separation.
更多
查看译文
关键词
Visual Sound Separation,Partial Supervision Learning,Real-life Scenarios,Audio-visual Matching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要