Visual Sound Source Separation with Partial Supervision Learning.

Huasen Wang, Lingling Gao,Qianchao Tan,Luping Ji

ICIP（2022）

引用 0|浏览5

暂无评分

摘要

Recent deep learning approaches have achieved impressive performance in visually-guided sound source separation tasks. However, due to the lack of real-world mixed/separated audio sample pairs, most methods seriously rely on the "Mix-and-Separate" manner to learn sound source separation, often unsuitable for real-world mixtures. To address this issue, we utilize a semi-supervised learning technique - preserving audio-visual consistency - to improve the separation performance of real-world scenarios. In this way, our network is trained jointly by artificial and real-world mixtures. To the best of our knowledge, this could be the first attempt to improve real-world generalization. We also design a category-guided audiovisual fusion module to learn audio-visual matching. Comparative experiments are performed on two publicly-available datasets, MUSIC and AudioSet. Experiment results demonstrate that our method could often outperform other state-of-the-art ones in visual sound separation.

查看译文

关键词

Visual Sound Separation,Partial Supervision Learning,Real-life Scenarios,Audio-visual Matching

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要