Waveform-Logmel Dual Stream Fusion Network for Sound Event Detection

Yun Liang,Shitong Weng, Shenlong Zheng,Shaojian Qiu, Hai Lin, Liping Chen, Jiefeng Zuo, Jingru Huang

2022 9th International Conference on Digital Home (ICDH)(2022)

引用 0|浏览3
暂无评分
摘要
Although Convolutional Recurrent Neural Network (CRNN) has proven to be an effective method for sound event detection, problems still exist. The feature extractor cannot effectively extract and fuse audio deep features with different semantics. Thus hard to cope with the complex sound scenes in real environments. To address these issues, we present a novel framework with a dual-stream network with a feature fusion module. More precisely, we first extract the deep features of the Logmel spectrogram and the raw waveform separately. Then, we use Selective Feature module to fuse the features of the dual streams. The module use channel attention mechanism to assign weights to each channel of the fused feature map, forcing the model to focus on important sound features while ignoring unrelated feature. Experiments on the DCASE 2022 Task 4 validation set demonstrate the effectiveness of our model. Ensembling this system, the PSDS-scenario1 and 2 of 37.9% and 62.3% outperform the baseline system of 33.3% and 53.6%.
更多
查看译文
关键词
Sound Event detection,feature fusion,feature representation,Convolutional Recurrent Neural Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要