BCINet: Bilateral cross-modal interaction network for indoor scene understanding in RGB-D images.

Inf. Fusion(2023)

引用 15|浏览8
暂无评分
摘要
Depth cue has proven to be useful information in the indoor scene understanding of RGB-D images for providing a geometric counterpart to RGB representation. However, because of the differences between RGB-D image pairs, utilizing cross-modal data effectively is a key issue. Most methods exclusively leverage depth data to unilaterally complement RGB data for better feature representation; they invariably ignore the fact that RGB and depth data can bilaterally complement each other. Herein, a novel RGB-D scene-understanding network called BCINet is presented, in which RGB and depth data bilaterally complement each other via a proposed bilateral cross-modal interaction module (BCIM). The BCIM helps to capture cross-modal complementary cues by crossly fusing enhanced features from one modality to the counterpart modality through a feature enhanced module. Meanwhile, exploiting the long-range dependencies of RGB-D features is also significant for accurate scene understanding. Specifically, we design a hybrid pyramid dilated convolution module to enlarge the receptive fields along both the vertical and horizontal spatial directions to adaptively capture diverse contexts with different shapes. Additionally, we propose a context-guided module to aggregate these diverse higher-level contexts with lower-level features in the encoder to guide the information flow for progressively refining the segmentation map. Experimental results on two indoor scene datasets demonstrate the superiority and effectiveness of the proposed BCINet over several state-of-the-art approaches.
更多
查看译文
关键词
Scene understanding,RGB-D,Bilateral cross-modal interaction,Hybrid pyramid dilated convolution,Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要