AMCFNet: Asymmetric multiscale and crossmodal fusion network for RGB-D semantic segmentation in indoor service robots

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION(2023)

引用 0|浏览4
暂无评分
摘要
Red-green-blue and depth (RGB-D) semantic segmentation is essential for indoor service robots to achieve accurate artificial intelligence. Various RGB-D indoor semantic segmentation methods have been proposed since the widespread adoption of depth maps. These methods have focused mainly on integrating the multiscale and crossmodal features extracted from RGB images and depth maps in the encoder and used unified strategies to recover the local details at the decoder progressively. However, these methods emphasized crossmodal fusion at the encoder, neglecting the distinguishability between RGB and depth features during decoding, thereby undermining the segmentation performance. To adequately exploit the features, we propose an efficient encoderdecoder architecture called asymmetric multiscale and crossmodal fusion network (AMCFNet) endowed with a differential feature integration strategy. Unlike existing methods, we use simple crossmodal fusion at the encoder and design an elaborate decoder to improve the semantic segmentation performance. Specifically, considering high- and low-level features, we propose a semantic aggregation module (SAM) to process the multiscale and crossmodal features in the last three network layers to aggregate high-level semantic information through a cascaded pyramid structure. Moreover, we design a spatial detail supplement module using low-level spatial details from depth maps to adaptively fuse these details and the information obtained from the SAM. Extensive experiments are conducted to demonstrate that the proposed AMCFNet outperforms state-of-the-art approaches.
更多
查看译文
关键词
Multiscale feature,Crossmodal fusion,Differential feature integration,RGB-D information,Semantic segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要