Fusion-Mamba for Cross-modality Object Detection
arxiv(2024)
摘要
Cross-modality fusing complementary information from different modalities
effectively improves object detection performance, making it more useful and
robust for a wider range of applications. Existing fusion strategies combine
different types of images or merge different backbone features through
elaborated neural network modules. However, these methods neglect that modality
disparities affect cross-modality fusion performance, as different modalities
with different camera focal lengths, placements, and angles are hardly fused.
In this paper, we investigate cross-modality fusion by associating cross-modal
features in a hidden state space based on an improved Mamba with a gating
mechanism. We design a Fusion-Mamba block (FMB) to map cross-modal features
into a hidden state space for interaction, thereby reducing disparities between
cross-modal features and enhancing the representation consistency of fused
features. FMB contains two modules: the State Space Channel Swapping (SSCS)
module facilitates shallow feature fusion, and the Dual State Space Fusion
(DSSF) enables deep fusion in a hidden state space. Through extensive
experiments on public datasets, our proposed approach outperforms the
state-of-the-art methods on mAP with 5.9
datasets, demonstrating superior object detection performance. To the best of
our knowledge, this is the first work to explore the potential of Mamba for
cross-modal fusion and establish a new baseline for cross-modality object
detection.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要