M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving
CoRR(2024)
摘要
End-to-end autonomous driving has witnessed remarkable progress. However, the
extensive deployment of autonomous vehicles has yet to be realized, primarily
due to 1) inefficient multi-modal environment perception: how to integrate data
from multi-modal sensors more efficiently; 2) non-human-like scene
understanding: how to effectively locate and predict critical risky agents in
traffic scenarios like an experienced driver. To overcome these challenges, in
this paper, we propose a Multi-Modal fusion transformer incorporating Driver
Attention (M2DA) for autonomous driving. To better fuse multi-modal data and
achieve higher alignment between different modalities, a novel
Lidar-Vision-Attention-based Fusion (LVAFusion) module is proposed. By
incorporating driver attention, we empower the human-like scene understanding
ability to autonomous vehicles to identify crucial areas within complex
scenarios precisely and ensure safety. We conduct experiments on the CARLA
simulator and achieve state-of-the-art performance with less data in
closed-loop benchmarks. Source codes are available at
https://anonymous.4open.science/r/M2DA-4772.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要