Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow
CVPR 2024(2024)
摘要
Single RGB or LiDAR is the mainstream sensor for the challenging scene flow,
which relies heavily on visual features to match motion features. Compared with
single modality, existing methods adopt a fusion strategy to directly fuse the
cross-modal complementary knowledge in motion space. However, these direct
fusion methods may suffer the modality gap due to the visual intrinsic
heterogeneous nature between RGB and LiDAR, thus deteriorating motion features.
We discover that event has the homogeneous nature with RGB and LiDAR in both
visual and motion spaces. In this work, we bring the event as a bridge between
RGB and LiDAR, and propose a novel hierarchical visual-motion fusion framework
for scene flow, which explores a homogeneous space to fuse the cross-modal
complementary knowledge for physical interpretation. In visual fusion, we
discover that event has a complementarity (relative v.s. absolute) in luminance
space with RGB for high dynamic imaging, and has a complementarity (local
boundary v.s. global shape) in scene structure space with LiDAR for structure
integrity. In motion fusion, we figure out that RGB, event and LiDAR are
complementary (spatial-dense, temporal-dense v.s. spatiotemporal-sparse) to
each other in correlation space, which motivates us to fuse their motion
correlations for motion continuity. The proposed hierarchical fusion can
explicitly fuse the multimodal knowledge to progressively improve scene flow
from visual space to motion space. Extensive experiments have been performed to
verify the superiority of the proposed method.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要