UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving
arxiv(2023)
摘要
Multi-camera 3D perception has emerged as a prominent research field in
autonomous driving, offering a viable and cost-effective alternative to
LiDAR-based solutions. The existing multi-camera algorithms primarily rely on
monocular 2D pre-training. However, the monocular 2D pre-training overlooks the
spatial and temporal correlations among the multi-camera system. To address
this limitation, we propose the first multi-camera unified pre-training
framework, called UniScene, which involves initially reconstructing the 3D
scene as the foundational stage and subsequently fine-tuning the model on
downstream tasks. Specifically, we employ Occupancy as the general
representation for the 3D scene, enabling the model to grasp geometric priors
of the surrounding world through pre-training. A significant benefit of
UniScene is its capability to utilize a considerable volume of unlabeled
image-LiDAR pairs for pre-training purposes. The proposed multi-camera unified
pre-training framework demonstrates promising results in key tasks such as
multi-camera 3D object detection and surrounding semantic scene completion.
When compared to monocular pre-training methods on the nuScenes dataset,
UniScene shows a significant improvement of about 2.0
for multi-camera 3D object detection, as well as a 3
surrounding semantic scene completion. By adopting our unified pre-training
method, a 25
offering significant practical value for the implementation of real-world
autonomous driving. Codes are publicly available at
https://github.com/chaytonmin/UniScene.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要