Improving Bird's Eye View Semantic Segmentation by Task Decomposition
arxiv(2024)
摘要
Semantic segmentation in bird's eye view (BEV) plays a crucial role in
autonomous driving. Previous methods usually follow an end-to-end pipeline,
directly predicting the BEV segmentation map from monocular RGB inputs.
However, the challenge arises when the RGB inputs and BEV targets from distinct
perspectives, making the direct point-to-point predicting hard to optimize. In
this paper, we decompose the original BEV segmentation task into two stages,
namely BEV map reconstruction and RGB-BEV feature alignment. In the first
stage, we train a BEV autoencoder to reconstruct the BEV segmentation maps
given corrupted noisy latent representation, which urges the decoder to learn
fundamental knowledge of typical BEV patterns. The second stage involves
mapping RGB input images into the BEV latent space of the first stage, directly
optimizing the correlations between the two views at the feature level. Our
approach simplifies the complexity of combining perception and generation into
distinct steps, equipping the model to handle intricate and challenging scenes
effectively. Besides, we propose to transform the BEV segmentation map from the
Cartesian to the polar coordinate system to establish the column-wise
correspondence between RGB images and BEV maps. Moreover, our method requires
neither multi-scale features nor camera intrinsic parameters for depth
estimation and saves computational overhead. Extensive experiments on nuScenes
and Argoverse show the effectiveness and efficiency of our method. Code is
available at https://github.com/happytianhao/TaDe.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要