Monocular Road Scene Bird's Eye View Prediction via Big Kernel-Size Encoder and Spatial-Channel Transform Module

Zhongyu Rao,Hai Wang,Long Chen,Yubo Lian,Yilin Zhong,Ze Liu,Yingfeng Cai

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS（2023）

引用 0|浏览40

暂无评分

摘要

A detailed representation of the surrounding road scene is crucial for an autonomous driving system. yellow The camera-based Bird's Eye View map has been a popular solution to present the surrounding information, due to its low cost and rich spatial context information. Most of the existing methods predict the BEV map based on the depth-estimation or the trivial homography method, which may cause the error propagation and the absence of content. To overcome these drawbacks, we propose a novel end-to-end framework that employs the front monocular image to predict the road layout and vehicle occupancy. In particular, to capture the long-range feature, we redesign a CNN encoder with a large kernel size to extract the image features. For reducing the big difference between the front image features and the top-down features, we propose a novel Spatial-Channel projection module to convert the front map into the top-down space. Additionally, concerning the correlation between front view and top-down view, we propose the Dual Cross-view Transformer module to refine the top-down view feature maps and strengthen the transformation. Extensive evaluations on the KITTI and Argoverse datasets present that the proposed model achieves the state-of-the-art results for both datasets. Furthermore, the proposed model runs in 37 FPS on a single GPU, demonstrating the generation of a real-time BEV map. The code will be published at https://github.com/raozhongyu/BEV_LKA.

查看译文

关键词

Transformers,Feature extraction,Kernel,Transforms,Task analysis,Roads,Predictive models,Autonomous vehicles,bird's eye view map,Index Terms,deep learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要