WidthFormer: Toward Efficient Transformer-based BEV View Transformation
CoRR(2024)
摘要
In this work, we present WidthFormer, a novel transformer-based
Bird's-Eye-View (BEV) 3D detection method tailored for real-time
autonomous-driving applications. WidthFormer is computationally efficient,
robust and does not require any special engineering effort to deploy. In this
work, we propose a novel 3D positional encoding mechanism capable of accurately
encapsulating 3D geometric information, which enables our model to generate
high-quality BEV representations with only a single transformer decoder layer.
This mechanism is also beneficial for existing sparse 3D object detectors.
Inspired by the recently-proposed works, we further improve our model's
efficiency by vertically compressing the image features when serving as
attention keys and values. We also introduce two modules to compensate for
potential information loss due to feature compression. Experimental evaluation
on the widely-used nuScenes 3D object detection benchmark demonstrates that our
method outperforms previous approaches across different 3D detection
architectures. More importantly, our model is highly efficient. For example,
when using 256× 704 input images, it achieves 1.5 ms latency on NVIDIA
3090 GPU. Furthermore, WidthFormer also exhibits strong robustness to different
degrees of camera perturbations. Our study offers valuable insights into the
deployment of BEV transformation methods in real-world, complex road
environments. Code is available at
https://github.com/ChenhongyiYang/WidthFormer .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要