Unified Spatio-Temporal Tri-Perspective View Representation for 3D Semantic Occupancy Prediction

Sathira Silva, Savindu Bhashitha Wannigama,Gihan Jayatilaka, Muhammad Haris Khan,Roshan Ragel

arxiv(2024)

引用 0|浏览2
暂无评分
摘要
Holistic understanding and reasoning in 3D scenes play a vital role in the success of autonomous driving systems. The evolution of 3D semantic occupancy prediction as a pretraining task for autonomous driving and robotic downstream tasks capture finer 3D details compared to methods like 3D detection. Existing approaches predominantly focus on spatial cues such as tri-perspective view embeddings (TPV), often overlooking temporal cues. This study introduces a spatiotemporal transformer architecture S2TPVFormer for temporally coherent 3D semantic occupancy prediction. We enrich the prior process by including temporal cues using a novel temporal cross-view hybrid attention mechanism (TCVHA) and generate spatiotemporal TPV embeddings (i.e. S2TPV embeddings). Experimental evaluations on the nuScenes dataset demonstrate a substantial 4.1 improvement in mean Intersection over Union (mIoU) for 3D Semantic Occupancy compared to TPVFormer, confirming the effectiveness of the proposed S2TPVFormer in enhancing 3D scene perception.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要