NeuralFloors: Conditional Street-Level Scene Generation From BEV Semantic Maps via Neural Fields

IEEE ROBOTICS AND AUTOMATION LETTERS(2024)

引用 0|浏览4
暂无评分
摘要
Semantic Bird's Eye View (BEV) representations are a popular format, being easily interpretable and editable. However, synthesising ground-view images from BEVs is a difficult task as the system would need to learn both the mapping from BEV to Front View (FV) structure as well as to synthesise highly photo-realistic imagery, thus having to simultaneously consider both the geometry and appearance of the scene. We therefore present a factorised approach that tackles the problem in two stages: a first stage that learns a BEV to FV transformation in the semantic space through a Neural Field, and a second stage that leverages a Latent Diffusion Model (LDM) to synthesise images conditional on the output of the first stage. Our experiments show that this approach produces RGB images with a high perceptual quality that are also well aligned with their corresponding FV ground-truth.
更多
查看译文
关键词
Deep learning for visual perception,computer vision for transportation,neural rendering,cross-view transformation,data-driven simulation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要