SceneSense: Diffusion Models for 3D Occupancy Synthesis from Partial Observation
CoRR(2024)
摘要
When exploring new areas, robotic systems generally exclusively plan and
execute controls over geometry that has been directly measured. When entering
space that was previously obstructed from view such as turning corners in
hallways or entering new rooms, robots often pause to plan over the newly
observed space. To address this we present SceneScene, a real-time 3D diffusion
model for synthesizing 3D occupancy information from partial observations that
effectively predicts these occluded or out of view geometries for use in future
planning and control frameworks. SceneSense uses a running occupancy map and a
single RGB-D camera to generate predicted geometry around the platform at
runtime, even when the geometry is occluded or out of view. Our architecture
ensures that SceneSense never overwrites observed free or occupied space. By
preserving the integrity of the observed map, SceneSense mitigates the risk of
corrupting the observed space with generative predictions. While SceneSense is
shown to operate well using a single RGB-D camera, the framework is flexible
enough to extend to additional modalities. SceneSense operates as part of any
system that generates a running occupancy map `out of the box', removing
conditioning from the framework. Alternatively, for maximum performance in new
modalities, the perception backbone can be replaced and the model retrained for
inference in new applications. Unlike existing models that necessitate multiple
views and offline scene synthesis, or are focused on filling gaps in observed
data, our findings demonstrate that SceneSense is an effective approach to
estimating unobserved local occupancy information at runtime. Local occupancy
predictions from SceneSense are shown to better represent the ground truth
occupancy distribution during the test exploration trajectories than the
running occupancy map.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要