ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction
CoRR(2023)
摘要
We propose an online 3D semantic segmentation method that incrementally
reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline
methods, ours is directly applicable to scenarios with real-time constraints,
such as robotics or mixed reality. To overcome the inherent challenges of
online methods, we make two main contributions. First, to effectively extract
information from the input RGB-D video stream, we jointly estimate geometry and
semantic labels per frame in 3D. A key focus of our approach is to reason about
semantic entities both in the 2D input and the local 3D domain to leverage
differences in spatial context and network architectures. Our method predicts
2D features using an off-the-shelf segmentation network. The extracted 2D
features are refined by a lightweight 3D network to enable reasoning about the
local 3D structure. Second, to efficiently deal with an infinite stream of
input RGB-D frames, a subsequent network serves as a temporal expert predicting
the incremental scene updates by leveraging 2D, 3D, and past information in a
learned manner. These updates are then integrated into a global scene
representation. Using these main contributions, our method can enable scenarios
with real-time constraints and can scale to arbitrary scene sizes by processing
and updating the scene only in a local region defined by the new measurement.
Our experiments demonstrate improved results compared to existing online
methods that purely operate in local regions and show that complementary
sources of information can boost the performance. We provide a thorough
ablation study on the benefits of different architectural as well as
algorithmic design decisions. Our method yields competitive results on the
popular ScanNet benchmark and SceneNN dataset.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要