Paved2Paradise: Cost-Effective and Scalable LiDAR Simulation by Factoring the Real World
arxiv(2023)
摘要
To achieve strong real world performance, neural networks must be trained on
large, diverse datasets; however, obtaining and annotating such datasets is
costly and time-consuming, particularly for 3D point clouds. In this paper, we
describe Paved2Paradise, a simple, cost-effective approach for generating fully
labeled, diverse, and realistic lidar datasets from scratch, all while
requiring minimal human annotation. Our key insight is that, by deliberately
collecting separate "background" and "object" datasets (i.e., "factoring the
real world"), we can intelligently combine them to produce a combinatorially
large and diverse training set. The Paved2Paradise pipeline thus consists of
four steps: (1) collecting copious background data, (2) recording individuals
from the desired object class(es) performing different behaviors in an isolated
environment (like a parking lot), (3) bootstrapping labels for the object
dataset, and (4) generating samples by placing objects at arbitrary locations
in backgrounds. To demonstrate the utility of Paved2Paradise, we generated
synthetic datasets for two tasks: (1) human detection in orchards (a task for
which no public data exists) and (2) pedestrian detection in urban
environments. Qualitatively, we find that a model trained exclusively on
Paved2Paradise synthetic data is highly effective at detecting humans in
orchards, including when individuals are heavily occluded by tree branches.
Quantitatively, a model trained on Paved2Paradise data that sources backgrounds
from KITTI performs comparably to a model trained on the actual dataset. These
results suggest the Paved2Paradise synthetic data pipeline can help accelerate
point cloud model development in sectors where acquiring lidar datasets has
previously been cost-prohibitive.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要