Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
arxiv(2024)
摘要
Popular representation learning methods encourage feature invariance under
transformations applied at the input. However, in 3D perception tasks like
object localization and segmentation, outputs are naturally equivariant to some
transformations, such as rotation. Using pre-training loss functions that
encourage equivariance of features under certain transformations provides a
strong self-supervision signal while also retaining information of geometric
relationships between transformed feature representations. This can enable
improved performance in downstream tasks that are equivariant to such
transformations. In this paper, we propose a spatio-temporal equivariant
learning framework by considering both spatial and temporal augmentations
jointly. Our experiments show that the best performance arises with a
pre-training approach that encourages equivariance to translation, scaling, and
flip, rotation and scene flow. For spatial augmentations, we find that
depending on the transformation, either a contrastive objective or an
equivariance-by-classification objective yields best results. To leverage
real-world object deformations and motion, we consider sequential LiDAR scene
pairs and develop a novel 3D scene flow-based equivariance objective that leads
to improved performance overall. We show our pre-training method for 3D object
detection which outperforms existing equivariant and invariant approaches in
many settings.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要