BootsTAP: Bootstrapped Training for Tracking-Any-Point
CoRR(2024)
摘要
To endow models with greater understanding of physics and motion, it is
useful to enable them to perceive how solid surfaces move and deform in real
scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the
algorithm to be able to track any point corresponding to a solid surface in a
video, potentially densely in space and time. Large-scale ground-truth training
data for TAP is only available in simulation, which currently has limited
variety of objects and motion. In this work, we demonstrate how large-scale,
unlabeled, uncurated real-world data can improve a TAP model with minimal
architectural changes, using a self-supervised student-teacher setup. We
demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing
previous results by a wide margin: for example, TAP-Vid-DAVIS performance
improves from 61.3
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要