LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
arxiv(2024)
摘要
We present a simple self-supervised method to enhance the performance of ViT
features for dense downstream tasks. Our Lightweight Feature Transform (LiFT)
is a straightforward and compact postprocessing network that can be applied to
enhance the features of any pre-trained ViT backbone. LiFT is fast and easy to
train with a self-supervised objective, and it boosts the density of ViT
features for minimal extra inference cost. Furthermore, we demonstrate that
LiFT can be applied with approaches that use additional task-specific
downstream modules, as we integrate LiFT with ViTDet for COCO detection and
segmentation. Despite the simplicity of LiFT, we find that it is not simply
learning a more complex version of bilinear interpolation. Instead, our LiFT
training protocol leads to several desirable emergent properties that benefit
ViT features in dense downstream tasks. This includes greater scale invariance
for features, and better object boundary maps. By simply training LiFT for a
few epochs, we show improved performance on keypoint correspondence, detection,
segmentation, and object discovery tasks. Overall, LiFT provides an easy way to
unlock the benefits of denser feature arrays for a fraction of the
computational cost. For more details, refer to our project page at
https://www.cs.umd.edu/ sakshams/LiFT/.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要