Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

ICLR 2024(2023)

引用 0|浏览102
暂无评分
摘要
Powered by large-scale pre-training, vision foundation models exhibit significant potential in open-world image understanding. However, unlike large language models that excel at directly tackling various language tasks, vision foundation models require a task-specific model structure followed by fine-tuning on specific tasks. In this work, we present Matcher, a novel perception paradigm that utilizes off-the-shelf vision foundation models to address various perception tasks. Matcher can segment anything by using an in-context example without training. Additionally, we design three effective components within the Matcher framework to collaborate with these foundation models and unleash their full potential in diverse perception tasks. Matcher demonstrates impressive generalization performance across various segmentation tasks, all without training. For example, it achieves 52.7 with one example, surpassing the state-of-the-art specialist model by 1.6 addition, Matcher achieves 33.0 semantic segmentation, outperforming the state-of-the-art generalist model by 14.4 flexibility of Matcher when applied to images in the wild. Our code can be found at https://github.com/aim-uofa/Matcher.
更多
查看译文
关键词
Vision Foundation Models,Segment Anything,Training-Free Generalist,Matcher
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要