RAP-SAM: Towards Real-Time All-Purpose Segment Anything
CoRR(2024)
摘要
Advanced by transformer architecture, vision foundation models (VFMs) achieve
remarkable progress in performance and generalization ability. Segment Anything
Model (SAM) is one remarkable model that can achieve generalized segmentation.
However, most VFMs cannot run in realtime, which makes it difficult to transfer
them into several products. On the other hand, current real-time segmentation
mainly has one purpose, such as semantic segmentation on the driving scene. We
argue that diverse outputs are needed for real applications. Thus, this work
explores a new real-time segmentation setting, named all-purpose segmentation
in real-time, to transfer VFMs in real-time deployment. It contains three
different tasks, including interactive segmentation, panoptic segmentation, and
video segmentation. We aim to use one model to achieve the above tasks in
real-time. We first benchmark several strong baselines. Then, we present
Real-Time All Purpose SAM (RAP-SAM). It contains an efficient encoder and an
efficient decoupled decoder to perform prompt-driven decoding. Moreover, we
further explore different training strategies and tuning methods to boost
co-training performance further. Our code and model are available at
https://github.com/xushilin1/RAP-SAM/.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要