VLKP:Video Instance Segmentation with Visual-Linguistic Knowledge Prompts.

Ruixiang Chen,Sheng Liu,Junhao Chen,Bingnan Guo,Feng Zhang

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2023）

引用 0|浏览0

暂无评分

摘要

Most video instance segmentation(VIS) models only focused on visual knowledge and ignored intrinsic linguistic knowledge. Based on the observation that incorporating linguistic knowledge can significantly improve the model’s contextual understanding of the video, in this paper, we present a Video Instance Segmentation approach with Visual-Linguistic Knowledge Prompts(VLKP), a novel paradigm for offline video instance Segmentation. Specifically, we propose the visual-linguistic knowledge prompt training strategy, which incorporates linguistic features with visual features to obtain Visual-Linguistic features and processes it instead of traditional visual features. In addition, we design a new temporal shift encoder to convey information between frames and enhance the temporal sensitivity of the model. On two widely adopted VIS benchmarks, i.e., YouTube-VIS-2019 and YouTube-VIS-2021, VLKP with ResNet-50 obtains state-of-the-art results,e.g.,47.7 AP on YouTube-VIS-2019 and 42.0 AP on YouTube-VIS-2021. Code is available at https://github.com/ruixiangC/VLKP.

查看译文

关键词

Video instance segmentation,end-to-end,Visual-Linguistic knowledge,Temporal shift encoder

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要