VLKP:Video Instance Segmentation with Visual-Linguistic Knowledge Prompts.

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览0
暂无评分
摘要
Most video instance segmentation(VIS) models only focused on visual knowledge and ignored intrinsic linguistic knowledge. Based on the observation that incorporating linguistic knowledge can significantly improve the model’s contextual understanding of the video, in this paper, we present a Video Instance Segmentation approach with Visual-Linguistic Knowledge Prompts(VLKP), a novel paradigm for offline video instance Segmentation. Specifically, we propose the visual-linguistic knowledge prompt training strategy, which incorporates linguistic features with visual features to obtain Visual-Linguistic features and processes it instead of traditional visual features. In addition, we design a new temporal shift encoder to convey information between frames and enhance the temporal sensitivity of the model. On two widely adopted VIS benchmarks, i.e., YouTube-VIS-2019 and YouTube-VIS-2021, VLKP with ResNet-50 obtains state-of-the-art results,e.g.,47.7 AP on YouTube-VIS-2019 and 42.0 AP on YouTube-VIS-2021. Code is available at https://github.com/ruixiangC/VLKP.
更多
查看译文
关键词
Video instance segmentation,end-to-end,Visual-Linguistic knowledge,Temporal shift encoder
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要