Linguistic Query-Guided Mask Generation for Referring Image Segmentation

Zhichao Wei,Xiaohao Chen, Mingqiang Chen,Siyu Zhu

arxiv(2023)

引用 0|浏览36
暂无评分
摘要
Referring image segmentation aims to segment the image region of interest according to the given language expression, which is a typical multi-modal task. Existing methods either adopt the pixel classification-based or the learnable query-based framework for mask generation, both of which are insufficient to deal with various text-image pairs with a fix number of parametric prototypes. In this work, we propose an end-to-end framework built on transformer to perform Linguistic query-Guided mask generation, dubbed LGFormer. It views the linguistic features as query to generate a specialized prototype for arbitrary input image-text pair, thus generating more consistent segmentation results. Moreover, we design several cross-modal interaction modules (\eg, vision-language bidirectional attention module, VLBA) in both encoder and decoder to achieve better cross-modal alignment.
更多
查看译文
关键词
mask generation,image,query-guided
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要