Bi-Directional Image-Text Retrieval With Position Attention and Similarity Filtering

2022 7th International Conference on Image, Vision and Computing (ICIVC)(2022)

引用 0|浏览0
暂无评分
摘要
In recent years, cross-modal retrieval has gradually become the frontier and hotspot of academic research at home and abroad, and it is an important direction for the future development of information retrieval. Some current methods improve the performance of image-text retrieval by exploring more comprehensive global image-text alignment information or capturing region-word local fine-grained alignment. However, previous methods did not mine more useful information to obtain more accurate matching scores. In this paper, we propose a location attention and similarity filtering network for image text retrieval. Specifically, we enhance visual-text joint embedding learning with global and local alignments. We then enhance more reliable relationships between images and text sentences by exploring the location information of objects in images through location attention. In addition, we use a similarity filtering mechanism to selectively focus on important and representative alignment information while leaving the distraction of meaningless alignment information aside to effectively integrate these alignments. Experiments on our proposed method on the public datasets Flickr30K and MS-COCO validate the effectiveness and superiority of our method.
更多
查看译文
关键词
Image-text retrieval,location attention,similarity representation learning,filtering mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要