Learning Hierarchical Visual-Semantic Representation with Phrase Alignment

Baoming Yan, Qingheng Zhang, Liyu Chen,Lin Wang, Leihao Pei,Jiang Yang, Enyun Yu,Xiaobo Li,Binqiang Zhao

International Multimedia Conference(2021)

引用 0|浏览14
暂无评分
摘要
ABSTRACTEffective visual-semantic representation is critical to the image-text matching task. Various methods are proposed to develop image representation with more semantic concepts and a lot of progress has been achieved. However, the internal hierarchical structure in both image and text, which could effectively enhance the semantic representation, is rarely explored in the image-text matching task. In this work, we propose a Hierarchical Visual-Semantic Network (HVSN) with fine-grained semantic alignment to exploit the hierarchical structure. Specifically, we first model the spatial or semantic relationship between objects and aggregate them into visual semantic concepts by the Local Relational Attention (LRA) module. Then we employ Gated Recurrent Unit (GRU) to learn relationships between visual semantic concepts and generate the global image representation. For the text part, we develop phrase features from related words, then generate text representation by learning relationships between these phrases. Besides, the model is trained with joint optimization of image-text retrieval and phrase alignment task to capture the fine-grained interplay between vision and language. Our approach achieves state-of-the-art performance on Flickr30K and MS-COCO datasets. On Flickr30K, our approach outperforms the current state-of-the-art method by 3.9% relatively in text retrieval with image query and 1.3% relatively for image retrieval with text query (based on [email protected]). On MS-COCO, our HVSN improves image retrieval by 2.3% relatively and text retrieval by 1.2% relatively. Both quantitative and visual ablation studies are provided to verify the effectiveness of the proposed modules.
更多
查看译文
关键词
Visual-Semantic Representation, Image-Text Matching, Phrase Alignment, Multi-Modal Retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要