Stacked Squeeze-and-Excitation Recurrent Residual Network for Visual-Semantic Matching

Pattern Recognition(2020)

引用 12|浏览119
暂无评分
摘要
•This paper proposes a novel stacked Squeeze-and-Excitation Recurrent Residual Network (SER2-Net) for visual-semantic matching.•This paper develops an effective and efficient cross-modal representation learning module, which is capable of generating semantically complementary multi-level features for both modalities.•This paper presents a novel objective function for aligning cross-modal data, which is able to capture the interdependency among multiple semantic levels to alleviate the distribution inconsistency between visual and textual modality.•Extensive experiments on two benchmark datasets demonstrate the superiority of our proposed model compared to the state-of-the-art approaches.
更多
查看译文
关键词
Vision and language,Cross-modal retrieval,Visual-Semantic embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要