Semantic-Preserving Augmentation for Robust Image-Text Retrieval

arxiv(2023)

引用 0|浏览13
暂无评分
摘要
Image-text retrieval is a task to search for the proper textual descriptions of the visual world and vice versa. One challenge of this task is the vulnerability to input image/text corruptions. Such corruptions are often unobserved during the training, and degrade the retrieval model’s decision quality substantially. In this paper, we propose a novel image-text retrieval technique, referred to as robust visual semantic embedding (RVSE), which consists of novel image-based and text-based augmentation techniques called semantic-preserving augmentation for image (SPAug-I) and text (SPAug-T). Since SPAug-I and SPAug-T change the original data in a way that its semantic information is preserved, we enforce the feature extractors to generate semantic-aware embedding vectors regardless of the corruption, improving the model’s robustness significantly. From extensive experiments using benchmark datasets, we show that RVSE outperforms conventional retrieval schemes in terms of image-text retrieval performance.
更多
查看译文
关键词
retrieval,semantic-preserving,image-text
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要