SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text FeedbackJust Accepted

ACM Transactions on Multimedia Computing, Communications, and Applications(2023)

引用 0|浏览0
暂无评分
摘要
Fashion image retrieval with text feedback aims to find the target image according to the reference image and the modification from the user. This is a challenging task as it requires not only the synergistic understanding of both visual and textual modalities, but also the ability to model a wide variety of styles that fashion images contain. Hence, the crucial aspect of addressing this problem lies in exploiting the abundant semantic information inherent in fashion images and correlating it with the textual description of style. Recognizing that style is generally situated at the local level, we explicitly define style as the commonalities and differences between local areas of fashion images. Building upon this, we propose a S tyle-guided P atch I nte R action approach for fashion I mage retrieval with T ext feedback (SPIRIT), which focuses on the decisive influence of local details of fashion images on their style. Three corresponding networks are designed pertinently. The Patch-level Style Commonality (PSC) network is introduced to fully leverage the semantic information among patches and compute their average as the style commonality. Subsequently, the Patch-level Style Difference (PSD) network employs a graph reasoning network to model the patch-level difference and filter out insignificant patches. By considering the above two networks, mutual information about style is obtained from the interaction between patches. Finally, the Visual Textual Fusion (VTF) network is utilized to integrate visual features with rich semantic information and textual features. Experimental results on four benchmark datasets demonstrate that our proposed SPIRIT achieves state-of-the-art performance. Source code is available at https://github.com/PKU-ICST-MIPL/SPIRIT_TOMM2024.
更多
查看译文
关键词
Fashion image retrieval with text feedback,Style modeling,Multimodal fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要