OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Giuseppe Cartella,Alberto Baldrati,Davide Morelli,Marcella Cornia,Marco Bertini,Rita Cucchiara

IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT I（2023）

引用 0|浏览10

暂无评分

摘要

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.

查看译文

关键词

Fashion Domain,Vision-and-Language Pre-Training,Open-Source Datasets

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要