Partial visual-semantic embedding: Fine-grained outfit image representation with massive volumes of tags via angular-based contrastive learning.

Knowl. Based Syst.(2023)

引用 0|浏览1
暂无评分
摘要
A novel technology named fashion intelligence system, which quantifies ambiguous expressions unique to fashion, such as "casual,'' "adult-casual,'' and "office-casual,"was previously proposed to support users in their understanding of fashion. However, the existing visual-semantic embedding (VSE) model, which forms the basis of the system, does not support images that are composed of multiple parts, such as those containing hair, tops, trousers, skirts, and shoes. Therefore, we propose a partial VSE (PVSE) model, which enables fine-grained learning of each part of the fashion outfit. The proposed model learns embedded representations via angular-based contrastive learning. This helps in retaining three existing practical functionalities and further enables image-retrieval tasks where changes are only made to specified parts and image-reordering tasks focusing on the specified parts. In other words, the proposed model enables five types of practical functionalities, even with a simple structure. Through qualitative and quantitative experiments, we demonstrate that the proposed model is superior to conventional models, without increasing computational complexity. (c) 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
更多
查看译文
关键词
outfit image representation,tags,contrastive learning,visual-semantic,fine-grained,angular-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要