An Aligning and Training Framework for Multimodal Recommendations
arxiv(2024)
摘要
With the development of multimedia applications, multimodal recommendations
are playing an essential role, as they can leverage rich contexts beyond user
interactions. Existing methods mainly regard multimodal information as an
auxiliary, using them to help learn ID features; however, there exist semantic
gaps among multimodal content features and ID features, for which directly
using multimodal information as an auxiliary would lead to misalignment in
representations of users and items. In this paper, we first systematically
investigate the misalignment issue in multimodal recommendations, and propose a
solution named AlignRec. In AlignRec, the recommendation objective is
decomposed into three alignments, namely alignment within contents, alignment
between content and categorical ID, and alignment between users and items. Each
alignment is characterized by a specific objective function and is integrated
into our multimodal recommendation framework. To effectively train our
AlignRec, we propose starting from pre-training the first alignment to obtain
unified multimodal features and subsequently training the following two
alignments together with these features as input. As it is essential to analyze
whether each multimodal feature helps in training, we design three new classes
of metrics to evaluate intermediate performance. Our extensive experiments on
three real-world datasets consistently verify the superiority of AlignRec
compared to nine baselines. We also find that the multimodal features generated
by AlignRec are better than currently used ones, which are to be open-sourced.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要