Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond
arxiv(2024)
摘要
Developing a universal model that can effectively harness heterogeneous
resources and respond to a wide range of personalized needs has been a
longstanding community aspiration. Our daily choices, especially in domains
like fashion and retail, are substantially shaped by multi-modal data, such as
pictures and textual descriptions. These modalities not only offer intuitive
guidance but also cater to personalized user preferences. However, the
predominant personalization approaches mainly focus on the ID or text-based
recommendation problem, failing to comprehend the information spanning various
tasks or modalities. In this paper, our goal is to establish a Unified paradigm
for Multi-modal Personalization systems (UniMP), which effectively leverages
multi-modal data while eliminating the complexities associated with task- and
modality-specific customization. We argue that the advancements in foundational
generative modeling have provided the flexibility and effectiveness necessary
to achieve the objective. In light of this, we develop a generic and extensible
personalization generative framework, that can handle a wide range of
personalized needs including item recommendation, product search, preference
prediction, explanation generation, and further user-guided image generation.
Our methodology enhances the capabilities of foundational language models for
personalized tasks by seamlessly ingesting interleaved cross-modal user history
information, ensuring a more precise and customized experience for users. To
train and evaluate the proposed multi-modal personalized tasks, we also
introduce a novel and comprehensive benchmark covering a variety of user
requirements. Our experiments on the real-world benchmark showcase the model's
potential, outperforming competitive methods specialized for each task.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要