PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning
CoRR(2024)
摘要
Multimedia online platforms (e.g., Amazon, TikTok) have greatly benefited
from the incorporation of multimedia (e.g., visual, textual, and acoustic)
content into their personal recommender systems. These modalities provide
intuitive semantics that facilitate modality-aware user preference modeling.
However, two key challenges in multi-modal recommenders remain unresolved: i)
The introduction of multi-modal encoders with a large number of additional
parameters causes overfitting, given high-dimensional multi-modal features
provided by extractors (e.g., ViT, BERT). ii) Side information inevitably
introduces inaccuracies and redundancies, which skew the modality-interaction
dependency from reflecting true user preference. To tackle these problems, we
propose to simplify and empower recommenders through Multi-modal Knowledge
Distillation (PromptMM) with the prompt-tuning that enables adaptive quality
distillation. Specifically, PromptMM conducts model compression through
distilling u-i edge relationship and multi-modal node content from cumbersome
teachers to relieve students from the additional feature reduction parameters.
To bridge the semantic gap between multi-modal context and collaborative
signals for empowering the overfitting teacher, soft prompt-tuning is
introduced to perform student task-adaptive. Additionally, to adjust the impact
of inaccuracies in multimedia data, a disentangled multi-modal list-wise
distillation is developed with modality-aware re-weighting mechanism.
Experiments on real-world data demonstrate PromptMM's superiority over existing
techniques. Ablation tests confirm the effectiveness of key components.
Additional tests show the efficiency and effectiveness.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要