Discrete Semantic Tokenization for Deep CTR Prediction
arxiv(2024)
摘要
Incorporating item content information into click-through rate (CTR)
prediction models remains a challenge, especially with the time and space
constraints of industrial scenarios. The content-encoding paradigm, which
integrates user and item encoders directly into CTR models, prioritizes space
over time. In contrast, the embedding-based paradigm transforms item and user
semantics into latent embeddings and then caches them, prioritizes space over
time. In this paper, we introduce a new semantic-token paradigm and propose a
discrete semantic tokenization approach, namely UIST, for user and item
representation. UIST facilitates swift training and inference while maintaining
a conservative memory footprint. Specifically, UIST quantizes dense embedding
vectors into discrete tokens with shorter lengths and employs a hierarchical
mixture inference module to weigh the contribution of each user–item token
pair. Our experimental results on news recommendation showcase the
effectiveness and efficiency (about 200-fold space compression) of UIST for CTR
prediction.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要