Multi-Objective Recommendation via Multivariate Policy Learning
arxiv(2024)
摘要
Real-world recommender systems often need to balance multiple objectives when
deciding which recommendations to present to users. These include behavioural
signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g.
diversity, fairness). Scalarisation methods are commonly used to handle this
balancing task, where a weighted average of per-objective reward signals
determines the final score used for ranking. Naturally, how these weights are
computed exactly, is key to success for any online platform. We frame this as a
decision-making task, where the scalarisation weights are actions taken to
maximise an overall North Star reward (e.g. long-term user retention or
growth). We extend existing policy learning methods to the continuous
multivariate action domain, proposing to maximise a pessimistic lower bound on
the North Star reward that the learnt policy will yield. Typical lower bounds
based on normal approximations suffer from insufficient coverage, and we
propose an efficient and effective policy-dependent correction for this. We
provide guidance to design stochastic data collection policies, as well as
highly sensitive reward signals. Empirical observations from simulations,
offline and online experiments highlight the efficacy of our deployed approach.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要