Future Impact Decomposition in Request-level Recommendations
CoRR(2024)
摘要
In recommender systems, reinforcement learning solutions have shown promising
results in optimizing the interaction sequence between users and the system
over the long-term performance. For practical reasons, the policy's actions are
typically designed as recommending a list of items to handle users' frequent
and continuous browsing requests more efficiently. In this list-wise
recommendation scenario, the user state is updated upon every request in the
corresponding MDP formulation. However, this request-level formulation is
essentially inconsistent with the user's item-level behavior. In this study, we
demonstrate that an item-level optimization approach can better utilize item
characteristics and optimize the policy's performance even under the
request-level MDP. We support this claim by comparing the performance of
standard request-level methods with the proposed item-level actor-critic
framework in both simulation and online experiments. Furthermore, we show that
a reward-based future decomposition strategy can better express the item-wise
future impact and improve the recommendation accuracy in the long term. To
achieve a more thorough understanding of the decomposition strategy, we propose
a model-based re-weighting framework with adversarial learning that further
boost the performance and investigate its correlation with the reward-based
strategy.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要