Generative Inverse Deep Reinforcement Learning for Online Recommendation

Conference on Information and Knowledge Management(2021)

引用 12|浏览91
暂无评分
摘要
ABSTRACTDeep reinforcement learning enables an agent to capture users' interest through dynamic interactions with the environment. It uses a reward function to learn user's interest and to control the learning process, attracting great interest in recommendation research. However, most reward functions are manually designed; they are either too unrealistic or imprecise to reflect the variety, dimensionality, and non-linearity of the recommendation problem. This impedes the agent from learning an optimal policy in highly dynamic online recommendation scenarios. To address the above issue, we propose a generative inverse reinforcement learning approach that avoids the need of defining an elaborative reward function. In particular, we model the recommendation problem as an automatic policy learning problem. We first generate policies based on observed users' preferences and then evaluate the learned policy by a measurement based on a discriminative actor-critic network. We conduct experiments on an online platform, VirtualTB, and demonstrate the feasibility and effectiveness of our proposed approach via comparisons with several state-of-the-art methods.
更多
查看译文
关键词
online recommendation,reinforcement learning,deep
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要