Top-K Off-Policy Correction for a REINFORCE Recommender System
Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Volume abs/1812.02353, 2019.
counterfactual learning exploration off-policy correction reinforce set recommendationMore(1+)
Industrial recommender systems deal with extremely large action spaces -- many millions of items to recommend. Moreover, they need to serve billions of users, who are unique at any point in time, making a complex user state space. Luckily, huge quantities of logged implicit feedback (e.g., user clicks, dwell time) are available for learni...More
Full Text (Upload PDF)