Top-K Off-Policy Correction for a REINFORCE Recommender System

    Paul Covington
    Paul Covington
    Sagar Jain
    Sagar Jain

    Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Volume abs/1812.02353, 2019.

    Cited by: 3|Bibtex|Views21|Links
    EI
    Keywords:
    counterfactual learning exploration off-policy correction reinforce set recommendationMore(1+)

    Abstract:

    Industrial recommender systems deal with extremely large action spaces -- many millions of items to recommend. Moreover, they need to serve billions of users, who are unique at any point in time, making a complex user state space. Luckily, huge quantities of logged implicit feedback (e.g., user clicks, dwell time) are available for learni...More

    Code:

    Data:

    Your rating :
    0

     

    Tags
    Comments