DRN: A Deep Reinforcement Learning Framework for News Recommendation

    WWW '18: The Web Conference 2018 Lyon France April, 2018, pp. 167-176, 2018.

    Cited by: 140|Bibtex|Views159|Links
    EI
    Keywords:
    Upper Confidence BoundDeep Q-Learninguser feedbackDueling Bandit Gradient DescentSIGIRMore(22+)
    Wei bo:
    We propose a Deep Q-Learning-based reinforcement learning framework to do online personalized news recommendation

    Abstract:

    In this paper, we propose a novel Deep Reinforcement Learning framework for news recommendation. Online personalized news recommendation is a highly challenging problem due to the dynamic nature of news features and user preferences. Although some online recommendation models have been proposed to address the dynamic nature of news recomm...More

    Code:

    Data:

    0
    Introduction
    • The explosive growth of online content and services has provided tons of choices for users.
    • In Proceedings of the fifth ACM international conference on Web search and data mining.
    • In Proceedings of the sixth ACM international conference on Web search and data mining.
    • In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining.
    • In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval.
    Highlights
    • The explosive growth of online content and services has provided tons of choices for users
    • We propose a reinforcement learning framework to do online personalized news recommendation
    • We focus on news recommendation, our framework can be generalized to many other recommendation problems
    • Traditional recommendation methods tend to recommend similar items and will narrow down user’s reading choices. This will make users bored and lead to decrease of user satisfaction in the long run. To address these three challenges, we propose a Deep Q-Learning-based Deep Reinforcement Learning framework to do online personalized news recommendation
    • We propose to apply Dueling Bandit Gradient Descent exploration strategy [16, 49] to our algorithm which can both improve recommendation diversity and avoid the harm to recommendation accuracy induced by classical exploration strategies like ε-greedy [31] and Upper Confidence Bound [23]
    • We propose a Deep Q-Learning-based reinforcement learning framework to do online personalized news recommendation
    Methods
    • Variations of the model.
    • The authors' basic model is named as “DN ”, which uses a dueling-structure [47] Double Deep Q-network [41] without considering future reward.
    • By adding future reward into consideration, this becomes “DDQN ”.
    • The authors add more components to “DDQN ”.
    • Baseline algorithms.
    • The authors compared the algorithms with following five baseline methods.
    • All these five methods will conduct online update during the testing stage.
    • Some state-of-art methods can not be applied due to their inapplicability to the problem, like [43], [45], and [48]
    Results
    • Experiments have shown that the method can improve the recommendation accuracy and recommendation diversity significantly.
    Conclusion
    • The authors propose a DQN-based reinforcement learning framework to do online personalized news recommendation.
    • Different from previous methods, the method can effectively model the dynamic news features and user preferences, and plan for future explicitly, in order to achieve higher reward (e.g., CTR) in the long run.
    • The authors apply an effective exploration strategy into the framework to improve the recommendation diversity and look for potential more rewarding recommendations.
    • Experiments have shown that the method can improve the recommendation accuracy and recommendation diversity significantly.
    • The authors' method can be generalized to many other recommendation problems
    Summary
    • The explosive growth of online content and services has provided tons of choices for users.
    • There are some online recommendation methods [11, 24] that can capture the dynamic change of news features and user preference through online model updates, they only try to optimize the current reward (e.g., Click Through Rate), and ignore what effect the current recommendation might bring to the future.
    • In order to better model the dynamic nature of news characteristics and user preference, we propose to use Deep Q-Learning (DQN) [31] framework.
    • Different from the effort for modeling the complex interaction between user and item, our algorithm focuses on dealing with the dynamic nature of online news recommendation, and modeling of future reward.
    • To address these three challenges, we propose a DQN-based Deep Reinforcement Learning framework to do online personalized news recommendation.
    • (3) MINOR UPDATE: After each timestamp, with the feature representation of the previous user u and news list L, and the feedback B, agent G will update the model by comparing the recommendation performance of exploitation network Q and exploration network Q.
    • Considering the previous mentioned dynamic feature of news recommendation and the need to estimate future reward, we apply a Deep Q-Network (DQN) [31] to model the probability that one user may click on one specific piece of news.
    • Under the setting of reinforcement learning, the probability for a user to click on a piece of news is essentially the reward that our agent can get.
    • Due to the long tail distribution of news request and click counts, we apply the same set of parameters for different news, which performs better than the original setting in [23] on our dataset.(An improved version of the original LinUCB– HLinUCB will be compared.)
    • (It is possible that our agent G want to recommend user u a news i for user activeness or exploration consideration, but the information about whether user u will click on news i or not does not exist in the offline log.) In addition, naive random exploration like ε-greedy will harm the recommendation accuracy.
    • In the online evaluation stage, we deployed our models and compared algorithms on a commercial news recommendation application.
    • We propose a DQN-based reinforcement learning framework to do online personalized news recommendation.
    • Different from previous methods, our method can effectively model the dynamic news features and user preferences, and plan for future explicitly, in order to achieve higher reward (e.g., CTR) in the long run.
    • Our method can be generalized to many other recommendation problems
    Tables
    • Table1: Notations
    • Table2: Statistics of the sampled dataset
    • Table3: Parameter setting
    • Table4: Offline recommendation accuracy
    • Table5: Online recommendation accuracy
    • Table6: Diversity of user clicked news in the online experiment. Smaller ILS indicates better diversity. Similarity between news is measured by the cosine similarity between the bag-of-words vectors of news
    Download tables as Excel
    Related work
    • 2.1 News recommendation algorithms

      Recommender systems [3, 4] have been investigated extensively because of its direct connection to profits of products. Recently, due to the explosive grow of online content, more and more attention has been drawn to a special application of recommendation – online personalized news recommendation. Conventional news recommendation methods can be divided into three categories. Content-based methods [19, 22, 33] will maintain news term frequency features (e.g., TF-IDF) and user profiles (based on historical news). Then, recommender will select news that is more similar to user profile. In contrast, collaborative filtering methods [11] usually make rating prediction utilizing the past ratings of current user or similar users [28, 34], or the combination of these two [11]. To combine the advantages of the former two groups of methods, hybrid methods [12, 24, 25] are further proposed to improve the user profile modeling. Recently, as an extension and integration of previous methods, deep learning models [8, 45, 52] have shown much superior performance than previous three categories of models due to its capability of modeling complex user-item relationship. Different from the effort for modeling the complex interaction between user and item, our algorithm focuses on dealing with the dynamic nature of online news recommendation, and modeling of future reward. However, these feature construction and user-item modeling techniques can be easily integrated into our methods.
    Funding
    • The work was supported in part by NSF awards #1639150, #1544455, #1652525, and #1618448
    • The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies
    Study subjects and analysis
    consecutive requests of users: 2
    S0 is set to 0.5 to represent the random initial state of a user (i.e., he or she can be either active or inactive). We can observe the histogram of the time interval between every two consecutive requests of users as shown in Figure 6. We observe that besides reading news multiple times in a day, people usually return to the application on a daily regular basis

    Reference
    • 2007. Lecture Notes on Generalized Linear Models. http://data.princeton.edu/wws509/notes/. (2007).
      Findings
    • Gediminas Adomavicius and YoungOk Kwon. 201Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering 24, 5 (2012), 896–911.
      Google ScholarLocate open access versionFindings
    • Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE transactions on knowledge and data engineering 17, 6 (2005), 734–749.
      Google ScholarLocate open access versionFindings
    • Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. 2013. Recommender systems survey. Knowledge-based systems 46 (2013), 109– 132.
      Google ScholarLocate open access versionFindings
    • Djallel Bouneffouf, Amel Bouzeghoub, and Alda Gançarski. 2012. A contextualbandit algorithm for mobile context-aware recommender system. In Neural Information Processing. Springer, 324–331.
      Google ScholarLocate open access versionFindings
    • Nicolo Cesa-Bianchi, Claudio Gentile, and Giovanni Zappella. 2013. A gang of bandits. In Advances in Neural Information Processing Systems. 737–745.
      Google ScholarLocate open access versionFindings
    • Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems. 2249–2257.
      Google ScholarFindings
    • Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7–10.
      Google ScholarLocate open access versionFindings
    • François Chollet et al. 2015. Keras. https://github.com/fchollet/keras. (2015).
      Findings
    • D Manning Christopher, Raghavan Prabhakar, and SCHÜTZE Hinrich. 2008. Introduction to information retrieval. An Introduction To Information Retrieval 151 (2008), 177.
      Google ScholarLocate open access versionFindings
    • Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web. ACM, 271–280.
      Google ScholarLocate open access versionFindings
    • Gianmarco De Francisci Morales, Aristides Gionis, and Claudio Lucchese. 20From chatter to headlines: harnessing the real-time web for personalized news recommendation. In Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 153–162.
      Google ScholarLocate open access versionFindings
    • Nan Du, Yichen Wang, Niao He, Jimeng Sun, and Le Song. 2015. Time-sensitive recommendation from recurrent user activities. In Advances in Neural Information Processing Systems. 3492–3500.
      Google ScholarLocate open access versionFindings
    • Claudio Gentile, Shuai Li, and Giovanni Zappella. 20Online Clustering of Bandits.. In ICML. 757–765.
      Google ScholarLocate open access versionFindings
    • Google. 2017. Google News. https://news.google.com/. (2017).
      Locate open access versionFindings
    • Artem Grotov and Maarten de Rijke. 20Online learning to rank for information retrieval: SIGIR 2016 tutorial. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 1215– 1218.
      Google ScholarLocate open access versionFindings
    • Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. 2013. Reusing historical interaction data for faster online learning to rank for IR. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 183–192.
      Google ScholarLocate open access versionFindings
    • Joseph G Ibrahim, Ming-Hui Chen, and Debajyoti Sinha. 2005. Bayesian survival analysis. Wiley Online Library.
      Google ScholarFindings
    • Wouter IJntema, Frank Goossen, Flavius Frasincar, and Frederik Hogenboom. 2010. Ontology-based news recommendation. In Proceedings of the 2010 EDBT/ICDT Workshops. ACM, 16.
      Google ScholarLocate open access versionFindings
    • How Jing and Alexander J Smola. 2017. Neural survival recommender. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 515–524.
      Google ScholarLocate open access versionFindings
    • Jaya Kawale, Hung H Bui, Branislav Kveton, Long Tran-Thanh, and Sanjay Chawla. 2015. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation. In Advances in Neural Information Processing Systems. 1297– 1305.
      Google ScholarLocate open access versionFindings
    • Michal Kompan and Mária Bieliková. 2010. Content-Based News Recommendation.. In EC-Web, Vol. 61.
      Google ScholarLocate open access versionFindings
    • Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextualbandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. ACM, 661–670.
      Google ScholarLocate open access versionFindings
    • Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Padmanabhan. 2011. SCENE: a scalable two-stage personalized news recommendation system. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 125–134.
      Google ScholarLocate open access versionFindings
    • Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior. In Proceedings of the 15th international conference on Intelligent user interfaces. ACM, 31–40.
      Google ScholarLocate open access versionFindings
    • Zhongqi Lu and Qiang Yang. 2016. Partially Observable Markov Decision Process for Recommender Systems. arXiv preprint arXiv:1608.07793 (2016).
      Findings
    • Tariq Mahmood and Francesco Ricci. 2007. Learning and adaptivity in interactive recommender systems. In Proceedings of the ninth international conference on
      Google ScholarLocate open access versionFindings
    • Benjamin Marlin and Richard S Zemel. 2004. The multiple multiplicative factor model for collaborative filtering. In Proceedings of the twenty-first international conference on Machine learning. ACM, 73.
      Google ScholarLocate open access versionFindings
    • Alexander Novikov Mikhail Trofimov. 2016. tffm: TensorFlow implementation of an arbitrary order Factorization Machine. https://github.com/geffy/tffm. (2016).
      Findings
    • Rupert G Miller Jr. 2011. Survival analysis. Vol. 66. John Wiley & Sons.
      Google ScholarFindings
    • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.
      Google ScholarLocate open access versionFindings
    • Atsuyoshi Nakamura. 2015. A ucb-like strategy of collaborative filtering. In Asian Conference on Machine Learning. 315–329.
      Google ScholarLocate open access versionFindings
    • Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth. 2011. Terms of a feather: Content-based news recommendation and discovery using twitter. Advances in Information Retrieval (2011), 448–459.
      Google ScholarLocate open access versionFindings
    • Steffen Rendle. 2010. Factorization machines. In Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 995–1000.
      Google ScholarLocate open access versionFindings
    • Pornthep Rojanavasu, Phaitoon Srinil, and Ouen Pinngern. 2005. New recommendation system using reinforcement learning. Special Issue of the Intl. J. Computer, the Internet and Management 13, SP 3 (2005).
      Google ScholarLocate open access versionFindings
    • Guy Shani, David Heckerman, and Ronen I Brafman. 2005. An MDP-based recommender system. Journal of Machine Learning Research 6, Sep (2005), 1265– 1295.
      Google ScholarLocate open access versionFindings
    • Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1. MIT press Cambridge.
      Google ScholarFindings
    • Nima Taghipour, Ahmad Kardan, and Saeed Shiry Ghidary. 2007. Usage-based web recommendations: a reinforcement learning approach. In Proceedings of the 2007 ACM conference on Recommender systems. ACM, 113–120.
      Google ScholarLocate open access versionFindings
    • Liang Tang, Yexi Jiang, Lei Li, and Tao Li. 2014. Ensemble contextual bandits for personalized recommendation. In Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 73–80.
      Google ScholarLocate open access versionFindings
    • Liang Tang, Yexi Jiang, Lei Li, Chunqiu Zeng, and Tao Li. 2015. Personalized recommendation via parameter-free contextual bandits. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 323–332.
      Google ScholarLocate open access versionFindings
    • Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning.. In AAAI. 2094–2100.
      Google ScholarFindings
    • Huazheng Wang, Qingyun Wu, and Hongning Wang. 2016. Learning Hidden Features for Contextual Bandits. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1633–1642.
      Google ScholarLocate open access versionFindings
    • Huazheng Wang, Qingyun Wu, and Hongning Wang. 2017. Factorization Bandits for Interactive Recommendation.. In AAAI. 2695–2702.
      Google ScholarFindings
    • Xinxi Wang, Yi Wang, David Hsu, and Ye Wang. 2014. Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11, 1 (2014), 7.
      Google ScholarLocate open access versionFindings
    • Xuejian Wang, Lantao Yu, Kan Ren, Guanyu Tao, Weinan Zhang, Yong Yu, and Jun Wang. 2017. Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors’ Demonstration. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2051–2059.
      Google ScholarLocate open access versionFindings
    • Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. 2013. A theoretical analysis of ndcg type ranking measures. In Conference on Learning Theory. 25–54.
      Google ScholarLocate open access versionFindings
    • Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. 2015. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015).
      Findings
    • Qingyun Wu, Hongning Wang, Liangjie Hong, and Yue Shi. 2017. Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems. (2017).
      Google ScholarFindings
    • Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 1201–1208.
      Google ScholarLocate open access versionFindings
    • Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, and Tao Li. 2016. Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2025–2034.
      Google ScholarLocate open access versionFindings
    • Xiaoxue Zhao, Weinan Zhang, and Jun Wang. 2013. Interactive collaborative filtering. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 1411–1420.
      Google ScholarLocate open access versionFindings
    • Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of users and items using reviews for recommendation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 425–434.
      Google ScholarLocate open access versionFindings
    • Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen. 2005. Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web. ACM, 22–32.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments