Latent Cross: Making Use of Context in Recurrent Recommender Systems

    Paul Covington
    Paul Covington
    Sagar Jain
    Sagar Jain
    Jia Li
    Jia Li
    Vince Gatto
    Vince Gatto

    WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining Marina Del Rey CA USA February, 2018, pp. 46-54, 2018.

    Cited by: 67|Bibtex|Views19|Links
    EI
    Keywords:
    matrix factorizationcontextual datumneural networkParagraph Vectorrecommender systemMore(7+)
    Wei bo:
    Production Model: We offer a detailed description of our recurrent neural networks-based recommender system used at YouTube

    Abstract:

    The success of recommender systems often depends on their ability to understand and make use of the context of the recommendation request. Significant research has focused on how time, location, interfaces, and a plethora of other contextual features affect recommendations. However, in using deep neural networks for recommender systems, r...More

    Code:

    Data:

    0
    Introduction
    • Recommender systems have long been used for predicting what content a user would enjoy.
    • Increasingly prominently, there is an understanding of the importance of modeling the context of a recommendation – not just the user who is looking for a video to watch, and the time of day, the location, the user’s device, etc.
    • Many of these models have been proposed in the factorization setting, such as with tensor factorization for location [17], unfolding tensors for different types of user actions [46], or hand-crafted features about the effect of time [29]
    Highlights
    • Recommender systems have long been used for predicting what content a user would enjoy
    • We explore the ability to make use of contextual data in an recurrent neural networks-based recommender system used at YouTube
    • As can be seen there, our model, using the recurrent neural networks with ∆t having a latent cross with the watch, gives the best result for both Precision@1 and MAP@20
    • Challenges of First-Order deep neural network: We found feed-forward neural networks to be inefficient in modeling multiplicative relations between features
    • Production Model: We offer a detailed description of our recurrent neural networks-based recommender system used at YouTube
    • Empirical Results: We demonstrate in multiple settings and with different context features that latent crosses improve recommendation accuracy, even on top of complex, state-of-the-art recurrent neural networks recommenders
    Methods
    • RNN (Plain, no time) Bag Of Words.
    • Bag of Words with time.
    • The task is to predict the last 5 watches in the user’s sequence.
    • For this set of experiments the authors use an RNN with an LSTM recurrent unit.
    • The authors have no ReLU cells before or after the recurrent unit, and use a pre-determined hierarchical softmax (HSM) to predict the videos.
    • The authors use all but the first watch in the sequence as supervision during training.
    • The model is trained using back-propagation with ADAM [26]
    Results
    • The authors report the results for this experiment in Table 4.
    • As can be seen there, the model, using the RNN with ∆t having a latent cross with the watch, gives the best result for both Precision@1 and MAP@20.
    • Even more interestingly, is the relative performance of the models.
    • The authors observe in both the bag of words models and the RNN models the critical importance of modeling time.
    • The authors use a production dataset of user watches, which is less restrictive than the above setting.
    • The authors use a larger vocabulary on the order of millions of recently popular uploaded videos and uploaders
    Conclusion
    • The authors explore below a number of questions raised by this work and implications for future work.

      8.1 Discrete Relations in DNNs

      While much of this paper has focused on enabling multiplicative interactions between features, the authors found that neural networks can approximate discrete interactions, an area where factorization models have more difficulty.
    Summary
    • Introduction:

      Recommender systems have long been used for predicting what content a user would enjoy.
    • Increasingly prominently, there is an understanding of the importance of modeling the context of a recommendation – not just the user who is looking for a video to watch, and the time of day, the location, the user’s device, etc.
    • Many of these models have been proposed in the factorization setting, such as with tensor factorization for location [17], unfolding tensors for different types of user actions [46], or hand-crafted features about the effect of time [29]
    • Methods:

      RNN (Plain, no time) Bag Of Words.
    • Bag of Words with time.
    • The task is to predict the last 5 watches in the user’s sequence.
    • For this set of experiments the authors use an RNN with an LSTM recurrent unit.
    • The authors have no ReLU cells before or after the recurrent unit, and use a pre-determined hierarchical softmax (HSM) to predict the videos.
    • The authors use all but the first watch in the sequence as supervision during training.
    • The model is trained using back-propagation with ADAM [26]
    • Results:

      The authors report the results for this experiment in Table 4.
    • As can be seen there, the model, using the RNN with ∆t having a latent cross with the watch, gives the best result for both Precision@1 and MAP@20.
    • Even more interestingly, is the relative performance of the models.
    • The authors observe in both the bag of words models and the RNN models the critical importance of modeling time.
    • The authors use a production dataset of user watches, which is less restrictive than the above setting.
    • The authors use a larger vocabulary on the order of millions of recently popular uploaded videos and uploaders
    • Conclusion:

      The authors explore below a number of questions raised by this work and implications for future work.

      8.1 Discrete Relations in DNNs

      While much of this paper has focused on enabling multiplicative interactions between features, the authors found that neural networks can approximate discrete interactions, an area where factorization models have more difficulty.
    Tables
    • Table1: Relationship with related recommenders: We bridge the intuition and insights from contextual collaborative filtering with the power of recurrent recommender systems
    • Table2: Notation machine learning perspective, we can split our tuple e into features x and label y such that x = (i, j) and label y = R
    • Table3: Pearson correlation for different width models when fitting low-rank data
    • Table4: Results for Comparative Study: RNN with a latent cross performs the best
    Download tables as Excel
    Related work
    • We begin with a survey of the various related research. An overview can be seen in Table 1.

      Contextual Recommendation. A significant amount of research has focused on using contextual data during recommendation. In particular, certain types of contextual data have been explored in depth, where as others have been treated abstractly. For example, temporal dynamics in recommendation have been explored widely [6]. During the Netflix Prize [4], Koren [29] discovered the significant long-ranging temporal dynamics in the Netflix data set and added temporal features to his Collaborative Filtering (CF) model to account for these effects. Researchers have also explored the how preferences evolve in shorter time-scales, e.g., sessions [39]. More general abstractions have been used to model preference evolution for recommendation such as point processes [15] and recurrent neural networks [43]. Similarly, modeling user actions along with geographical data has been widely explored with probabilistic models [2, 8], matrix factorization [32], and tensor factorization [17]. A variety of methods have built on matrix and tensor factorization for cross domain learning [45, 46]. Methods like factorization machines [34] and other contextual recommenders [22, 37, 48] have provided generalizations of these collaborative filtering approaches.
    Reference
    • Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, and others. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Savannah, Georgia, USA.
      Google ScholarLocate open access versionFindings
    • Amr Ahmed, Liangjie Hong, and Alexander J Smola. 2013. Hierarchical geographical modeling of user locations from social media posts. In Proceedings of the 22nd international conference on World Wide Web (WWW). ACM, 25–36.
      Google ScholarLocate open access versionFindings
    • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
      Findings
    • James Bennett, Stan Lanning, and others. 2007. The netflix prize. In Proceedings of KDD cup and workshop, Vol. 2007. New York, NY, USA, 35.
      Google ScholarLocate open access versionFindings
    • Alex Beutel, Ed H Chi, Zhiyuan Cheng, Hubert Pham, and John Anderson. 2017. Beyond Globally Optimal: Focused Learning for Improved Recommendations. In Proceedings of the 26th International Conference on World Wide Web (WWW). ACM.
      Google ScholarLocate open access versionFindings
    • Pedro G Campos, Fernando Díez, and Iván Cantador. 2014. Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols. User Modeling and User-Adapted Interaction 24, 1-2 (2014), 67–119.
      Google ScholarLocate open access versionFindings
    • Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, and others. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7–10.
      Google ScholarLocate open access versionFindings
    • Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 759–768.
      Google ScholarLocate open access versionFindings
    • Evangelia Christakopoulou and George Karypis. 2016. Local Item-Item Models For Top-N Recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys). ACM, 67–74.
      Google ScholarLocate open access versionFindings
    • Junyoung Chung, Caglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated Feedback Recurrent Neural Networks.. In ICML. 2067–2075.
      Google ScholarLocate open access versionFindings
    • Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys). ACM, 191–198.
      Google ScholarLocate open access versionFindings
    • Bin Cui, Anthony KH Tung, Ce Zhang, and Zhe Zhao. 2010. Multiple feature fusion for social media applications. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 435–446.
      Google ScholarLocate open access versionFindings
    • Andrew M Dai, Christopher Olah, and Quoc V Le. 2015. Document embedding with paragraph vectors. arXiv preprint arXiv:1507.07998 (2015).
      Findings
    • Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2016. Language modeling with gated convolutional networks. arXiv preprint arXiv:1612.08083 (2016).
      Findings
    • Nan Du, Yichen Wang, Niao He, Jimeng Sun, and Le Song. 20Time-sensitive recommendation from recurrent user activities. In Advances in Neural Information Processing Systems. 3492–3500.
      Google ScholarLocate open access versionFindings
    • John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul (2011), 2121–2159.
      Google ScholarLocate open access versionFindings
    • Hancheng Ge, James Caverlee, and Haokai Lu. 2016. TAPER: A contextual tensor-based approach for personalized expert recommendation. (2016).
      Google ScholarFindings
    • Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).
      Findings
    • Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse Predictive Analytics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’17). ACM, New York, NY, USA, 355–364. DOI:https://doi.org/10.1145/3077136.3080777
      Locate open access versionFindings
    • Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 173–182.
      Google ScholarLocate open access versionFindings
    • Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
      Findings
    • Balázs Hidasi and Domonkos Tikk. 2016. General factorization framework for context-aware recommendations. Data Mining and Knowledge Discovery 30, 2 (2016), 342–371.
      Google ScholarFindings
    • Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
      Google ScholarLocate open access versionFindings
    • Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In ICDM.
      Google ScholarFindings
    • How Jing and Alexander J. Smola. 2017. Neural Survival Recommender. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM). 515–524.
      Google ScholarLocate open access versionFindings
    • Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
      Findings
    • Ryan Kiros, Richard Zemel, and Ruslan R Salakhutdinov. 2014. A multiplicative model for learning distributed text-based attribute representations. In Advances in neural information processing systems. 2348–2356.
      Google ScholarFindings
    • Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD. ACM, 426–434.
      Google ScholarLocate open access versionFindings
    • Yehuda Koren. 2010. Collaborative filtering with temporal dynamics. Commun. ACM 53, 4 (2010), 89–97.
      Google ScholarLocate open access versionFindings
    • Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (Aug. 2009), 30–37. DOI: https://doi.org/10.1109/MC.2009.263
      Locate open access versionFindings
    • Joonseok Lee, Seungyeon Kim, Guy Lebanon, and Yoram Singer. 2013. Local LowRank Matrix Approximation. In Proceedings of the 30th International Conference on Machine Learning (ICML). 82–90. http://jmlr.org/proceedings/papers/v28/lee13.html
      Locate open access versionFindings
    • Haokai Lu and James Caverlee. 2015. Exploiting geo-spatial preference for personalized expert recommendation. In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys). ACM, 67–74.
      Google ScholarLocate open access versionFindings
    • Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang. 2016. Product-based neural networks for user response prediction. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 1149–1154.
      Google ScholarLocate open access versionFindings
    • Steffen Rendle. 2012. Factorization Machines with libFM. ACM TIST 3, 3, Article 57 (May 2012), 22 pages.
      Google ScholarFindings
    • Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In ICML. ACM, 880–887.
      Google ScholarLocate open access versionFindings
    • Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th International Conference on World Wide Web (WWW). ACM, 111–112.
      Google ScholarLocate open access versionFindings
    • Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic, and Nuria Oliver. 2012. TFMAP: optimizing MAP for top-n context-aware recommendation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 155–164.
      Google ScholarLocate open access versionFindings
    • Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, and others. 2015. End-toend memory networks. In Advances in neural information processing systems. 2440–2448.
      Google ScholarFindings
    • Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 17–22.
      Google ScholarLocate open access versionFindings
    • Duyu Tang, Bing Qin, Ting Liu, and Yuekui Yang. 2015. User Modeling with Neural Network for Review Rating Prediction.. In IJCAI. 1340–1346.
      Google ScholarFindings
    • Bartlomiej Twardowski. 2016. Modelling Contextual Information in SessionAware Recommender Systems with Neural Networks.. In RecSys. 273–276.
      Google ScholarLocate open access versionFindings
    • Manasi Vartak, Hugo Larochelle, and Arvind Thiagarajan. 2017. A Meta-Learning Perspective on Cold-Start Recommendations for Items. In Advances in Neural Information Processing Systems. 6888–6898.
      Google ScholarLocate open access versionFindings
    • Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J. Smola, and How Jing. 2017. Recurrent Recommender Networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM). 495–503.
      Google ScholarLocate open access versionFindings
    • Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, and Ruslan R Salakhutdinov. 2016. On multiplicative integration with recurrent neural networks. In Advances in Neural Information Processing Systems. 2856–2864.
      Google ScholarLocate open access versionFindings
    • Chunfeng Yang, Huan Yan, Donghan Yu, Yong Li, and Dah Ming Chiu. 2017. Multi-site User Behavior Modeling and Its Application in Video Recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 175–184.
      Google ScholarLocate open access versionFindings
    • Zhe Zhao, Zhiyuan Cheng, Lichan Hong, and Ed H Chi. 2015. Improving User Topic Interest Profiles by Behavior Factorization. In Proceedings of the 24th International Conference on World Wide Web (WWW). 1406–1416.
      Google ScholarLocate open access versionFindings
    • Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of users and items using reviews for recommendation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM). ACM, 425–434.
      Google ScholarLocate open access versionFindings
    • Yong Zheng, Bamshad Mobasher, and Robin Burke. 2014. CSLIM: Contextual SLIM recommendation algorithms. In Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 301–304.
      Google ScholarLocate open access versionFindings
    • Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. 2017. What to Do Next: Modeling User Behaviors by Time-LSTM. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 3602–3608. DOI:https://doi.org/10.24963/ijcai.2017/504
      Locate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments