Recurrent Recommender Networks

    WSDM, pp. 495-503, 2017.

    Cited by: 292|Bibtex|Views122|Links
    EI
    Keywords:
    root-mean-square errorstochastic gradient descentGated Recurrent Unitprobabilistic matrix factorizationrecurrent recommender networksMore(6+)
    Wei bo:
    We have provided Recommender Networks, a novel recommender system based on recurrent neural networks that can accurately model user and movie dynamics

    Abstract:

    Recommender systems traditionally assume that user profiles and movie attributes are static. Temporal dynamics are purely reactive, that is, they are inferred after they are observed, e.g. after a user's taste has changed or based on hand-engineered temporal bias corrections for movies. We propose Recurrent Recommender Networks (RRN) that...More

    Code:

    Data:

    0
    Introduction
    • The design of practical recommender systems is a wellestablished and well-studied subject.
    • A common approach is to study problems of the form introduced in the Netflix contest [4].
    • Performance is measured by the deviation of the prediction from the actual rating.
    • This formulation is easy to understand and it has led to numerous highly successful approaches, such as Probabilistic Matrix Factorization [19], nearest neighbor based approaches [20], and clustering [5].
    • It is easy to define appropriate performance measures, by selecting a random subset of the tuples for training and the rest for testing purposes
    Highlights
    • The design of practical recommender systems is a wellestablished and well-studied subject
    • We demonstrate Recommender Networks’s ability to automatically model a variety of temporal effects and make accurate predictions to future ratings
    • We show that the ratings are the best currently available. They accurately reproduce the temporal effects that are usually handengineered in temporal recommender systems
    • We evaluate the performance of rating prediction based on standard root-mean-square error (RMSE)
    • We report the root-mean-square error on the testing set for the model that gives the best results on validation set
    • We have provided Recommender Networks, a novel recommender system based on recurrent neural networks that can accurately model user and movie dynamics
    Methods
    • The authors demonstrate RRN’s ability to automatically model a variety of temporal effects and make accurate predictions to future ratings.
    • The authors show that the ratings are the best currently available
    • They accurately reproduce the temporal effects that are usually handengineered in temporal recommender systems.
    • Note that the authors split the data based on time to simulate the actual situations where the authors need to predict future ratings instead of interpolate previous ratings.
    • Ratings from the testing period are evenly split into validation and testing sets
    Results
    • The authors' model achieves the best accuracy on all datasets among all compared methods including the best neural network model and the best temporal model available.
    Conclusion
    • CONCLUSION AND DISCUSSION

      In summary, the authors have provided RRN, a novel recommender system based on recurrent neural networks that can accurately model user and movie dynamics.
    • Nonparametric, Dynamic Recommender: The authors offer the first, to the best of the knowledge, recommender system that jointly models the evolution of both user and item states, and focuses on extrapolating predictions into the future without hand-crafted features
    • The authors accomplish this by adapting recurrent neural networks architectures, LSTMs, to recommendation data in order to learn dynamic embeddings of users and movies.
    • This enables them to scale to over 100 million ratings from over 6 years
    Summary
    • Introduction:

      The design of practical recommender systems is a wellestablished and well-studied subject.
    • A common approach is to study problems of the form introduced in the Netflix contest [4].
    • Performance is measured by the deviation of the prediction from the actual rating.
    • This formulation is easy to understand and it has led to numerous highly successful approaches, such as Probabilistic Matrix Factorization [19], nearest neighbor based approaches [20], and clustering [5].
    • It is easy to define appropriate performance measures, by selecting a random subset of the tuples for training and the rest for testing purposes
    • Methods:

      The authors demonstrate RRN’s ability to automatically model a variety of temporal effects and make accurate predictions to future ratings.
    • The authors show that the ratings are the best currently available
    • They accurately reproduce the temporal effects that are usually handengineered in temporal recommender systems.
    • Note that the authors split the data based on time to simulate the actual situations where the authors need to predict future ratings instead of interpolate previous ratings.
    • Ratings from the testing period are evenly split into validation and testing sets
    • Results:

      The authors' model achieves the best accuracy on all datasets among all compared methods including the best neural network model and the best temporal model available.
    • Conclusion:

      CONCLUSION AND DISCUSSION

      In summary, the authors have provided RRN, a novel recommender system based on recurrent neural networks that can accurately model user and movie dynamics.
    • Nonparametric, Dynamic Recommender: The authors offer the first, to the best of the knowledge, recommender system that jointly models the evolution of both user and item states, and focuses on extrapolating predictions into the future without hand-crafted features
    • The authors accomplish this by adapting recurrent neural networks architectures, LSTMs, to recommendation data in order to learn dynamic embeddings of users and movies.
    • This enables them to scale to over 100 million ratings from over 6 years
    Tables
    • Table1: IMDb and different splits on the Netflix dataset used in experiments. To evaluate the ability to model state transition dynamics, users and items not shown in the training set are removed from the testing set
    • Table2: RRN outperforms competing models in terms of RMSE. On Netflix datasets, a RRN with only 20-dimensional stationary factors and 40-dimensional embedding is enough to outperform PMF and TimeSVD++ of dimensionality 160 and AutoRec with 500-dimensional embeddings. Dimensionality and regularization parameter of PMF and TimeSVD++ are selected by crossvalidation. (I-AR: I-AutoRec, U-AR: U-AutoRec, T-SVD: TimeSVD++.)
    • Table3: RMSE and training-time trade-off for different time step granularities. Training-time (in seconds) is measured for one epoch of user/itemsequence update
    Download tables as Excel
    Related work
    • 2.1 Recommender Systems

      Basic recommender systems ignore temporal information. This is a reasonable approximation, in particular for the Netflix contest, since opinions about movies and users do not change too rapidly and too dramatically in most cases.

      Probably one of the most popular variants is Probabilistic Matrix Factorization (PMF) [19]. Albeit simple, PMF achieves robust and strong results in rating prediction. It is the prototypical factorization model where preferences are attributed to users and movies alike. Furthermore, implementations such as LIBPMF [24] are readily available, which make it easy to replicate experiments. Our proposed model uses the same factorization as PMF to model stationary effects. In this sense, it is a strict generalization.
    Funding
    • This research was supported by funds from Google, Bloomberg, Adobe, a National Science Foundation Graduate Research Fellowship (Grant No DGE-1252522), and the National Science Foundation under Grant No IIS-1408924
    • Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties
    Reference
    • A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky. Tensor decompositions for learning latent variable models. arXiv preprint arXiv:1210.7559, 2012.
      Findings
    • C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to mcmc for machine learning. Machine Learning, 50:5–43, 2003.
      Google ScholarLocate open access versionFindings
    • Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, et al. Greedy layer-wise training of deep networks. NIPS, 19:153, 2007.
      Google ScholarLocate open access versionFindings
    • J. Bennett and S. Lanning. The netflix prize. In Proceedings of KDD Cup and Workshop, volume 2007, page 35, 2007.
      Google ScholarLocate open access versionFindings
    • A. Beutel, A. Ahmed, and A. J. Smola. ACCAMS: Additive co-clustering to approximate matrices succinctly. In WWW, pages 119–129. ACM, 2015.
      Google ScholarLocate open access versionFindings
    • T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. W. T. Xiao, B. Xu, C. Zhang, and Z. Zhang. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274, 2015.
      Findings
    • J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Gated feedback recurrent neural networks. arXiv preprint arXiv:1502.02367, 2015.
      Findings
    • C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky, J. Leskovec, and C. Potts. No country for old members: User lifecycle and linguistic change in online communities. In WWW, pages 307–318, 2013.
      Google ScholarLocate open access versionFindings
    • M. Deshpande and G. Karypis. Selective markov models for predicting web page accesses. ACM Transactions on Internet Technology (TOIT), 4(2):163–184, 2004.
      Google ScholarLocate open access versionFindings
    • Q. Diao, M. Qiu, C.-Y. Wu, A. J. Smola, J. Jiang, and C. Wang. Jointly modeling aspects, ratings and sentiments for movie recommendation (jmars). In SIGKDD, pages 193–202. ACM, 2014.
      Google ScholarLocate open access versionFindings
    • S. Frederick and G. Loewenstein. Hedonic adaptation. D. Kahneman, E. Diener, and N. Schwarz (Eds.), Well being. The foundations of hedonic psychology, 1999.
      Google ScholarLocate open access versionFindings
    • A. Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
      Findings
    • B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939, 2015.
      Findings
    • S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
      Google ScholarLocate open access versionFindings
    • D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
      Findings
    • Y. Koren. Collaborative filtering with temporal dynamics. In Knowledge discovery and data mining KDD, pages 447–456, 2009.
      Google ScholarFindings
    • A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a pc. In OSDI, 2012.
      Google ScholarLocate open access versionFindings
    • S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Neural Computation, 11(2), 1999.
      Google ScholarLocate open access versionFindings
    • R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In NIPS, volume 20, 2008.
      Google ScholarLocate open access versionFindings
    • B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In WWW, pages 285–295. ACM, 2001.
      Google ScholarLocate open access versionFindings
    • S. Sedhain, A. K. Menon, S. Sanner, and L. Xie. Autorec: Autoencoders meet collaborative filtering. In WWW Companion, pages 111–112, 2015.
      Google ScholarLocate open access versionFindings
    • L. Song, B. Boots, S. Siddiqi, G. Gordon, and A. J. Smola. Hilbert space embeddings of hidden markov models. In ICML, 2010.
      Google ScholarLocate open access versionFindings
    • C.-Y. Wu, C. V. Alvino, A. J. Smola, and J. Basilico. Using navigation to improve recommendations in real-time. In RecSys, pages 341–348. ACM, 2016.
      Google ScholarLocate open access versionFindings
    • H.-F. Yu, C.-J. Hsieh, S. Si, and I. S. Dhillon. Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In ICDM, 2012.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments