AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We introduced a new class of loss function that together with an improved sampling strategy have provided impressive top-k gains for Recurrent Neural Networks for session-based recommendations

Recurrent Neural Networks with Top-k Gains for Session-based Recommendations.

CIKM, (2018): 843-852

被引用249|浏览246
EI
下载 PDF 全文
引用
微博一下

摘要

RNNs have been shown to be excellent models for sequential data and in particular for data that is generated by users in an session-based manner. The use of RNNs provides impressive performance benefits over classical methods in session-based recommendations. In this work we introduce novel ranking loss functions tailored to RNNs in the r...更多

代码

数据

0
简介
  • Session-based recommendation is a very common recommendation problem that is encountered in many domains such as e-commerce, classified sites, music and video recommendation.
  • In the session-based setting, past user history logs are typically not available and recommender systems have to rely only on the actions of the user in the current sessions to provide accurate recommendations.
  • Recurrent Neural Networks (RNNs) have emerged from the deep learning literature as powerful methods for modeling sequential data.
  • These models have been successfully applied in speech recognition, translation, time series forecasting and signal processing.
  • In recommender systems RNNs have been recently applied to the session-based recommendation setting with impressive results (Hidasi et al, 2016a)
重点内容
  • Session-based recommendation is a very common recommendation problem that is encountered in many domains such as e-commerce, classified sites, music and video recommendation
  • In this work we analyze ranking loss functions used in Recurrent Neural Networks (RNNs) for session-based recommendations, this analysis leads to a new set of ranking loss functions that increase the performance of the RNN up to 30% over previous commonly used losses without incurring in significant computational overheads
  • We propose two ways to stabilize the numerical instability of the cross-entropy loss, we show how learning with the TOP1 and Bayesian Personalized Ranking (BPR) pairwise losses degrades as we add more samples to the output, and propose a family of loss functions based on pairwise losses that alleviates this problem
  • The increase with sampling and the proper loss function is stunning as the best results exceed the accuracy of the original GRU4Rec by 15 − 35% and that of item-kNN by up to 52%
  • We introduced a new class of loss function that together with an improved sampling strategy have provided impressive top-k gains for RNNs for session-based recommendations
方法
  • Experimental setup: The authors evaluated the proposed improvements – fixed cross-entropy loss, rankingmax loss functions & adding additional samples – on four dataset.
  • RSC15 is based on the dataset of RecSys Challange 20159, which contains click and buy events from an online webshop.
  • VIDEO and VIDXL are proprietary datasets containing watch events from an online video service.
  • CLASS is a proprietary dataset containing item page view events from an online classified site.
  • Datasets were subjugated to minor preprocessing split into train and test sets so that a whole session either belongs to the train or to the test set.
  • The split is based on the time of the first event of the sessions.
  • The datsets and the split are exactly the same for RSC15 as in Hidasi et al (2016a); and for VIDXL and CLASS as in Hidasi et al (2016b).
结果
  • The better performance of such loss over alternatives, along with further tricks and improvements described in this work, allow to achieve an overall improvement of up to 35% in terms of MRR and Recall@20 over previous session-based RNN solutions and up to 51% over classical collaborative filtering approaches.
  • In this work the authors analyze ranking loss functions used in RNNs for session-based recommendations, this analysis leads to a new set of ranking loss functions that increase the performance of the RNN up to 30% over previous commonly used losses without incurring in significant computational overheads.
  • 1https://github.com/hidasib/GRU4Rec section the authors revisit how GRU4Rec samples negative feedback on the output and discuss its importance.
  • The authors extend this sampling with an option for additional samples and argue that this is crucial for the increased recommendation accuracy the authors achieve.
  • The increase with sampling and the proper loss function is stunning as the best results exceed the accuracy of the original GRU4Rec by 15 − 35% and that of item-kNN by up to 52%
结论
  • The authors introduced a new class of loss function that together with an improved sampling strategy have provided impressive top-k gains for RNNs for session-based recommendations.
  • The authors believe that these new losses could be more generally applicable and along with the corresponding sampling strategies provide top-k gains for different recommendations settings and algorithms such as e.g. matrix factorization or autoencoders.
  • It is conceivable that these techniques could provide similar benefits in the area of Natural Language Processing a domain that shares significant similarities to the recommendation domain in terms of machine learning and data structure
表格
  • Table1: Properties of the datasets
  • Table2: Recommendation accuracy with additional samples and different loss functions compared to item-kNN and the original GRU4Rec. Improvements over item-kNN and the original GRU4Rec (with TOP1 loss) results are shown in parentheses. Best results are typeset bold
  • Table3: Results with unified embeddings
Download tables as Excel
相关工作
  • One of the main approaches that is employed in session-based recommendation and a natural solution to the problem of a missing user profile is the item-to-item recommendation approach (Sarwar et al, 2001; Linden et al, 2003). In this setting, an item-to-item similarity matrix is precomputed from the available session data, that is items that are often clicked together in sessions are deemed to be similar. This similarity matrix is then simply used during the session to recommend the most similar items to the one the user has currently clicked.

    Long Short-Term Memory (LSTM) Hochreiter & Schmidhuber (1997) networks are a type of RNNs that have been shown to solve the optimization issues the plague vanilla-type RNNs. LSTM’s include additional gates that regulate when and how much of the input to take into account and when to reset the hidden state. A slightly simplified version of LSTM – that still maintains all their properties – are Gated Recurrent Units (GRUs) Cho et al (2014), which we use in this work. Recurrent Neural Networks have been used with success in the area of session-based recommendations; (Hidasi et al, 2016a) proposed a Recurrent Neural Network with a pairwise ranking loss for this task, (Tan et al, 2016) proposed data augmentation techniques to improve the performance of the RNN for session-based recommendations; these techniques have though the side effect of increasing training times as a single session is split into several sub-sessions for training. Session-based RNNs have been augmented (Hidasi et al, 2016b) with feature information, such as text and images from the clicked/consumed items, showing improved performance over the plain models. RNNs have also been used in more standard user-item collaborative filtering settings where the aim is to model the evolution of the user and items factors (Wu et al, 2017),(Devooght & Bersini, 2016) where the results are less striking, with the proposed methods barely outperforming standard matrix factorization methods. This is to be expected as there is no strong evidence on major user taste evolution in a single domain in the timeframes of the available datasets and sequential modeling of items that are not ’consumed’ in sessions such as movies might not bring major benefits.
基金
  • The better performance of such loss over alternatives, along with further tricks and improvements described in this work, allow to achieve an overall improvement of up to 35% in terms of MRR and Recall@20 over previous session-based RNN solutions and up to 51% over classical collaborative filtering approaches
  • In this work we analyze ranking loss functions used in RNNs for session-based recommendations, this analysis leads to a new set of ranking loss functions that increase the performance of the RNN up to 30% over previous commonly used losses without incurring in significant computational overheads
  • 1https://github.com/hidasib/GRU4Rec section we revisit how GRU4Rec samples negative feedback on the output and discuss its importance. We extend this sampling with an option for additional samples and argue that this is crucial for the increased recommendation accuracy we achieve (up to 51% improvement)
  • The increase with sampling and the proper loss function is stunning as the best results exceed the accuracy of the original GRU4Rec by 15 − 35% and that of item-kNN by up to 52%
研究对象与分析
additional negative samples: 2048
Lower rank means that there are fewer negative samples that are relevant. The figure depicts the median negative gradient w.r.t. the target score in two cases, measured on a dataset sample during the 1st and 10th epochs (i.e. beginning and end of the training): (left) no additional samples were used, only the other examples from a mini-batch of size 32; (middle & right) 2048 additional negative samples were added. The rightmost figure focuses on the first 200 ranks of the figure in the middle

ALL Additional samples: 32768
Additional samples (a) Recommendation accuracy. 32 128 512 2048 8192 32768 ALL Additional samples (b) Training times. Adding extra samples increase computational cost, yet due to easy parallelization on modern GPUs most of this cost is alleviated

extra samples: 512
The trend, however, is similar to for all losses. For example, the full training of the network is around 10 minutes (with the settings for cross-entropy or TOP1-max), which does not increase with even 512 extra samples. At the point of diminishing returns, i.e. at 2048 extra samples, training time is around 15 minutes, which is also totally acceptable

extra samples: 2048
For example, the full training of the network is around 10 minutes (with the settings for cross-entropy or TOP1-max), which does not increase with even 512 extra samples. At the point of diminishing returns, i.e. at 2048 extra samples, training time is around 15 minutes, which is also totally acceptable. After that, training times grow quickly, due to exceeding the parallelization capabilities of the GPU we used

extra samples: 2048
After that, training times grow quickly, due to exceeding the parallelization capabilities of the GPU we used. The trend is similar on the VIDEO dataset, with training times starting around 50 minutes, starting to increase at 2048 extra samples (to 80 minutes) and quickly above thereafter. This means that the proposed method can be used with zero too little additional cost in practice, unlike data augmentation methods

Additional samples: 32768
0.32 alpha. 128 512 2048 8192 32768 Additional samples. 0.33 alpha

Additional samples: 32768
0.34 alpha. 128 512 2048 8192 32768 Additional samples popularity based samples of the mini-batch sampling. The ranking-max losses, on the other hand, seem to prefer the middle road with a slight preference towards higher values, while the extremes perform the worst

datasets: 4
The increase with sampling and the proper loss function is stunning as the best results exceed the accuracy of the original GRU4Rec by 15 − 35% and that of item-kNN by up to 52%. BPR-max even performs slightly better (+1 − 7%) than cross-entropy on 3 of 4 datasets and achieves similar results on the remaining one dataset. On RSC15, Tan et al (2016) reported ∼ 0.685 and ∼ 0.29 in recall@20 and MRR@20 respectively11 using data augmentation

additional negative samples: 2048
Mini-batch sampling. Median negative gradients of BPR and BPR-max w.r.t. the target score against the rank of the target item. Left: only minibatch samples are used (minibatch size: 32); Center: 2048 additional negative samples were added to the minibatch samples; Right: same setting as the center, focusing on ranks 0-200. Results on the CLASS dataset. ”ALL” means no sampling of items

引用论文
  • Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frederic Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, et al. Theano: A python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688, 2016.
    Findings
  • Alejandro Bellogin, Pablo Castells, and Ivan Cantador. Precision-oriented evaluation of recommender systems: An algorithmic comparison. In RecSys’11: 5th ACM Conf. on Recommender Systems, pp. 333–336, 2011. ISBN 978-1-4503-0683-6. doi: 10.1145/2043932.2043996. URL http://doi.acm.org/10.1145/2043932.2043996.
    Locate open access versionFindings
  • Chris J.C. Burges. From ranknet to lambdarank to lambdamart: An overview. Technical report, June 2010. URL https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview/.
    Findings
  • Sotirios Chatzis, Panayiotis Christodoulou, and Andreas S Andreou. Recurrent latent variable networks for session-based recommendation. arXiv preprint arXiv:1706.04026, 2017.
    Findings
  • Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder–decoder approaches. In SSST-8: 8th Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111, 2014.
    Google ScholarLocate open access versionFindings
  • Robin Devooght and Hugues Bersini. Collaborative filtering with recurrent neural networks. arXiv preprint arXiv:1608.07400, 2016.
    Findings
  • Balazs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. Session-based recommendations with recurrent neural networks. International Conference on Learning Representations, 2016a. URL http://arxiv.org/abs/1511.06939.
    Findings
  • Balazs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, and Domonkos Tikk. Parallel recurrent neural network architectures for feature-rich session-based recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, pp. 241–248, New York, NY, USA, 2016b. ACM. ISBN 978-1-4503-4035-9. doi: 10.1145/2959100.2959167. URL http://doi.acm.org/10.1145/2959100.2959167.
    Locate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8): 1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • Shihao Ji, SVN Vishwanathan, Nadathur Satish, Michael J Anderson, and Pradeep Dubey. Blackout: Speeding up recurrent neural network language models with very large vocabularies. ICLR, 2016.
    Google ScholarLocate open access versionFindings
  • Yehuda Koren and Joe Sill. Ordrec: An ordinal model for predicting personalized item rating distributions. In Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ’11, pp. 117–124, New York, NY, USA, 20ACM.
    Google ScholarLocate open access versionFindings
  • G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-to-item collaborative filtering. Internet Computing, IEEE, 7(1):76–80, 2003.
    Google ScholarLocate open access versionFindings
  • Qiwen Liu, Tianjian Chen, Jing Cai, and Dianhai Yu. Enlister: Baidu’s recommender system for the biggest Chinese Q&A website. In RecSys-12: Proc. of the 6th ACM Conf. on Recommender Systems, pp. 285–288, 2012.
    Google ScholarLocate open access versionFindings
  • S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. BPR: Bayesian personalized ranking from implicit feedback. In UAI’09: 25th Conf. on Uncertainty in Artificial Intelligence, pp. 452–461, 2009a. ISBN 978-0-9749039-5-8.
    Google ScholarLocate open access versionFindings
  • Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pp. 452–461, 2009b.
    Google ScholarLocate open access versionFindings
  • Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Item-based collaborative filtering recommendation algorithms. In WWW:01: 10th Int. Conf. on World Wide Web, pp. 285–295, 2001.
    Google ScholarLocate open access versionFindings
  • Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Nuria Oliver, and Alan Hanjalic. Climf: Learning to maximize reciprocal rank with collaborative less-is-more filtering. In Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys ’12, pp. 139–146, 2012. ISBN 978-1-4503-1270-7. doi: 10.1145/2365952.2365981. URL http://doi.acm.org/10.1145/2365952.2365981.
    Locate open access versionFindings
  • Yong Kiam Tan, Xinxing Xu, and Yong Liu. Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS 2016, pp. 17–22, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-47952. doi: 10.1145/2988450.2988452. URL http://doi.acm.org/10.1145/2988450.2988452.
    Locate open access versionFindings
  • Markus Weimer, Alexandros Karatzoglou, Quoc Viet Le, and Alex Smola. Cofirank maximum margin matrix factorization for collaborative ranking. In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, pp. 1593–1600, 2007.
    Google ScholarLocate open access versionFindings
  • Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J. Smola, and How Jing. Recurrent recommender networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17, pp. 495–503, New York, NY, USA, 2017. ACM.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科