Sequential Recommender System based on Hierarchical Attention Networks

    IJCAI, pp. 3926-3932, 2018.

    Cited by: 76|Bibtex|Views35|Links
    EI
    Keywords:
    maximizing a posteriorrecurrent neural networkSequential Hierarchical Attention Networkhierarchical attention networkitem userMore(10+)
    Wei bo:
    We proposed a hierarchical attention network for recommending item problem

    Abstract:

    With a large amount of user activity data accumulated, it is crucial to exploit user sequential behavior for sequential recommendations. Conventionally, user general taste and recent demand are combined to promote recommendation performances. However, existing methods often neglect that user long-term preference keep evolving over time, a...More

    Code:

    Data:

    Introduction
    • With the emergence of platform economy, many companies like Amazon, Yelp, and Uber, are creating self-ecosystems to retain users through interaction with products and services.
    • 62 million user trips have been accumulated in July 2016 at Uber, and more than 10 billion check-ins have been generated by over 50 million users at Foursquare
    • With such massive user sequential behavior data, sequential recommendation, which is to recommend the item user might be interested, has become a critical task for improving user experience and driving new value for platforms.
    • Previous methods mainly focus on user general taste and rarely consider sequential information, which leads to repeated recommendations [Hu et al, 2017; Ying et al, 2016; Zhang et al, 2016]
    Highlights
    • With the emergence of platform economy, many companies like Amazon, Yelp, and Uber, are creating self-ecosystems to retain users through interaction with products and services
    • Previous methods mainly focus on user general taste and rarely consider sequential information, which leads to repeated recommendations [Hu et al, 2017; Ying et al, 2016; Zhang et al, 2016]
    • We propose a novel approach, namely Sequential Hierarchical Attention Network (SHAN), to solve the item recommendation problem
    • We introduce the attention mechanism to model user dynamics and personal preferences for sequential recommendations
    • We propose a novel approach based on hierarchical attention network, as shown in Figure 1, according to the following characteristics of user preference
    • We proposed a hierarchical attention network for recommending item problem
    Methods
    • Taking AUC as an example, the relative performance improvements continuously keep at a very high level
    • This demonstrates that sequential information is very important for the task, where BPR ignores it.
    • 4. Surprisingly, TOP method surpasses BPR when N increases from 50 in terms of recall and even performs better than FPMC under AUC.
    • TOP method surpasses BPR when N increases from 50 in terms of recall and even performs better than FPMC under AUC
    • This phenomenon can be explained that users may tend to buy popular items in online shopping.
    • The authors' model can outperform all the baselines in terms of different Ns
    Results
    • The authors can observe that the model outperforms state-of-the-art algorithms on two datasets.
    • The authors perform experiments on two datasets which show the model consistently outperforms state-of-the-art methods in terms of Recall and Area Under Curve.
    Conclusion
    • Conclusion and Future

      Work

      In this paper, the authors proposed a hierarchical attention network for recommending item problem.
    • The authors first embedded users and items into low-rank dimension spaces, and employed a two-layer attention network to model user’s dynamic long-term taste and sequential behavior.
    • The authors' model considered not only dynamic properties in user’s longand short-term preferences, but high-level complex interactions between user and item factors, item and item factors.
    Summary
    • With the emergence of platform economy, many companies like Amazon, Yelp, and Uber, are creating self-ecosystems to retain users through interaction with products and services.
    • They factorize observed user-item matrix to learn user’s long-term preference and utilize item-item transitions to model sequential information, and linearly add them to get final scores.
    • The attention mechanism can automatically assign different influences of items for user to capture the dynamic property, while the hierarchical structure combines user’s long- and short-term preferences.
    • Through the hierarchical structure, we combine user’s long- and short-term preferences to generate a high-level hybrid representation of user.
    • Our work follows this pipeline but contributes in that: (1) Our model is built on hierarchical attention networks, which can caputure dynamic long- and short-term preferences.
    • Given users and their sequential transactions L, we aim to recommend the items users will purchase based on long- and short-term preferences learned from L.
    • In the long-term item set Lut−1, while the weights are inferred by an attention-based pooling layer guided by the user embedding.
    • To further incorporate the short-term preference, the final hybrid user representation combines the long-term user representation with the embeddings of items in the short-term item set, where the weights are learned by another attentionbased pooling layer.
    • We compute the long-term user representation ulto−n1g as a sum of the item embeddings weighted by the attention scores as follows: ulto−n1g =
    • Similar to modeling user long-term preference, we turn to attention networks, assigning weights to long-term representations and embeddings of items in the short-term item set, to capture the high-level representation of user u.
    • This method models user preference through matrix factorization and sequential information through first-order Markov chain simultaneously, and combine them by linear way for basket recommendation [Rendle et al, 2010].
    • This method integrates factored item similarity with Markov chain to model user’s long- and short-term preference.
    • We show the performance of our simplified version, i.e., SAN, which ignores the hierarchical construction and computes the weights of items from long- and short-term sets through a single attention network.
    • This indicates that our model captures more high-level complicated nonlinear information for long- and shortterm representations through attention network, while HRM may lose much information through hierarchical max pooling operation.
    • The reason may be that the basic user embedding vector on SHAN-S, fusing user basic preference, is learned for computing each weight of items in short-term set.
    • We first embedded users and items into low-rank dimension spaces, and employed a two-layer attention network to model user’s dynamic long-term taste and sequential behavior.
    Tables
    • Table1: Statistics of datasets dataset accumulates user behavior logs in the largest online shopping site in China (i.e, Tmall.com), while Gowalla dataset records the time and point-of-interest information of check-ins from users in the location-based social networking site, Gowalla. We focus on the data generated in the last seven months on both datasets. Items which have been observed by less than 20 users during this period are removed. After that, user records in one day are treated as a session (i.e., a transaction) to represent the short-term preference, and all singleton sessions (i.e., contain only one item) are removed. Similar to [<a class="ref-link" id="cHu_et+al_2017_a" href="#rHu_et+al_2017_a">Hu et al, 2017</a>], we randomly select 20% of sessions in the last month for testing, and the rest are used for training. We also randomly hold out one item in each session as the next item to be predicted. After preprocessing, basic statistics of both datasets are summarized in Table 1
    • Table2: Influence of components at AUC and Recall@20
    • Table3: Influence of different regularization at Recall@20
    Download tables as Excel
    Related work
    • To model user’s individual and sequential information jointly, Markov chains have been introduced by previous work for traditional recommendations. [Rendle et al, 2010] combined factorization method to model user general taste and Markov chains to mine user sequential pattern. Following this idea, researchers have utilized different methods to extract these two different user preferences. [Chen et al, 2012] and [Feng et al, 2015] used metric embedding to project items into points in a low-dimension Euclidean space for play list prediction and successive location recommendation. [Liang et al, 2016] utilized word embedding to extract information from item-item co-occurrence to improve matrix factorization performance. However, these methods have limited capacity on capturing high-level user-item interactions, because the weights of different components are fixed.

      Recently, researchers turn to graphical models and neural networks in recommender systems. [Liu et al, 2016] proposed a bi-weighted low-rank graph construction model, which integrates users’ interests and sequential preferences with temporal interval assessment. [Cheng et al, 2016] combined wide linear models with cross-product feature transformations and employed deep neural network to learn highly nonlinear interactions between feature embeddings. However, this model needs feature engineering to design cross features, which can be rarely observed in real data with high sparsity. To deal with this problem, [He and Chua, 2017] and [Xiao et al, 2017] designed B-Interaction and attentional pooling layers, respectively, to automatically learn secondorder feature interaction based on traditional factorization machine technology. [Hidasi et al, 2015] and [Wu et al, 2017] employed recurrent neural network (RNN) to mine dynamic user and item preferences in trajectory data. However, items in a session may not follow rigidly sequential order in many real scenarios, e.g., transactions in online shopping, where RNN is not applicable. Beyond that, [Wang et al, 2015] and [Hu et al, 2017] learned user hierarchical representation to combine user long- and short-term preferences. Our work follows this pipeline but contributes in that: (1) Our model is built on hierarchical attention networks, which can caputure dynamic long- and short-term preferences. (2) Our model utilizes nonlinear modeling of user-item interactions. It is able to learn different item influences (weights) of different users for the same item.
    Funding
    • This research was supported by the Ministry of Education of China under grant of No.2017PT18, the Natural Science Foundation of China under grant of No 61672453, 61773361, 61473273, the WE-DOCTOR company under grant of No 124000-11110, the Zhejiang University Education Foundation under grant of No K17-511120-017, and the National Science Foundation under grant of IIS-1648664
    Study subjects and analysis
    million users: 50
    Users can easily access these platforms through mobile devices in daily life, as a result large amounts of behavior logs have been generated. For instance, 62 million user trips have been accumulated in July 2016 at Uber, and more than 10 billion check-ins have been generated by over 50 million users at Foursquare. With such massive user sequential behavior data, sequential recommendation, which is to recommend the next item user might be interested, has become a critical task for improving user experience and meanwhile driving new value for platforms

    datasets: 2
    To learn the parameters, we employ the Bayesian personalized ranking optimization criterion to generate a pair-wise loss function [Rendle et al, 2009]. From the experiments, we can observe that our model outperforms state-of-the-art algorithms on two datasets. Finally, our contributions are summarized as follows:

    datasets: 2
    • Through the hierarchical structure, we combine user’s long- and short-term preferences to generate a high-level hybrid representation of user. • We perform experiments on two datasets which show our model consistently outperforms state-of-the-art methods in terms of Recall and Area Under Curve. 2 Related Work

    real-world datasets: 2
    Datasets. We perform experiments on two real-world datasets, Tmall [Hu et al, 2017] and Gowalla [Cho et al, 2011], to demonstrate the effectiveness of our model. Tmall

    real-world datasets: 2
    Our model considered not only dynamic properties in user’s longand short-term preferences, but also high-level complex interactions between user and item factors, item and item factors. From the experiments, we observed that our model outperformed the state-of-the-art methods on two real-world datasets in terms of Recall and AUC. This research was supported by the Ministry of Education of China under grant of No.2017PT18, the Natural Science Foundation of China under grant of No 61672453, 61773361, 61473273, the WE-DOCTOR company under grant of No 124000-11110, the Zhejiang University Education Foundation under grant of No K17-511120-017, and the National Science Foundation under grant of IIS-1648664

    Reference
    • [Bayer et al., 2017] Immanuel Bayer, Xiangnan He, Bhargav Kanagal, and Steffen Rendle. A generic coordinate descent framework for learning from implicit feedback. In Proceedings of the 26th International Conference on World Wide Web, 2017.
      Google ScholarLocate open access versionFindings
    • [Chen et al., 2012] Shuo Chen, Josh L Moore, Douglas Turnbull, and Thorsten Joachims. Playlist prediction via metric embedding. In SIGKDD. ACM, 2012.
      Google ScholarLocate open access versionFindings
    • [Cheng et al., 2016] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016.
      Google ScholarLocate open access versionFindings
    • [Cho et al., 2011] Eunjoon Cho, Seth A Myers, and Jure Leskovec. Friendship and mobility: user movement in location-based social networks. In SIGKDD. ACM, 2011.
      Google ScholarLocate open access versionFindings
    • [Feng et al., 2015] Shanshan Feng, Xutao Li, Yifeng Zeng, Gao Cong, Yeow Meng Chee, and Quan Yuan. Personalized ranking metric embedding for next new poi recommendation. In IJCAI, 2015.
      Google ScholarLocate open access versionFindings
    • [He and Chua, 2017] Xiangnan He and Tat-Seng Chua. Neural factorization machines for sparse predictive analytics. In SIGIR, 2017.
      Google ScholarLocate open access versionFindings
    • [He and McAuley, 2016] Ruining He and Julian McAuley. Fusing similarity models with markov chains for sparse sequential recommendation. In Proceedings of the 16th International Conference on Data Mining. IEEE, 2016.
      Google ScholarLocate open access versionFindings
    • [Hidasi et al., 2015] Balazs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. Sessionbased recommendations with recurrent neural networks. In Proceedings of the fourth International Conference on Learning Representations, 2015.
      Google ScholarLocate open access versionFindings
    • [Hu et al., 2017] Liang Hu, Longbing Cao, Shoujin Wang, Guandong Xu, Jian Cao, and Zhiping Gu. Diversifying personalized recommendation with user-session context. In IJCAI, 2017.
      Google ScholarLocate open access versionFindings
    • [Liang et al., 2016] Dawen Liang, Jaan Altosaar, Laurent Charlin, and David M Blei. Factorization meets the item embedding: Regularizing matrix factorization with item co-occurrence. In Proceedings of the 10th conference on recommender systems. ACM, 2016.
      Google ScholarLocate open access versionFindings
    • [Liu et al., 2016] Yanchi Liu, Chuanren Liu, Bin Liu, Meng Qu, and Hui Xiong. Unified point-of-interest recommendation with temporal interval assessment. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1015– 1024. ACM, 2016.
      Google ScholarLocate open access versionFindings
    • [Pan et al., 2008] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. One-class collaborative filtering. In Proceedings of International Conference on Data Mining. IEEE, 2008.
      Google ScholarLocate open access versionFindings
    • [Rendle et al., 2009] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, 2009.
      Google ScholarLocate open access versionFindings
    • [Rendle et al., 2010] Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World Wide Web. ACM, 2010.
      Google ScholarLocate open access versionFindings
    • [Wang et al., 2015] Pengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, Shengxian Wan, and Xueqi Cheng. Learning hierarchical representation model for nextbasket recommendation. In Proceedings of the 38th International SIGIR conference on Research and Development in Information Retrieval. ACM, 2015.
      Google ScholarLocate open access versionFindings
    • [Wu et al., 2017] Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J Smola, and How Jing. Recurrent recommender networks. In Proceedings of International Conference on Web Search and Data Mining. ACM, 2017.
      Google ScholarLocate open access versionFindings
    • [Xiao et al., 2017] Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. Attentional factorization machines: Learning the weight of feature interactions via attention networks. In IJCAI, 2017.
      Google ScholarLocate open access versionFindings
    • [Yang et al., 2016a] Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. Stacked attention networks for image question answering. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, 2016.
      Google ScholarLocate open access versionFindings
    • [Yang et al., 2016b] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J Smola, and Eduard H Hovy. Hierarchical attention networks for document classification. In HLT-NAACL, 2016.
      Google ScholarLocate open access versionFindings
    • [Ying et al., 2016] Haochao Ying, Liang Chen, Yuwen Xiong, and Jian Wu. Collaborative deep ranking: a hybrid pair-wise recommendation algorithm with implicit feedback. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2016.
      Google ScholarLocate open access versionFindings
    • [Zhang et al., 2016] Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. Collaborative knowledge base embedding for recommender systems. In SIGKDD. ACM, 2016.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments