Learning Tree-based Deep Model for Recommender Systems

KDD, pp. 1079-1088, 2018.

Cited by: 38|Bibtex|Views62|Links
EI
Keywords:
Tree-based LearningRecommender SystemsImplicit Feedback
Weibo:
We focus on the problem of introducing arbitrary advanced models to recommender systems with large corpus

Abstract:

Model-based methods for recommender systems have been studied extensively in recent years. In systems with large corpus, however, the calculation cost for the learnt model to predict all user-item preferences is tremendous, which makes full corpus retrieval extremely difficult. To overcome the calculation barriers, models such as matrix f...More

Code:

Data:

0
Introduction
  • Recommendation has been widely used by various kinds of content providers. Personalized recommendation method, based on the intuition that users’ interests can be inferred from their historical behaviors or other users with similar preference, has been proven to be effective in YouTube [7] and Amazon [22].

    Designing such a recommendation model to predict the best candidate set from the entire corpus for each user has many challenges.
  • Personalized recommendation method, based on the intuition that users’ interests can be inferred from their historical behaviors or other users with similar preference, has been proven to be effective in YouTube [7] and Amazon [22].
  • Designing such a recommendation model to predict the best candidate set from the entire corpus for each user has many challenges.
  • Results that only contain homogeneous items with user’s historical behaviors are not expected
Highlights
  • Recommendation has been widely used by various kinds of content providers
  • To address the retrieval problem, we propose a max-heap like tree formulation and introduce deep neural networks to model the tree, which forms an efficient method for large-scale recommendation
  • Comparing to the second best YouTube productDNN approach, tree-based deep recommendation model (TDM) attention-DNN achieves 21.1% and 42.6% improvements on recall metric in two datasets respectively without filtering. This result proves the effectiveness of advanced neural network and hierarchical tree search adopted by TDM attentionDNN
  • We evaluate the proposed TDM method in Taobao display advertising platform with real traffic
  • A tree structure learning approach is used, which proves that a better tree structure can lead to significantly better results
  • Experimental evaluations with two large-scale real-world datasets show that the proposed method significantly outperforms traditional methods
  • In Taobao display advertising platform, the proposed TDM method has been deployed in production, which improves both business benefits and user experience
Methods
  • FM BPR-MF Item-CF YouTube product-DNN TDM attention-DNN TDM product-DNN TDM DNN TDM attention-DNN-HS

    MovieLens-20M (@10)

    Precision Recall F-Measure

    UserBehavior (@200) Recall F-Measure.
  • FM BPR-MF Item-CF YouTube product-DNN TDM attention-DNN TDM product-DNN TDM DNN TDM attention-DNN-HS
Results
  • The comparison results of different methods are shown in Table 2 above the dash line.
  • The results indicate that the proposed TDM attention-DNN outperforms all the baselines significantly in both datasets on most of the metrics.
  • Comparing to the second best YouTube productDNN approach, TDM attention-DNN achieves 21.1% and 42.6% improvements on recall metric in two datasets respectively without filtering.
  • This result proves the effectiveness of advanced neural network and hierarchical tree search adopted by TDM attentionDNN.
Conclusion
  • The authors figure out the main challenge for model-based methods to generate recommendations from large-scale corpus, i.e., the amount of calculation problem when making prediction.
  • A tree-based approach is proposed, where arbitrary advanced models can be employed in large-scale recommendation to infer user interests coarse-to-fine along the tree.
  • The authors conduct extensive experiments which validate the effectiveness of the proposed method, both in recommendation accuracy and novelty.
  • In Taobao display advertising platform, the proposed TDM method has been deployed in production, which improves both business benefits and user experience
Summary
  • Introduction:

    Recommendation has been widely used by various kinds of content providers. Personalized recommendation method, based on the intuition that users’ interests can be inferred from their historical behaviors or other users with similar preference, has been proven to be effective in YouTube [7] and Amazon [22].

    Designing such a recommendation model to predict the best candidate set from the entire corpus for each user has many challenges.
  • Personalized recommendation method, based on the intuition that users’ interests can be inferred from their historical behaviors or other users with similar preference, has been proven to be effective in YouTube [7] and Amazon [22].
  • Designing such a recommendation model to predict the best candidate set from the entire corpus for each user has many challenges.
  • Results that only contain homogeneous items with user’s historical behaviors are not expected
  • Methods:

    FM BPR-MF Item-CF YouTube product-DNN TDM attention-DNN TDM product-DNN TDM DNN TDM attention-DNN-HS

    MovieLens-20M (@10)

    Precision Recall F-Measure

    UserBehavior (@200) Recall F-Measure.
  • FM BPR-MF Item-CF YouTube product-DNN TDM attention-DNN TDM product-DNN TDM DNN TDM attention-DNN-HS
  • Results:

    The comparison results of different methods are shown in Table 2 above the dash line.
  • The results indicate that the proposed TDM attention-DNN outperforms all the baselines significantly in both datasets on most of the metrics.
  • Comparing to the second best YouTube productDNN approach, TDM attention-DNN achieves 21.1% and 42.6% improvements on recall metric in two datasets respectively without filtering.
  • This result proves the effectiveness of advanced neural network and hierarchical tree search adopted by TDM attentionDNN.
  • Conclusion:

    The authors figure out the main challenge for model-based methods to generate recommendations from large-scale corpus, i.e., the amount of calculation problem when making prediction.
  • A tree-based approach is proposed, where arbitrary advanced models can be employed in large-scale recommendation to infer user interests coarse-to-fine along the tree.
  • The authors conduct extensive experiments which validate the effectiveness of the proposed method, both in recommendation accuracy and novelty.
  • In Taobao display advertising platform, the proposed TDM method has been deployed in production, which improves both business benefits and user experience
Tables
  • Table1: Dimensions of the two datasets after preprocessing. One record is a user-item pair that represents user feedback
  • Table2: The comparison results of different methods in MovieLens-20M and UserBehavior datasets. According to the different corpus size, metrics are evaluated @10 in MovieLens-20 and @200 in UserBehavior. In experiments of filtering interacted items, the recommendation results and ground truth only contain items that the user has not yet interacted with before
  • Table3: Results in UserBehavior dataset. Items belong to interacted categories are excluded from recommendation results and ground truth
  • Table4: Comparison results of different tree structures in UserBehavior dataset using TDM attention-DNN model
  • Table5: Online results from Jan 22 to Jan 28, 2018 in Guess What You Like column of Taobao App Homepage
Download tables as Excel
Related work
  • With the tree structure, we firstly introduce the related work hierarchical softmax to help understand its difference with our TDM. In hierarchical softmax, each leaf node n in tree has its unique encoding from the root to the node. For example, if we encode 1 as choosing the left branch and 0 as choosing the right branch, n9’s encoding in tree in Figure 2 is 110 and n15’s encoding is 000.

    Denote bj (n) as the encoding of node n in level j. In hierarchical softmax’s formulation, the next-word’s probability given the context is derived as w

    P n|context = P b = bj (n) lj (n), context , (1)

    j =1 where w is the length of leaf node n’s encoding, and lj (n) is n’s ancestor node in level j.

    In such a way, hierarchical softmax solves the probability calculation problem by avoiding the normalization term (each word in the corpus needs to be traversed) in conventional softmax. However, to find the most possible leaf, the model still has to traverse the entire corpus. Traversing each level’s most possible node top-down along the tree path can not guarantee to successfully retrieve the optimal leaf. Therefore, hierarchical softmax’s formulation is not suitable for large-scale retrieval problem. In addition, according to Equation 1, each non-leaf node in tree is trained as a binary classifier to discriminate between its two children nodes. But if two nodes are neighbors in the tree, they are probably to be similar. In recommendation scenario, it’s likely that user is interested in both two children. Hierarchical softmax’s model focuses on distinguishing optimal and suboptimal choices, which may lose the capability of discriminating from a global view. If greedy beam search is used to retrieve those most possible leaf nodes, once bad decisions are made in upper levels of the tree, the model may fail to find relatively better results among those low quality candidates in lower levels. YouTube’s work [7] also reports that they have tried hierarchical softmax to learn user and item embeddings, while it performs worse than sampled-softmax [16] manner.
Funding
  • Experimental evaluations with two large-scale real-world datasets show that the proposed method significantly outperforms traditional methods
  • For YouTube product-DNN and TDM attention-DNN, the node embeddings’ dimension is set to 24, because a higher dimension doesn’t perform significantly better in our experiments
  • The proposed TDM attention-DNN performs 34.3% better in recall than YouTube’s inner product manner
  • From the results, we can observe that the trained model with learnt tree structure significantly outperforms the initial one
  • Brute-force Search tree increases from 4.15% to 4.82% compared to initial tree in experiments of filtering interacted categories, which surpasses YouTube product-DNN’s 3.09% and item-CF’s 1.06% in very large margin
  • As shown in Table 5, the CTR of TDM method increases 2.1%
  • And on the other hand the RPM metric increases 6.4%, which means the TDM method can also bring more revenue for Taobao advertising platform
Study subjects and analysis
negative samples: 100
According to the timestamp, user behaviors are divided into 10 time windows. In YouTube product-DNN and TDM attention-DNN, for each implicit feedback we randomly select 100 negative samples in MovieLens-20M and 600 negative samples in UserBehavior. Note that the negative sample number of TDM is the sum of all levels

users: 1000
Besides, only the users who have watched at least 10 movies are kept. To create training, validation and testing sets, we randomly sample 1, 000 users as testing set and another 1, 000 users as validation set, while the rest users constitute the training set [8]. For validation and testing sets, the first half of user-movie views along the timeline is regarded as known behaviors to predict the latter half

users: 10000
The data is organized in a very similar form to MovieLens-20M, i.e., a user-item behavior consists of user ID, item ID, item’s category ID, behavior type and timestamp. As we do in MovieLens-20M, only the users who have at least 10 behaviors are kept. 10, 000 users are randomly selected as testing set and another randomly selected 10, 000 users are validation set. Items’ categories are from the bottom level of Taobao’s current commodity taxonomy

Reference
  • Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, and Manik Varma. 2013. Multilabel learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the 22nd international conference on World Wide Web. ACM, 13–24.
    Google ScholarLocate open access versionFindings
  • Samy Bengio, Jason Weston, and David Grangier. 2010. Label embedding trees for large multi-class tasks. In International Conference on Neural Information Processing Systems. 163–171.
    Google ScholarLocate open access versionFindings
  • Alina Beygelzimer, John Langford, and Pradeep Ravikumar. 2007. Multiclass classification with filter trees. Gynecologic Oncology 105, 2 (2007), 312–320.
    Google ScholarLocate open access versionFindings
  • Pablo Castells, SaÞl Vargas, and Jun Wang. 2011. Novelty and Diversity Metrics for Recommender Systems: Choice, Discovery and Relevance. In Proceedings of International Workshop on Diversity in Document Retrieval (2011), 29–37.
    Google ScholarLocate open access versionFindings
  • Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7–10.
    Google ScholarLocate open access versionFindings
  • Anna E Choromanska and John Langford. 2015. Logarithmic time online multiclass prediction. In Advances in Neural Information Processing Systems. 55–63.
    Google ScholarLocate open access versionFindings
  • Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In ACM Conference on Recommender Systems. 191–198.
    Google ScholarLocate open access versionFindings
  • Robin Devooght and Hugues Bersini. 2016. Collaborative filtering with recurrent neural networks. arXiv preprint arXiv:1608.07400 (2016).
    Findings
  • Kun Gai, Xiaoqiang Zhu, Han Li, Kai Liu, and Zhe Wang. 2017. Learning Piecewise Linear Models from Large Scale Data for Ad Click Prediction. arXiv preprint arXiv:1704.05194 (2017).
    Findings
  • Zeno Gantner, Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2011. MyMediaLite: A free recommender system library. In Proceedings of the fifth ACM conference on Recommender systems. ACM, 305–308.
    Google ScholarLocate open access versionFindings
  • Tiezheng Ge, Liqin Zhao, Guorui Zhou, Keyu Chen, Shuying Liu, Huiming Yi, Zelin Hu, Bochao Liu, Peng Sun, Haoyu Liu, et al. 2017. Image Matters: Jointly Train Advertising CTR Model with Image Representation of Ad and User Behavior. arXiv preprint arXiv:1711.06505 (2017).
    Findings
  • F Maxwell Harper and Joseph A Konstan. 2016. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems 5, 4 (2016), 19.
    Google ScholarLocate open access versionFindings
  • Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web. 173–182.
    Google ScholarLocate open access versionFindings
  • Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. 448–456.
    Google ScholarLocate open access versionFindings
  • Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multilabel loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 935–944.
    Google ScholarLocate open access versionFindings
  • Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2014. On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007 (2014).
    Findings
  • Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang. 2018. Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising. arXiv preprint arXiv:1802.09756 (2018).
    Findings
  • Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017).
    Findings
  • Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (2009), 30–37.
    Google ScholarLocate open access versionFindings
  • Dawen Liang, Jaan Altosaar, Laurent Charlin, and David M. Blei. 2016. Factorization Meets the Item Embedding:Regularizing Matrix Factorization with Item Co-occurrence. In ACM Conference on Recommender Systems. 59–66.
    Google ScholarLocate open access versionFindings
  • D. Lin. 1999. WordNet: An Electronic Lexical Database. Computational Linguistics 25, 2 (1999), 292–296.
    Google ScholarLocate open access versionFindings
  • Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet computing 7, 1 (2003), 76–80.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In International Conference on Neural Information Processing Systems. 3111–3119.
    Google ScholarLocate open access versionFindings
  • Frederic Morin and Yoshua Bengio. 2005. Hierarchical probabilistic neural network language model. Aistats (2005).
    Google ScholarLocate open access versionFindings
  • Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On spectral clustering: analysis and an algorithm. In International Conference on Neural Information Processing Systems: Natural and Synthetic. 849–856.
    Google ScholarLocate open access versionFindings
  • Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 263–272.
    Google ScholarLocate open access versionFindings
  • Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang. 2016. Product-based neural networks for user response prediction. In IEEE 16th International Conference on Data Mining. IEEE, 1149–1154.
    Google ScholarLocate open access versionFindings
  • Steffen Rendle. 2010. Factorization Machines. In IEEE International Conference on Data Mining. 995–1000.
    Google ScholarLocate open access versionFindings
  • Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th conference on uncertainty in artificial intelligence. AUAI Press, 452–461.
    Google ScholarLocate open access versionFindings
  • Ruslan Salakhutdinov and Andriy Mnih. 2007. Probabilistic Matrix Factorization. In International Conference on Neural Information Processing Systems. 1257–1264.
    Google ScholarLocate open access versionFindings
  • Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In International Conference on World Wide Web. 285–295.
    Google ScholarFindings
  • J. Weston, A. Makadia, and H. Yee. 2013. Label partitioning for sublinear ranking. In International Conference on Machine Learning. 181–189.
    Google ScholarLocate open access versionFindings
  • Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853 (2015).
    Findings
  • Guorui Zhou, Chengru Song, Xiaoqiang Zhu, Xiao Ma, Yanghui Yan, Xingya Dai, Han Zhu, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD Conference. ACM.
    Google ScholarLocate open access versionFindings
  • Han Zhu, Junqi Jin, Chang Tan, Fei Pan, Yifan Zeng, Han Li, and Kun Gai. 2017. Optimized Cost Per Click in Taobao Display Advertising. In Proceedings of the 23rd ACM SIGKDD Conference. ACM, 2191–2200.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments