# Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems

pp. 968-977, 2019.

EI

Keywords:

graph neural networks knowledge-aware recommendation label propagation

Wei bo:

Abstract:

Knowledge graphs capture structured information and relations between a set of entities or items. As such knowledge graphs represent an attractive source of information that could help improve recommender systems. However, existing approaches in this domain rely on manual feature engineering and do not allow for an end-to-end training. He...More

Code:

Data:

Introduction

- Recommender systems are widely used in Internet applications to meet user’s personalized interests and alleviate the information overload [4, 29, 32].
- The sparsity issue can be addressed by introducing additional sources of information such as user/item profiles [23] or social networks [22].
- Knowledge graphs (KGs) capture structured information and relations between a set of entities [8, 9, 18, 24,25,26,27,28, 33, 34, 36].
- KGs provide connectivity information between items via different types of relations and capture semantic relatedness between the items

Highlights

- Recommender systems are widely used in Internet applications to meet user’s personalized interests and alleviate the information overload [4, 29, 32]
- We develop Knowledge-aware Graph Neural Networks with Label Smoothness regularization (KGNN-LS) that extends Graph Neural Networks architecture to knowledge graphs to simultaneously capture semantic relationships between the items as well as personalized user preferences and interests
- We show that the knowledge-aware graph neural networks and label smoothness regularization can be unified under the same framework, where label smoothness can be seen as a natural choice of regularization on knowledge-aware graph neural networks
- (2) In click-through rate (CTR) prediction, we apply the trained model to predict each piece of user-item pair in the test set
- The results of top-K recommendation and click-through rate prediction are presented in Tables 3 and 4, respectively, which show that KGNNLS outperforms baselines by a significant margin
- We propose knowledge-aware graph neural networks with label smoothness regularization for recommendation

Methods

- The authors evaluate the proposed KGNN-LS model, and present its performance on four real-world scenarios: movie, book, music, and restaurant recommendations. 5.1 Datasets

The authors utilize the following four datasets in the experiments for movie, book, music, and restaurant recommendations, respectively, in which the first three are public datasets and the last one is from probability probability without common rater

0.8 with common rater(s)

0.2 shortest distance (a) MovieLens-20M

0.6 without common rater with common rater(s)

0 2 4 5 6 7 8 9 10 11 12 >12 shortest distance (b) Last.FM

Meituan-Dianping Group. - The authors utilize the following four datasets in the experiments for movie, book, music, and restaurant recommendations, respectively, in which the first three are public datasets and the last one is from probability probability without common rater.
- 0.2 shortest distance (a) MovieLens-20M.
- The KG for Dianping-Food dataset is constructed by the internal toolkit of Meituan-Dianping Group.
- MovieLens-20M4 is a widely used benchmark dataset in movie recommendations, which consists of approximately 20 million explicit ratings on the MovieLens website.
- The corresponding KG contains 25,787 entities, 60,787 edges and 18 relation-types

Results

- The authors evaluate the method in two experiment scenarios: (1) In top-K recommendation, the authors use the trained model to select K items with highest predicted click probability for each user in the test set, and choose Recall@K to evaluate the recommended sets.
- (2) In click-through rate (CTR) prediction, the authors apply the trained model to predict each piece of user-item pair in the test set.
- The AUC of KGNN-LS surpasses baselines by 5.1%, 6.9%, 8.3%, and 4.3% on average in MovieLens-20M, Book-Crossing, Last.FM, and Dianping-Food datasets, respectively.
- The authors notice that the curve of KGNN-LS is consistently above baselines over the test period; the performance of KGNN-LS is r SVD LibFM LibFM+TransE PER CKE RippleNet KGNN-LS with low variance, which suggests that KGNN-LS is robust and stable in practice

Conclusion

- How can the knowledge graph help find users’ interests? To intuitively understand the role of the KG, the authors make an analogy with a

# users # items # interactions # entities # relations # KG triples

Movie 138,159 16,954 13,501,622 102,569

Book 19,676 20,003 172,576 25,787

Music 1,872 3,846 42,346 9,366

Restaurant 2,298,698

physical equilibrium model as shown in Figure 2. - The upward force goes deeper in the KG with the increase of L (Figure 2c), which helps explore users’ long-distance interests and pull up more positive items.
- The authors propose knowledge-aware graph neural networks with label smoothness regularization for recommendation.
- LS regularization is proposed for recommendation task with KGs. It is interesting to examine the LS assumption on other graph tasks such as link prediction and node classification

Summary

## Introduction:

Recommender systems are widely used in Internet applications to meet user’s personalized interests and alleviate the information overload [4, 29, 32].- The sparsity issue can be addressed by introducing additional sources of information such as user/item profiles [23] or social networks [22].
- Knowledge graphs (KGs) capture structured information and relations between a set of entities [8, 9, 18, 24,25,26,27,28, 33, 34, 36].
- KGs provide connectivity information between items via different types of relations and capture semantic relatedness between the items
## Methods:

The authors evaluate the proposed KGNN-LS model, and present its performance on four real-world scenarios: movie, book, music, and restaurant recommendations. 5.1 Datasets

The authors utilize the following four datasets in the experiments for movie, book, music, and restaurant recommendations, respectively, in which the first three are public datasets and the last one is from probability probability without common rater

0.8 with common rater(s)

0.2 shortest distance (a) MovieLens-20M

0.6 without common rater with common rater(s)

0 2 4 5 6 7 8 9 10 11 12 >12 shortest distance (b) Last.FM

Meituan-Dianping Group.- The authors utilize the following four datasets in the experiments for movie, book, music, and restaurant recommendations, respectively, in which the first three are public datasets and the last one is from probability probability without common rater.
- 0.2 shortest distance (a) MovieLens-20M.
- The KG for Dianping-Food dataset is constructed by the internal toolkit of Meituan-Dianping Group.
- MovieLens-20M4 is a widely used benchmark dataset in movie recommendations, which consists of approximately 20 million explicit ratings on the MovieLens website.
- The corresponding KG contains 25,787 entities, 60,787 edges and 18 relation-types
## Results:

The authors evaluate the method in two experiment scenarios: (1) In top-K recommendation, the authors use the trained model to select K items with highest predicted click probability for each user in the test set, and choose Recall@K to evaluate the recommended sets.- (2) In click-through rate (CTR) prediction, the authors apply the trained model to predict each piece of user-item pair in the test set.
- The AUC of KGNN-LS surpasses baselines by 5.1%, 6.9%, 8.3%, and 4.3% on average in MovieLens-20M, Book-Crossing, Last.FM, and Dianping-Food datasets, respectively.
- The authors notice that the curve of KGNN-LS is consistently above baselines over the test period; the performance of KGNN-LS is r SVD LibFM LibFM+TransE PER CKE RippleNet KGNN-LS with low variance, which suggests that KGNN-LS is robust and stable in practice
## Conclusion:

How can the knowledge graph help find users’ interests? To intuitively understand the role of the KG, the authors make an analogy with a

# users # items # interactions # entities # relations # KG triples

Movie 138,159 16,954 13,501,622 102,569

Book 19,676 20,003 172,576 25,787

Music 1,872 3,846 42,346 9,366

Restaurant 2,298,698

physical equilibrium model as shown in Figure 2.- The upward force goes deeper in the KG with the increase of L (Figure 2c), which helps explore users’ long-distance interests and pull up more positive items.
- The authors propose knowledge-aware graph neural networks with label smoothness regularization for recommendation.
- LS regularization is proposed for recommendation task with KGs. It is interesting to examine the LS assumption on other graph tasks such as link prediction and node classification

- Table1: List of key symbols
- Table2: Statistics of the four datasets: MovieLens-20M (movie), Book-Crossing (book), Last.FM (music), and Dianping-Food (restaurant)
- Table3: The results of Recall@K in top-K recommendation
- Table4: The results of AU C in CTR prediction
- Table5: AUC of all methods w.r.t. the ratio of training set r
- Table6: R@10 w.r.t. the number of layers L
- Table7: R@10 w.r.t. the dimension of hidden layers d
- Table8: Hyper-parameter settings for the four datasets (S: number of sampled neighbors for each entity; d: dimension of hidden layers, L: number of layers, λ: label smoothness regularizer weight, γ : L2 regularizer weight, η: learning rate)

Related work

- 2.1 Graph Neural Networks

Graph Neural Networks (or Graph Convolutional Neural Networks, GCNs) aim to generalize convolutional neural networks to nonEuclidean domains (such as graphs) for robust feature learning. Bruna et al [3] define the convolution in Fourier domain and calculate the eigendecomposition of the graph Laplacian, Defferrard et al [5] approximate the convolutional filters by Chebyshev expansion of the graph Laplacian, and Kipf et al [11] propose a convolutional architecture via a first-order approximation. In contrast to these spectral GCNs, non-spectral GCNs operate on the graph directly and apply “convolution” (i.e., weighted average) to local neighbors of a node [6, 7, 15].

Recently, researchers also deployed GCNs in recommender systems: PinSage [32] applies GCNs to the pin-board bipartite graph in Pinterest. Monti et al [14] and Berg et al [19] model recommender systems as matrix completion and design GCNs for representation learning on user-item bipartite graphs. Wu et al [31] use GCNs on user/item structure graphs to learn user/item representations. The difference between these works and ours is that they are all designed for homogeneous bipartite graphs or user/item-similarity graphs where GCNs can be used directly, while here we investigate GCNs for heterogeneous KGs. Wang et al [28] use GCNs in KGs for recommendation, but simply applying GCNs to KGs without proper regularization is prone to overfitting and leads to performance degradation as we will show later. Schlichtkrull et al also propose using GCNs to model KGs [17], but not for the purpose of recommendations.

Funding

- This research has been supported in part by NSF OAC-1835598, DARPA MCS, ARO MURI, Boeing, Docomo, Hitachi, Huawei, JD, Siemens, and Stanford Data Science Initiative

Reference

- Shumeet Baluja, Rohan Seth, D Sivakumar, Yushi Jing, Jay Yagnik, Shankar Kumar, Deepak Ravichandran, and Mohamed Aly. 2008. Video suggestion and discovery for youtube: taking random walks through the view graph. In Proceedings of the 17th international conference on World Wide Web. ACM, 895–904.
- Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems. 2787–2795.
- Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. 2014. Spectral networks and locally connected networks on graphs. In the 2nd International Conference on Learning Representations.
- Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. ACM, 191–198.
- Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844–3852.
- David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems. 2224–2232.
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 201Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024–1034.
- Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S Yu. 201Leveraging metapath based context for top-n recommendation with a neural co-attention model. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1531–1540.
- Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, and Edward Y Chang. 2018. Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks. In the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 505–514.
- Masayuki Karasuyama and Hiroshi Mamitsuka. 2013. Manifold-based similarity adaptation for label propagation. In Advances in neural information processing systems. 1547–1555.
- Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In the 5th International Conference on Learning Representations.
- Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 426–434.
- Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 8 (2009), 30–37.
- Federico Monti, Michael Bronstein, and Xavier Bresson. 2017. Geometric matrix completion with recurrent multi-graph neural networks. In Advances in Neural Information Processing Systems. 3697–3707.
- Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In International Conference on Machine Learning. 2014–2023.
- Steffen Rendle. 2012. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 3 (2012), 57.
- Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference. Springer, 593–607.
- Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu.
- 2018. Recurrent knowledge graph embedding for effective recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 297–305.
- [19] Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph Convolutional Matrix Completion. stat 1050 (2017), 7.
- [20] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the 6th International Conferences on Learning Representations.
- [21] Fei Wang and Changshui Zhang. 2008. Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering 20, 1 (2008), 55–67.
- [22] Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, and Minyi Guo. 2017. Joint topic-semantic-aware social recommendation for online voting. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 347–356.
- [23] Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, and Qi Liu. 2018. Shine: Signed heterogeneous information network embedding for sentiment link prediction. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 592–600.
- [24] Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2018. RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 417–426.
- [25] Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2019. Exploring High-Order User Preference on the Knowledge Graph for Recommender Systems. ACM Transactions on Information Systems (TOIS) 37, 3 (2019), 32.
- [26] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep Knowledge-Aware Network for News Recommendation. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. 1835–1844.
- [27] Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2019. Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation. In Proceedings of the 2019 World Wide Web Conference on World Wide Web.
- [28] Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, and Minyi Guo. 2019. Knowledge graph convolutional networks for recommender systems. In Proceedings of the 2019 World Wide Web Conference on World Wide Web.
- [29] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 839–848.
- [30] Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29, 12 (2017), 2724–2743.
- [31] Yuexin Wu, Hanxiao Liu, and Yiming Yang. 2018. Graph Convolutional Matrix Completion for Bipartite Edge Prediction. In the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management.
- [32] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 974–983.
- [33] Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining. ACM, 283–292.
- [34] Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 353–362.
- [35] Xinhua Zhang and Wee S Lee. 2007. Hyperparameter learning for graph based semi-supervised learning algorithms. In Advances in neural information processing systems. 1585–1592.
- [36] Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, and Dik Lun Lee. 2017. Metagraph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 635–644.
- [37] Dengyong Zhou, Olivier Bousquet, Thomas N Lal, Jason Weston, and Bernhard Schölkopf. 2004. Learning with local and global consistency. In Advances in neural information processing systems. 321–328.
- [38] Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning. 912–919.
- [39] Xiaojin Zhu, John Lafferty, and Ronald Rosenfeld. 2005. Semi-supervised learning with graphs. Ph.D. Dissertation. Carnegie Mellon University, language technologies institute, school of computer science.

Tags

Comments