HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning.
CIKM, pp.1797-1806, (2017)
In this paper, we propose a novel representation learning framework, namely HIN2Vec, for heterogeneous information networks (HINs). The core of the proposed framework is a neural network model, also called HIN2Vec, designed to capture the rich semantics embedded in HINs by exploiting different types of relationships among nodes. Given a ...更多
下载 PDF 全文
- Network data analysis and mining is an important research field because network data, capturing phenomena in various networks, such as social networks, paper citation networks, and World Wide Web, are ubiquitous in the real world [6, 15, 29].
- The HIN2Vec model aims to capture the rich information in an HIN by exploiting various types of relationships among nodes and the network structure.
- Consider a DBLP collaboration network, which consists of three node types: Author, Paper and Venue, and two edge types: an author writes a paper, and a paper is published in a venue.
- The authors claim that encoding the rich information embedded in meta-paths and the whole network structure would help learning meaningful representation which is useful for various applications, because the different semantics of relationships are better captured
- Network data analysis and mining is an important research field because network data, capturing phenomena in various networks, such as social networks, paper citation networks, and World Wide Web, are ubiquitous in the real world [6, 15, 29]
- We propose a new neural network (NN) model, namely Heterogeneous Information Network to Vector (HIN2Vec) for representation learning of nodes in heterogeneous information networks (HINs)
- Heterogeneous information networks, such as Yelp social network , DBLP collaboratoin network , and U.S patent citation network , are networks with nodes and edges belonging to different types
- We argue the superiority of HIN2Vec is due to a better model design that precisely captures the relationships between nodes
- Capturing meta-paths of longer length is useful for link prediction in complex heterogeneous information networks
- This study focuses on representation learning in heterogeneous information networks
- The authors conduct a comprehensive evaluation on HIN2Vec. The authors first introduce four real-world HINs used for experiments and six models for representation learning on networks.
- The authors evaluate HIN2Vec and those models by two applications: multi-label classification for nodes and link prediction for edges.
- The authors use all bloggers (U) and their groups (G) as nodes to form a social network, which contains friendships (UU) and users’ groups (U-G) as edges.
- The authors extract data of the top 10 cities with the most businesses to form a network, which includes users (U), businesses (B), cities(C) and categories (T) as nodes, and friendships (U-U), users’ reviews (B-U), businesses’ cities (B-L) and businesses’ categories (B-C) as edges.
- The network contains patents (P), inventors (I), assignees (A) and patent classes (C) as nodes, and inventorships (P-I), patents’ assignees (PA), patents’ classes (P-C) and citations (P→P) as edges
- Evaluation of models
The performance of node classification by all evaluated models is summarized in Table 2.
- Comparing with ESim which captures meta-path relationships between nodes, HIN2Vec outperforms ESim in all four networks.
- Capturing meta-paths of longer length is useful for link prediction in complex HINs. Compared with LINE and PTE, which only capture 1-hop or 2-hop neighborhood of nodes, other models usually have better performance in most of the networks, perhaps because they captures relationships between nodes with larger hop number.
- This is the reason why HIN2Vec outperforms LINE in Blogcatalog
- This study focuses on representation learning in HINs. Prior works in representation learning in networks only consider limited types of relationships among nodes, or only capture aggregated information of relationships.
- The proposed model learn representations of meta-paths, Hadamard Average Minus Abs. minus node2vec PTE HINE ESim MPE
- Table1: Statistics of Datasets
- Table2: Performance Evaluation of Node Classification
- Table3: Clusters of Meta-paths
- Table4: Vector Functions of Node Pairs
- Table5: Performance Evaluation of Vector Functions
- Table6: Performance Evaluation of Link Prediction
- Recent development on representation learning has shed a light on alleviating the dependence of feature engineering on human knowledge and labors [7, 24, 28]. The goal of representation learning aims to automatically learn useful latent representations of data that are effective and discriminative as input features to supervised machine learning algorithms for various prediction tasks. Among the various approaches of representation learning, the neural network based learning models have received significant attention in recent years, and achieved successes in several empirical research studies of various domains, including speech recognition [12, 22], computer vision [9, 16], and natural language processing (NLP) .
Recently, research on representation learning has been extended to network data [8, 10, 11, 13, 24, 25, 27, 28]. However, instead of the complex Heterogeneous information networks (HINs) targeted in this paper, some prior works focus only on learning node vectors in homogeneous information networks [10, 24, 28]. Moreover, while they all claim that their approaches are able to capture the embedded structures of information networks, these models tend to consider only aggregated information among nodes or limited types of relationships. For instance, DeepWalk  and node2vec  learn feature vectors of nodes by capturing the nearby neighborhood to each node by simulating uniform and parameterized random walks, respectively. LINE  captures 1-hop and 2-hop neighborhood relationships, separately, to learn two representations of nodes.
- This work is supported in part by the National Science Foundation under Grant No IIS-1717084 and SMA-1360205
- 2009. BlogCatalog3. http://socialcomputing.asu.edu/datasets/BlogCatalog3. (2009).
- 2014. USPTO PatentView. http://www.dev.patentsview.org/workshop/participants.html. (2014).
- 2017. How can I download the whole dblp dataset. http://dblp.uni-trier.de/faq/ How+can+I+download+the+whole+dblp+dataset. (2017).
- 2017. Yelp dataset. https://www.yelp.com/dataset_challenge. (2017).
- Joachim H Ahrens and Ulrich Dieter. 1989. An alias method for sampling from the normal distribution. Computing 42, 2-3 (1989), 159–170.
- Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509–512.
- Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798–1828.
- Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 119–128.
- Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2012). IEEE.
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. (2016).
- Huan Gui, Jialu Liu, Fangbo Tao, Meng Jiang, Brandon Norick, and Jiawei Han. 2016. Large-Scale Embedding Learning in Heterogeneous Event Data. (2016).
- Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, and Tara N Sainath. 20Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82–97.
- Zhipeng Huang and Nikos Mamoulis. Heterogeneous Information Network Embedding for Meta Path based Proximity. arXiv preprint arXiv:1701.05291 (????).
- Ming Ji, Jiawei Han, and Marina Danilevsky. 2011. Ranking-based classification of heterogeneous information networks. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2011).
- Jon M Kleinberg. 2000. Navigation in a small world. Nature 406, 6798 (2000).
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS 2012). 1097–1105.
- Richard A Kronmal and Arthur V Peterson Jr. 1979. On the alias method for generating random variables from a discrete distribution. The American Statistician 33, 4 (1979), 214–218.
- Ni Lao and William W. Cohen. 2010. Relational Retrieval Using a Combination of Path-constrained Random Walks. Machine Learning 81 (Oct. 2010), 53–67.
- Ni Lao and William W Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine learning 81, 1 (2010), 53–67.
- David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. Journal of the American society for information science and technology 58, 7 (2007), 1019–1031.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS 2013). 3111–3119.
- Abdel-rahman Mohamed, George E Dahl, and Geoffrey Hinton. 2012. Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (2012), 14–22.
- Tore Opsahl and Pietro Panzarasa. 2009. Clustering in weighted networks. Social networks 31, 2 (2009), 155–163.
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2014). ACM, 701–710.
- Jingbo Shang, Meng Qu, Jialu Liu, Lance M Kaplan, Jiawei Han, and Jian Peng. 2016. Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information Networks. arXiv preprint arXiv:1610.09769 (2016).
- Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, and Jiawei Han. 2011. Co-author Relationship Prediction in Heterogeneous Bibliographic Networks. In Proc. of the 2011 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2011). Kaohsiung, Taiwan, 121–128.
- Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2015).
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In Proceedings of the International Conference on World Wide Web (WWW 2015). ACM, 1067–1077.
- Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 6684 (1998), 440–442.