HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning.

CIKM, pp.1797-1806, (2017)

被引用266|浏览453
EI
下载 PDF 全文
引用
微博一下

摘要

In this paper, we propose a novel representation learning framework, namely HIN2Vec, for heterogeneous information networks (HINs). The core of the proposed framework is a neural network model, also called HIN2Vec, designed to capture the rich semantics embedded in HINs by exploiting different types of relationships among nodes. Given a ...更多

代码

数据

0
简介
  • Network data analysis and mining is an important research field because network data, capturing phenomena in various networks, such as social networks, paper citation networks, and World Wide Web, are ubiquitous in the real world [6, 15, 29].
  • The HIN2Vec model aims to capture the rich information in an HIN by exploiting various types of relationships among nodes and the network structure.
  • Consider a DBLP collaboration network, which consists of three node types: Author, Paper and Venue, and two edge types: an author writes a paper, and a paper is published in a venue.
  • The authors claim that encoding the rich information embedded in meta-paths and the whole network structure would help learning meaningful representation which is useful for various applications, because the different semantics of relationships are better captured
重点内容
  • Network data analysis and mining is an important research field because network data, capturing phenomena in various networks, such as social networks, paper citation networks, and World Wide Web, are ubiquitous in the real world [6, 15, 29]
  • We propose a new neural network (NN) model, namely Heterogeneous Information Network to Vector (HIN2Vec) for representation learning of nodes in heterogeneous information networks (HINs)
  • Heterogeneous information networks, such as Yelp social network [4], DBLP collaboratoin network [3], and U.S patent citation network [2], are networks with nodes and edges belonging to different types
  • We argue the superiority of HIN2Vec is due to a better model design that precisely captures the relationships between nodes
  • Capturing meta-paths of longer length is useful for link prediction in complex heterogeneous information networks
  • This study focuses on representation learning in heterogeneous information networks
方法
  • The authors conduct a comprehensive evaluation on HIN2Vec. The authors first introduce four real-world HINs used for experiments and six models for representation learning on networks.
  • The authors evaluate HIN2Vec and those models by two applications: multi-label classification for nodes and link prediction for edges.
  • The authors use all bloggers (U) and their groups (G) as nodes to form a social network, which contains friendships (UU) and users’ groups (U-G) as edges.
  • The authors extract data of the top 10 cities with the most businesses to form a network, which includes users (U), businesses (B), cities(C) and categories (T) as nodes, and friendships (U-U), users’ reviews (B-U), businesses’ cities (B-L) and businesses’ categories (B-C) as edges.
  • The network contains patents (P), inventors (I), assignees (A) and patent classes (C) as nodes, and inventorships (P-I), patents’ assignees (PA), patents’ classes (P-C) and citations (P→P) as edges
结果
  • Evaluation of models

    The performance of node classification by all evaluated models is summarized in Table 2.
  • Comparing with ESim which captures meta-path relationships between nodes, HIN2Vec outperforms ESim in all four networks.
  • Capturing meta-paths of longer length is useful for link prediction in complex HINs. Compared with LINE and PTE, which only capture 1-hop or 2-hop neighborhood of nodes, other models usually have better performance in most of the networks, perhaps because they captures relationships between nodes with larger hop number.
  • This is the reason why HIN2Vec outperforms LINE in Blogcatalog
结论
  • This study focuses on representation learning in HINs. Prior works in representation learning in networks only consider limited types of relationships among nodes, or only capture aggregated information of relationships.
  • The proposed model learn representations of meta-paths, Hadamard Average Minus Abs. minus node2vec PTE HINE ESim MPE
表格
  • Table1: Statistics of Datasets
  • Table2: Performance Evaluation of Node Classification
  • Table3: Clusters of Meta-paths
  • Table4: Vector Functions of Node Pairs
  • Table5: Performance Evaluation of Vector Functions
  • Table6: Performance Evaluation of Link Prediction
Download tables as Excel
相关工作
  • Recent development on representation learning has shed a light on alleviating the dependence of feature engineering on human knowledge and labors [7, 24, 28]. The goal of representation learning aims to automatically learn useful latent representations of data that are effective and discriminative as input features to supervised machine learning algorithms for various prediction tasks. Among the various approaches of representation learning, the neural network based learning models have received significant attention in recent years, and achieved successes in several empirical research studies of various domains, including speech recognition [12, 22], computer vision [9, 16], and natural language processing (NLP) [21].

    Recently, research on representation learning has been extended to network data [8, 10, 11, 13, 24, 25, 27, 28]. However, instead of the complex Heterogeneous information networks (HINs) targeted in this paper, some prior works focus only on learning node vectors in homogeneous information networks [10, 24, 28]. Moreover, while they all claim that their approaches are able to capture the embedded structures of information networks, these models tend to consider only aggregated information among nodes or limited types of relationships. For instance, DeepWalk [24] and node2vec [10] learn feature vectors of nodes by capturing the nearby neighborhood to each node by simulating uniform and parameterized random walks, respectively. LINE [28] captures 1-hop and 2-hop neighborhood relationships, separately, to learn two representations of nodes.
基金
  • This work is supported in part by the National Science Foundation under Grant No IIS-1717084 and SMA-1360205
引用论文
  • 2009. BlogCatalog3. http://socialcomputing.asu.edu/datasets/BlogCatalog3. (2009).
    Findings
  • 2014. USPTO PatentView. http://www.dev.patentsview.org/workshop/participants.html. (2014).
    Findings
  • 2017. How can I download the whole dblp dataset. http://dblp.uni-trier.de/faq/ How+can+I+download+the+whole+dblp+dataset. (2017).
    Findings
  • 2017. Yelp dataset. https://www.yelp.com/dataset_challenge. (2017).
    Findings
  • Joachim H Ahrens and Ulrich Dieter. 1989. An alias method for sampling from the normal distribution. Computing 42, 2-3 (1989), 159–170.
    Google ScholarLocate open access versionFindings
  • Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509–512.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798–1828.
    Google ScholarLocate open access versionFindings
  • Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 119–128.
    Google ScholarLocate open access versionFindings
  • Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2012). IEEE.
    Google ScholarLocate open access versionFindings
  • Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. (2016).
    Google ScholarFindings
  • Huan Gui, Jialu Liu, Fangbo Tao, Meng Jiang, Brandon Norick, and Jiawei Han. 2016. Large-Scale Embedding Learning in Heterogeneous Event Data. (2016).
    Google ScholarFindings
  • Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, and Tara N Sainath. 20Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82–97.
    Google ScholarLocate open access versionFindings
  • Zhipeng Huang and Nikos Mamoulis. Heterogeneous Information Network Embedding for Meta Path based Proximity. arXiv preprint arXiv:1701.05291 (????).
    Findings
  • Ming Ji, Jiawei Han, and Marina Danilevsky. 2011. Ranking-based classification of heterogeneous information networks. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2011).
    Google ScholarLocate open access versionFindings
  • Jon M Kleinberg. 2000. Navigation in a small world. Nature 406, 6798 (2000).
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS 2012). 1097–1105.
    Google ScholarLocate open access versionFindings
  • Richard A Kronmal and Arthur V Peterson Jr. 1979. On the alias method for generating random variables from a discrete distribution. The American Statistician 33, 4 (1979), 214–218.
    Google ScholarLocate open access versionFindings
  • Ni Lao and William W. Cohen. 2010. Relational Retrieval Using a Combination of Path-constrained Random Walks. Machine Learning 81 (Oct. 2010), 53–67.
    Google ScholarLocate open access versionFindings
  • Ni Lao and William W Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine learning 81, 1 (2010), 53–67.
    Google ScholarLocate open access versionFindings
  • David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. Journal of the American society for information science and technology 58, 7 (2007), 1019–1031.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS 2013). 3111–3119.
    Google ScholarLocate open access versionFindings
  • Abdel-rahman Mohamed, George E Dahl, and Geoffrey Hinton. 2012. Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (2012), 14–22.
    Google ScholarLocate open access versionFindings
  • Tore Opsahl and Pietro Panzarasa. 2009. Clustering in weighted networks. Social networks 31, 2 (2009), 155–163.
    Google ScholarLocate open access versionFindings
  • Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2014). ACM, 701–710.
    Google ScholarLocate open access versionFindings
  • Jingbo Shang, Meng Qu, Jialu Liu, Lance M Kaplan, Jiawei Han, and Jian Peng. 2016. Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information Networks. arXiv preprint arXiv:1610.09769 (2016).
    Findings
  • Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, and Jiawei Han. 2011. Co-author Relationship Prediction in Heterogeneous Bibliographic Networks. In Proc. of the 2011 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2011). Kaohsiung, Taiwan, 121–128.
    Google ScholarLocate open access versionFindings
  • Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2015).
    Google ScholarLocate open access versionFindings
  • Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In Proceedings of the International Conference on World Wide Web (WWW 2015). ACM, 1067–1077.
    Google ScholarLocate open access versionFindings
  • Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 6684 (1998), 440–442.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科