Representation Learning for Attributed Multiplex Heterogeneous Network

KDD, 2019.

Cited by: 45|Bibtex|Views388|Links
EI
Keywords:
heterogeneous network multiplex network network embedding
Weibo:
We focus on embedding learning for Attributed MHEN, where different types of nodes might be linked with multiple different types of edges, and each node is associated with a set of different attributes

Abstract:

Network embedding (or graph embedding) has been widely used in many real-world applications. However, existing methods mainly focus on networks with single-typed nodes/edges and cannot scale well to handle large networks. Many real-world networks consist of billions of nodes and edges of multiple types, and each node is associated with di...More

Code:

Data:

0
Introduction
  • Network embedding [4], or network representation learning, is a promising method to project nodes in a network to a lowdimensional continuous space while preserving network structure and inherent properties.
  • NetMF [29] gives a theoretical analysis of equivalence for the different network embedding algorithms, and later NetSMF [28] gives a scalable solution via sparsification
  • They were designed to handle only the homogeneous network with single-typed nodes and edges.
  • Among them the AMHEN has been least studied
Highlights
  • Network embedding [4], or network representation learning, is a promising method to project nodes in a network to a lowdimensional continuous space while preserving network structure and inherent properties
  • We focus on embedding learning for Attributed MHEN, where different types of nodes might be linked with multiple different types of edges, and each node is associated with a set of different attributes
  • To address the above challenges, we propose a novel approach to capture both rich attributed information and to utilize multiplex topological structures from different node types, namely General
  • We focus on the representative work metapath2vec [7], which is designed to deal with the node heterogeneity
  • On YouTube and Twitter datasets, GATNE-I performs to GATNE-T as the node attributes of these two datasets are the node embeddings of DeepWalk, which are generated by the network
  • The base embedding and attribute embedding are shared among edges of different types, while the edge embedding is computed by aggregation of neighborhood information with the self-attention mechanism
Methods
  • GATNE-T considers the network structure and uses base embeddings and edge embeddings to capture the influential factors between different edge types.
  • GATNE-The author considers both the network structure and the node attributes, and learns an inductive transformation function instead of learning base embeddings and meta embeddings for each node directly.
  • For Alibaba dataset, the authors use the same meta-path schemes as metapath2vec.
  • Due to the size of the Alibaba dataset with more than 40 million nodes and 500 million edges and the scalabilities of the other competitors, the authors only compare the GATNE model with DeepWalk, MVE, and MNE.
Results
  • Results on Alibaba dataset

    HON are 20.3% (Twitter), 21.6% (YouTube), 15.1% (Amazon) and 16.3% (Alibaba) of the linked node pairs having more than one type of edges respectively.
  • In an e-commerce system, users may have several types of interactions with items, such as click, conversion, add-to-cart, add-to-preference.
  • GATNE-T obtains better performance than GATNE-The author ons Amazon dataset as the node attributes are limited.
  • The node attributes of Alibaba dataset are abundant so that GATNE-The author obtains the best performance.
  • ANRL is very sensitive to the weak node attributes and obtains the worst result on Amazon dataset.
  • The different node attributes of users and items limit the performance of ANRL on Alibaba-S dataset.
  • On YouTube and Twitter datasets, GATNE-The author performs to GATNE-T as the node attributes of these two datasets are the node embeddings of DeepWalk, which are generated by the network
Conclusion
  • The authors formalized the attributed multiplex heterogeneous network embedding problem and proposed GATNE to solve it with both transductive and inductive settings.
  • The authors split the overall node embedding of GATNE-The author intoes three parts: base embedding, edge embedding, and attribute embedding.
  • The base embedding and attribute embedding are shared among edges of different types, while the edge embedding is computed by aggregation of neighborhood information with the self-attention mechanism.
  • The authors' proposed methods achieve significantly better performances compared to previous state-of-the-art methods on link prediction tasks across multiple challenging datasets.
  • The approach has been successfully deployed and evaluated on Alibaba’s recommendation system with excellent scalability and effectiveness
Summary
  • Introduction:

    Network embedding [4], or network representation learning, is a promising method to project nodes in a network to a lowdimensional continuous space while preserving network structure and inherent properties.
  • NetMF [29] gives a theoretical analysis of equivalence for the different network embedding algorithms, and later NetSMF [28] gives a scalable solution via sparsification
  • They were designed to handle only the homogeneous network with single-typed nodes and edges.
  • Among them the AMHEN has been least studied
  • Methods:

    GATNE-T considers the network structure and uses base embeddings and edge embeddings to capture the influential factors between different edge types.
  • GATNE-The author considers both the network structure and the node attributes, and learns an inductive transformation function instead of learning base embeddings and meta embeddings for each node directly.
  • For Alibaba dataset, the authors use the same meta-path schemes as metapath2vec.
  • Due to the size of the Alibaba dataset with more than 40 million nodes and 500 million edges and the scalabilities of the other competitors, the authors only compare the GATNE model with DeepWalk, MVE, and MNE.
  • Results:

    Results on Alibaba dataset

    HON are 20.3% (Twitter), 21.6% (YouTube), 15.1% (Amazon) and 16.3% (Alibaba) of the linked node pairs having more than one type of edges respectively.
  • In an e-commerce system, users may have several types of interactions with items, such as click, conversion, add-to-cart, add-to-preference.
  • GATNE-T obtains better performance than GATNE-The author ons Amazon dataset as the node attributes are limited.
  • The node attributes of Alibaba dataset are abundant so that GATNE-The author obtains the best performance.
  • ANRL is very sensitive to the weak node attributes and obtains the worst result on Amazon dataset.
  • The different node attributes of users and items limit the performance of ANRL on Alibaba-S dataset.
  • On YouTube and Twitter datasets, GATNE-The author performs to GATNE-T as the node attributes of these two datasets are the node embeddings of DeepWalk, which are generated by the network
  • Conclusion:

    The authors formalized the attributed multiplex heterogeneous network embedding problem and proposed GATNE to solve it with both transductive and inductive settings.
  • The authors split the overall node embedding of GATNE-The author intoes three parts: base embedding, edge embedding, and attribute embedding.
  • The base embedding and attribute embedding are shared among edges of different types, while the edge embedding is computed by aggregation of neighborhood information with the self-attention mechanism.
  • The authors' proposed methods achieve significantly better performances compared to previous state-of-the-art methods on link prediction tasks across multiple challenging datasets.
  • The approach has been successfully deployed and evaluated on Alibaba’s recommendation system with excellent scalability and effectiveness
Tables
  • Table1: The network types handled by different methods
  • Table2: Notations
  • Table3: Statistics of Datasets
  • Table4: Performance comparison of different methods on four datasets
  • Table5: The experimental results on Alibaba dataset
  • Table6: Statistics of Original Datasets
Download tables as Excel
Related work
  • In this section, we review related state-of-the-arts for network embedding, heterogeneous network embedding, multiplex heterogeneous network embedding, and attributed network embedding.

    Network Embedding. Works in network embedding mainly consist of two categories, graph embedding (GE) and graph neural network (GNN). Representative works for GE include DeepWalk [27] which generates a corpus on graphs by random walk and then trains a skip-gram model on the corpus. LINE [35] learns node presentations on large-scale networks while preserving both firstorder and second-order proximities. node2vec [10] designs a biased random walk procedure to efficiently explore diverse neighborhoods. NetMF [29] is a unified matrix factorization framework for theoretically understanding and improving DeepWalk and LINE. For popular works in GNN, GCN [19] incorporates neighbors’ feature representations into the node feature representation using convolutional operations. GraphSAGE [11] provides an inductive approach to combine structural information with node features. It learns functional representations instead of direct embeddings for each node, which helps it work inductively on unobserved nodes during training.
Funding
  • The work is supported by the NSFC for Distinguished Young Scholar (61825602), NSFC (61836013), and a research fund supported by Alibaba Group
Reference
  • Smriti Bhagat, Graham Cormode, and S Muthukrishnan. 201Node classification in social networks. In Social network data analytics. Springer, 115–148.
    Google ScholarFindings
  • Aleksandar Bojchevski and Stephan GÃijnnemann. 2018. Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking. In ICLR’18.
    Google ScholarFindings
  • Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In KDD’15. ACM, 119–128.
    Google ScholarLocate open access versionFindings
  • Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network embedding. TKDE (2018).
    Google ScholarLocate open access versionFindings
  • Jesse Davis and Mark Goadrich. 2006. The relationship between Precision-Recall and ROC curves. In ICML’06. ACM, 233–240.
    Google ScholarLocate open access versionFindings
  • Manlio De Domenico, Antonio Lima, Paul Mougel, and Mirco Musolesi. 2013. The anatomy of a scientific rumor. Scientific reports 3 (2013), 2980.
    Google ScholarLocate open access versionFindings
  • Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 201metapath2vec: Scalable representation learning for heterogeneous networks. In KDD’17. ACM, 135–144.
    Google ScholarLocate open access versionFindings
  • Santo Fortunato. 2010. Community detection in graphs. Physics reports 486, 3-5 (2010), 75–174.
    Google ScholarLocate open access versionFindings
  • Hongchang Gao and Heng Huang. 2018. Deep Attributed Network Embedding.. In IJCAI’18. 3364–3370.
    Google ScholarLocate open access versionFindings
  • Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD’16. ACM, 855–864.
    Google ScholarLocate open access versionFindings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NIPS’17. 1024–1034.
    Google ScholarLocate open access versionFindings
  • James A Hanley and Barbara J McNeil. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 1 (1982), 29–36.
    Google ScholarLocate open access versionFindings
  • Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In WWW’16. 507–517.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
    Google ScholarLocate open access versionFindings
  • Xiao Huang, Jundong Li, and Xia Hu. 2017. Accelerated attributed network embedding. In SDM’17. SIAM, 633–641.
    Google ScholarLocate open access versionFindings
  • Xiao Huang, Jundong Li, and Xia Hu. 2017. Label informed attributed network embedding. In WSDM’17. ACM, 731–739.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    Findings
  • Thomas N Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).
    Findings
  • Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR’17.
    Google ScholarLocate open access versionFindings
  • Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2018. Attributed social network embedding. TKDE 30, 12 (2018), 2257–2270.
    Google ScholarLocate open access versionFindings
  • Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. ICLR’17.
    Google ScholarFindings
  • Weiyi Liu, Pin-Yu Chen, Sailung Yeung, Toyotaro Suzumura, and Lingli Chen. 2017. Principled multilayer network embedding. In ICDMW’17. IEEE, 134–141.
    Google ScholarLocate open access versionFindings
  • Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. 2015. Image-based recommendations on styles and substitutes. In SIGIR’15. ACM, 43–52.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. ICLR’13.
    Google ScholarFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.
    Google ScholarFindings
  • Sankar K Pal and Sushmita Mitra. 1992. Multilayer Perceptron, Fuzzy Sets, Classifiaction. (1992).
    Google ScholarFindings
  • Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD’14. ACM, 701–710.
    Google ScholarLocate open access versionFindings
  • Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang, and Jie Tang. 2019. NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization. In WWW’19.
    Google ScholarFindings
  • Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018.
    Google ScholarFindings
  • Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, and Jiawei Han.
    Google ScholarFindings
  • 2017. An Attention-based Collaboration Framework for Multi-View Network Representation Learning. In CIKM’17. ACM, 1767–1776.
    Google ScholarLocate open access versionFindings
  • [31] Chuan Shi, Binbin Hu, Xin Zhao, and Philip Yu. 2018. Heterogeneous Information Network Embedding for Recommendation. TKDE (2018).
    Google ScholarLocate open access versionFindings
  • [32] Yu Shi, Fangqiu Han, Xinran He, Carl Yang, Jie Luo, and Jiawei Han. 2018. mvn2vec: Preservation and Collaboration in Multi-View Network Embedding. arXiv preprint arXiv:1801.06597 (2018).
    Findings
  • [33] Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S Yu, and Xiao Yu. 2013. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. TKDD 7, 3 (2013), 11.
    Google ScholarLocate open access versionFindings
  • [34] Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. Pte: Predictive text embedding through large-scale heterogeneous text networks. In KDD’15. ACM, 1165–1174.
    Google ScholarLocate open access versionFindings
  • [35] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW’15. 1067–1077.
    Google ScholarLocate open access versionFindings
  • [36] Lei Tang and Huan Liu. 2009. Uncovering cross-dimension group structures in multi-dimensional networks. In SDM workshop on Analysis of Dynamic Networks. ACM, 568–575.
    Google ScholarLocate open access versionFindings
  • [37] Lei Tang, Suju Rajan, and Vijay K Narayanan. 2009. Large scale multi-label classification via metalabeler. In WWW’09. ACM, 211–220.
    Google ScholarLocate open access versionFindings
  • [38] Lei Tang, Xufei Wang, and Huan Liu. 2009. Uncoverning groups via heterogeneous interaction analysis. In ICDM’09. IEEE, 503–512.
    Google ScholarLocate open access versionFindings
  • [39] Ben Taskar, Ming-Fai Wong, Pieter Abbeel, and Daphne Koller. 2004. Link prediction in relational data. In NIPS’04. 659–666.
    Google ScholarLocate open access versionFindings
  • [40] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. KDD’18, 839–848.
    Google ScholarLocate open access versionFindings
  • [41] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y Chang. 2015. Network representation learning with rich text information.. In IJCAI’15. 2111– 2117.
    Google ScholarFindings
  • [42] Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. 2016. Revisiting semi-supervised learning with graph embeddings. In ICML’16. 40–48.
    Google ScholarLocate open access versionFindings
  • [43] Hongming Zhang, Liwei Qiu, Lingling Yi, and Yangqiu Song. 2018. Scalable Multiplex Network Embedding. In IJCAI’18. 3082–3088.
    Google ScholarLocate open access versionFindings
  • [44] Zhen Zhang, Hongxia Yang, Jiajun Bu, Sheng Zhou, Pinggang Yu, Jianwei Zhang, Martin Ester, and Can Wang. 2018. ANRL: Attributed Network Representation Learning via Deep Neural Networks.. In IJCAI’18. 3155–3161. https://www.tensorflow.org/6 https://github.com/phanein/deepwalk 7 https://ericdongyx.github.io/metapath2vec/m2v.html 8 https://www.tensorflow.org/tutorials/representation/word2vec https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
    Locate open access versionFindings
  • 10 https://github.com/tangjianpku/LINE 11 https://github.com/aditya-grover/node2vec https://ericdongyx.github.io/metapath2vec/m2v.html
    Findings
Your rating :
0

 

Tags
Comments