AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We show that all of the four methods are essentially performing implicit matrix factorizations and the closed forms of their matrices offer the relationships between those methods and their intrinsic connections with graph Laplacian

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec.

WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining Marina De..., (2018): 459-467

Cited by: 512|Views847
EI

Abstract

Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with ...More

Code:

Data:

0
Introduction
  • The conventional paradigm of mining and learning with networks usually starts from the explicit exploration of their structural properties [13, 32].
  • Many of such properties, such as betweenness.
  • Matrix log vol(G ) 1 T T r =1 (D −1 A)r D −1.
  • − log b log vol(G)D −1AD −1 − log b α vol(Gww)(Drwoww)−1Aww(Dcwowl )−1
Highlights
  • The conventional paradigm of mining and learning with networks usually starts from the explicit exploration of their structural properties [13, 32]
  • We focus on skipgram with negative sampling (SGNS)
  • Due to the page limitation, in the rest of this paper, we mainly focus on the matrix factorization framework depending on the 1st-order random walk (DeepWalk)
  • We provide a theoretical analysis of four impactful network embedding methods—DeepWalk, LINE, PTE, and node2vec— that were recently proposed between the years 2014 and 2016
  • We show that all of the four methods are essentially performing implicit matrix factorizations and the closed forms of their matrices offer the relationships between those methods and their intrinsic connections with graph Laplacian
  • It is exciting to study the nature of skip-gram based dynamic and heterogeneous network embedding
Methods
  • Baseline Methods The authors compare the methods.
  • NetMF (T = 1) and NetMF (T = 10) with LINE (2nd) [37] and DeepWalk [31], which the authors have introduced in previous sections.
  • For NetMF (T = 10), the authors choose h = 16384 for Flickr, and h = 256 for BlogCatelog, PPI, and.
  • 3 http://mattmahoney.net/dc/text.html Micro-F1 (%) BlogCatalog PPI NetMF (T=1) LINE Wikipedia NetMF (T=10) DeepWalk Flickr
Results
  • The authors list the main results of node2vec without proofs. The idea is similar to the analysis of DeepWalk. #(w , c, u )→−r

    Xw,u (Pr )c,w,u and #(w , c, u )←r−

    Xc,u (Pr )w,c,u .

    #(w,c )→−r u #(w,c,u)→−r u Xw,u (Pr )c,w,u . #(w,c )←r−

    u Xc,u (Pr )w,c,u . #(w , c ) |D|

    u Xw,u (Pr )c,w,u + u Xc,u (Pr )w,c,u.
  • (2) In Wikipedia, NetMF (T = 1) shows better performance than other methods in terms of Micro-F1, while LINE outperforms other methods regarding Macro-F1
  • This observation implies that shortterm dependence is enough to model Wikipedia’s network structure.
  • Take the PPI dataset with 10% training data as an example, NetMF (T = 1) achieves relatively 46.34% and 33.85% gains over LINE (2nd) regarding Micro-F1 and Macro-F1 scores, respectively; More impressively, NetMF (T = 10) outperforms DeepWalk by 50.71% and 39.16% relatively as measured by two metrics
Conclusion
  • The authors provide a theoretical analysis of four impactful network embedding methods—DeepWalk, LINE, PTE, and node2vec— that were recently proposed between the years 2014 and 2016.
  • The authors show that all of the four methods are essentially performing implicit matrix factorizations and the closed forms of their matrices offer the relationships between those methods and their intrinsic connections with graph Laplacian.
  • The authors' extensive experiments suggest that NetMF’s direct factorization achieves consistent performance improvements over the implicit approximation models—DeepWalk and LINE.
  • It would be necessary to investigate whether and how the development in random-walk polynomials [9] can support fast approximations of the closed-form matrices.
  • It is exciting to study the nature of skip-gram based dynamic and heterogeneous network embedding
Tables
  • Table1: The matrices that are implicitly approximated and factorized by DeepWalk, LINE, PTE, and node2vec
  • Table2: Statistics of Datasets
  • Table3: Micro/Macro-F1 Score(%) for Multilabel Classification on BlogCatalog, PPI, Wikipedia, and Flickr datasets. In Flickr, 1% of vertices are labeled for training [<a class="ref-link" id="c31" href="#r31">31</a>], and in the other three datasets, 10% of vertices are labeled for training
Download tables as Excel
Related work
  • The story of network embedding stems from Spectral Clustering [5, 45], a data clustering technique which selects eigenvalues/eigenvectors of a data affinity matrix to obtain representations that can be clustered or embedded in a low-dimensional space. Spectral Clustering has been widely used in fields such as community detection [23] and image segmentation [33]. In recent years, there is an increasing interest in network embedding. Following a few pioneer works such as SocDim [38] and DeepWalk [31], a growing number of literature has tried to address the problem from various of perspectives, such as heterogeneous network embedding [8, 12, 20, 36], semi-supervised network embedding [17, 21, 44, 48], network embedding with rich vertex attributes [43, 47, 49], network embedding Algorithm

    LINE (2nd) NetMF (T = 1) Relative Gain of NetMF (T = 1) DeepWalk NetMF (T = 10) Relative Gain of NetMF (T = 10)

    BlogCatalog (10%)

    Micro-F1 Macro-F1 PPI (10%) Wikipeida (10%) Flickr (1%)

    with high order structure [6, 16], signed network embedding [10], direct network embedding [30], network embedding via deep neural network [7, 25, 46], etc.

    Among the above research, a commonly used technique is to define the “context” for each vertex, and then to train a predictive model to perform context prediction. For example, DeepWalk [31], node2vec [16], and metapath2vec [12] define vertices’ context by the 1st-, 2nd-order, and meta-path based random walks, respectively; The idea of leveraging the context information are largely motivated by the skip-gram model with negative sampling (SGNS) [29]. Recently, there has been effort in understanding this model. For example, Levy and Goldberg [24] prove that SGNS is actually conducting an implicit matrix factorization, which provides us with a tool to analyze the above network embedding models; Arora et al [1] propose a generative model RAND-WALK to explain word embedding models; and Hashimoto et al [18] frame word embedding as a metric learning problem. Built upon the work in [24], we theoretically analyze popular skip-gram based network embedding models and connect them with spectral graph theory.
Funding
  • Jiezhong Qiu and Jie Tang are supported by NSFC 61561130160
  • Jian Li is supported in part by the National Basic Research Program of China Grant 2015CB358700, and NSFC 61772297 & 61632016
Reference
  • Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. 2016. A latent variable model approach to pmi-based word embeddings. TACL 4 (2016), 385–399.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Aaron Courville, and Pierre Vincent. 2013. Representation learning: A review and new perspectives. IEEE TPAMI 35, 8 (2013), 1798–1828.
    Google ScholarLocate open access versionFindings
  • Austin R Benson, David F Gleich, and Jure Leskovec. 2015. Tensor spectral clustering for partitioning higher-order network structures. In SDM. SIAM, 118– 126.
    Google ScholarLocate open access versionFindings
  • Austin R Benson, David F Gleich, and Lek-Heng Lim. 2017. The Spacey Random Walk: A stochastic Process for Higher-Order Data. SIAM Rev. 59, 2 (2017), 321– 345.
    Google ScholarLocate open access versionFindings
  • Matthew Brand and Kun Huang. 2003. A unifying theorem for spectral embedding and clustering.. In AISTATS.
    Google ScholarFindings
  • Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learning graph representations with global structural information. In CIKM. ACM, 891–900.
    Google ScholarLocate open access versionFindings
  • Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep Neural Networks for Learning Graph Representations.. In AAAI. 1145–1152.
    Google ScholarFindings
  • Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In KDD. ACM, 119–128.
    Google ScholarLocate open access versionFindings
  • Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015.
    Google ScholarFindings
  • Kewei Cheng, Jundong Li, and Huan Liu. 2017. Unsupervised Feature Selection in Signed Social Networks. In KDD. ACM.
    Google ScholarLocate open access versionFindings
  • Fan RK Chung. 1997. Spectral graph theory. Number 92. American Mathematical Soc.
    Google ScholarFindings
  • Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In KDD.
    Google ScholarFindings
  • David Easley and Jon Kleinberg. 2010.
    Google ScholarFindings
  • Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR 9, Aug (2008), 1871–1874.
    Google ScholarLocate open access versionFindings
  • David F Gleich, Lek-Heng Lim, and Yongyang Yu. 20Multilinear pagerank. SIAM J. Matrix Anal. Appl. 36, 4 (2015), 1507–1541.
    Google ScholarLocate open access versionFindings
  • Aditya Grover and Jure Leskovec. 20node2vec: Scalable feature learning for networks. In KDD. ACM, 855–864.
    Google ScholarLocate open access versionFindings
  • William L. Hamilton, Rex Ying, and Jure Leskovec. 20Inductive Representation Learning on Large Graphs. In NIPS.
    Google ScholarFindings
  • Tatsunori B Hashimoto, David Alvarez-Melis, and Tommi S Jaakkola. 2016. Word embeddings as metric recovery in semantic spaces. TACL 4 (2016), 273–286.
    Google ScholarLocate open access versionFindings
  • Roger A. Horn and Charles R. Johnson. 1991. Topics in Matrix Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511840371
    Findings
  • Yann Jacob, Ludovic Denoyer, and Patrick Gallinari. 2014. Learning latent representations of nodes for classifying in heterogeneous social networks. In WSDM. ACM, 373–382.
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. arXiv preprint arXiv:1609.02907 (2016).
    Findings
  • Richard B Lehoucq, Danny C Sorensen, and Chao Yang. 1998. ARPACK users’ guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods. SIAM.
    Google ScholarFindings
  • Jure Leskovec, Kevin J Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In WWW. ACM, 631–640.
    Google ScholarLocate open access versionFindings
  • Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In NIPS. 2177–2185.
    Google ScholarLocate open access versionFindings
  • Hang Li, Haozheng Wang, Zhenglu Yang, and Masato Odagaki. 2017. Variation Autoencoder Based Network Representation Learning for Classification. In ACL. 56.
    Google ScholarLocate open access versionFindings
  • László Lovász. 1993. Random walks on graphs. Combinatorics, Paul erdos is eighty 2 (1993), 1–46.
    Google ScholarLocate open access versionFindings
  • Qing Lu and Lise Getoor. 2003. Link-based Classification. In ICML.
    Google ScholarFindings
  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
    Findings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.
    Google ScholarFindings
  • Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric Transitivity Preserving Graph Embedding.. In KDD. 1105–1114.
    Google ScholarFindings
  • Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In KDD. ACM, 701–710.
    Google ScholarLocate open access versionFindings
  • S Yu Philip, Jiawei Han, and Christos Faloutsos. 2010. Link mining: Models, algorithms, and applications. Springer.
    Google ScholarFindings
  • Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE PAMI 22, 8 (2000), 888–905.
    Google ScholarLocate open access versionFindings
  • A.N. Shiryaev and A. Lyasoff. 2012. Problems in Probability. Springer New York.
    Google ScholarFindings
  • Chris Stark, Bobby-Joe Breitkreutz, Andrew Chatr-Aryamontri, Lorrie Boucher, Rose Oughtred, Michael S Livstone, Julie Nixon, Kimberly Van Auken, Xiaodong Wang, Xiaoqi Shi, et al. 2010. The BioGRID interaction database: 2011 update. Nucleic acids research 39, suppl_1 (2010), D698–D704.
    Google ScholarLocate open access versionFindings
  • Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. PTE: Predictive text embedding through large-scale heterogeneous text networks. In KDD. ACM, 1165–1174.
    Google ScholarLocate open access versionFindings
  • Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In WWW. 1067–1077.
    Google ScholarFindings
  • Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In KDD. ACM, 817–826.
    Google ScholarLocate open access versionFindings
  • Lei Tang, Suju Rajan, and Vijay K Narayanan. 2009. Large scale multi-label classification via metalabeler. In WWW. ACM, 211–220.
    Google ScholarLocate open access versionFindings
  • Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In NAACL. Association for Computational Linguistics, 173–180.
    Google ScholarLocate open access versionFindings
  • Lloyd N Trefethen and David Bau III. 1997. Numerical linear algebra. Vol. 50. Siam.
    Google ScholarLocate open access versionFindings
  • Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2009. Mining multi-label data. In Data mining and knowledge discovery handbook. Springer, 667–685.
    Google ScholarFindings
  • Cunchao Tu, Han Liu, Zhiyuan Liu, and Maosong Sun. 2017. CANE: Contextaware network embedding for relation modeling. In ACL.
    Google ScholarFindings
  • Cunchao Tu, Weicheng Zhang, Zhiyuan Liu, and Maosong Sun. 2016. Max-Margin DeepWalk: Discriminative Learning of Network Representation.. In IJCAI. 3889– 3895.
    Google ScholarLocate open access versionFindings
  • Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416.
    Google ScholarLocate open access versionFindings
  • Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In KDD. ACM, 1225–1234.
    Google ScholarLocate open access versionFindings
  • Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y Chang. 2015. Network Representation Learning with Rich Text Information. In IJCAI. 2111– 2117.
    Google ScholarFindings
  • Zhilin Yang, William W. Cohen, and Ruslan Salakhutdinov. 2016. Revisiting Semi-Supervised Learning with Graph Embeddings. In ICML. 40–48.
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, Jie Tang, and William W Cohen. 2016. Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs.. In IJCAI. 2287–2293.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科