## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec.

WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining Marina De..., (2018): 459-467

EI

Full Text

Weibo

Keywords

Abstract

Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with ...More

Code:

Data:

Introduction

- The conventional paradigm of mining and learning with networks usually starts from the explicit exploration of their structural properties [13, 32].
- Many of such properties, such as betweenness.
- Matrix log vol(G ) 1 T T r =1 (D −1 A)r D −1.
- − log b log vol(G)D −1AD −1 − log b α vol(Gww)(Drwoww)−1Aww(Dcwowl )−1

Highlights

- The conventional paradigm of mining and learning with networks usually starts from the explicit exploration of their structural properties [13, 32]
- We focus on skipgram with negative sampling (SGNS)
- Due to the page limitation, in the rest of this paper, we mainly focus on the matrix factorization framework depending on the 1st-order random walk (DeepWalk)
- We provide a theoretical analysis of four impactful network embedding methods—DeepWalk, LINE, PTE, and node2vec— that were recently proposed between the years 2014 and 2016
- We show that all of the four methods are essentially performing implicit matrix factorizations and the closed forms of their matrices offer the relationships between those methods and their intrinsic connections with graph Laplacian
- It is exciting to study the nature of skip-gram based dynamic and heterogeneous network embedding

Methods

**Baseline Methods The authors compare the methods**.- NetMF (T = 1) and NetMF (T = 10) with LINE (2nd) [37] and DeepWalk [31], which the authors have introduced in previous sections.
- For NetMF (T = 10), the authors choose h = 16384 for Flickr, and h = 256 for BlogCatelog, PPI, and.
- 3 http://mattmahoney.net/dc/text.html Micro-F1 (%) BlogCatalog PPI NetMF (T=1) LINE Wikipedia NetMF (T=10) DeepWalk Flickr

Results

- The authors list the main results of node2vec without proofs. The idea is similar to the analysis of DeepWalk. #(w , c, u )→−r

Xw,u (Pr )c,w,u and #(w , c, u )←r−

Xc,u (Pr )w,c,u .

#(w,c )→−r u #(w,c,u)→−r u Xw,u (Pr )c,w,u . #(w,c )←r−

u Xc,u (Pr )w,c,u . #(w , c ) |D|

u Xw,u (Pr )c,w,u + u Xc,u (Pr )w,c,u. - (2) In Wikipedia, NetMF (T = 1) shows better performance than other methods in terms of Micro-F1, while LINE outperforms other methods regarding Macro-F1
- This observation implies that shortterm dependence is enough to model Wikipedia’s network structure.
- Take the PPI dataset with 10% training data as an example, NetMF (T = 1) achieves relatively 46.34% and 33.85% gains over LINE (2nd) regarding Micro-F1 and Macro-F1 scores, respectively; More impressively, NetMF (T = 10) outperforms DeepWalk by 50.71% and 39.16% relatively as measured by two metrics

Conclusion

- The authors provide a theoretical analysis of four impactful network embedding methods—DeepWalk, LINE, PTE, and node2vec— that were recently proposed between the years 2014 and 2016.
- The authors show that all of the four methods are essentially performing implicit matrix factorizations and the closed forms of their matrices offer the relationships between those methods and their intrinsic connections with graph Laplacian.
- The authors' extensive experiments suggest that NetMF’s direct factorization achieves consistent performance improvements over the implicit approximation models—DeepWalk and LINE.
- It would be necessary to investigate whether and how the development in random-walk polynomials [9] can support fast approximations of the closed-form matrices.
- It is exciting to study the nature of skip-gram based dynamic and heterogeneous network embedding

- Table1: The matrices that are implicitly approximated and factorized by DeepWalk, LINE, PTE, and node2vec
- Table2: Statistics of Datasets
- Table3: Micro/Macro-F1 Score(%) for Multilabel Classification on BlogCatalog, PPI, Wikipedia, and Flickr datasets. In Flickr, 1% of vertices are labeled for training [<a class="ref-link" id="c31" href="#r31">31</a>], and in the other three datasets, 10% of vertices are labeled for training

Related work

- The story of network embedding stems from Spectral Clustering [5, 45], a data clustering technique which selects eigenvalues/eigenvectors of a data affinity matrix to obtain representations that can be clustered or embedded in a low-dimensional space. Spectral Clustering has been widely used in fields such as community detection [23] and image segmentation [33]. In recent years, there is an increasing interest in network embedding. Following a few pioneer works such as SocDim [38] and DeepWalk [31], a growing number of literature has tried to address the problem from various of perspectives, such as heterogeneous network embedding [8, 12, 20, 36], semi-supervised network embedding [17, 21, 44, 48], network embedding with rich vertex attributes [43, 47, 49], network embedding Algorithm

LINE (2nd) NetMF (T = 1) Relative Gain of NetMF (T = 1) DeepWalk NetMF (T = 10) Relative Gain of NetMF (T = 10)

BlogCatalog (10%)

Micro-F1 Macro-F1 PPI (10%) Wikipeida (10%) Flickr (1%)

with high order structure [6, 16], signed network embedding [10], direct network embedding [30], network embedding via deep neural network [7, 25, 46], etc.

Among the above research, a commonly used technique is to define the “context” for each vertex, and then to train a predictive model to perform context prediction. For example, DeepWalk [31], node2vec [16], and metapath2vec [12] define vertices’ context by the 1st-, 2nd-order, and meta-path based random walks, respectively; The idea of leveraging the context information are largely motivated by the skip-gram model with negative sampling (SGNS) [29]. Recently, there has been effort in understanding this model. For example, Levy and Goldberg [24] prove that SGNS is actually conducting an implicit matrix factorization, which provides us with a tool to analyze the above network embedding models; Arora et al [1] propose a generative model RAND-WALK to explain word embedding models; and Hashimoto et al [18] frame word embedding as a metric learning problem. Built upon the work in [24], we theoretically analyze popular skip-gram based network embedding models and connect them with spectral graph theory.

Funding

- Jiezhong Qiu and Jie Tang are supported by NSFC 61561130160
- Jian Li is supported in part by the National Basic Research Program of China Grant 2015CB358700, and NSFC 61772297 & 61632016

Reference

- Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. 2016. A latent variable model approach to pmi-based word embeddings. TACL 4 (2016), 385–399.
- Yoshua Bengio, Aaron Courville, and Pierre Vincent. 2013. Representation learning: A review and new perspectives. IEEE TPAMI 35, 8 (2013), 1798–1828.
- Austin R Benson, David F Gleich, and Jure Leskovec. 2015. Tensor spectral clustering for partitioning higher-order network structures. In SDM. SIAM, 118– 126.
- Austin R Benson, David F Gleich, and Lek-Heng Lim. 2017. The Spacey Random Walk: A stochastic Process for Higher-Order Data. SIAM Rev. 59, 2 (2017), 321– 345.
- Matthew Brand and Kun Huang. 2003. A unifying theorem for spectral embedding and clustering.. In AISTATS.
- Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learning graph representations with global structural information. In CIKM. ACM, 891–900.
- Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep Neural Networks for Learning Graph Representations.. In AAAI. 1145–1152.
- Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In KDD. ACM, 119–128.
- Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015.
- Kewei Cheng, Jundong Li, and Huan Liu. 2017. Unsupervised Feature Selection in Signed Social Networks. In KDD. ACM.
- Fan RK Chung. 1997. Spectral graph theory. Number 92. American Mathematical Soc.
- Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In KDD.
- David Easley and Jon Kleinberg. 2010.
- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR 9, Aug (2008), 1871–1874.
- David F Gleich, Lek-Heng Lim, and Yongyang Yu. 20Multilinear pagerank. SIAM J. Matrix Anal. Appl. 36, 4 (2015), 1507–1541.
- Aditya Grover and Jure Leskovec. 20node2vec: Scalable feature learning for networks. In KDD. ACM, 855–864.
- William L. Hamilton, Rex Ying, and Jure Leskovec. 20Inductive Representation Learning on Large Graphs. In NIPS.
- Tatsunori B Hashimoto, David Alvarez-Melis, and Tommi S Jaakkola. 2016. Word embeddings as metric recovery in semantic spaces. TACL 4 (2016), 273–286.
- Roger A. Horn and Charles R. Johnson. 1991. Topics in Matrix Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511840371
- Yann Jacob, Ludovic Denoyer, and Patrick Gallinari. 2014. Learning latent representations of nodes for classifying in heterogeneous social networks. In WSDM. ACM, 373–382.
- Thomas N Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. arXiv preprint arXiv:1609.02907 (2016).
- Richard B Lehoucq, Danny C Sorensen, and Chao Yang. 1998. ARPACK users’ guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods. SIAM.
- Jure Leskovec, Kevin J Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In WWW. ACM, 631–640.
- Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In NIPS. 2177–2185.
- Hang Li, Haozheng Wang, Zhenglu Yang, and Masato Odagaki. 2017. Variation Autoencoder Based Network Representation Learning for Classification. In ACL. 56.
- László Lovász. 1993. Random walks on graphs. Combinatorics, Paul erdos is eighty 2 (1993), 1–46.
- Qing Lu and Lise Getoor. 2003. Link-based Classification. In ICML.
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.
- Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric Transitivity Preserving Graph Embedding.. In KDD. 1105–1114.
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In KDD. ACM, 701–710.
- S Yu Philip, Jiawei Han, and Christos Faloutsos. 2010. Link mining: Models, algorithms, and applications. Springer.
- Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE PAMI 22, 8 (2000), 888–905.
- A.N. Shiryaev and A. Lyasoff. 2012. Problems in Probability. Springer New York.
- Chris Stark, Bobby-Joe Breitkreutz, Andrew Chatr-Aryamontri, Lorrie Boucher, Rose Oughtred, Michael S Livstone, Julie Nixon, Kimberly Van Auken, Xiaodong Wang, Xiaoqi Shi, et al. 2010. The BioGRID interaction database: 2011 update. Nucleic acids research 39, suppl_1 (2010), D698–D704.
- Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. PTE: Predictive text embedding through large-scale heterogeneous text networks. In KDD. ACM, 1165–1174.
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In WWW. 1067–1077.
- Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In KDD. ACM, 817–826.
- Lei Tang, Suju Rajan, and Vijay K Narayanan. 2009. Large scale multi-label classification via metalabeler. In WWW. ACM, 211–220.
- Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In NAACL. Association for Computational Linguistics, 173–180.
- Lloyd N Trefethen and David Bau III. 1997. Numerical linear algebra. Vol. 50. Siam.
- Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2009. Mining multi-label data. In Data mining and knowledge discovery handbook. Springer, 667–685.
- Cunchao Tu, Han Liu, Zhiyuan Liu, and Maosong Sun. 2017. CANE: Contextaware network embedding for relation modeling. In ACL.
- Cunchao Tu, Weicheng Zhang, Zhiyuan Liu, and Maosong Sun. 2016. Max-Margin DeepWalk: Discriminative Learning of Network Representation.. In IJCAI. 3889– 3895.
- Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416.
- Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In KDD. ACM, 1225–1234.
- Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y Chang. 2015. Network Representation Learning with Rich Text Information. In IJCAI. 2111– 2117.
- Zhilin Yang, William W. Cohen, and Ruslan Salakhutdinov. 2016. Revisiting Semi-Supervised Learning with Graph Embeddings. In ICML. 40–48.
- Zhilin Yang, Jie Tang, and William W Cohen. 2016. Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs.. In IJCAI. 2287–2293.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn