# NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

WWW '19: The Web Conference on The World Wide Web Conference WWW 2019, pp. 1509-1520, 2019.

EI

Keywords:

large scale networkacademic collaborationlatent representationProtein-Protein Interactionsnetwork embedding as sparse matrix factorizationMore(8+)

Weibo:

Abstract:

We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2) the explicit factorization of such matrix gener...More

Code:

Data:

Introduction

- Recent years have witnessed the emergence of network embedding, which offers a revolutionary paradigm for modeling graphs and networks [16].
- The goal of network embedding is to automatically learn latent representations for objects in networks, such as vertices and edges.
- DeepWalk and node2vec, on the other hand, leverage random walks on graphs and skip-gram [24] with large context sizes to model nodes further away.
- It is computationally more expensive for DeepWalk and node2vec to handle large-scale networks.
- The node2vec model, which performs high-order random walks, takes more time than DeepWalk to learn embeddings

Highlights

- Recent years have witnessed the emergence of network embedding, which offers a revolutionary paradigm for modeling graphs and networks [16]
- We present the solution to network embedding learning as sparse matrix factorization (NetSMF)
- From Table 5, we observe that for YouTube and Open Academic Graph, both of which contain more than one million vertices, NetMF fails to complete because of the excessive space and memory consumption, while network embedding as sparse matrix factorization is able to finish in four hours and one day, respectively
- We study network embedding with the goal of achieving both efficiency and effectiveness
- To address the scalability challenges faced by the NetMF model, we propose to study large-scale network embedding as sparse matrix factorization
- Network embedding as sparse matrix factorization is capable of learning embeddings that maintain the same representation power as the dense matrix factorization solution, making it consistently outperform DeepWalk and node2vec by up to 34% and LINE by up to 100% for the multi-label vertex classification task in networks
- Both the construction and factorization of the sparsified matrix are fast enough to support very large-scale network embedding learning. It empowers network embedding as sparse matrix factorization to efficiently embed the Open Academic Graph in 24 hours, whose size is computationally intractable for the dense matrix factorization solution (NetMF)

Methods

- The authors compare NetSMF with NetMF [28], LINE [33], DeepWalk [27], and node2vec [14].
- For NetSMF, NetMF, DeepWalk, and node2vec that allow multi-hop structural dependencies, the context window size T is set to be 10, which is the default setting used in both DeepWalk and node2vec.
- The authors use the default setting of LINE’s hyper-parameters: the number of edge samples to be 10 billion and the negative sample size to be 5

Results

- The authors summarize the prediction performance in Figure 2.
- For small-scale networks, NetMF is faster than NetSMF in BlogCatalog and is comparable to NetSMF in PPI in terms of running time.
- This is because when the input networks contain only thousands of vertices, the advantage of sparse matrix construction and factorization over its dense alternative could be marginalized by other components of the workflow

Conclusion

**Discussion on the Approximation Error**

The above bound is achieved without making assumptions about the input network.- The authors organize the sparsifier into row-major format
- This format allows efficient multiplication between a sparse and a dense matrix (Alg. 3, Line 3 and 5).
- The authors use OpenMP [10] to parallelize NetSMF in our implementation5.In this work, the authors study network embedding with the goal of achieving both efficiency and effectiveness.
- The authors present the NetSMF algorithm, which achieves a sparsification of the NetMF matrix
- Both the construction and factorization of the sparsified matrix are fast enough to support very large-scale network embedding learning.
- Among both matrix factorization based methods (NetMF and NetSMF) and common skip-gram based benchmarks (DeepWalk, LINE, and node2vec), NetSMF is the only model that achieves both efficiency and performance superiority

Summary

## Introduction:

Recent years have witnessed the emergence of network embedding, which offers a revolutionary paradigm for modeling graphs and networks [16].- The goal of network embedding is to automatically learn latent representations for objects in networks, such as vertices and edges.
- DeepWalk and node2vec, on the other hand, leverage random walks on graphs and skip-gram [24] with large context sizes to model nodes further away.
- It is computationally more expensive for DeepWalk and node2vec to handle large-scale networks.
- The node2vec model, which performs high-order random walks, takes more time than DeepWalk to learn embeddings
## Objectives:

The authors aim to address the efficiency and scalability limitation of NetMF, while maintaining its superiority in effectiveness## Methods:

The authors compare NetSMF with NetMF [28], LINE [33], DeepWalk [27], and node2vec [14].- For NetSMF, NetMF, DeepWalk, and node2vec that allow multi-hop structural dependencies, the context window size T is set to be 10, which is the default setting used in both DeepWalk and node2vec.
- The authors use the default setting of LINE’s hyper-parameters: the number of edge samples to be 10 billion and the negative sample size to be 5
## Results:

The authors summarize the prediction performance in Figure 2.- For small-scale networks, NetMF is faster than NetSMF in BlogCatalog and is comparable to NetSMF in PPI in terms of running time.
- This is because when the input networks contain only thousands of vertices, the advantage of sparse matrix construction and factorization over its dense alternative could be marginalized by other components of the workflow
## Conclusion:

**Discussion on the Approximation Error**

The above bound is achieved without making assumptions about the input network.- The authors organize the sparsifier into row-major format
- This format allows efficient multiplication between a sparse and a dense matrix (Alg. 3, Line 3 and 5).
- The authors use OpenMP [10] to parallelize NetSMF in our implementation5.In this work, the authors study network embedding with the goal of achieving both efficiency and effectiveness.
- The authors present the NetSMF algorithm, which achieves a sparsification of the NetMF matrix
- Both the construction and factorization of the sparsified matrix are fast enough to support very large-scale network embedding learning.
- Among both matrix factorization based methods (NetMF and NetSMF) and common skip-gram based benchmarks (DeepWalk, LINE, and node2vec), NetSMF is the only model that achieves both efficiency and performance superiority

- Table1: The comparison between NetSMF and other popular network embedding algorithms
- Table2: Notations
- Table3: Time and Space Complexity of NetSMF
- Table4: Statistics of Datasets
- Table5: Efficiency comparison. The running time includes filesystem IO and computation time. “–” indicates that the corresponding algorithm fails to complete within one week. “×” indicates that the corresponding algorithm is unable to handle the computation due to excessive space and memory consumption

Related work

- In this section, we review the related work of network embedding, large-scale embedding algorithms, and spectral graph sparsification.

5.1 Network Embedding

Network embedding has been extensively studied over the past years [16]. The success of network embedding has driven a lot of downstream network applications, such as recommendation systems [44]. Briefly, recent work about network embedding can be categorized into three genres: (1) Skip-gram based methods that are inspired by word2vec [24], such as LINE [33], DeepWalk [27], node2vec [14], metapath2vec [12], and VERSE [40]; (2) Deep learning based methods such as [21, 44]; (3) Matrix factorization based methods such as GraRep [4] and NetMF [28]. Among them, NetMF bridges the first and the third categories by unifying a collection of skip-gram based network embedding methods into a matrix factorization framework. In this work, we leverage the merit of NetMF and address its limitation in efficiency. Among literature, PinSage is notably a network embedding framework for billionscale networks [44]. The difference between NetSMF and PinSage

Funding

- Jian Li is supported in part by the National Basic Research Program of China Grant 2015CB358700, the National Natural Science Foundation of China Grant 61822203, 61772297, 61632016, 61761146003, and a grant from Microsoft Research Asia

Reference

- Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang.
- 2009. A Social Identity Approach to Identify Familiar Strangers in a Social Network.. In ICWSM ’09.
- [2] Lars Backstrom, Paolo Boldi, Marco Rosa, Johan Ugander, and Sebastiano Vigna. 2012. Four degrees of separation. In WebSci ’12. ACM, 33–42.
- [3] Daniele Calandriello, Ioannis Koutis, Alessandro Lazaric, and Michal Valko. 2018.
- [4] Shaosheng Cao, Wei Lu, and Qiongkai Xu. 201GraRep: Learning graph representations with global structural information. In CIKM ’15. ACM, 891–900.
- [5] Raymond B Cattell. 196The scree test for the number of factors. Multivariate behavioral research 1, 2 (1966), 245–276.
- [6] Kamalika Chaudhuri, Fan Chung, and Alexander Tsiatas. 2012. Spectral clustering of graphs with general degrees in the extended planted partition model. In COLT ’12. 35–1.
- [7] Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015.
- [8] Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015. Spectral sparsification of random-walk matrix polynomials. arXiv preprint arXiv:1502.03496 (2015).
- [9] Michael B Cohen, Jonathan Kelner, John Peebles, Richard Peng, Aaron Sidford, and Adrian Vladu. 2016. Faster algorithms for computing the stationary distribution, simulating random walks, and more. In FOCS ’16. IEEE, 583–592.
- [10] Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5, 1 (1998), 46–55.
- [11] Anirban Dasgupta, John E Hopcroft, and Frank McSherry. 2004. Spectral analysis of random graphs with skewed degree distributions. In FOCS ’04. 602–610.
- [12] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In KDD ’17.
- [13] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR ’08 9, Aug (2008), 1871–1874.
- [14] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD ’16. ACM, 855–864.
- [15] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217–288.
- [16] William L. Hamilton, Rex Ying, and Jure Leskovec. 20Representation Learning on Graphs: Methods and Applications. IEEE Data(base) Engineering Bulletin 40 (2017), 52–74.
- [17] Nicholas J Higham and Lijing Lin. 2011. On p th roots of stochastic matrices. Linear Algebra Appl. 435, 3 (2011), 448–463.
- [18] Roger A. Horn and Charles R. Johnson. 1991. Topics in Matrix Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511840371
- [19] Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep Dubey. 2016. Parallelizing word2vec in shared and distributed memory. arXiv preprint arXiv:1604.04661 (2016).
- [20] Michael Kapralov, Yin Tat Lee, CN Musco, CP Musco, and Aaron Sidford. 2017. Single pass spectral sparsification in dynamic streams. SIAM J. Comput. 46, 1 (2017), 456–477.
- [21] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR ’17.
- [22] Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In NIPS ’14. 2177–2185.
- [23] Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server.. In OSDI ’14, Vol. 14.
- [24] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR Workshop ’13.
- [25] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.
- [26] Erik Ordentlich, Lee Yang, Andy Feng, Peter Cnudde, Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, and Gavin Owens. 2016. Network-efficient distributed word2vec training system for large vocabularies. In CIKM ’16. ACM, 1139–1148.
- [27] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD ’14. ACM, 701–710.
- [28] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018.
- [29] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In WWW ’15. ACM, 243–246.
- [30] Daniel A Spielman and Nikhil Srivastava. 2011. Graph sparsification by effective resistances. SIAM J. Comput. 40, 6 (2011), 1913–1926.
- [31] Chris Stark, Bobby-Joe Breitkreutz, Andrew Chatr-Aryamontri, Lorrie Boucher, Rose Oughtred, Michael S Livstone, Julie Nixon, Kimberly Van Auken, Xiaodong Wang, Xiaoqi Shi, et al. 2010. The BioGRID interaction database: 2011 update. Nucleic acids research 39, suppl_1 (2010), D698–D704.
- [32] Stergios Stergiou, Zygimantas Straznickas, Rolina Wu, and Kostas Tsioutsiouliklis. 2017. Distributed Negative Sampling for Word Embeddings.. In AAAI ’17. 2569– 2575.
- [33] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW ’15. 1067–1077.
- [34] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In KDD ’08. 990–998.
- [35] Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In KDD ’09. ACM, 817–826.
- [36] Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In CIKM ’09. ACM, 1107–1116.
- [37] Lei Tang, Suju Rajan, and Vijay K Narayanan. 2009. Large scale multi-label classification via metalabeler. In WWW ’09. ACM, 211–220.
- [38] Shang-Hua Teng et al. 2016. Scalable algorithms for data and network analysis. Foundations and Trends® in Theoretical Computer Science 12, 1–2 (2016), 1–274.
- [39] Lloyd N Trefethen and David Bau III. 1997. Numerical linear algebra. Vol. 50. Siam.
- [40] Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. VERSE: Versatile Graph Embeddings from Similarity Measures. In WWW ’18. 539–548.
- [41] Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2009. Mining multi-label data. In Data mining and knowledge discovery handbook. Springer, 667–685.
- [42] Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416.
- [43] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In KDD ’18. ACM.
- [44] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. KDD ’18.
- [45] Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud ’10 10, 10-10 (2010), 95.
- [46] Peixiang Zhao. 2015. gSparsify: Graph Motif Based Sparsification for Graph Clustering. In CIKM ’15. ACM, 373–382.

Tags

Comments