NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

WWW '19: The Web Conference on The World Wide Web Conference WWW 2019, pp. 1509-1520, 2019.

Cited by: 27|Bibtex|Views398|Links
EI
Keywords:
large scale networkacademic collaborationlatent representationProtein-Protein Interactionsnetwork embedding as sparse matrix factorizationMore(8+)
Weibo:
To address the scalability challenges faced by the NetMF model, we propose to study large-scale network embedding as sparse matrix factorization

Abstract:

We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2) the explicit factorization of such matrix gener...More

Code:

Data:

0
Introduction
  • Recent years have witnessed the emergence of network embedding, which offers a revolutionary paradigm for modeling graphs and networks [16].
  • The goal of network embedding is to automatically learn latent representations for objects in networks, such as vertices and edges.
  • DeepWalk and node2vec, on the other hand, leverage random walks on graphs and skip-gram [24] with large context sizes to model nodes further away.
  • It is computationally more expensive for DeepWalk and node2vec to handle large-scale networks.
  • The node2vec model, which performs high-order random walks, takes more time than DeepWalk to learn embeddings
Highlights
  • Recent years have witnessed the emergence of network embedding, which offers a revolutionary paradigm for modeling graphs and networks [16]
  • We present the solution to network embedding learning as sparse matrix factorization (NetSMF)
  • From Table 5, we observe that for YouTube and Open Academic Graph, both of which contain more than one million vertices, NetMF fails to complete because of the excessive space and memory consumption, while network embedding as sparse matrix factorization is able to finish in four hours and one day, respectively
  • We study network embedding with the goal of achieving both efficiency and effectiveness
  • To address the scalability challenges faced by the NetMF model, we propose to study large-scale network embedding as sparse matrix factorization
  • Network embedding as sparse matrix factorization is capable of learning embeddings that maintain the same representation power as the dense matrix factorization solution, making it consistently outperform DeepWalk and node2vec by up to 34% and LINE by up to 100% for the multi-label vertex classification task in networks
  • Both the construction and factorization of the sparsified matrix are fast enough to support very large-scale network embedding learning. It empowers network embedding as sparse matrix factorization to efficiently embed the Open Academic Graph in 24 hours, whose size is computationally intractable for the dense matrix factorization solution (NetMF)
Methods
  • The authors compare NetSMF with NetMF [28], LINE [33], DeepWalk [27], and node2vec [14].
  • For NetSMF, NetMF, DeepWalk, and node2vec that allow multi-hop structural dependencies, the context window size T is set to be 10, which is the default setting used in both DeepWalk and node2vec.
  • The authors use the default setting of LINE’s hyper-parameters: the number of edge samples to be 10 billion and the negative sample size to be 5
Results
  • The authors summarize the prediction performance in Figure 2.
  • For small-scale networks, NetMF is faster than NetSMF in BlogCatalog and is comparable to NetSMF in PPI in terms of running time.
  • This is because when the input networks contain only thousands of vertices, the advantage of sparse matrix construction and factorization over its dense alternative could be marginalized by other components of the workflow
Conclusion
  • Discussion on the Approximation Error

    The above bound is achieved without making assumptions about the input network.
  • The authors organize the sparsifier into row-major format
  • This format allows efficient multiplication between a sparse and a dense matrix (Alg. 3, Line 3 and 5).
  • The authors use OpenMP [10] to parallelize NetSMF in our implementation5.In this work, the authors study network embedding with the goal of achieving both efficiency and effectiveness.
  • The authors present the NetSMF algorithm, which achieves a sparsification of the NetMF matrix
  • Both the construction and factorization of the sparsified matrix are fast enough to support very large-scale network embedding learning.
  • Among both matrix factorization based methods (NetMF and NetSMF) and common skip-gram based benchmarks (DeepWalk, LINE, and node2vec), NetSMF is the only model that achieves both efficiency and performance superiority
Summary
  • Introduction:

    Recent years have witnessed the emergence of network embedding, which offers a revolutionary paradigm for modeling graphs and networks [16].
  • The goal of network embedding is to automatically learn latent representations for objects in networks, such as vertices and edges.
  • DeepWalk and node2vec, on the other hand, leverage random walks on graphs and skip-gram [24] with large context sizes to model nodes further away.
  • It is computationally more expensive for DeepWalk and node2vec to handle large-scale networks.
  • The node2vec model, which performs high-order random walks, takes more time than DeepWalk to learn embeddings
  • Objectives:

    The authors aim to address the efficiency and scalability limitation of NetMF, while maintaining its superiority in effectiveness
  • Methods:

    The authors compare NetSMF with NetMF [28], LINE [33], DeepWalk [27], and node2vec [14].
  • For NetSMF, NetMF, DeepWalk, and node2vec that allow multi-hop structural dependencies, the context window size T is set to be 10, which is the default setting used in both DeepWalk and node2vec.
  • The authors use the default setting of LINE’s hyper-parameters: the number of edge samples to be 10 billion and the negative sample size to be 5
  • Results:

    The authors summarize the prediction performance in Figure 2.
  • For small-scale networks, NetMF is faster than NetSMF in BlogCatalog and is comparable to NetSMF in PPI in terms of running time.
  • This is because when the input networks contain only thousands of vertices, the advantage of sparse matrix construction and factorization over its dense alternative could be marginalized by other components of the workflow
  • Conclusion:

    Discussion on the Approximation Error

    The above bound is achieved without making assumptions about the input network.
  • The authors organize the sparsifier into row-major format
  • This format allows efficient multiplication between a sparse and a dense matrix (Alg. 3, Line 3 and 5).
  • The authors use OpenMP [10] to parallelize NetSMF in our implementation5.In this work, the authors study network embedding with the goal of achieving both efficiency and effectiveness.
  • The authors present the NetSMF algorithm, which achieves a sparsification of the NetMF matrix
  • Both the construction and factorization of the sparsified matrix are fast enough to support very large-scale network embedding learning.
  • Among both matrix factorization based methods (NetMF and NetSMF) and common skip-gram based benchmarks (DeepWalk, LINE, and node2vec), NetSMF is the only model that achieves both efficiency and performance superiority
Tables
  • Table1: The comparison between NetSMF and other popular network embedding algorithms
  • Table2: Notations
  • Table3: Time and Space Complexity of NetSMF
  • Table4: Statistics of Datasets
  • Table5: Efficiency comparison. The running time includes filesystem IO and computation time. “–” indicates that the corresponding algorithm fails to complete within one week. “×” indicates that the corresponding algorithm is unable to handle the computation due to excessive space and memory consumption
Download tables as Excel
Related work
  • In this section, we review the related work of network embedding, large-scale embedding algorithms, and spectral graph sparsification.

    5.1 Network Embedding

    Network embedding has been extensively studied over the past years [16]. The success of network embedding has driven a lot of downstream network applications, such as recommendation systems [44]. Briefly, recent work about network embedding can be categorized into three genres: (1) Skip-gram based methods that are inspired by word2vec [24], such as LINE [33], DeepWalk [27], node2vec [14], metapath2vec [12], and VERSE [40]; (2) Deep learning based methods such as [21, 44]; (3) Matrix factorization based methods such as GraRep [4] and NetMF [28]. Among them, NetMF bridges the first and the third categories by unifying a collection of skip-gram based network embedding methods into a matrix factorization framework. In this work, we leverage the merit of NetMF and address its limitation in efficiency. Among literature, PinSage is notably a network embedding framework for billionscale networks [44]. The difference between NetSMF and PinSage
Funding
  • Jian Li is supported in part by the National Basic Research Program of China Grant 2015CB358700, the National Natural Science Foundation of China Grant 61822203, 61772297, 61632016, 61761146003, and a grant from Microsoft Research Asia
Reference
  • Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang.
    Google ScholarFindings
  • 2009. A Social Identity Approach to Identify Familiar Strangers in a Social Network.. In ICWSM ’09.
    Google ScholarFindings
  • [2] Lars Backstrom, Paolo Boldi, Marco Rosa, Johan Ugander, and Sebastiano Vigna. 2012. Four degrees of separation. In WebSci ’12. ACM, 33–42.
    Google ScholarLocate open access versionFindings
  • [3] Daniele Calandriello, Ioannis Koutis, Alessandro Lazaric, and Michal Valko. 2018.
    Google ScholarFindings
  • [4] Shaosheng Cao, Wei Lu, and Qiongkai Xu. 201GraRep: Learning graph representations with global structural information. In CIKM ’15. ACM, 891–900.
    Google ScholarLocate open access versionFindings
  • [5] Raymond B Cattell. 196The scree test for the number of factors. Multivariate behavioral research 1, 2 (1966), 245–276.
    Google ScholarFindings
  • [6] Kamalika Chaudhuri, Fan Chung, and Alexander Tsiatas. 2012. Spectral clustering of graphs with general degrees in the extended planted partition model. In COLT ’12. 35–1.
    Google ScholarLocate open access versionFindings
  • [7] Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015.
    Google ScholarFindings
  • [8] Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015. Spectral sparsification of random-walk matrix polynomials. arXiv preprint arXiv:1502.03496 (2015).
    Findings
  • [9] Michael B Cohen, Jonathan Kelner, John Peebles, Richard Peng, Aaron Sidford, and Adrian Vladu. 2016. Faster algorithms for computing the stationary distribution, simulating random walks, and more. In FOCS ’16. IEEE, 583–592.
    Google ScholarLocate open access versionFindings
  • [10] Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5, 1 (1998), 46–55.
    Google ScholarLocate open access versionFindings
  • [11] Anirban Dasgupta, John E Hopcroft, and Frank McSherry. 2004. Spectral analysis of random graphs with skewed degree distributions. In FOCS ’04. 602–610.
    Google ScholarLocate open access versionFindings
  • [12] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In KDD ’17.
    Google ScholarFindings
  • [13] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR ’08 9, Aug (2008), 1871–1874.
    Google ScholarFindings
  • [14] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD ’16. ACM, 855–864.
    Google ScholarLocate open access versionFindings
  • [15] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217–288.
    Google ScholarLocate open access versionFindings
  • [16] William L. Hamilton, Rex Ying, and Jure Leskovec. 20Representation Learning on Graphs: Methods and Applications. IEEE Data(base) Engineering Bulletin 40 (2017), 52–74.
    Google ScholarLocate open access versionFindings
  • [17] Nicholas J Higham and Lijing Lin. 2011. On p th roots of stochastic matrices. Linear Algebra Appl. 435, 3 (2011), 448–463.
    Google ScholarLocate open access versionFindings
  • [18] Roger A. Horn and Charles R. Johnson. 1991. Topics in Matrix Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511840371
    Findings
  • [19] Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep Dubey. 2016. Parallelizing word2vec in shared and distributed memory. arXiv preprint arXiv:1604.04661 (2016).
    Findings
  • [20] Michael Kapralov, Yin Tat Lee, CN Musco, CP Musco, and Aaron Sidford. 2017. Single pass spectral sparsification in dynamic streams. SIAM J. Comput. 46, 1 (2017), 456–477.
    Google ScholarLocate open access versionFindings
  • [21] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR ’17.
    Google ScholarLocate open access versionFindings
  • [22] Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In NIPS ’14. 2177–2185.
    Google ScholarLocate open access versionFindings
  • [23] Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server.. In OSDI ’14, Vol. 14.
    Google ScholarLocate open access versionFindings
  • [24] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR Workshop ’13.
    Google ScholarLocate open access versionFindings
  • [25] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.
    Google ScholarFindings
  • [26] Erik Ordentlich, Lee Yang, Andy Feng, Peter Cnudde, Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, and Gavin Owens. 2016. Network-efficient distributed word2vec training system for large vocabularies. In CIKM ’16. ACM, 1139–1148.
    Google ScholarLocate open access versionFindings
  • [27] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD ’14. ACM, 701–710.
    Google ScholarLocate open access versionFindings
  • [28] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018.
    Google ScholarFindings
  • [29] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In WWW ’15. ACM, 243–246.
    Google ScholarLocate open access versionFindings
  • [30] Daniel A Spielman and Nikhil Srivastava. 2011. Graph sparsification by effective resistances. SIAM J. Comput. 40, 6 (2011), 1913–1926.
    Google ScholarLocate open access versionFindings
  • [31] Chris Stark, Bobby-Joe Breitkreutz, Andrew Chatr-Aryamontri, Lorrie Boucher, Rose Oughtred, Michael S Livstone, Julie Nixon, Kimberly Van Auken, Xiaodong Wang, Xiaoqi Shi, et al. 2010. The BioGRID interaction database: 2011 update. Nucleic acids research 39, suppl_1 (2010), D698–D704.
    Google ScholarLocate open access versionFindings
  • [32] Stergios Stergiou, Zygimantas Straznickas, Rolina Wu, and Kostas Tsioutsiouliklis. 2017. Distributed Negative Sampling for Word Embeddings.. In AAAI ’17. 2569– 2575.
    Google ScholarLocate open access versionFindings
  • [33] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW ’15. 1067–1077.
    Google ScholarLocate open access versionFindings
  • [34] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In KDD ’08. 990–998.
    Google ScholarLocate open access versionFindings
  • [35] Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In KDD ’09. ACM, 817–826.
    Google ScholarLocate open access versionFindings
  • [36] Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In CIKM ’09. ACM, 1107–1116.
    Google ScholarLocate open access versionFindings
  • [37] Lei Tang, Suju Rajan, and Vijay K Narayanan. 2009. Large scale multi-label classification via metalabeler. In WWW ’09. ACM, 211–220.
    Google ScholarLocate open access versionFindings
  • [38] Shang-Hua Teng et al. 2016. Scalable algorithms for data and network analysis. Foundations and Trends® in Theoretical Computer Science 12, 1–2 (2016), 1–274.
    Google ScholarLocate open access versionFindings
  • [39] Lloyd N Trefethen and David Bau III. 1997. Numerical linear algebra. Vol. 50. Siam.
    Google ScholarLocate open access versionFindings
  • [40] Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. VERSE: Versatile Graph Embeddings from Similarity Measures. In WWW ’18. 539–548.
    Google ScholarLocate open access versionFindings
  • [41] Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2009. Mining multi-label data. In Data mining and knowledge discovery handbook. Springer, 667–685.
    Google ScholarFindings
  • [42] Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416.
    Google ScholarLocate open access versionFindings
  • [43] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In KDD ’18. ACM.
    Google ScholarLocate open access versionFindings
  • [44] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. KDD ’18.
    Google ScholarFindings
  • [45] Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud ’10 10, 10-10 (2010), 95.
    Google ScholarLocate open access versionFindings
  • [46] Peixiang Zhao. 2015. gSparsify: Graph Motif Based Sparsification for Graph Clustering. In CIKM ’15. ACM, 373–382.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments