# ProNE - Fast and Scalable Network Representation Learning

IJCAI, pp. 4278-4284, 2019.

EI

Keywords:

Weibo:

Abstract:

Recent advances in network embedding has revolutionized the field of graph and network mining. However, (pre-)training embeddings for very large-scale networks is computationally challenging for most existing methods. In this work, we present ProNE---a fast, scalable, and effective model, whose single-thread version is 10--400x faster tha...More

Code:

Data:

Introduction

- Representation learning has offered a new paradigm for network mining and analysis [Hamilton et al, 2017b].
- The recent advances in network embedding can roughly fall into three categories: matrix factorization based methods such as SocDim [Tang and Liu, 2009], GraRep [Cao et al, 2015], HOPE [Ou et al, 2016], and NetMF [Qiu et al, 2018]; skip-gram based models, such as DeepWalk [Perozzi et al, 2014], LINE [Tang et al, 2015], and node2vec [Grover and Leskovec, 2016]; and graph neural networks (GNNs), such as Graph Convolution [Kipf and Welling, 2017], GraphSage [Hamilton et al, 2017a], and Graph Attention [Velickovicet al., 2018].
- In factorization based models, the time complexity of GraRep is O(n3) with n being the number of nodes in a network, making it prohibitively expensive to compute for large networks; In skipgram based models, with the default parameter settings, it would cost LINE weeks and DeepWalk/node2vec months to learn embeddings for a network of 100,000,000 nodes and 500,000,000 edges by using 20 threads on a modern server

Highlights

- Representation learning has offered a new paradigm for network mining and analysis [Hamilton et al, 2017b]
- The recent advances in network embedding can roughly fall into three categories: matrix factorization based methods such as SocDim [Tang and Liu, 2009], GraRep [Cao et al, 2015], HOPE [Ou et al, 2016], and NetMF [Qiu et al, 2018]; skip-gram based models, such as DeepWalk [Perozzi et al, 2014], LINE [Tang et al, 2015], and node2vec [Grover and Leskovec, 2016]; and graph neural networks (GNNs), such as Graph Convolution [Kipf and Welling, 2017], GraphSage [Hamilton et al, 2017a], and Graph Attention [Velickovicet al., 2018]
- Sparse Randomized tSVD for Fast Embedding By far, we show how general distributional similarity-based network embedding can be understood as matrix factorization
- In addition to ProNE, we report the interim embedding results generated by the sparse matrix factorization (SMF) step in ProNE
- We propose ProNE—a fast and scalable network embedding approach

Methods

- The authors evaluate the efficiency and effectiveness of the ProNE method on multi-label node classification—a commonly used task for network embedding evaluation [Perozzi et al, 2014; Tang et al, 2015; Grover and Leskovec, 2016].

Results

- The authors follow the same experimental settings used in baseline works [Perozzi et al, 2014; Grover and Leskovec, 2016; Tang et al, 2015; Cao et al, 2015].
- The authors randomly sample different percentages of labeled nodes for training a liblinear classifier and use the remaining for testing.
- The authors repeat the training and predicting for ten times and report the average Micro-F1 for all methods.
- The authors follow the common practice for efficiency evaluation by the Wall-clock time and ProNE’s scalability is analyzed by the time cost in multiple-scale networks [Tang et al, 2015].

Conclusion

- The authors propose ProNE—a fast and scalable network embedding approach.
- It achieves both efficiency and effectiveness superiority over recent powerful network embedding benchmarks, such as DeepWalk, LINE, node2vec, GraRep, and HOPE.
- The single-thread ProNE model is ∼10–400× faster than the aforementioned baselines that are accelerated by using 20 threads.
- The authors would like to apply the sparse matrix multiplication parallelizability technique to speed up ProNE as discussed in Section 3.4.
- The authors are interested in exploring the connection between graph spectral based factorization models and graph convolution and graph attention networks

Summary

## Introduction:

Representation learning has offered a new paradigm for network mining and analysis [Hamilton et al, 2017b].- The recent advances in network embedding can roughly fall into three categories: matrix factorization based methods such as SocDim [Tang and Liu, 2009], GraRep [Cao et al, 2015], HOPE [Ou et al, 2016], and NetMF [Qiu et al, 2018]; skip-gram based models, such as DeepWalk [Perozzi et al, 2014], LINE [Tang et al, 2015], and node2vec [Grover and Leskovec, 2016]; and graph neural networks (GNNs), such as Graph Convolution [Kipf and Welling, 2017], GraphSage [Hamilton et al, 2017a], and Graph Attention [Velickovicet al., 2018].
- In factorization based models, the time complexity of GraRep is O(n3) with n being the number of nodes in a network, making it prohibitively expensive to compute for large networks; In skipgram based models, with the default parameter settings, it would cost LINE weeks and DeepWalk/node2vec months to learn embeddings for a network of 100,000,000 nodes and 500,000,000 edges by using 20 threads on a modern server
## Methods:

The authors evaluate the efficiency and effectiveness of the ProNE method on multi-label node classification—a commonly used task for network embedding evaluation [Perozzi et al, 2014; Tang et al, 2015; Grover and Leskovec, 2016].## Results:

The authors follow the same experimental settings used in baseline works [Perozzi et al, 2014; Grover and Leskovec, 2016; Tang et al, 2015; Cao et al, 2015].- The authors randomly sample different percentages of labeled nodes for training a liblinear classifier and use the remaining for testing.
- The authors repeat the training and predicting for ten times and report the average Micro-F1 for all methods.
- The authors follow the common practice for efficiency evaluation by the Wall-clock time and ProNE’s scalability is analyzed by the time cost in multiple-scale networks [Tang et al, 2015].
## Conclusion:

The authors propose ProNE—a fast and scalable network embedding approach.- It achieves both efficiency and effectiveness superiority over recent powerful network embedding benchmarks, such as DeepWalk, LINE, node2vec, GraRep, and HOPE.
- The single-thread ProNE model is ∼10–400× faster than the aforementioned baselines that are accelerated by using 20 threads.
- The authors would like to apply the sparse matrix multiplication parallelizability technique to speed up ProNE as discussed in Section 3.4.
- The authors are interested in exploring the connection between graph spectral based factorization models and graph convolution and graph attention networks

- Table1: The statistics of datasets
- Table2: Efficiency comparison based on running time (second)
- Table3: The classification performance in terms of Micro-F1 (%)

Related work

- The recent emergence of network embedding is largely triggered by skip-gram’s applications in natural language and network mining [Mikolov et al, 2013b; Perozzi et al, 2014]. Its history can date back to spectral clustering [Chung, 1997; Yan et al, 2009] and social dimension learning [Tang and Liu, 2009]. Over the course of its development, most network embedding methods aim to model distributional similarities between nodes either implicitly or explicitly.

Inspired by the word2vec model, a line of skip-gram based embedding models have been presented to encode network structures into continuous spaces, such as DeepWalk [Perozzi et al, 2014], LINE [Tang et al, 2015], and node2vec [Grover and Leskovec, 2016]. Recently, learned from [Levy and Goldberg, 2014], a study shows that skip-gram based network embedding can be understood as implicit matrix factorization and it also presents the NetMF model to perform explicit matrix factorization for learning network embeddings [Qiu et al, 2018]. The difference between NetMF and our model lies in that the matrix to be factorized by NetMF is a dense one, whose construction and factorization involve computation in O(|V |3) time complexity, while our ProNE model formalizes network embedding as sparse matrix factorization in O(|E|).

Reference

- [Andrews and Andrews, 1992] Larry C Andrews and Larry C Andrews. Special functions of mathematics for engineers. McGrawHill New York, 1992.
- [Bandeira et al., 2013] Afonso S Bandeira, Amit Singer, and Daniel A Spielman. A cheeger inequality for the graph connection laplacian. SIAM Journal on Matrix Analysis and Applications, 34(4):1611–1630, 2013.
- [Belkin and Niyogi, 2001] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, pages 585–591, 2001.
- [Breitkreutz et al., 2008] Bobby-Joe Breitkreutz, Chris Stark, et al. The biogrid interaction database: 2008 update. Nucleic acids research, 36(suppl 1):D637–D640, 2008.
- [Bulucand Gilbert, 2012] Aydin Bulucand John R Gilbert. Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments. SIAM Journal on Scientific Computing, 34(4):C170–C191, 2012.
- [Cao et al., 2015] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. In CIKM, pages 891–900, 2015.
- [Chung, 1997] Fan RK Chung. Spectral graph theory. Number 92. American Mathematical Soc., 1997.
- [Defferrard et al., 2016] Michael Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, pages 3844–3852, 2016.
- [Grover and Leskovec, 2016] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In KDD, pages 855–864, 2016.
- [Hamilton et al., 2017a] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NIPS, pages 1025–1035, 2017.
- [Hamilton et al., 2017b] William L. Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. IEEE Data(base) Engineering Bulletin, 40:52–74, 2017.
- [Hammond et al., 2011] David K Hammond, Pierre Vandergheynst, and Remi Gribonval. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129–150, 2011.
- [Harris, 1954] Zellig S Harris. Distributional structure. Word, 10(23):146–162, 1954.
- [Henaff et al., 2015] Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163, 2015.
- [Kipf and Welling, 2017] Thomas N Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
- [Lee et al., 2014] James R Lee, Shayan Oveis Gharan, and Luca Trevisan. Multiway spectral partitioning and higher-order cheeger inequalities. JACM, 61(6):37, 2014.
- [Levy and Goldberg, 2014] Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. In NIPS, pages 2177–2185, 2014.
- [Mahoney, 2009] Matt Mahoney. Large text compression benchmark. URL: http://www.mattmahoney.net/text/text.html, 2009.
- [Mikolov et al., 2013a] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In ICLR Workshop, 2013.
- [Mikolov et al., 2013b] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111–3119, 2013.
- [Ou et al., 2016] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity preserving graph embedding. In KDD, pages 1105–1114, 2016.
- [Perozzi et al., 2014] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In KDD, pages 701–710, 2014.
- [Qiu et al., 2018] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM, pages 459–467, 2018.
- [Shuman et al., 2016] David I Shuman, Benjamin Ricaud, and Pierre Vandergheynst. Vertex-frequency analysis on graphs. Applied and Computational Harmonic Analysis, 40(2):260–291, 2016.
- [Smith et al., 2015] Shaden Smith, Niranjay Ravindran, Nicholas D Sidiropoulos, and George Karypis. Splatt: Efficient and parallel sparse tensor-matrix multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium, pages 61–70. IEEE, 2015.
- [Tang and Liu, 2009] Lei Tang and Huan Liu. Relational learning via latent social dimensions. In KDD, pages 817–826, 2009.
- [Tang et al., 2008] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Arnetminer: extraction and mining of academic social networks. In KDD, pages 990–998, 2008.
- [Tang et al., 2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In WWW, pages 1067–1077, 2015.
- [Tao, 2012] Terence Tao. Topics in random matrix theory, volume 132. American Mathematical Soc., 2012.
- [Tenenbaum et al., 2000] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.
- [Velickovicet al., 2018] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. In ICLR, 2018.
- [Von Luxburg, 2007] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
- [Yan et al., 2009] Donghui Yan, Ling Huang, and Michael I Jordan. Fast approximate spectral clustering. In KDD, pages 907–916, 2009.
- [Zafarani and Liu, 2009] Reza Zafarani and Huan Liu. Social computing data repository at asu, 2009.

Tags

Comments