## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# DeepWalk: online learning of social representations

KDD, (2014): 701-710

EI

Full Text

Weibo

Keywords

Abstract

We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep lea...More

Code:

Data:

Introduction

- The sparsity of a network representation is both a strength and a weakness. Sparsity enables the design of efficient discrete algorithms, but can make it harder to generalize in statistical learning.
- Social representations are latent features of the vertices that capture neighborhood similarity and community membership.
- DeepWalk generalizes neural language models to process a special language composed of a set of randomly-generated walks.
- These neural language models have been used to capture the semantic and syntactic structure of human language [7], and even logical analogies [29]

Highlights

- The sparsity of a network representation is both a strength and a weakness
- In this paper we introduce deep learning [3] techniques, which have proven successful in natural language processing, into network analysis for the first time
- We introduce deep learning as a tool to analyze graphs, to build robust representations that are suitable for statistical modeling
- We present a generalization of language modeling to explore the graph through a stream of short random walks
- We propose DeepWalk, a novel approach for learning latent social representations of vertices
- DeepWalk’s representations are able to outperform all baseline methods while using 60% less training data
- Our results show that we can create meaningful representations for graphs which are too large for standard spectral methods

Methods

- To validate the performance of the approach the authors compare it against a number of baselines:

SpectralClustering [41]: This method generates a representation in Rd from the d-smallest eigenvectors of L, the normalized graph Laplacian of G. - Given the neighborhood Ni of vertex vi, wvRN estimates Pr with the weighted mean of its neighbors (i.e

Results

- DeepWalk’s representations can provide F1 scores up to 10% higher than competing methods when labeled data is sparse.
- DeepWalk’s representations are able to outperform all baseline methods while using 60% less training data.
- DeepWalk’s representations can outperform its competitors even when given 60% less training data.
- The authors' results show that the authors can create meaningful representations for graphs which are too large for standard spectral methods.
- On such large graphs, the method significantly outperforms other methods designed to operate for sparsity

Conclusion

- The authors propose DeepWalk, a novel approach for learning latent social representations of vertices.
- Using local information from truncated random walks as input, the method learns a representation which encodes structural regularities.
- The authors' results show that the authors can create meaningful representations for graphs which are too large for standard spectral methods.
- On such large graphs, the method significantly outperforms other methods designed to operate for sparsity.
- The authors show that the approach is parallelizable, allowing workers to update different parts of the model concurrently

- Table1: Graphs used in our experiments
- Table2: Multi-label classification results in BlogCatalog
- Table3: Multi-label classification results in Flickr
- Table4: Multi-label classification results in YouTube

Related work

- The main differences between our proposed method and previous work can be summarized as follows: 1. We learn our latent social representations, instead of computing statistics related to centrality [13] or partitioning [41].

2. We do not attempt to extend the classification procedure itself (through collective inference [37] or graph kernels [21]).

3. We propose a scalable online method which uses only local information. Most methods require global information and are offline [17, 39,40,41].

4. We apply unsupervised representation learning to graphs. In this section we discuss related work in network classification and unsupervised feature learning.

Funding

- This research was partially supported by NSF Grants DBI-1060572 and IIS-1017181, and a Google Faculty Research Award

Reference

- R. Al-Rfou, B. Perozzi, and S. Skiena. Polyglot: Distributed word representations for multilingual nlp. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 183–192, Sofia, Bulgaria, August 2013. ACL.
- R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In Foundations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on, pages 475–486. IEEE, 2006.
- Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. 2013.
- Y. Bengio, R. Ducharme, and P. Vincent. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137–1155, 2003.
- L. Bottou. Stochastic gradient learning in neural networks. In Proceedings of Neuro-Nımes 91, Nimes, France, 1991. EC2.
- V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3):15, 2009.
- R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th ICML, ICML ’08, pages 160–16ACM, 2008.
- G. E. Dahl, D. Yu, L. Deng, and A. Acero. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):30–42, 2012.
- J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1232–1240. 2012.
- D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11:625–660, 2010.
- R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.
- F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. Knowledge and Data Engineering, IEEE Transactions on, 19(3):355–369, 2007.
- B. Gallagher and T. Eliassi-Rad. Leveraging label-independent features for classification in sparsely labeled networks: An empirical study. In Advances in Social Network Mining and Analysis, pages 1–19.
- B. Gallagher, H. Tong, T. Eliassi-Rad, and C. Faloutsos. Using ghost edges for classification in sparsely labeled networks. In Proceedings of the 14th ACM SIGKDD, KDD ’08, pages 256–264, New York, NY, USA, 2008. ACM.
- S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (6):721–741, 1984.
- L. Getoor and B. Taskar. Introduction to statistical relational learning. MIT press, 2007.
- K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, and C. Faloutsos. It’s who you know: Graph mining using recursive structural features. In Proceedings of the 17th ACM SIGKDD, KDD ’11, pages 663–671, New York, NY, USA, 2011. ACM.
- G. E. Hinton. Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society, pages 1–12. Amherst, MA, 1986.
- R. A. Hummel and S. W. Zucker. On the foundations of relaxation labeling processes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (3):267–287, 1983.
- U. Kang, H. Tong, and J. Sun. Fast random walk graph kernel. In SDM, pages 828–838, 2012.
- R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In ICML, volume 2, pages 315–322, 2002.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, volume 1, page 4, 2012.
- D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019–1031, 2007.
- F. Lin and W. Cohen. Semi-supervised classification of network data using very few labels. In Advances in Social Networks Analysis and Mining (ASONAM), 2010 International Conference on, pages 192–199, Aug 2010.
- S. A. Macskassy and F. Provost. A simple relational classifier. In Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM-2003) at KDD-2003, pages 64–76, 2003.
- S. A. Macskassy and F. Provost. Classification in networked data: A toolkit and a univariate case study. The Journal of Machine Learning Research, 8:935–983, 2007.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, pages 3111–3119. 2013.
- T. Mikolov, W.-t. Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In Proceedings of NAACL-HLT, pages 746–751, 2013.
- A. Mnih and G. E. Hinton. A scalable hierarchical distributed language model. Advances in neural information processing systems, 21:1081–1088, 2009.
- F. Morin and Y. Bengio. Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics, pages 246–252, 2005.
- J. Neville and D. Jensen. Iterative classification in relational data. In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data, pages 13–20, 2000.
- J. Neville and D. Jensen. Leveraging relational autocorrelation with latent group models. In Proceedings of the 4th International Workshop on Multi-relational Mining, MRDM ’05, pages 49–55, New York, NY, USA, 2005. ACM.
- J. Neville and D. Jensen. A bias/variance decomposition for models using collective inference. Machine Learning, 73(1):87–106, 2008.
- M. E. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577–8582, 2006.
- B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems 24, pages 693–701. 2011.
- P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. Collective classification in network data. AI magazine, 29(3):93, 2008.
- D. A. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 81–90. ACM, 2004.
- L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD, KDD ’09, pages 817–826, New York, NY, USA, 2009. ACM.
- L. Tang and H. Liu. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1107–1116. ACM, 2009.
- L. Tang and H. Liu. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3):447–478, 2011.
- S. Vishwanathan, N. N. Schraudolph, R. Kondor, and K. M. Borgwardt. Graph kernels. The Journal of Machine Learning Research, 99:1201–1242, 2010.
- X. Wang and G. Sukthankar. Multi-label relational neighbor classification using social context features. In Proceedings of the 19th ACM SIGKDD, pages 464–472. ACM, 2013.
- W. Zachary. An information flow model for conflict and fission in small groups1. Journal of anthropological research, 33(4):452–473, 1977.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn