LINE: Large-scale Information Network Embedding

WWW, 2015.

Cited by: 2315|Bibtex|Views329|Links
EI
Keywords:
information networkreal world information networkobjective functiondimension reductionlow dimensionalMore(9+)
Weibo:
An efficient and effective edge-sampling method is proposed for model inference, which solved the limitation of stochastic gradient descent on weighted edges without compromising the efficiency

Abstract:

This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In ...More

Code:

Data:

0
Introduction
  • Information networks are ubiquitous in the real world with examples such as airline networks, publication networks, social and communication networks, and the World Wide Web.
  • This paper studies the problem of embedding information networks into lowdimensional spaces, in which every vertex is represented as a low-dimensional vector.
  • Such a low-dimensional embedding is very useful in a variety of applications such as visualization [21], node classification [3], link prediction [10], and recommendation [23].
  • The authors anticipate that a new model with a carefully designed objective function that preserves properties of the graph and an efficient optimization technique should effectively find the embedding of millions of nodes
Highlights
  • Information networks are ubiquitous in the real world with examples such as airline networks, publication networks, social and communication networks, and the World Wide Web
  • This paper studies the problem of embedding information networks into lowdimensional spaces, in which every vertex is represented as a low-dimensional vector
  • A few very recent studies approach the embedding of large-scale networks, these methods either use an indirect approach that is not designed for networks (e.g., [1]) or lack a clear objective function tailored for network embedding (e.g., [16])
  • We show that directly deploying the stochastic gradient descent is problematic for real world information networks
  • As the importance of the vertices in the network may be different, we introduce λi in the objective function to represent the prestige of vertex i in the network, which can be measured by the degree or estimated through algorithms such as
  • An efficient and effective edge-sampling method is proposed for model inference, which solved the limitation of stochastic gradient descent on weighted edges without compromising the efficiency
Methods
  • The authors empirically evaluated the effectiveness and efficiency of the LINE. The authors applied the method to several large-scale real-world networks of different types, including a language network, two social networks, and two citation networks.

    5.1 Experiment Setup Data Sets.

    (1) Language network.
  • The authors constructed a word cooccurrence network from the entire set of English Wikipedia pages.
  • The authors use two social networks: Flickr and Youtube2.
  • The authors use the DBLP data set [19]3 to construct the citation networks between authors and between papers.
  • The detailed statistics of these networks are summarized into Table 1.
  • They represent a variety of information networks: directed and undirected, binary and weighted.
  • Each network contains at least half a million nodes and millions of edges, with the largest network containing around two million nodes and a billion edges
Results
  • 5.2.1 Language Network

    The authors start with the results on the language network, which contains two million nodes and a billion edges.
  • GF DeepWalk SkipGram LINE-SGD(1st) LINE-SGD(2nd) LINE(1st) LINE(2nd) Semantic (%) Syntactic (%) Overall (%) Running time.
  • Word Analogy
  • This task is introduced by Mikolov et al [12].
  • Given a word pair (a, b) and a word c, the task aims to find a word d, such that the relation between c and d is similar to the relation between a and b, or denoted as: a : b → c :?.
  • Two categories of word analogy are used in this task: semantic and syntactic
Conclusion
  • The authors discuss several practical issues of the LINE model. Low degree vertices.
  • One practical issue is how to accurately embed vertices with small degrees.
  • As the number of neighbors of such a node is very small, it is very hard to accurately infer its representation, especially with the second-order proximity based methods which heavily rely on the number of “contexts.” An intuitive solution to this is expanding the neighbors of those vertices by adding higher order neighbors, such as neighbors of neighbors.
  • The authors plan to investigate the embedding of heterogeneous information networks, e.g., vertices with multiple types
Summary
  • Introduction:

    Information networks are ubiquitous in the real world with examples such as airline networks, publication networks, social and communication networks, and the World Wide Web.
  • This paper studies the problem of embedding information networks into lowdimensional spaces, in which every vertex is represented as a low-dimensional vector.
  • Such a low-dimensional embedding is very useful in a variety of applications such as visualization [21], node classification [3], link prediction [10], and recommendation [23].
  • The authors anticipate that a new model with a carefully designed objective function that preserves properties of the graph and an efficient optimization technique should effectively find the embedding of millions of nodes
  • Methods:

    The authors empirically evaluated the effectiveness and efficiency of the LINE. The authors applied the method to several large-scale real-world networks of different types, including a language network, two social networks, and two citation networks.

    5.1 Experiment Setup Data Sets.

    (1) Language network.
  • The authors constructed a word cooccurrence network from the entire set of English Wikipedia pages.
  • The authors use two social networks: Flickr and Youtube2.
  • The authors use the DBLP data set [19]3 to construct the citation networks between authors and between papers.
  • The detailed statistics of these networks are summarized into Table 1.
  • They represent a variety of information networks: directed and undirected, binary and weighted.
  • Each network contains at least half a million nodes and millions of edges, with the largest network containing around two million nodes and a billion edges
  • Results:

    5.2.1 Language Network

    The authors start with the results on the language network, which contains two million nodes and a billion edges.
  • GF DeepWalk SkipGram LINE-SGD(1st) LINE-SGD(2nd) LINE(1st) LINE(2nd) Semantic (%) Syntactic (%) Overall (%) Running time.
  • Word Analogy
  • This task is introduced by Mikolov et al [12].
  • Given a word pair (a, b) and a word c, the task aims to find a word d, such that the relation between c and d is similar to the relation between a and b, or denoted as: a : b → c :?.
  • Two categories of word analogy are used in this task: semantic and syntactic
  • Conclusion:

    The authors discuss several practical issues of the LINE model. Low degree vertices.
  • One practical issue is how to accurately embed vertices with small degrees.
  • As the number of neighbors of such a node is very small, it is very hard to accurately infer its representation, especially with the second-order proximity based methods which heavily rely on the number of “contexts.” An intuitive solution to this is expanding the neighbors of those vertices by adding higher order neighbors, such as neighbors of neighbors.
  • The authors plan to investigate the embedding of heterogeneous information networks, e.g., vertices with multiple types
Tables
  • Table1: They represent a variety of information networks: directed and undirected, binary and weighted. Each network contains at least half a million nodes and millions of edges, with the largest network containing around two million nodes and a billion edges. Statistics of the real-world information networks
  • Table2: Results of word analogy on Wikipedia data
  • Table3: Results of Wikipedia page classification on Wikipedia data set
  • Table4: Comparison of most similar words using 1st-order and 2nd-order proximity
  • Table5: Results of multi-label classification on the Flickr network
  • Table6: Results of multi-label classification on the Youtube network. The results in the brackets are on the reconstructed network, which adds second-order neighbors (i.e., neighbors of neighbors) as neighbors for vertices with a low degree
  • Table7: Results of multi-label classification on DBLP(AuthorCitation) network
  • Table8: Results of multi-label classification on DBLP(PaperCitation) network
Download tables as Excel
Related work
  • Our work is related to classical methods of graph embedding or dimension reduction in general, such as multidimensional scaling (MDS) [4], IsoMap [20], LLE [18] and Laplacian Eigenmap [2]. These approaches typically first construct the affinity graph using the feature vectors of the data points, e.g., the K-nearest neighbor graph of data, and then embed the affinity graph [22] into a low dimensional space. However, these algorithms usually rely on solving the leading eigenvectors of the affinity matrices, the complexity of which is at least quadratic to the number of nodes, making them inefficient to handle large-scale networks.

    Among the most recent literature is a technique called graph factorization [1]. It finds the low-dimensional embedding of a large graph through matrix factorization, which is optimized using stochastic gradient descent. This is possible because a graph can be represented as an affinity matrix. However, the objective of matrix factorization is not designed for networks, therefore does not necessarily preserve the global network structure. Intuitively, graph factorization expects nodes with higher first-order proximity are represented closely. Instead, the LINE model uses an objective that is particularly designed for networks, which preserves both the first-order and the second-order proximities. Practically, the graph factorization method only applies to undirected graphs while the proposed model is applicable for both undirected and directed graphs.
Funding
  • The co-author Ming Zhang is supported by the National Natural Science Foundation of China (NSFC Grant No 61472006); Qiaozhu Mei is supported by the National Science Foundation under grant numbers IIS-1054199 and CCF-1048168
Reference
  • A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In Proceedings of the 22nd international conference on World Wide Web, pages 37–48. International World Wide Web Conferences Steering Committee, 2013.
    Google ScholarLocate open access versionFindings
  • M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, volume 14, pages 585–591, 2001.
    Google ScholarLocate open access versionFindings
  • S. Bhagat, G. Cormode, and S. Muthukrishnan. Node classification in social networks. In Social Network Data Analytics, pages 115–148.
    Google ScholarLocate open access versionFindings
  • T. F. Cox and M. A. Cox. Multidimensional scaling. CRC Press, 2000.
    Google ScholarFindings
  • J. R. Firth. A synopsis of linguistic theory, 1930–195In J. R. Firth (Ed.), Studies in linguistic analysis, pages 1–32.
    Google ScholarLocate open access versionFindings
  • M. S. Granovetter. The strength of weak ties. American journal of sociology, pages 1360–1380, 1973.
    Google ScholarLocate open access versionFindings
  • Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of The 31st International Conference on Machine Learning, pages 1188–1196, 2014.
    Google ScholarLocate open access versionFindings
  • O. Levy and Y. Goldberg. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems, pages 2177–2185, 2014.
    Google ScholarLocate open access versionFindings
  • A. Q. Li, A. Ahmed, S. Ravi, and A. J. Smola. Reducing the sampling complexity of topic models. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 891–900. ACM, 2014.
    Google ScholarLocate open access versionFindings
  • D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019–1031, 2007.
    Google ScholarLocate open access versionFindings
  • C. D. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.
    Google ScholarFindings
  • T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
    Findings
  • T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111–3119, 2013.
    Google ScholarLocate open access versionFindings
  • S. A. Myers, A. Sharma, P. Gupta, and J. Lin. Information network or social network?: the structure of the twitter follow graph. In Proceedings of the companion publication of the 23rd international conference on World wide web companion, pages 493–498. International World Wide Web Conferences Steering Committee, 2014.
    Google ScholarLocate open access versionFindings
  • L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. 1999.
    Google ScholarFindings
  • B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
    Google ScholarLocate open access versionFindings
  • B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 693–701, 2011.
    Google ScholarLocate open access versionFindings
  • S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
    Google ScholarLocate open access versionFindings
  • J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 990–998. ACM, 2008.
    Google ScholarLocate open access versionFindings
  • J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
    Google ScholarLocate open access versionFindings
  • L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579-2605):85, 2008.
    Google ScholarLocate open access versionFindings
  • S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: a general framework for dimensionality reduction. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(1):40–51, 2007.
    Google ScholarLocate open access versionFindings
  • X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 283–292. ACM, 2014.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments