Incomplete Network Alignment: Problem Definitions and Fast Solutions

Si Zhang
Si Zhang
Jiejun Xu
Jiejun Xu

ACM Transactions on Knowledge Discovery from Data, pp. 1-26, 2020.

Cited by: 1|Bibtex|Views252|Links
EI
Keywords:
Incomplete network alignmentlow ranknetwork completion
Weibo:
The multi-sourced and incomplete characteristics often co-exist in many real networks, the state-of-the-arts have been largely addressing network alignment and network completion problems in parallel

Abstract:

Networks are prevalent in many areas and are often collected from multiple sources. However, due to the veracity characteristics, more often than not, networks are incomplete. Network alignment and network completion have become two fundamental cornerstones behind a wealth of high-impact graph mining applications. The state-of-the-art hav...More

Code:

Data:

0
Introduction
  • Networks are prevalent and naturally appear in many areas. More often than not, in the big data era, networks in many high-impact applications are collected from multiple sources (i.e., variety), such as social networks from different social platforms, protein–protein interaction (PPI) networks from multiple tissues, and transaction networks from multiple financial institutes.
  • In the big data era, networks in many high-impact applications are collected from multiple sources, such as social networks from different social platforms, protein–protein interaction (PPI) networks from multiple tissues, and transaction networks from multiple financial institutes.
  • In order to integrate the considerable information associated with multiple networks, network alignment is of key importance to find the node correspondence across networks.
  • By aligning the same users in different transaction networks, the transaction patterns of users can be comprehended to enhance the financial fraud detection.
  • Network completion has become another key task which benefits many graph mining applications by providing higher-quality networks if handled properly
Highlights
  • Networks are prevalent and naturally appear in many areas
  • In the big data era, networks in many high-impact applications are collected from multiple sources, such as social networks from different social platforms, protein–protein interaction (PPI) networks from multiple tissues, and transaction networks from multiple financial institutes
  • In order to integrate the considerable information associated with multiple networks, network alignment is of key importance to find the node correspondence across networks
  • The multi-sourced and incomplete characteristics often co-exist in many real networks, the state-of-the-arts have been largely addressing network alignment and network completion problems in parallel
  • The empirical evaluations demonstrate the effectiveness and efficiency of the proposed iNeAt algorithm. It (1) improves the alignment accuracy by up to 30% over the existing network alignment methods, in the it leads a better imputation outcome; and (2) achieves a good quality-speed balance and scales linearly w.r.t the number of nodes in the networks
  • The proposed algorithm is proved to converge to the KKT fixed point with a linear complexity in both time and space
Methods
  • The authors evaluate the effectiveness and efficiency of the proposed algorithm by extensive experiments.
  • The rest of the article is organized as follows.
  • Section 2 defines the incomplete network alignment problem and provides some preliminaries of the article.
  • Section 3 presents the proposed optimization formulation of iNeAt and Section 4 gives an effective optimization algorithm, followed by some analyses.
  • Section 5 presents the experimental results.
  • Related work and conclusion are given in Sections 6 and 7
Results
  • The authors present the experimental results of the proposed algorithm iNeAt. The authors evaluate the algorithm in the following two aspects:.
  • — Effectiveness: How accurate is the algorithm for aligning incomplete networks?
  • How effective is the algorithm to recover missing edges by leveraging the alignment result?.
  • — Efficiency: How fast and scalable is the algorithm?
  • — Effectiveness: How accurate is the algorithm for aligning incomplete networks? How effective is the algorithm to recover missing edges by leveraging the alignment result?
Conclusion
  • In the era of big data, the multi-sourced and incomplete characteristics often co-exist in many real networks.
  • The empirical evaluations demonstrate the effectiveness and efficiency of the proposed iNeAt algorithm.
  • It (1) improves the alignment accuracy by up to 30% over the existing network alignment methods, in the it leads a better imputation outcome; and (2) achieves a good quality-speed balance and scales linearly w.r.t the number of nodes in the networks.
  • Future work includes extending the algorithm to handle attributed networks and other ways to leverage the prior knowledge
Summary
  • Introduction:

    Networks are prevalent and naturally appear in many areas. More often than not, in the big data era, networks in many high-impact applications are collected from multiple sources (i.e., variety), such as social networks from different social platforms, protein–protein interaction (PPI) networks from multiple tissues, and transaction networks from multiple financial institutes.
  • In the big data era, networks in many high-impact applications are collected from multiple sources, such as social networks from different social platforms, protein–protein interaction (PPI) networks from multiple tissues, and transaction networks from multiple financial institutes.
  • In order to integrate the considerable information associated with multiple networks, network alignment is of key importance to find the node correspondence across networks.
  • By aligning the same users in different transaction networks, the transaction patterns of users can be comprehended to enhance the financial fraud detection.
  • Network completion has become another key task which benefits many graph mining applications by providing higher-quality networks if handled properly
  • Methods:

    The authors evaluate the effectiveness and efficiency of the proposed algorithm by extensive experiments.
  • The rest of the article is organized as follows.
  • Section 2 defines the incomplete network alignment problem and provides some preliminaries of the article.
  • Section 3 presents the proposed optimization formulation of iNeAt and Section 4 gives an effective optimization algorithm, followed by some analyses.
  • Section 5 presents the experimental results.
  • Related work and conclusion are given in Sections 6 and 7
  • Results:

    The authors present the experimental results of the proposed algorithm iNeAt. The authors evaluate the algorithm in the following two aspects:.
  • — Effectiveness: How accurate is the algorithm for aligning incomplete networks?
  • How effective is the algorithm to recover missing edges by leveraging the alignment result?.
  • — Efficiency: How fast and scalable is the algorithm?
  • — Effectiveness: How accurate is the algorithm for aligning incomplete networks? How effective is the algorithm to recover missing edges by leveraging the alignment result?
  • Conclusion:

    In the era of big data, the multi-sourced and incomplete characteristics often co-exist in many real networks.
  • The empirical evaluations demonstrate the effectiveness and efficiency of the proposed iNeAt algorithm.
  • It (1) improves the alignment accuracy by up to 30% over the existing network alignment methods, in the it leads a better imputation outcome; and (2) achieves a good quality-speed balance and scales linearly w.r.t the number of nodes in the networks.
  • Future work includes extending the algorithm to handle attributed networks and other ways to leverage the prior knowledge
Tables
  • Table1: Symbols and Notations
  • Table2: Statistics of Datasets
Download tables as Excel
Related work
  • Network Alignment. In general, network alignment has two categories, i.e., local alignment and global alignment. Among others, local network alignment aims to uncover the alignment among small regions across multiple networks, such as motifs and small subgraphs. Some recent works include [4, 26, 30]. Nevertheless, local network alignment might be too restrictive to effectively find the node correspondence. On the other side, many global network alignment algorithms that targets to find node alignment, are based on the topology consistency. For example, one early wellknown approach IsoRank computes the cross-network pairwise topology similarities by propagating the similarities of their neighboring node pairs and it is shown that this can be formulated as a random-walk propagation procedure in the Kronecker product graph [34]. In addition, IsoRankN [20] extends the original IsoRank algorithm by using PageRank-Nibble [1] to align multiple networks. BigAlign [16] and UMA [42] both assume that one network is a noisy permutation of the other network, whereas [42] is generalized to align multiple networks by adding the transitivity constraints. NetAlign formulates the network alignment problem as an optimization problem to maximize the number of aligned neighboring node pairs [3] and solve it based on a beliefpropagation heuristic.
Funding
  • This work is supported by the National Science Foundation under grant No 1947135 and 1715385, by the NSF Program on Fairness in AI in collaboration with Amazon under award No 1939725, by the United States Air Force and DARPA under contract number FA8750-17-C-0153, and Department of Homeland Security under Grant Award Number 2017-ST061-QA0001
Reference
  • Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using pagerank vectors. InProceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, (FOCS’06). IEEE, 475–486.
    Google ScholarLocate open access versionFindings
  • Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco. 2014. Who to follow and why: Link prediction with explanations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1266–1275.
    Google ScholarLocate open access versionFindings
  • Mohsen Bayati, David F. Gleich, Amin Saberi, and Ying Wang. 201Message-passing algorithms for sparse network alignment. ACM Transactions on Knowledge Discovery from Data 7, 1 (2013), 3.
    Google ScholarLocate open access versionFindings
  • Johannes Berg and Michael Lässig. 200Local graph alignment and motif search in biological networks. Proceedings of the National Academy of Sciences of the United States of America 101, 41 (2004), 14689–14694.
    Google ScholarLocate open access versionFindings
  • Nicolas Boumal and Pierre-antoine Absil. 2011. RTRMC: A riemannian trust-region method for low-rank matrix completion. In Proceedings of the Advances in Neural Information Processing Systems. 406–414.
    Google ScholarLocate open access versionFindings
  • Ulrik Brandes. 2008. On variants of shortest-path betweenness centrality and their generic computation. Social Networks 30, 2 (2008), 136–145.
    Google ScholarLocate open access versionFindings
  • Jian-Feng Cai, Emmanuel J. Candès, and Zuowei Shen. 2010. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20, 4 (2010), 1956–1982.
    Google ScholarLocate open access versionFindings
  • Zheng Chen, Xinli Yu, Bo Song, Jianliang Gao, Xiaohua Hu, and Wei-Shih Yang. 2017. Community-based network alignment for large attributed network. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 587–596.
    Google ScholarLocate open access versionFindings
  • Chris Ding, Xiaofeng He, and Horst D. Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the 2005 SIAM International Conference on Data Mining. SIAM, 606–610.
    Google ScholarLocate open access versionFindings
  • Boxin Du and Hanghang Tong. 2018. FASTEN: Fast sylvester equation solver for graph mining. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1339–1347.
    Google ScholarLocate open access versionFindings
  • Boxin Du and Hanghang Tong. 2019. MrMine: Multi-resolution Multi-network embedding. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 479–488.
    Google ScholarLocate open access versionFindings
  • Mohammed El-Kebir, Jaap Heringa, and Gunnar W. Klau. 2015. Natalie 2.0: Sparse global network alignment as a special case of quadratic assignment. Algorithms 8, 4 (2015), 1035–1051.
    Google ScholarLocate open access versionFindings
  • Somaye Hashemifar and Jinbo Xu. 2014. Hubalign: An accurate and efficient method for global alignment of protein– protein interaction networks. Bioinformatics 30, 17 (2014), i438–i444.
    Google ScholarLocate open access versionFindings
  • Mark Heimann, Haoming Shen, and Danai Koutra. 2018. Node representation learning for multiple networks: The case of graph alignment. Arxiv Preprint Arxiv:1802.06257 (2018).
    Findings
  • Xiangnan Kong, Jiawei Zhang, and Philip S. Yu. 2013. Inferring anchor links across multiple heterogeneous social networks. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 179–188.
    Google ScholarLocate open access versionFindings
  • Danai Koutra, Hanghang Tong, and David Lubensky. 2013. Big-align: Fast bipartite graph alignment. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining. IEEE, 389–398.
    Google ScholarLocate open access versionFindings
  • Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research 11, Feb (2010), 985–1042.
    Google ScholarLocate open access versionFindings
  • Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 2.
    Google ScholarLocate open access versionFindings
  • Jure Leskovec and Julian J. Mcauley. 2012. Learning to discover social circles in ego networks. In Proceedings of the Advances in Neural Information Processing Systems. 539–547.
    Google ScholarLocate open access versionFindings
  • Chung-Shou Liao, Kanghao Lu, Michael Baym, Rohit Singh, and Bonnie Berger. 2009. IsoRankN: Spectral methods for global alignment of multiple protein networks. Bioinformatics 25, 12 (2009), i253–i258.
    Google ScholarLocate open access versionFindings
  • David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. Journal of the Association for Information Science and Technology 58, 7 (2007), 1019–1031.
    Google ScholarLocate open access versionFindings
  • Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye. 2013. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 208–220.
    Google ScholarLocate open access versionFindings
  • Li Liu, William K. Cheung, Xin Li, and Lejian Liao. 2016. Aligning users across social networks using network embedding. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. 1774–1780.
    Google ScholarLocate open access versionFindings
  • Yuanyuan Liu, Fanhua Shang, Hong Cheng, James Cheng, and Hanghang Tong. 2014. Factor matrix trace norm minimization for low-rank tensor completion. In Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM, 866–874.
    Google ScholarLocate open access versionFindings
  • Noël Malod-Dognin and Nataša Pržulj. 2015. L-GRAAL: Lagrangian graphlet-based network aligner. Bioinformatics 31, 13 (2015), 2182–2189.
    Google ScholarLocate open access versionFindings
  • Hazel N. Manners, Ahed Elmsallati, Pietro H. Guzzi, Swarup Roy, and Jugal K. Kalita. 2017. Performing local network alignment by ensembling global aligners. In Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM’17). IEEE, 1316–1323.
    Google ScholarLocate open access versionFindings
  • Farzan Masrour, Iman Barjesteh, Rana Forsati, Abdol-Hossein Esfahanian, and Hayder Radha. 2015. Network completion with node similarity: A matrix completion approach with provable guarantees. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’15). IEEE, 302– 307.
    Google ScholarLocate open access versionFindings
  • Aditya Krishna Menon and Charles Elkan. 2011. Link prediction via matrix factorization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 437–452.
    Google ScholarLocate open access versionFindings
  • Kurt Miller, Michael I. Jordan, and Thomas L. Griffiths. 2009. Nonparametric latent feature models for link prediction. In Proceedings of the Advances in Neural Information Processing Systems. 1276–1284.
    Google ScholarLocate open access versionFindings
  • Marco Mina and Pietro Hiram Guzzi. 2014. Improving the robustness of local network alignment: Design and extensive assessment of a markov clustering-based approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics 11, 3 (2014), 561–572.
    Google ScholarLocate open access versionFindings
  • K. B. Petersen, M. S. Pedersen, and others. 2008. The matrix cookbook, vol 7. Technical University of Denmark 15 (2008).
    Google ScholarLocate open access versionFindings
  • Benjamin Recht. 2011. A simpler approach to matrix completion. Journal of Machine Learning Research 12, Dec (2011), 3413–3430.
    Google ScholarLocate open access versionFindings
  • Jasson D. M. Rennie and Nathan Srebro. 2005. Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd International Conference on Machine Learning. ACM, 713–719.
    Google ScholarLocate open access versionFindings
  • Rohit Singh, Jinbo Xu, and Bonnie Berger. 2008. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences 105, 35 (2008), 12763– 12768.
    Google ScholarLocate open access versionFindings
  • Sucheta Soundarajan, Tina Eliassi-Rad, Brian Gallagher, and Ali Pinar. 2016. MaxReach: Reducing network incompleteness through node probes. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’16). IEEE, 152–157.
    Google ScholarLocate open access versionFindings
  • Kim-Chuan Toh and Sangwoon Yun. 2010. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific Journal of Optimization 6, 615–640 (2010), 15.
    Google ScholarLocate open access versionFindings
  • Rianne van den Berg, Thomas N. Kipf, and Max Welling. 2017. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263.
    Findings
  • Vipin Vijayan, Vikram Saraph, and T. Milenković. 2015. MAGNA++: Maximizing accuracy in global network alignment via both node and edge conservation. Bioinformatics 31, 14 (2015), 2409–2411.
    Google ScholarLocate open access versionFindings
  • Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181–213.
    Google ScholarLocate open access versionFindings
  • Reza Zafarani and Huan Liu. 2013. Connecting users across social media sites: A behavioral-modeling approach. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 41–49.
    Google ScholarLocate open access versionFindings
  • Jiawei Zhang, Jianhui Chen, Shi Zhi, Yi Chang, S. Yu Philip, and Jiawei Han. 2017. Link prediction across aligned networks with sparse and low rank matrix estimation. In Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering (ICDE’17). IEEE, 971–982.
    Google ScholarLocate open access versionFindings
  • Jiawei Zhang and S. Yu Philip. 2015. Multiple anonymized social networks alignment. In Proceedings of the 2015 IEEE International Conference onData Mining (ICDM’15). IEEE, 599–608.
    Google ScholarLocate open access versionFindings
  • Si Zhang and Hanghang Tong. 2016. Final: Fast attributed network alignment. In Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.
    Google ScholarLocate open access versionFindings
  • Si Zhang and Hanghang Tong. 2018. Attributed network alignment: Problem definitions and fast solutions. IEEE Transactions on Knowledge and Data Engineering 31, 9 (2018), 1680–1692.
    Google ScholarLocate open access versionFindings
  • Si Zhang, Hanghang Tong, Jiejun Xu, Yifan Hu, and Ross Maciejewski. 2019. Origin: Non-rigid network alignment. In Proceedings of the 2019 IEEE International Conference on Big Data. IEEE.
    Google ScholarLocate open access versionFindings
  • Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip S Yu. 2015. Cosnet: Connecting heterogeneous social networks with local and global consistency. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1485–1494. Received May 2018; revised November 2019; accepted February 2020
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments