AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
When tested on various social and web graphs, we find that streaming algorithms do not suffer from observed pathologies of the synchronous assignment process used by Balanced Label Propagation or Social Hash partitioner-based algorithms—namely moving or swapping neighboring nodes...

Prioritized Restreaming Algorithms for Balanced Graph Partitioning

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event..., pp.1877-1887, (2020)

Cited: 1|Views40
EI

Abstract

Balanced graph partitioning is a critical step for many large-scale distributed computations with relational data. As graph datasets have grown in size and density, a range of highly-scalable balanced partitioning algorithms have appeared to meet varied demands across different domains. As the starting point for the present work, we obser...More

Code:

Data:

0
Introduction
  • Graphs are ubiquitous structures in computer science for representing a host of real-world systems, including social and information networks, biological networks, and meshed domains in physics simulations.
  • Twitter sees hundreds of millions of monthly active users interact by sharing and liking each others content.
  • In all these examples, graph-wide computations—most notably in the service of ranking and recommendation problems—are central to the core functions of many products and services
Highlights
  • Graphs are ubiquitous structures in computer science for representing a host of real-world systems, including social and information networks, biological networks, and meshed domains in physics simulations
  • The contribution of this work can be summarized in three points: (1) We provide benchmarking that has been absent from the literature, showing that the existing restreaming algorithm Restreamed Linear Deterministic Greedy (reLDG) outperforms Balanced Label Propagation (BLP) and SHP1 on a range of real-world graphs
  • (3) We introduce both static and dynamic stream orderings, where the latter can vary between stream iterations, as a way to inject priority into streaming algorithms for balanced graph partitioning
  • To study question (1), we report the partition qualities of all methods— BLP, KL-Social Hash partitioner (SHP) and its restricted forms (SHP-I, SHP-II), and reLDG with six stream orders—on all networks in Table 2
  • We dissect the design decisions involved in recent highly-scalable iterative algorithms for balanced partitioning
  • When tested on various social and web graphs, we find that streaming algorithms do not suffer from observed pathologies of the synchronous assignment process used by BLP or SHP-based algorithms—namely moving or swapping neighboring nodes away from or past each other
Methods
  • The authors present three existing iterative algorithms— Balanced Label Propagation (BLP), Social Hash partitioner (SHP), and Restreaming Linear Deterministic Greedy—as they are published in the literature.
  • The BLP algorithm makes iterative, balanced improvements to an initial feasible partitioning of the node set until an equilibrium is achieved.
  • The authors use the simplest initialization—random balanced assignment—for comparison with other methods, though careful initialization has been shown to achieve a better equilibrium cut, depending on both context and available metadata [39].
Results
  • (1) How do the presented algorithms for balanced graph partitioning, which previously haven’t been well-benchmarked, compare in terms of cut quality?.
  • (2) What role do the modules in Section 4 play in the performance of these methods?.
  • The authors focus the tests of balanced partitioning algorithms on a fixed number of shards (k = 16) and number of iterations (t = 10), studying varied social and web networks described in Table 1.
  • All methods are presented under exact balance, ε = 0 in the problem formulation in Section 2.
  • Given that all methods are to some extent random, if only in the handling of tie-breaks, all tabulated results were averaged over ten trials
Conclusion
  • The authors dissect the design decisions involved in recent highly-scalable iterative algorithms for balanced partitioning.
  • Based on this dissection, the authors introduce a new class, prioritized streaming algorithms, that leverages prioritization ideas from synchronous algorithms within the streaming setting.
  • When tested on various social and web graphs, the authors find that streaming algorithms do not suffer from observed pathologies of the synchronous assignment process used by BLP or SHP-based algorithms—namely moving or swapping neighboring nodes away from or past each other.
  • Though initially proposed in the online setting—moving graphs between clusters—the results clarify that restreaming algorithms are major contenders as highly scalable offline partitioners
Tables
  • Table1: Test networks, all from the SNAP repository [<a class="ref-link" id="c18" href="#r18">18</a>]. Here dis average degree and LCC denotes the percent of nodes in the largest connected component
  • Table2: Internal edge fractions of 16-shard partitioning after 10 iterations of each method under exact balance (ε = 0). Highest quality, excluding METIS (0.001), in bold. As a family, reLDG and its various stream orderings outperform the top performer of the synchronous class, with the best performance coming from ambivalence (4 of 7 networks). Of the synchronous methods, SHP-I and SHP-II show superior results over their more advanced counterparts on all graphs
  • Table3: Results from varying the number of shards, k. All results on LiveJournal network with ε given under each method name. Bold denotes most performant method (excluding METIS). Ambivalence-sorted reLDG (reLDG-a) consistently yields a higher quality partition than these previously benchmarked methods [<a class="ref-link" id="c3" href="#r3">3</a>]
Download tables as Excel
Related work
  • Graph partitioning and its balanced variation are well-studied problems, with major results dating back to at least 1970. Many classes of algorithms for balanced graph partitioning were omitted from this work, primarily because of their poor scaling properties when considering truly massive graphs, though we highlight some notable algorithms in this section. Borrowing nomenclature from [6], the class of “global” balanced partitioners considers the entire graph in some capacity and strives to achieve a solution to adjacent problems with some version of theoretical guarantees, e.g. spectral partitioning or max-flow/min-cut-based algorithms [2, 5] for bipartitioning.

    Given a bipartitioning algorithm, one can achieve a k-way partition by recursively cutting the graph log k times.

    The earliest iterative algorithms for k-way partitioning were based on recursive schemes for bisection [10, 15]. However, these methods are less than ideal in our context for a few reasons: (1) spectral algorithms become impractical to compute for extremely large graphs, and in this work we focus on the frontier of truly massive graphs and (2) recursive bisection greatly restricts the k-way partition. Hence, we focus our work on direct k-way partitioning algorithms.
Funding
  • This work is funded in part by a Young Investigator Award from the Army Research Office (JU, 73348-NS-YIP) and a National Science Foundation Graduate Research Fellowship (AA, 2017237604)
Reference
  • K. Andreev and H. Racke. 2006. Balanced Graph Partitioning. Theory of Computing Systems 39, 6 (2006), 929 – 939.
    Google ScholarLocate open access versionFindings
  • M. Armbruster, M. Fügenschuh, C. Helmberg, and A. Martin. 2008. A Comparative Study of Linear and Semidefinite Branch-and-Cut Methods for Solving the Minimum Graph Bisection Problem. In IPCO. 112–124.
    Google ScholarLocate open access versionFindings
  • Kevin Aydin, MohammadHossein Bateni, and Vahab Mirrokni. 2019. Distributed balanced partitioning via linear embedding. Algorithms 12, 8 (2019), 162.
    Google ScholarLocate open access versionFindings
  • MohammadHossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, and Vahab Mirrokni. 2017. Affinity clustering: Hierarchical clustering at scale. In NIPS. 6864–6874.
    Google ScholarFindings
  • L. Brunetta, M. Conforti, and G. Rinaldi. 1997. A branch-and-cut algorithm for the equicut problem. Mathematical Programming 78, 2 (1997), 243–263.
    Google ScholarLocate open access versionFindings
  • A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz. 2013. Recent Advances in Graph Partitioning. (2013). arXiv:1311.3144
    Findings
  • U. V. Catalyurek and C. Aykanat. 1999. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems 10, 7 (July 1999), 673–693.
    Google ScholarLocate open access versionFindings
  • C. Chevalier and I. Safro. 2009. Comparison of Coarsening Schemes for Multilevel Graph Partitioning. In Learning and Intelligent Optimization. 191–205.
    Google ScholarLocate open access versionFindings
  • Q. Duong, S. Goel, J. Hofman, and S. Vassilvitskii. 2013. Sharding Social Networks. In WSDM. New York, NY, USA, 223–232.
    Google ScholarFindings
  • C. M. Fiduccia and R. M. Mattheyses. 1982. A Linear-Time Heuristic for Improving Network Partitions. In 19th Design Automation Conference. 175–181.
    Google ScholarLocate open access versionFindings
  • J. E Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. Powergraph: Distributed graph-parallel computation on natural graphs. In USENIX OSDI.
    Google ScholarLocate open access versionFindings
  • I. Kabiljo, B. Karrer, M. Pundir, S. Pupyrev, and A. Shalita. 2017. Social hash partitioner: a scalable distributed hypergraph partitioner. VLDB 10, 11 (2017).
    Google ScholarLocate open access versionFindings
  • George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1 (1998), 359–392.
    Google ScholarLocate open access versionFindings
  • G. Karypis and V. Kumar. 1998. Multilevel k-way Partitioning Scheme for Irregular Graphs. J. Parallel and Distrib. Comput. 48, 1 (1998), 96 – 129.
    Google ScholarLocate open access versionFindings
  • B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal 49, 2 (Feb 1970), 291–307.
    Google ScholarLocate open access versionFindings
  • D. Lasalle and G. Karypis. 2013. Multi-threaded Graph Partitioning. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 225–236.
    Google ScholarLocate open access versionFindings
  • Jure Leskovec and Eric Horvitz. 2008. Planetary-Scale Views on an InstantMessaging Network. In WWW. 915–924.
    Google ScholarLocate open access versionFindings
  • Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
    Findings
  • Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29–123.
    Google ScholarLocate open access versionFindings
  • G. Malewicz, M. H. Austern, A. J.C Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In SIGMOD.
    Google ScholarLocate open access versionFindings
  • C. Martella, D. Logothetis, A. Loukas, and G. Siganos. 2017. Spinner: Scalable Graph Partitioning in the Cloud. In ICDE. 1083–1094.
    Google ScholarFindings
  • H. Meyerhenke, B. Monien, and S. Schamberger. 2006. Accelerating shape optimizing load balancing for parallel FEM simulations by algebraic multigrid. In IPDPS, Vol. 2006. 10 pp.
    Google ScholarLocate open access versionFindings
  • H. Meyerhenke, B. Monien, and S. Schamberger. 2009. Graph partitioning and disturbed diffusion. Parallel Comput. 35, 10 (2009), 544 – 569.
    Google ScholarLocate open access versionFindings
  • J. Nishimura and J. Ugander. 2013. Restreaming Graph Partitioning: Simple Versatile Algorithms for Advanced Balancing. In KDD. 1106–1114.
    Google ScholarFindings
  • V. Osipov and P. Sanders. 2010. n-Level Graph Partitioning. CoRR (2010). arXiv:1004.4024
    Findings
  • Anil Pacaci and M. Tamer Özsu. 2019. Experimental Analysis of Streaming Algorithms for Graph Partitioning. In SIGMOD. 1375–1392.
    Google ScholarFindings
  • F. Pellegrini. 2007. A Parallelisable Multi-level Banded Diffusion Scheme for Computing Balanced Partitions with Smooth Boundaries. In Euro-Par 2007 Parallel Processing. 195–204.
    Google ScholarLocate open access versionFindings
  • U. Raghavan, R. Albert, and S. Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E (2007), 11.
    Google ScholarLocate open access versionFindings
  • Erzsébet Regan and Albert-Laszlo Barabasi. 2003. Hierarchical Organization in Complex Networks. Physical Review E 67 (03 2003), 026112.
    Google ScholarLocate open access versionFindings
  • Peter Sanders and Christian Schulz. 2011. Engineering multilevel graph partitioning algorithms. In European Symposium on Algorithms. Springer, 469–480.
    Google ScholarFindings
  • Mohamed Sarwat, Sameh Elnikety, Yuxiong He, and Gabriel Kliot. 2012. Horton: Online Query Execution Engine for Large Distributed Graphs. In ICDE.
    Google ScholarFindings
  • Venu Satuluri, Srinivasan Parthasarathy, and Yiye Ruan. 2011. Local Graph Sparsification for Scalable Clustering. In SIGMOD. 721–732.
    Google ScholarLocate open access versionFindings
  • M. Saveski, J. Pouget-Abadie, G. Saint-Jacques, W. Duan, S. Ghosh, Y. Xu, and E. Airoldi. 2017. Detecting network effects: Randomizing over randomized experiments. In KDD. 1027–1035.
    Google ScholarFindings
  • A. Shalita, B. Karrer, I. Kabiljo, A. Sharma, A. Presta, A. Adcock, H. Kllapi, and M. Stumm. 2016. Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks. In USENIX NSDI. 455–468.
    Google ScholarLocate open access versionFindings
  • Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a Memory Cloud. In SIGMOD.
    Google ScholarFindings
  • Isabelle Stanton. 2014. Streaming balanced graph partitioning algorithms for random graphs. In SODA. 1287–1301.
    Google ScholarFindings
  • Isabelle Stanton and Gabriel Kliot. 2012. Streaming Graph Partitioning for Large Distributed Graphs. In KDD. 1222–1230.
    Google ScholarFindings
  • C. E. Tsourakakis, C. Skantsidis, B. Radunovic, and M. Vojnovic. 2014. FENNEL: Streaming Graph Partitioning for Massive Scale Graphs. In WSDM.
    Google ScholarFindings
  • J. Ugander and L. Backstrom. 2013. Balanced Label Propagation for Partitioning Massive Graphs. In WSDM. 507–516.
    Google ScholarLocate open access versionFindings
  • J. Ugander, B. Karrer, L. Backstrom, and J. Kleinberg. 2013. Graph cluster randomization: Network exposure to multiple universes. In KDD. 329–337.
    Google ScholarLocate open access versionFindings
  • J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. 2011. The Anatomy of the Facebook Social Graph. (2011). arXiv:1111.4503
    Findings
  • S. Vigna. 2015. A weighted correlation index for rankings with ties. In WWW.
    Google ScholarFindings
  • E. Voorhees. 2002. Evaluation by Highly Relevant Documents. In SIGIR Forum.
    Google ScholarLocate open access versionFindings
  • Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘smallworld’networks. Nature 393, 6684 (1998), 440–442.
    Google ScholarLocate open access versionFindings
  • Xiaojin Zhu and Zoubin Ghahramani. 2002. Learning from Labeled and
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn