## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Prioritized Restreaming Algorithms for Balanced Graph Partitioning

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event..., pp.1877-1887, (2020)

EI

Full Text

Weibo

Keywords

Abstract

Balanced graph partitioning is a critical step for many large-scale distributed computations with relational data. As graph datasets have grown in size and density, a range of highly-scalable balanced partitioning algorithms have appeared to meet varied demands across different domains. As the starting point for the present work, we obser...More

Code:

Data:

Introduction

- Graphs are ubiquitous structures in computer science for representing a host of real-world systems, including social and information networks, biological networks, and meshed domains in physics simulations.
- Twitter sees hundreds of millions of monthly active users interact by sharing and liking each others content.
- In all these examples, graph-wide computations—most notably in the service of ranking and recommendation problems—are central to the core functions of many products and services

Highlights

- Graphs are ubiquitous structures in computer science for representing a host of real-world systems, including social and information networks, biological networks, and meshed domains in physics simulations
- The contribution of this work can be summarized in three points: (1) We provide benchmarking that has been absent from the literature, showing that the existing restreaming algorithm Restreamed Linear Deterministic Greedy (reLDG) outperforms Balanced Label Propagation (BLP) and SHP1 on a range of real-world graphs
- (3) We introduce both static and dynamic stream orderings, where the latter can vary between stream iterations, as a way to inject priority into streaming algorithms for balanced graph partitioning
- To study question (1), we report the partition qualities of all methods— BLP, KL-Social Hash partitioner (SHP) and its restricted forms (SHP-I, SHP-II), and reLDG with six stream orders—on all networks in Table 2
- We dissect the design decisions involved in recent highly-scalable iterative algorithms for balanced partitioning
- When tested on various social and web graphs, we find that streaming algorithms do not suffer from observed pathologies of the synchronous assignment process used by BLP or SHP-based algorithms—namely moving or swapping neighboring nodes away from or past each other

Methods

- The authors present three existing iterative algorithms— Balanced Label Propagation (BLP), Social Hash partitioner (SHP), and Restreaming Linear Deterministic Greedy—as they are published in the literature.
- The BLP algorithm makes iterative, balanced improvements to an initial feasible partitioning of the node set until an equilibrium is achieved.
- The authors use the simplest initialization—random balanced assignment—for comparison with other methods, though careful initialization has been shown to achieve a better equilibrium cut, depending on both context and available metadata [39].

Results

- (1) How do the presented algorithms for balanced graph partitioning, which previously haven’t been well-benchmarked, compare in terms of cut quality?.
- (2) What role do the modules in Section 4 play in the performance of these methods?.
- The authors focus the tests of balanced partitioning algorithms on a fixed number of shards (k = 16) and number of iterations (t = 10), studying varied social and web networks described in Table 1.
- All methods are presented under exact balance, ε = 0 in the problem formulation in Section 2.
- Given that all methods are to some extent random, if only in the handling of tie-breaks, all tabulated results were averaged over ten trials

Conclusion

- The authors dissect the design decisions involved in recent highly-scalable iterative algorithms for balanced partitioning.
- Based on this dissection, the authors introduce a new class, prioritized streaming algorithms, that leverages prioritization ideas from synchronous algorithms within the streaming setting.
- When tested on various social and web graphs, the authors find that streaming algorithms do not suffer from observed pathologies of the synchronous assignment process used by BLP or SHP-based algorithms—namely moving or swapping neighboring nodes away from or past each other.
- Though initially proposed in the online setting—moving graphs between clusters—the results clarify that restreaming algorithms are major contenders as highly scalable offline partitioners

- Table1: Test networks, all from the SNAP repository [<a class="ref-link" id="c18" href="#r18">18</a>]. Here dis average degree and LCC denotes the percent of nodes in the largest connected component
- Table2: Internal edge fractions of 16-shard partitioning after 10 iterations of each method under exact balance (ε = 0). Highest quality, excluding METIS (0.001), in bold. As a family, reLDG and its various stream orderings outperform the top performer of the synchronous class, with the best performance coming from ambivalence (4 of 7 networks). Of the synchronous methods, SHP-I and SHP-II show superior results over their more advanced counterparts on all graphs
- Table3: Results from varying the number of shards, k. All results on LiveJournal network with ε given under each method name. Bold denotes most performant method (excluding METIS). Ambivalence-sorted reLDG (reLDG-a) consistently yields a higher quality partition than these previously benchmarked methods [<a class="ref-link" id="c3" href="#r3">3</a>]

Related work

- Graph partitioning and its balanced variation are well-studied problems, with major results dating back to at least 1970. Many classes of algorithms for balanced graph partitioning were omitted from this work, primarily because of their poor scaling properties when considering truly massive graphs, though we highlight some notable algorithms in this section. Borrowing nomenclature from [6], the class of “global” balanced partitioners considers the entire graph in some capacity and strives to achieve a solution to adjacent problems with some version of theoretical guarantees, e.g. spectral partitioning or max-flow/min-cut-based algorithms [2, 5] for bipartitioning.

Given a bipartitioning algorithm, one can achieve a k-way partition by recursively cutting the graph log k times.

The earliest iterative algorithms for k-way partitioning were based on recursive schemes for bisection [10, 15]. However, these methods are less than ideal in our context for a few reasons: (1) spectral algorithms become impractical to compute for extremely large graphs, and in this work we focus on the frontier of truly massive graphs and (2) recursive bisection greatly restricts the k-way partition. Hence, we focus our work on direct k-way partitioning algorithms.

Funding

- This work is funded in part by a Young Investigator Award from the Army Research Office (JU, 73348-NS-YIP) and a National Science Foundation Graduate Research Fellowship (AA, 2017237604)

Reference

- K. Andreev and H. Racke. 2006. Balanced Graph Partitioning. Theory of Computing Systems 39, 6 (2006), 929 – 939.
- M. Armbruster, M. Fügenschuh, C. Helmberg, and A. Martin. 2008. A Comparative Study of Linear and Semidefinite Branch-and-Cut Methods for Solving the Minimum Graph Bisection Problem. In IPCO. 112–124.
- Kevin Aydin, MohammadHossein Bateni, and Vahab Mirrokni. 2019. Distributed balanced partitioning via linear embedding. Algorithms 12, 8 (2019), 162.
- MohammadHossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, and Vahab Mirrokni. 2017. Affinity clustering: Hierarchical clustering at scale. In NIPS. 6864–6874.
- L. Brunetta, M. Conforti, and G. Rinaldi. 1997. A branch-and-cut algorithm for the equicut problem. Mathematical Programming 78, 2 (1997), 243–263.
- A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz. 2013. Recent Advances in Graph Partitioning. (2013). arXiv:1311.3144
- U. V. Catalyurek and C. Aykanat. 1999. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems 10, 7 (July 1999), 673–693.
- C. Chevalier and I. Safro. 2009. Comparison of Coarsening Schemes for Multilevel Graph Partitioning. In Learning and Intelligent Optimization. 191–205.
- Q. Duong, S. Goel, J. Hofman, and S. Vassilvitskii. 2013. Sharding Social Networks. In WSDM. New York, NY, USA, 223–232.
- C. M. Fiduccia and R. M. Mattheyses. 1982. A Linear-Time Heuristic for Improving Network Partitions. In 19th Design Automation Conference. 175–181.
- J. E Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. Powergraph: Distributed graph-parallel computation on natural graphs. In USENIX OSDI.
- I. Kabiljo, B. Karrer, M. Pundir, S. Pupyrev, and A. Shalita. 2017. Social hash partitioner: a scalable distributed hypergraph partitioner. VLDB 10, 11 (2017).
- George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1 (1998), 359–392.
- G. Karypis and V. Kumar. 1998. Multilevel k-way Partitioning Scheme for Irregular Graphs. J. Parallel and Distrib. Comput. 48, 1 (1998), 96 – 129.
- B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal 49, 2 (Feb 1970), 291–307.
- D. Lasalle and G. Karypis. 2013. Multi-threaded Graph Partitioning. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 225–236.
- Jure Leskovec and Eric Horvitz. 2008. Planetary-Scale Views on an InstantMessaging Network. In WWW. 915–924.
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
- Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29–123.
- G. Malewicz, M. H. Austern, A. J.C Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In SIGMOD.
- C. Martella, D. Logothetis, A. Loukas, and G. Siganos. 2017. Spinner: Scalable Graph Partitioning in the Cloud. In ICDE. 1083–1094.
- H. Meyerhenke, B. Monien, and S. Schamberger. 2006. Accelerating shape optimizing load balancing for parallel FEM simulations by algebraic multigrid. In IPDPS, Vol. 2006. 10 pp.
- H. Meyerhenke, B. Monien, and S. Schamberger. 2009. Graph partitioning and disturbed diffusion. Parallel Comput. 35, 10 (2009), 544 – 569.
- J. Nishimura and J. Ugander. 2013. Restreaming Graph Partitioning: Simple Versatile Algorithms for Advanced Balancing. In KDD. 1106–1114.
- V. Osipov and P. Sanders. 2010. n-Level Graph Partitioning. CoRR (2010). arXiv:1004.4024
- Anil Pacaci and M. Tamer Özsu. 2019. Experimental Analysis of Streaming Algorithms for Graph Partitioning. In SIGMOD. 1375–1392.
- F. Pellegrini. 2007. A Parallelisable Multi-level Banded Diffusion Scheme for Computing Balanced Partitions with Smooth Boundaries. In Euro-Par 2007 Parallel Processing. 195–204.
- U. Raghavan, R. Albert, and S. Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E (2007), 11.
- Erzsébet Regan and Albert-Laszlo Barabasi. 2003. Hierarchical Organization in Complex Networks. Physical Review E 67 (03 2003), 026112.
- Peter Sanders and Christian Schulz. 2011. Engineering multilevel graph partitioning algorithms. In European Symposium on Algorithms. Springer, 469–480.
- Mohamed Sarwat, Sameh Elnikety, Yuxiong He, and Gabriel Kliot. 2012. Horton: Online Query Execution Engine for Large Distributed Graphs. In ICDE.
- Venu Satuluri, Srinivasan Parthasarathy, and Yiye Ruan. 2011. Local Graph Sparsification for Scalable Clustering. In SIGMOD. 721–732.
- M. Saveski, J. Pouget-Abadie, G. Saint-Jacques, W. Duan, S. Ghosh, Y. Xu, and E. Airoldi. 2017. Detecting network effects: Randomizing over randomized experiments. In KDD. 1027–1035.
- A. Shalita, B. Karrer, I. Kabiljo, A. Sharma, A. Presta, A. Adcock, H. Kllapi, and M. Stumm. 2016. Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks. In USENIX NSDI. 455–468.
- Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a Memory Cloud. In SIGMOD.
- Isabelle Stanton. 2014. Streaming balanced graph partitioning algorithms for random graphs. In SODA. 1287–1301.
- Isabelle Stanton and Gabriel Kliot. 2012. Streaming Graph Partitioning for Large Distributed Graphs. In KDD. 1222–1230.
- C. E. Tsourakakis, C. Skantsidis, B. Radunovic, and M. Vojnovic. 2014. FENNEL: Streaming Graph Partitioning for Massive Scale Graphs. In WSDM.
- J. Ugander and L. Backstrom. 2013. Balanced Label Propagation for Partitioning Massive Graphs. In WSDM. 507–516.
- J. Ugander, B. Karrer, L. Backstrom, and J. Kleinberg. 2013. Graph cluster randomization: Network exposure to multiple universes. In KDD. 329–337.
- J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. 2011. The Anatomy of the Facebook Social Graph. (2011). arXiv:1111.4503
- S. Vigna. 2015. A weighted correlation index for rankings with ties. In WWW.
- E. Voorhees. 2002. Evaluation by Highly Relevant Documents. In SIGIR Forum.
- Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘smallworld’networks. Nature 393, 6684 (1998), 440–442.
- Xiaojin Zhu and Zoubin Ghahramani. 2002. Learning from Labeled and

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn