## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Slim Fly: A Cost Effective Low-Diameter Network Topology

New Orleans, LA, pp.348-359, (2014)

EI WOS

Full Text

Weibo

Keywords

Abstract

We introduce a high-performance cost-effective network topology called Slim Fly that approaches the theoretically optimal network diameter. Slim Fly is based on graphs that approximate the solution to the degree-diameter problem. We analyze Slim Fly and compare it to both traditional and state-of the-art networks. Our analysis shows that ...More

Code:

Data:

Introduction

- Interconnection networks play an important role in today’s large-scale computing systems.
- The importance of the network grows with ever increasing per-node performance and memory bandwidth.
- Large networks with tens of thousands of nodes are deployed in warehouse-sized HPC and data centers [8].
- Key properties of such networks are determined by their topologies: the arrangement of nodes and cables.
- High bandwidth is indispensable as many applications perform all-to-all communication [38].

Highlights

- Interconnection networks play an important role in today’s large-scale computing systems
- We propose a new topology, called Slim Fly, which further reduces the diameter and costs, energy consumption, and the latency of the network while maintaining high bandwidth and resiliency
- If the number of racks Nrck is not divisible by any x and y, we find z such that Nrck = x · y + z and we place remaining z racks at an arbitrary side
- N cannot be identical for each topology due to the limited number of networks in their balanced configurations
- We demonstrated the Slim Fly topology which allows the construction of low-latency, full-bandwidth, and resilient networks at a lower cost than existing topologies
- Under the current technology constraints, we achieve a 25% cost and power benefit over Dragonfly
- We propose a new class of topologies called Slim Fly networks to implement large datacenter and HPC network architectures

Results

- All with diameter three in the examples, are very resilient, and one can remove up to 75% of the links before the network is disconnected.
- For a network size N = 213, SF can withstand up to 40% link failures before the diameter grows beyond four.
- DLN is most resilient and can sustain up to 60% link failures for a network with N = 213.
- SF is ≈25% more cost-effective than DF, and almost 30%, 40%, and 50% less expensive than FBF-3, DLN, and FT-3.
- Under the current technology constraints, the authors achieve a 25% cost and power benefit over Dragonfly

Conclusion

**Discussion of the Results**

Figure 11c presents the total cost of balanced networks. A detailed case-study showing cost per endpoint for an SF with ≈10K endpoints and radix 43 can be found in Table IV.- For network sizes up to 20,000, there are 11 balanced SF variants with full global bandwidth; DF offers only 8 such designs
- Many of these variants can be directly constructed using readily available Mellanox routers with 18, 36, or 108 ports.
- The authors utilize a notion that lowering the network diameter reduces the amount of expensive network resources used by packets traversing the network while maintaining high bandwidth
- The authors define it as an optimization problem and the authors optimize towards the Moore Bound.

- Table1: Symbols used in the paper
- Table2: Topologies compared in the paper, their diameters (§ III-A), and example existing HPC systems that use respective topologies
- Table3: Cost and power comparison between a Slim Fly (N = 10830, k = 43) and other networks (§ VI-B4 and § VI-C). We select low-radix networks with N comparable to that of Slim Fly. N cannot be identical due to the limited number of existing network configurations. For high-radix topologies, we select comparable N and we also compare to topologies with fixed radix k. We also construct and analyze one additional variant of a DF that has both comparable N and identical k as the analyzed SF. Each of these groups of topologies is indicated with a bolded parameter
- Table4: Disconnection Resiliency (§ III-D1): the maximum number of cables that can be removed before the network is disconnected. Missing values indicate the inadequacy of a balanced topology variant for a given N

Funding

- MB is supported by the 2013 Google European Doctoral Fellowship in Parallel Computing

Reference

- D. Abts. Cray XT4 and Seastar 3-D Torus Interconnect. Encyclopedia of Parallel Computing, pages 470–477, 2011.
- D. Abts, M. R. Marty, P. M. Wells, P. Klausler, and H. Liu. Energy Proportional Datacenter Networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pages 338–347, New York, NY, USA, 2010. ACM.
- R. Alverson, D. Roweth, and L. Kaplan. The Gemini System Interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects, HOTI ’10, pages 83–87, Washington, DC, USA, 2010. IEEE Computer Society.
- B. Arimilli et al. The PERCS High-Performance Interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects, HOTI ’10, pages 75–82, Washington, DC, USA, 2010. IEEE Computer Society.
- R. Barriuso and A. Knies. 108-Port InfiniBand FDR SwitchX Switch Platform Hardware User Manual, 2014.
- J. Bermond, C. Delorme, and G. Farhi. Large graphs with given degree and diameter III. Annals of Discrete Mathematics, 13:23–32, 1982.
- B. Bollobas. Random Graphs. Cambridge University Press, 2001.
- D. Chen, N. Eisley, P. Heidelberger, S. Kumar, A. Mamidala, F. Petrini, R. Senger, Y. Sugawara, R. Walkup, B. Steinmacher-Burow, A. Choudhury, Y. Sabharwal, S. Singhal, and J. J. Parker. Looking Under the Hood of the IBM Blue Gene/Q Network. In Proceedings of the ACM/IEEE Supercomputing, SC ’12, pages 69:1–69:12, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
- D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker. The IBM Blue Gene/Q Interconnection Network and Message Unit. In Proceedings of 2011 ACM/IEEE Supercomputing, SC ’11, pages 26:1–26:10, New York, NY, USA, 2011. ACM.
- W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003.
- W. J. Dally. Performance Analysis of k-ary n-cube Interconnection Networks. IEEE Transactions on Computers, 39:775–785, 1990.
- W. J. Dally and C. L. Seitz. Deadlock-Free Message Routing in Multiprocessor Interconnection Networks. IEEE Trans. Comput., 36(5):547– 553, May 1987.
- C. Delorme. Grands Graphes de Degree et Diametre Donnes. Europ. J. Combinatorics, 6:291–302, 1985.
- J. Domke, T. Hoefler, and W. Nagel. Deadlock-Free Oblivious Routing for Arbitrary Topologies. In Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 613– 624. IEEE Computer Society, May 2011.
- J. Dongarra. Visit to the National University for Defense Technology Changsha, China. Oak Ridge National Laboratory, Tech. Rep., June, 2013.
- J. Duato. A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks. IEEE Trans. Parallel Distrib. Syst., 6(10):1055–1067, Oct. 1995.
- J. Duato, S. Yalamanchili, and N. Lionel. Interconnection Networks: An Engineering Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.
- G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese, R. Alverson, T. Johnson, J. Kopnick, M. Higgins, and J. Reinhard. Cray cascade: a scalable HPC system based on a Dragonfly network. In SC, page 103. IEEE/ACM, 2012.
- J. Flich, T. Skeie, A. Mejia, O. Lysne, P. Lopez, A. Robles, J. Duato, M. Koibuchi, T. Rokicki, and J. C. Sancho. A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms. IEEE Trans. Parallel Distrib. Syst., 23(3):405–425, Mar. 2012.
- C. Gomez, F. Gilabert, M. Gomez, P. Lopez, and J. Duato. Deterministic versus adaptive routing in fat-trees. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1–8, March 2007. Computer Networks, pages 338–344. IEEE Computer Society Press, Los Alamitos, CA, USA, 1994.
- McKay–Miller–Siran. Journal of Combinatorial Theory, Series
- B, 90(2):223 – 232, 2004.
- N. Jiang, D. U. Becker, G. Michelogiannakis, J. Balfour, B. Towles, D. E. Shaw, J. Kim, and W. J. Dally. A detailed and flexible cycleaccurate network-on-chip simulator. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on, pages 86–96. IEEE, 2013.
- N. Jiang, J. Kim, and W. J. Dally. Indirect Adaptive Routing on Large Scale Interconnection Networks. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA ’09, pages 220–231, New York, NY, USA, 2009. ACM.
- G. Karypis and V. Kumar. A Fast and Highly Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing, 20:359–392, 1999.
- J. Kim, J. Balfour, and W. Dally. Flattened Butterfly Topology for On-Chip Networks. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pages 172– 182, Washington, DC, USA, 2007. IEEE Computer Society.
- J. Kim, W. J. Dally, and D. Abts. Flattened Butterfly: A Cost-efficient Topology for High-radix Networks. In Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA ’07, pages 126–137, New York, NY, USA, 2007. ACM.
- J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-Driven, HighlyScalable Dragonfly Topology. In Proceedings of the 35th Annual International Symposium on Computer Architecture, ISCA ’08, pages 77–88, Washington, DC, USA, 2008. IEEE Computer Society.
- M. Koibuchi, H. Matsutani, H. Amano, D. F. Hsu, and H. Casanova. A case for random shortcut topologies for HPC interconnects. In ISCA’12, pages 177–188. IEEE, 2012.
- C. E. Leiserson. Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput., 34(10):892–901, Oct. 1985.
- R. Lidl and H. Niederreiter. Finite Fields: Encyclopedia of Mathematics and Its Applications. Computers & Mathematics with Applications, 33(7):136–136, 1997.
- B. D. McKay, M. Miller, and J. Siran. A note on large graphs of diameter two and given maximum degree. Journal of Combinatorial Theory, Series B, 74(1):110 – 118, 1998.
- M. Miller and J. Siran. Moore graphs and beyond: A survey of the degree/diameter problem. Electronic Journal of Combinatorics, 61:1– 63, 2005.
- N. Pippenger and G. Lin. Fault-tolerant circuit-switching networks. In Proceedings of the Fourth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’92, pages 229–235, New York, NY, USA, 1992. ACM.
- S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-Radix Clos Network. In Proceedings of the 33rd annual International Symposium on Computer Architecture, ISCA ’06, pages 16–28, Washington, DC, USA, 2006. IEEE Computer Society.
- A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.
- A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey. Jellyfish: networking data centers randomly. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, NSDI’12, pages 17–17, Berkeley, CA, USA, 2012. USENIX Association.
- S. Tiyyagura, P. Adamidis, R. Rabenseifner, P. Lammers, S. Borowski, F. Lippold, F. Svensson, O. Marxen, S. Haberhauer, A. Seitsonen, J. Furthmuller, K. Benkert, M. Galle, T. Bonisch, U. Kuster, and M. Resch. Teraflops Sustained Performance With Real World Applications. Int. J. High Perform. Comput. Appl., 22(2):131–148, May 2008.
- R. V. Tomic. Network Throughput Optimization via Error Correcting Codes. ArXiv e-prints, Jan. 2013.
- L. Valiant. A scheme for fast parallel communication. SIAM journal on computing, 11(2):350–361, 1982.
- J. Siagiova. A Note on the McKay-Miller-Siran Graphs. Journal of Combinatorial Theory, Series B, 81:205–208, 2001.
- R. Wolf. Nasa Pleiades Infiniband Communications Network, 2009. Intl. ACM Symposium on High Performance Distributed Computing.
- X. Yuan, S. Mahapatra, W. Nienaber, S. Pakin, and M. Lang. A New Routing Scheme for Jellyfish and Its Performance with HPC Workloads. In Proceedings of 2013 ACM/IEEE Supercomputing, SC ’13, pages 36:1–36:11, 2013.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn