AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We demonstrated the Slim Fly topology which allows the construction of low-latency, full-bandwidth, and resilient networks at a lower cost than existing topologies

Slim Fly: A Cost Effective Low-Diameter Network Topology

New Orleans, LA, pp.348-359, (2014)

Cited by: 138|Views189
EI WOS

Abstract

We introduce a high-performance cost-effective network topology called Slim Fly that approaches the theoretically optimal network diameter. Slim Fly is based on graphs that approximate the solution to the degree-diameter problem. We analyze Slim Fly and compare it to both traditional and state-of the-art networks. Our analysis shows that ...More

Code:

Data:

0
Introduction
  • Interconnection networks play an important role in today’s large-scale computing systems.
  • The importance of the network grows with ever increasing per-node performance and memory bandwidth.
  • Large networks with tens of thousands of nodes are deployed in warehouse-sized HPC and data centers [8].
  • Key properties of such networks are determined by their topologies: the arrangement of nodes and cables.
  • High bandwidth is indispensable as many applications perform all-to-all communication [38].
Highlights
  • Interconnection networks play an important role in today’s large-scale computing systems
  • We propose a new topology, called Slim Fly, which further reduces the diameter and costs, energy consumption, and the latency of the network while maintaining high bandwidth and resiliency
  • If the number of racks Nrck is not divisible by any x and y, we find z such that Nrck = x · y + z and we place remaining z racks at an arbitrary side
  • N cannot be identical for each topology due to the limited number of networks in their balanced configurations
  • We demonstrated the Slim Fly topology which allows the construction of low-latency, full-bandwidth, and resilient networks at a lower cost than existing topologies
  • Under the current technology constraints, we achieve a 25% cost and power benefit over Dragonfly
  • We propose a new class of topologies called Slim Fly networks to implement large datacenter and HPC network architectures
Results
  • All with diameter three in the examples, are very resilient, and one can remove up to 75% of the links before the network is disconnected.
  • For a network size N = 213, SF can withstand up to 40% link failures before the diameter grows beyond four.
  • DLN is most resilient and can sustain up to 60% link failures for a network with N = 213.
  • SF is ≈25% more cost-effective than DF, and almost 30%, 40%, and 50% less expensive than FBF-3, DLN, and FT-3.
  • Under the current technology constraints, the authors achieve a 25% cost and power benefit over Dragonfly
Conclusion
  • Discussion of the Results

    Figure 11c presents the total cost of balanced networks. A detailed case-study showing cost per endpoint for an SF with ≈10K endpoints and radix 43 can be found in Table IV.
  • For network sizes up to 20,000, there are 11 balanced SF variants with full global bandwidth; DF offers only 8 such designs
  • Many of these variants can be directly constructed using readily available Mellanox routers with 18, 36, or 108 ports.
  • The authors utilize a notion that lowering the network diameter reduces the amount of expensive network resources used by packets traversing the network while maintaining high bandwidth
  • The authors define it as an optimization problem and the authors optimize towards the Moore Bound.
Tables
  • Table1: Symbols used in the paper
  • Table2: Topologies compared in the paper, their diameters (§ III-A), and example existing HPC systems that use respective topologies
  • Table3: Cost and power comparison between a Slim Fly (N = 10830, k = 43) and other networks (§ VI-B4 and § VI-C). We select low-radix networks with N comparable to that of Slim Fly. N cannot be identical due to the limited number of existing network configurations. For high-radix topologies, we select comparable N and we also compare to topologies with fixed radix k. We also construct and analyze one additional variant of a DF that has both comparable N and identical k as the analyzed SF. Each of these groups of topologies is indicated with a bolded parameter
  • Table4: Disconnection Resiliency (§ III-D1): the maximum number of cables that can be removed before the network is disconnected. Missing values indicate the inadequacy of a balanced topology variant for a given N
Download tables as Excel
Funding
  • MB is supported by the 2013 Google European Doctoral Fellowship in Parallel Computing
Reference
  • D. Abts. Cray XT4 and Seastar 3-D Torus Interconnect. Encyclopedia of Parallel Computing, pages 470–477, 2011.
    Google ScholarLocate open access versionFindings
  • D. Abts, M. R. Marty, P. M. Wells, P. Klausler, and H. Liu. Energy Proportional Datacenter Networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pages 338–347, New York, NY, USA, 2010. ACM.
    Google ScholarLocate open access versionFindings
  • R. Alverson, D. Roweth, and L. Kaplan. The Gemini System Interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects, HOTI ’10, pages 83–87, Washington, DC, USA, 2010. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • B. Arimilli et al. The PERCS High-Performance Interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects, HOTI ’10, pages 75–82, Washington, DC, USA, 2010. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • R. Barriuso and A. Knies. 108-Port InfiniBand FDR SwitchX Switch Platform Hardware User Manual, 2014.
    Google ScholarFindings
  • J. Bermond, C. Delorme, and G. Farhi. Large graphs with given degree and diameter III. Annals of Discrete Mathematics, 13:23–32, 1982.
    Google ScholarLocate open access versionFindings
  • B. Bollobas. Random Graphs. Cambridge University Press, 2001.
    Google ScholarFindings
  • D. Chen, N. Eisley, P. Heidelberger, S. Kumar, A. Mamidala, F. Petrini, R. Senger, Y. Sugawara, R. Walkup, B. Steinmacher-Burow, A. Choudhury, Y. Sabharwal, S. Singhal, and J. J. Parker. Looking Under the Hood of the IBM Blue Gene/Q Network. In Proceedings of the ACM/IEEE Supercomputing, SC ’12, pages 69:1–69:12, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
    Google ScholarLocate open access versionFindings
  • D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker. The IBM Blue Gene/Q Interconnection Network and Message Unit. In Proceedings of 2011 ACM/IEEE Supercomputing, SC ’11, pages 26:1–26:10, New York, NY, USA, 2011. ACM.
    Google ScholarLocate open access versionFindings
  • W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003.
    Google ScholarFindings
  • W. J. Dally. Performance Analysis of k-ary n-cube Interconnection Networks. IEEE Transactions on Computers, 39:775–785, 1990.
    Google ScholarLocate open access versionFindings
  • W. J. Dally and C. L. Seitz. Deadlock-Free Message Routing in Multiprocessor Interconnection Networks. IEEE Trans. Comput., 36(5):547– 553, May 1987.
    Google ScholarLocate open access versionFindings
  • C. Delorme. Grands Graphes de Degree et Diametre Donnes. Europ. J. Combinatorics, 6:291–302, 1985.
    Google ScholarLocate open access versionFindings
  • J. Domke, T. Hoefler, and W. Nagel. Deadlock-Free Oblivious Routing for Arbitrary Topologies. In Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 613– 624. IEEE Computer Society, May 2011.
    Google ScholarLocate open access versionFindings
  • J. Dongarra. Visit to the National University for Defense Technology Changsha, China. Oak Ridge National Laboratory, Tech. Rep., June, 2013.
    Google ScholarLocate open access versionFindings
  • J. Duato. A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks. IEEE Trans. Parallel Distrib. Syst., 6(10):1055–1067, Oct. 1995.
    Google ScholarLocate open access versionFindings
  • J. Duato, S. Yalamanchili, and N. Lionel. Interconnection Networks: An Engineering Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.
    Google ScholarFindings
  • G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese, R. Alverson, T. Johnson, J. Kopnick, M. Higgins, and J. Reinhard. Cray cascade: a scalable HPC system based on a Dragonfly network. In SC, page 103. IEEE/ACM, 2012.
    Google ScholarLocate open access versionFindings
  • J. Flich, T. Skeie, A. Mejia, O. Lysne, P. Lopez, A. Robles, J. Duato, M. Koibuchi, T. Rokicki, and J. C. Sancho. A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms. IEEE Trans. Parallel Distrib. Syst., 23(3):405–425, Mar. 2012.
    Google ScholarLocate open access versionFindings
  • C. Gomez, F. Gilabert, M. Gomez, P. Lopez, and J. Duato. Deterministic versus adaptive routing in fat-trees. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1–8, March 2007. Computer Networks, pages 338–344. IEEE Computer Society Press, Los Alamitos, CA, USA, 1994.
    Google ScholarLocate open access versionFindings
  • McKay–Miller–Siran. Journal of Combinatorial Theory, Series
    Google ScholarLocate open access versionFindings
  • B, 90(2):223 – 232, 2004.
    Google ScholarFindings
  • N. Jiang, D. U. Becker, G. Michelogiannakis, J. Balfour, B. Towles, D. E. Shaw, J. Kim, and W. J. Dally. A detailed and flexible cycleaccurate network-on-chip simulator. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on, pages 86–96. IEEE, 2013.
    Google ScholarLocate open access versionFindings
  • N. Jiang, J. Kim, and W. J. Dally. Indirect Adaptive Routing on Large Scale Interconnection Networks. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA ’09, pages 220–231, New York, NY, USA, 2009. ACM.
    Google ScholarLocate open access versionFindings
  • G. Karypis and V. Kumar. A Fast and Highly Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing, 20:359–392, 1999.
    Google ScholarLocate open access versionFindings
  • J. Kim, J. Balfour, and W. Dally. Flattened Butterfly Topology for On-Chip Networks. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pages 172– 182, Washington, DC, USA, 2007. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • J. Kim, W. J. Dally, and D. Abts. Flattened Butterfly: A Cost-efficient Topology for High-radix Networks. In Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA ’07, pages 126–137, New York, NY, USA, 2007. ACM.
    Google ScholarLocate open access versionFindings
  • J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-Driven, HighlyScalable Dragonfly Topology. In Proceedings of the 35th Annual International Symposium on Computer Architecture, ISCA ’08, pages 77–88, Washington, DC, USA, 2008. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • M. Koibuchi, H. Matsutani, H. Amano, D. F. Hsu, and H. Casanova. A case for random shortcut topologies for HPC interconnects. In ISCA’12, pages 177–188. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • C. E. Leiserson. Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput., 34(10):892–901, Oct. 1985.
    Google ScholarLocate open access versionFindings
  • R. Lidl and H. Niederreiter. Finite Fields: Encyclopedia of Mathematics and Its Applications. Computers & Mathematics with Applications, 33(7):136–136, 1997.
    Google ScholarLocate open access versionFindings
  • B. D. McKay, M. Miller, and J. Siran. A note on large graphs of diameter two and given maximum degree. Journal of Combinatorial Theory, Series B, 74(1):110 – 118, 1998.
    Google ScholarLocate open access versionFindings
  • M. Miller and J. Siran. Moore graphs and beyond: A survey of the degree/diameter problem. Electronic Journal of Combinatorics, 61:1– 63, 2005.
    Google ScholarLocate open access versionFindings
  • N. Pippenger and G. Lin. Fault-tolerant circuit-switching networks. In Proceedings of the Fourth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’92, pages 229–235, New York, NY, USA, 1992. ACM.
    Google ScholarLocate open access versionFindings
  • S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-Radix Clos Network. In Proceedings of the 33rd annual International Symposium on Computer Architecture, ISCA ’06, pages 16–28, Washington, DC, USA, 2006. IEEE Computer Society.
    Google ScholarLocate open access versionFindings
  • A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.
    Google ScholarFindings
  • A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey. Jellyfish: networking data centers randomly. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, NSDI’12, pages 17–17, Berkeley, CA, USA, 2012. USENIX Association.
    Google ScholarLocate open access versionFindings
  • S. Tiyyagura, P. Adamidis, R. Rabenseifner, P. Lammers, S. Borowski, F. Lippold, F. Svensson, O. Marxen, S. Haberhauer, A. Seitsonen, J. Furthmuller, K. Benkert, M. Galle, T. Bonisch, U. Kuster, and M. Resch. Teraflops Sustained Performance With Real World Applications. Int. J. High Perform. Comput. Appl., 22(2):131–148, May 2008.
    Google ScholarLocate open access versionFindings
  • R. V. Tomic. Network Throughput Optimization via Error Correcting Codes. ArXiv e-prints, Jan. 2013.
    Google ScholarFindings
  • L. Valiant. A scheme for fast parallel communication. SIAM journal on computing, 11(2):350–361, 1982.
    Google ScholarLocate open access versionFindings
  • J. Siagiova. A Note on the McKay-Miller-Siran Graphs. Journal of Combinatorial Theory, Series B, 81:205–208, 2001.
    Google ScholarLocate open access versionFindings
  • R. Wolf. Nasa Pleiades Infiniband Communications Network, 2009. Intl. ACM Symposium on High Performance Distributed Computing.
    Google ScholarLocate open access versionFindings
  • X. Yuan, S. Mahapatra, W. Nienaber, S. Pakin, and M. Lang. A New Routing Scheme for Jellyfish and Its Performance with HPC Workloads. In Proceedings of 2013 ACM/IEEE Supercomputing, SC ’13, pages 36:1–36:11, 2013.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科