## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# 2HOT: an improved parallel hashed oct-tree n-body algorithm for cosmological simulation

High Performance Computing, Networking, Storage and Analysis, no. 2 (2014): 1-12

EI WOS

Full Text

Weibo

Keywords

Abstract

We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (218) processors. We present error analysis and scientific application results from a series ...More

Code:

Data:

Introduction

- The authors first reported on the parallel N-body algorithm (HOT) 20 years ago [67]. Over the same timescale, cosmology has been transformed from a qualitative to a quantitative science.
- At early times when the authors calculate the acceleration from a 100 Mpc cell in one direction, 99% of that value will cancel with a cell in the opposite direction, leaving a small remainder.
- This implies that the error tolerance needed for these large cells is 100 times stricter than for the short-range interactions.
- This suggests that eliminating the background contribution from the partial acceleration terms would be beneficial

Highlights

- We first reported on our parallel N-body algorithm (HOT) 20 years ago [67]
- We describe an improved version of our code (2HOT), and present a suite of simulations which probe the finest details of our current understanding of cosmology
- Modeling the mass function at these scales is an enormous challenge for numerical simulations, since both statistical and systematic errors conspire to prevent the emergence of an accurate theoretical model
- We provide the first mass function calculated from a suite of simulations using the new standard Planck 2013 cosmology
- Using the background subtraction technique described in Section 2.2.1 improved the efficiency of our treecode algorithm for cosmological simulations by about a factor of three when using a relatively strict tolerance (10−5), resulting in a total absolute force error of about 0.1% of the typical force
- By updating the particles in an order which takes advantage of their spatial proximity, we improved the performance of the memory hierarchy
- We have evidence that accuracy at this level is required for high-precision scientific results, and we have used that tolerance for the results presented here

Methods

- Using N particles to represent the Universe, treecodes and fast multipole methods reduce the N 2 scaling of the righthand side of equation (2) to O(N ) or O(N log N )—a significant savings for current cosmological simulations which use N in the range of 1010 to 1012.

Results

- The number of objects in the Universe of a given mass is a fundamental statistic called the mass function.
- For very massive clusters the mass function is a sensitive probe of cosmology.
- For these reasons, the mass function is a major target of current observational programs [10].
- Modeling the mass function at these scales is an enormous challenge for numerical simulations, since both statistical and systematic errors conspire to prevent the emergence of an accurate theoretical model.
- The dynamic range in mass and convergence tests necessary to model systematic errors require multiple simulations at different resolutions, since even a 1012 particle simulation does not have sufficient statistical power by itself

Conclusion

- Using the background subtraction technique described in Section 2.2.1 improved the efficiency of the treecode algorithm for cosmological simulations by about a factor of three when using a relatively strict tolerance (10−5), resulting in a total absolute force error of about 0.1% of the typical force.
- The authors can compare the computational efficiency with the 2012 Gordon Bell Prize winning TreePM N-body application [26] which used 140,000 floating point operations per particle.
- Modulo being able to precisely compare codes at the same accuracy, this work demonstrates that a pure treecode can be competitive with TreePM codes in large periodic cosmological volumes.
- The advantage of pure treecodes grows significantly as applications move to higher resolutions in smaller volumes, use simulations with multiple hierarchical resolutions, and require non-periodic boundary conditions

- Table1: Performance of HOT on a variety of parallel supercomputers spanning 20 years of time and five decades of performance
- Table2: Breakdown of computation stages in a single timestep from a recent 40963 particle simulation using 2HOT on 12288 processors of Mustang at LANL. The force evaluation consisted of 1.05e15 hexadecapole interactions, 1.46e15 quadrupole interactions and 4.68e14 monopole interactions, for a total of 582,000 floating point operations per particle. Reducing the accuracy parameter to a value consistent with other methods would reduce the operation count by more than a factor of three
- Table3: Single core/GPU performance in Gflop/s obtained with our gravitational micro-kernel benchmark for the monopole interaction. All numbers are for single-precision calculations, calculated using 28 flops per interaction

Funding

- This research used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the Department of Energy under Contract DE-AC0500OR22725
- This research also used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S Department of Energy under Contract No DE-AC02-05CH11231

Reference

- R. E. Angulo, V. Springel, S. D. M. White, A. Jenkins, C. M. Baugh, and C. S. Frenk. Scaling relations for galaxy clusters in the Millennium-XXL simulation. arXiv:1203.3216, 2012.
- R. E. Angulo and S. D. M. White. One simulation to fit them all – changing the background parameters of a cosmological N-body simulation. Monthly Notices of the Royal Astronomical Society, 405(1):143–154, 2010.
- P. Balaji, et al. MPI on millions of cores. Parallel Processing Letters, 21(01):45–60, 2011.
- J. Bedorf, E. Gaburov, and S. Portegies Zwart. A sparse octree gravitational N-body code that runs entirely on the GPU processor. Journal of Computational Physics, 231(7):2825–2839, 2012.
- S. Behnel, R. Bradshaw, C. Citro, L. Dalcin, D. Seljebotn, and K. Smith. Cython: The best of both worlds. Computing in Science Engineering, 13(2):31–39, 2011.
- P. S. Behroozi, R. H. Wechsler, and H. Wu. The ROCKSTAR phase-space temporal halo finder and the velocity offsets of cluster cores. The Astrophysical Journal, 762(2):109, 2013.
- D. Blas, J. Lesgourgues, and T. Tram. The cosmic linear anisotropy solving system (CLASS). part II: approximation schemes. Journal of Cosmology and Astroparticle Physics, 2011(07):034, 2011.
- M. Challacombe, C. White, and M. Head-Gordon. Periodic boundary conditions and the fast multipole method. The Journal of Chemical Physics, 107(23):10131–10140, 1997.
- Planck Collaboration. Planck 2013 results. XVI. cosmological parameters. arXiv:1303.5076, 2013.
- Planck Collaboration. Planck 2013 results. XX. cosmology from Sunyaev-Zeldovich cluster counts. arXiv:1303.5080, 2013.
- National Research Council. New worlds, new horizons in astronomy and astrophysics. National Academies Press, 2010.
- M. Crocce, S. Pueblas, and R. Scoccimarro. Transients from initial conditions in cosmological simulations. Monthly Notices of the Royal Astronomical Society, 373(1):369–381, 2006.
- W. Dehnen. Towards optimal softening in threedimensional N-body codes – I. minimizing the force error. Monthly Notices of the Royal Astronomical Society, 324(2):273–291, 2001.
- G. Efstathiou, M. Davis, S. D. M. White, and C. S. Frenk. Numerical techniques for large cosmological Nbody simulations. The Astrophysical Journal Supplement Series, 57:241–260, 1985.
- C. I. Ellinger, P. A. Young, C. L. Fryer, and G. Rockefeller. A case study of small scale structure formation in 3D supernova simulations. arXiv:1206.1834, 2012.
- W. Feng. Making a case for efficient supercomputing. Queue, 1(7):54–64, 2003.
- M. Frigo and S. G. Johnson. FFTW: an adaptive software architecture for the FFT. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, volume 3, page 1381–1384. 1998.
- C. L. Fryer, G. Rockefeller, and M. S. Warren. SNSPH: a parallel three-dimensional smoothed particle radiation hydrodynamics code. The Astrophysical Journal, 643(1):292, 2006.
- C. L. Fryer and M. S. Warren. Modeling Core-Collapse supernovae in three dimensions. The Astrophysical Journal Letters, 574(1):L65, 2002.
- M. Galassi, et al. GNU scientific library. Network Theory, 2007.
- K. M. Gorski, E. Hivon, A. J. Banday, B. D. Wandelt, F. K. Hansen, M. Reinecke, and M. Bartelmann. HEALPix: a framework for high-resolution discretization and fast analysis of data distributed on the sphere. The Astrophysical Journal, 622(2):759, 2005.
- A. G. Gray and A. W. Moore. N-Body problems in statistical learning. Advances in neural information processing systems, page 521–527, 2001.
- M. Griebel and G. Zumbusch. Parallel multigrid in an adaptive PDE solver based on hashing and space-filling curves. Parallel Computing, 25(7):827–843, 1999.
- J. Harnois-Deraps, U. Pen, I. T. Iliev, H. Merz, J. D. Emberson, and V. Desjacques. High performance P3M N-body code: CUBEP3M. arXiv:1208.5098, 2012.
- L. Hernquist, F. R. Bouchet, and Y. Suto. Application of the ewald method to cosmological N-body simulations. The Astrophysical Journal Supplement Series, 75:231–240, 1991.
- T. Ishiyama, K. Nitadori, and J. Makino. 4.45 pflops astrophysical N-Body simulation on K computer – the gravitational Trillion-Body problem. arXiv:1211.4406, 2012.
- P. Jetley, F. Gioachin, C. Mendes, L. Kale, and T. Quinn. Massively parallel cosmological simulations with ChaNGa. In IEEE International Symposium on Parallel and Distributed Processing, 2008. IPDPS 2008, pages 1–12. 2008.
- A. Kawai and J. Makino. Pseudoparticle multipole method: A simple method to implement a highaccuracy tree code. The Astrophysical Journal Letters, 550(2):L143, 2001.
- M. Kuhlen, M. Vogelsberger, and R. Angulo. Numerical simulations of the dark universe: State of the art and the next decade. arXiv:1209.5745, 2012.
- J. Lesgourgues. The cosmic linear anisotropy solving system (CLASS) I: Overview. arXiv:1104.2932, 2011.
- Z. Lukic, K. Heitmann, S. Habib, S. Bashinsky, and P. M. Ricker. The halo mass function: High-Redshift evolution and universality. The Astrophysical Journal, 671(2):1160, 2007.
- P. MacNeice, K. M. Olson, C. Mobarry, R. de Fainchtein, and C. Packer. PARAMESH: a parallel adaptive mesh refinement community toolkit. Computer physics communications, 126(3):330–354, 2000.
- P. M. Mcllroy, K. Bostic, and M. D. Mcllroy. Engineering radix sort. Computing systems, 6(1):5–27, 1993.
- J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 13th international conference on Supercomputing, ICS ’99, page 425–433. ACM, New York, NY, USA, 1999.
- M. Metchnik. A Fast N-Body Scheme for Computational Cosmology. Ph.D. thesis, U. Arizona., 2009.
- B. Nijboer and F. De Wette. On the calculation of lattice sums. Physica, 23(1–5):309–321, 1957.
- J. Ousterhout. Why threads are a bad idea (for most purposes). In Presentation given at the 1996 Usenix Annual Technical Conference, volume 5. 1996.
- M. Parashar and J. Browne. On partitioning dynamic adaptive grid hierarchies. In System Sciences, 1996., Proceedings of the Twenty-Ninth Hawaii International Conference on,, volume 1, pages 604–613 vol.1. 1996.
- Princeton University Press, 1980.
- D. W. Pfitzner, J. K. Salmon, T. Sterling, P. Stolorz, and R. Musick. Halo world: Tools for parallel cluster finding in astrophysical N-body simulations. In P. Stolorz and R. Musick, editors, Scalable High Performance Computing for Knowledge Discovery and Data Springer US, 1998.
- P. Ploumhans, G. Winckelmans, J. Salmon, A. Leonard, and M. Warren. Vortex methods for direct numerical simulation of Three-Dimensional bluff body flows: Application to the sphere at re=300, 500, and 1000. Journal of Computational Physics, 178(2):427–463, 2002.
- T. Quinn, N. Katz, J. Stadel, and G. Lake. Time stepping N-body simulations. arXiv:astro-ph/9710043, 1997.
- D. S. Reed, R. E. Smith, D. Potter, A. Schneider, J. Stadel, and B. Moore. Toward an accurate mass function for precision cosmology. arXiv:1206.5302, 2012.
- A. G. Riess, et al. Type ia supernova discoveries at z > 1 from the hubble space telescope: Evidence for past deceleration and constraints on dark energy evolution. The Astrophysical Journal, 607(2):665, 2004.
- J. K. Salmon and M. S. Warren. Skeletons from the treecode closet. Journal of Computational Physics, 111(1):136–155, 1994.
- Z. F. Seidov and P. I. Skvirsky. Gravitational potential and energy of homogeneous rectangular parallelepiped. arXiv:astro-ph/0002496, 2000.
- G. F. Smoot, et al. Structure in the COBE differential microwave radiometer first-year maps. The Astrophysical Journal, 396:L1–L5, 1992.
- E. Solomonik and L. Kale. Highly scalable parallel sorting. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pages 1–12. 2010.
- D. N. Spergel, et al. First-year wilkinson microwave anisotropy probe (WMAP) observations: determination of cosmological parameters. The Astrophysical Journal Supplement Series, 148(1):175, 2003.
- V. Springel. The cosmological simulation code gadget2. Monthly Notices of the Royal Astronomical Society, 364(4):1105–1134, 2005.
- R. M. Stallman. Using and porting the gnu compiler collection. Free Software Foundation, 1989.
- A. Taruya, F. Bernardeau, T. Nishimichi, and S. Codis. RegPT: direct and fast calculation of regularized cosmological power spectrum at two-loop order. arXiv:1208.1191, 2012.
- M. Tegmark, et al. Cosmological parameters from SDSS and WMAP. Physical Review D, 69(10):103501, 2004.
- R. Thakur, et al. MPI at exascale. Procceedings of SciDAC, 2010.
- J. Tinker, A. V. Kravtsov, A. Klypin, K. Abazajian, M. Warren, G. Yepes, S. Gottlober, and D. E. Holz. Toward a halo mass function for precision cosmology: The limits of universality. The Astrophysical Journal, 688(2):709, 2008.
- L. Torvalds and J. Hamano. GIT-fast version control system. 2005.
- M. J. Turk, B. D. Smith, J. S. Oishi, S. Skory, S. W. Skillman, T. Abel, and M. L. Norman. yt: A multi-code analysis toolkit for astrophysical simulation data. The Astrophysical Journal Supplement Series, 192:9, 2011.
- G. Van Rossum and F. L. Drake Jr. Python reference manual. Centrum voor Wiskunde en Informatica, 1995.
- J. Waldvogel. The newtonian potential of a homogeneous cube. Zeitschrift fur angewandte Mathematik und Physik ZAMP, 27(6):867–871, 1976.
- M. Warren, J. Salmon, D. Becker, M. Goda, T. Sterling, and W. Winckelmans. Pentium pro inside: I. a treecode at 430 gigaflops on ASCI red, II. Price/Performance of $50/mflop on Loki and Hyglac. In Supercomputing, ACM/IEEE 1997 Conference, pages 61–61. 1997.
- M. S. Warren, K. Abazajian, D. E. Holz, and L. Teodoro. Precision determination of the mass function of dark matter halos. The Astrophysical Journal, 646(2):881, 2006.
- M. S. Warren, D. J. Becker, M. P. Goda, J. K. Salmon, and T. Sterling. Parallel supercomputing with commodity components. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’97), page 1372–1381. 1997.
- M. S. Warren and B. Bergen. Poster: The hashed OctTree N-Body algorithm at a petaflop. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, page 1442–1442. 2012.
- M. S. Warren, C. L. Fryer, and M. P. Goda. The space simulator: Modeling the universe from supernovae to cosmology. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC ’03, page 30–. ACM, New York, NY, USA, 2003.
- M. S. Warren, T. C. Germann, P. S. Lomdahl, D. M. Beazley, and J. K. Salmon. Avalon: an Alpha/Linux cluster achieves 10 gflops for $150k. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing, Supercomputing ’98, page 1–11. IEEE Computer Society, Washington, DC, USA, 1998.
- M. S. Warren and J. K. Salmon. Astrophysical N-body simulations using hierarchical tree data structures. In Supercomputing ’92. Proceedings, page 570–576. 1992.
- M. S. Warren and J. K. Salmon. A parallel hashed Oct-Tree N-body algorithm. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing, Supercomputing ’93, page 12–21. ACM, New York, NY, USA, 1993.
- M. S. Warren and J. K. Salmon. A portable parallel particle program. Computer Physics Communications, 87(1–2):266–290, 1995.
- M. S. Warren, E. H. Weigle, and W. Feng. High-density computing: a 240-processor beowulf in one cubic meter. In Supercomputing, ACM/IEEE 2002 Conference, page 61–61. 2002.
- M. S. Warren, W. Zurek, P. Quinn, and J. Salmon. The shape of the invisible halo: N-body simulations on parallel supercomputers. AIP Conference Proceedings, 222:216, 1991.
- W. A. Watson, I. T. Iliev, A. D’Aloisio, A. Knebe, P. R. Shapiro, and G. Yepes. The halo mass function through the cosmic ages. arXiv:1212.0095, 2012.
- W. A. Watson, I. T. Iliev, J. M. Diego, S. Gottlober, A. Knebe, E. Martınez-Gonzalez, and G. Yepes. Statistics of extreme objects in the juropa hubble volume simulation. arXiv e-print 1305.1976, 2013.
- J. J. Willcock, T. Hoefler, N. G. Edmonds, and A. Lumsdaine. AM++: a generalized active message framework. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT ’10, page 401–410. 2010.
- 4. Cambridge University Press, 4 edition, 1999.
- J. Wu, Z. Lan, X. Xiong, N. Y. Gnedin, and A. V. Kravtsov. Hierarchical task mapping of cell-based AMR cosmology simulations. In SC ’12, page 75:1–75:10. IEEE Computer Society Press, Los Alamitos, CA, USA, 2012.
- L. Ying, G. Biros, and D. Zorin. A kernelindependent adaptive fast multipole algorithm in two and three dimensions. Journal of Computational Physics, 196(2):591–626, 2004.
- R. Yokota. An FMM based on dual tree traversal for many-core architectures. arXiv:1209.3516, 2012.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn