AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We report on improvements made over the past two decades to our adaptive treecode N-body method

2HOT: an improved parallel hashed oct-tree n-body algorithm for cosmological simulation

High Performance Computing, Networking, Storage and Analysis, no. 2 (2014): 1-12

Cited by: 40|Views160
EI WOS

Abstract

We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (218) processors. We present error analysis and scientific application results from a series ...More

Code:

Data:

0
Introduction
  • The authors first reported on the parallel N-body algorithm (HOT) 20 years ago [67]. Over the same timescale, cosmology has been transformed from a qualitative to a quantitative science.
  • At early times when the authors calculate the acceleration from a 100 Mpc cell in one direction, 99% of that value will cancel with a cell in the opposite direction, leaving a small remainder.
  • This implies that the error tolerance needed for these large cells is 100 times stricter than for the short-range interactions.
  • This suggests that eliminating the background contribution from the partial acceleration terms would be beneficial
Highlights
  • We first reported on our parallel N-body algorithm (HOT) 20 years ago [67]
  • We describe an improved version of our code (2HOT), and present a suite of simulations which probe the finest details of our current understanding of cosmology
  • Modeling the mass function at these scales is an enormous challenge for numerical simulations, since both statistical and systematic errors conspire to prevent the emergence of an accurate theoretical model
  • We provide the first mass function calculated from a suite of simulations using the new standard Planck 2013 cosmology
  • Using the background subtraction technique described in Section 2.2.1 improved the efficiency of our treecode algorithm for cosmological simulations by about a factor of three when using a relatively strict tolerance (10−5), resulting in a total absolute force error of about 0.1% of the typical force
  • By updating the particles in an order which takes advantage of their spatial proximity, we improved the performance of the memory hierarchy
  • We have evidence that accuracy at this level is required for high-precision scientific results, and we have used that tolerance for the results presented here
Methods
  • Using N particles to represent the Universe, treecodes and fast multipole methods reduce the N 2 scaling of the righthand side of equation (2) to O(N ) or O(N log N )—a significant savings for current cosmological simulations which use N in the range of 1010 to 1012.
Results
  • The number of objects in the Universe of a given mass is a fundamental statistic called the mass function.
  • For very massive clusters the mass function is a sensitive probe of cosmology.
  • For these reasons, the mass function is a major target of current observational programs [10].
  • Modeling the mass function at these scales is an enormous challenge for numerical simulations, since both statistical and systematic errors conspire to prevent the emergence of an accurate theoretical model.
  • The dynamic range in mass and convergence tests necessary to model systematic errors require multiple simulations at different resolutions, since even a 1012 particle simulation does not have sufficient statistical power by itself
Conclusion
  • Using the background subtraction technique described in Section 2.2.1 improved the efficiency of the treecode algorithm for cosmological simulations by about a factor of three when using a relatively strict tolerance (10−5), resulting in a total absolute force error of about 0.1% of the typical force.
  • The authors can compare the computational efficiency with the 2012 Gordon Bell Prize winning TreePM N-body application [26] which used 140,000 floating point operations per particle.
  • Modulo being able to precisely compare codes at the same accuracy, this work demonstrates that a pure treecode can be competitive with TreePM codes in large periodic cosmological volumes.
  • The advantage of pure treecodes grows significantly as applications move to higher resolutions in smaller volumes, use simulations with multiple hierarchical resolutions, and require non-periodic boundary conditions
Tables
  • Table1: Performance of HOT on a variety of parallel supercomputers spanning 20 years of time and five decades of performance
  • Table2: Breakdown of computation stages in a single timestep from a recent 40963 particle simulation using 2HOT on 12288 processors of Mustang at LANL. The force evaluation consisted of 1.05e15 hexadecapole interactions, 1.46e15 quadrupole interactions and 4.68e14 monopole interactions, for a total of 582,000 floating point operations per particle. Reducing the accuracy parameter to a value consistent with other methods would reduce the operation count by more than a factor of three
  • Table3: Single core/GPU performance in Gflop/s obtained with our gravitational micro-kernel benchmark for the monopole interaction. All numbers are for single-precision calculations, calculated using 28 flops per interaction
Download tables as Excel
Funding
  • This research used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the Department of Energy under Contract DE-AC0500OR22725
  • This research also used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S Department of Energy under Contract No DE-AC02-05CH11231
Reference
  • R. E. Angulo, V. Springel, S. D. M. White, A. Jenkins, C. M. Baugh, and C. S. Frenk. Scaling relations for galaxy clusters in the Millennium-XXL simulation. arXiv:1203.3216, 2012.
    Findings
  • R. E. Angulo and S. D. M. White. One simulation to fit them all – changing the background parameters of a cosmological N-body simulation. Monthly Notices of the Royal Astronomical Society, 405(1):143–154, 2010.
    Google ScholarLocate open access versionFindings
  • P. Balaji, et al. MPI on millions of cores. Parallel Processing Letters, 21(01):45–60, 2011.
    Google ScholarLocate open access versionFindings
  • J. Bedorf, E. Gaburov, and S. Portegies Zwart. A sparse octree gravitational N-body code that runs entirely on the GPU processor. Journal of Computational Physics, 231(7):2825–2839, 2012.
    Google ScholarLocate open access versionFindings
  • S. Behnel, R. Bradshaw, C. Citro, L. Dalcin, D. Seljebotn, and K. Smith. Cython: The best of both worlds. Computing in Science Engineering, 13(2):31–39, 2011.
    Google ScholarLocate open access versionFindings
  • P. S. Behroozi, R. H. Wechsler, and H. Wu. The ROCKSTAR phase-space temporal halo finder and the velocity offsets of cluster cores. The Astrophysical Journal, 762(2):109, 2013.
    Google ScholarLocate open access versionFindings
  • D. Blas, J. Lesgourgues, and T. Tram. The cosmic linear anisotropy solving system (CLASS). part II: approximation schemes. Journal of Cosmology and Astroparticle Physics, 2011(07):034, 2011.
    Google ScholarLocate open access versionFindings
  • M. Challacombe, C. White, and M. Head-Gordon. Periodic boundary conditions and the fast multipole method. The Journal of Chemical Physics, 107(23):10131–10140, 1997.
    Google ScholarLocate open access versionFindings
  • Planck Collaboration. Planck 2013 results. XVI. cosmological parameters. arXiv:1303.5076, 2013.
    Findings
  • Planck Collaboration. Planck 2013 results. XX. cosmology from Sunyaev-Zeldovich cluster counts. arXiv:1303.5080, 2013.
    Findings
  • National Research Council. New worlds, new horizons in astronomy and astrophysics. National Academies Press, 2010.
    Google ScholarFindings
  • M. Crocce, S. Pueblas, and R. Scoccimarro. Transients from initial conditions in cosmological simulations. Monthly Notices of the Royal Astronomical Society, 373(1):369–381, 2006.
    Google ScholarLocate open access versionFindings
  • W. Dehnen. Towards optimal softening in threedimensional N-body codes – I. minimizing the force error. Monthly Notices of the Royal Astronomical Society, 324(2):273–291, 2001.
    Google ScholarLocate open access versionFindings
  • G. Efstathiou, M. Davis, S. D. M. White, and C. S. Frenk. Numerical techniques for large cosmological Nbody simulations. The Astrophysical Journal Supplement Series, 57:241–260, 1985.
    Google ScholarLocate open access versionFindings
  • C. I. Ellinger, P. A. Young, C. L. Fryer, and G. Rockefeller. A case study of small scale structure formation in 3D supernova simulations. arXiv:1206.1834, 2012.
    Findings
  • W. Feng. Making a case for efficient supercomputing. Queue, 1(7):54–64, 2003.
    Google ScholarLocate open access versionFindings
  • M. Frigo and S. G. Johnson. FFTW: an adaptive software architecture for the FFT. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, volume 3, page 1381–1384. 1998.
    Google ScholarLocate open access versionFindings
  • C. L. Fryer, G. Rockefeller, and M. S. Warren. SNSPH: a parallel three-dimensional smoothed particle radiation hydrodynamics code. The Astrophysical Journal, 643(1):292, 2006.
    Google ScholarLocate open access versionFindings
  • C. L. Fryer and M. S. Warren. Modeling Core-Collapse supernovae in three dimensions. The Astrophysical Journal Letters, 574(1):L65, 2002.
    Google ScholarLocate open access versionFindings
  • M. Galassi, et al. GNU scientific library. Network Theory, 2007.
    Google ScholarLocate open access versionFindings
  • K. M. Gorski, E. Hivon, A. J. Banday, B. D. Wandelt, F. K. Hansen, M. Reinecke, and M. Bartelmann. HEALPix: a framework for high-resolution discretization and fast analysis of data distributed on the sphere. The Astrophysical Journal, 622(2):759, 2005.
    Google ScholarLocate open access versionFindings
  • A. G. Gray and A. W. Moore. N-Body problems in statistical learning. Advances in neural information processing systems, page 521–527, 2001.
    Google ScholarFindings
  • M. Griebel and G. Zumbusch. Parallel multigrid in an adaptive PDE solver based on hashing and space-filling curves. Parallel Computing, 25(7):827–843, 1999.
    Google ScholarLocate open access versionFindings
  • J. Harnois-Deraps, U. Pen, I. T. Iliev, H. Merz, J. D. Emberson, and V. Desjacques. High performance P3M N-body code: CUBEP3M. arXiv:1208.5098, 2012.
    Findings
  • L. Hernquist, F. R. Bouchet, and Y. Suto. Application of the ewald method to cosmological N-body simulations. The Astrophysical Journal Supplement Series, 75:231–240, 1991.
    Google ScholarLocate open access versionFindings
  • T. Ishiyama, K. Nitadori, and J. Makino. 4.45 pflops astrophysical N-Body simulation on K computer – the gravitational Trillion-Body problem. arXiv:1211.4406, 2012.
    Findings
  • P. Jetley, F. Gioachin, C. Mendes, L. Kale, and T. Quinn. Massively parallel cosmological simulations with ChaNGa. In IEEE International Symposium on Parallel and Distributed Processing, 2008. IPDPS 2008, pages 1–12. 2008.
    Google ScholarLocate open access versionFindings
  • A. Kawai and J. Makino. Pseudoparticle multipole method: A simple method to implement a highaccuracy tree code. The Astrophysical Journal Letters, 550(2):L143, 2001.
    Google ScholarLocate open access versionFindings
  • M. Kuhlen, M. Vogelsberger, and R. Angulo. Numerical simulations of the dark universe: State of the art and the next decade. arXiv:1209.5745, 2012.
    Findings
  • J. Lesgourgues. The cosmic linear anisotropy solving system (CLASS) I: Overview. arXiv:1104.2932, 2011.
    Findings
  • Z. Lukic, K. Heitmann, S. Habib, S. Bashinsky, and P. M. Ricker. The halo mass function: High-Redshift evolution and universality. The Astrophysical Journal, 671(2):1160, 2007.
    Google ScholarLocate open access versionFindings
  • P. MacNeice, K. M. Olson, C. Mobarry, R. de Fainchtein, and C. Packer. PARAMESH: a parallel adaptive mesh refinement community toolkit. Computer physics communications, 126(3):330–354, 2000.
    Google ScholarLocate open access versionFindings
  • P. M. Mcllroy, K. Bostic, and M. D. Mcllroy. Engineering radix sort. Computing systems, 6(1):5–27, 1993.
    Google ScholarLocate open access versionFindings
  • J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 13th international conference on Supercomputing, ICS ’99, page 425–433. ACM, New York, NY, USA, 1999.
    Google ScholarLocate open access versionFindings
  • M. Metchnik. A Fast N-Body Scheme for Computational Cosmology. Ph.D. thesis, U. Arizona., 2009.
    Google ScholarFindings
  • B. Nijboer and F. De Wette. On the calculation of lattice sums. Physica, 23(1–5):309–321, 1957.
    Google ScholarLocate open access versionFindings
  • J. Ousterhout. Why threads are a bad idea (for most purposes). In Presentation given at the 1996 Usenix Annual Technical Conference, volume 5. 1996.
    Google ScholarLocate open access versionFindings
  • M. Parashar and J. Browne. On partitioning dynamic adaptive grid hierarchies. In System Sciences, 1996., Proceedings of the Twenty-Ninth Hawaii International Conference on,, volume 1, pages 604–613 vol.1. 1996.
    Google ScholarLocate open access versionFindings
  • Princeton University Press, 1980.
    Google ScholarFindings
  • D. W. Pfitzner, J. K. Salmon, T. Sterling, P. Stolorz, and R. Musick. Halo world: Tools for parallel cluster finding in astrophysical N-body simulations. In P. Stolorz and R. Musick, editors, Scalable High Performance Computing for Knowledge Discovery and Data Springer US, 1998.
    Google ScholarLocate open access versionFindings
  • P. Ploumhans, G. Winckelmans, J. Salmon, A. Leonard, and M. Warren. Vortex methods for direct numerical simulation of Three-Dimensional bluff body flows: Application to the sphere at re=300, 500, and 1000. Journal of Computational Physics, 178(2):427–463, 2002.
    Google ScholarLocate open access versionFindings
  • T. Quinn, N. Katz, J. Stadel, and G. Lake. Time stepping N-body simulations. arXiv:astro-ph/9710043, 1997.
    Findings
  • D. S. Reed, R. E. Smith, D. Potter, A. Schneider, J. Stadel, and B. Moore. Toward an accurate mass function for precision cosmology. arXiv:1206.5302, 2012.
    Findings
  • A. G. Riess, et al. Type ia supernova discoveries at z > 1 from the hubble space telescope: Evidence for past deceleration and constraints on dark energy evolution. The Astrophysical Journal, 607(2):665, 2004.
    Google ScholarLocate open access versionFindings
  • J. K. Salmon and M. S. Warren. Skeletons from the treecode closet. Journal of Computational Physics, 111(1):136–155, 1994.
    Google ScholarLocate open access versionFindings
  • Z. F. Seidov and P. I. Skvirsky. Gravitational potential and energy of homogeneous rectangular parallelepiped. arXiv:astro-ph/0002496, 2000.
    Findings
  • G. F. Smoot, et al. Structure in the COBE differential microwave radiometer first-year maps. The Astrophysical Journal, 396:L1–L5, 1992.
    Google ScholarLocate open access versionFindings
  • E. Solomonik and L. Kale. Highly scalable parallel sorting. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pages 1–12. 2010.
    Google ScholarLocate open access versionFindings
  • D. N. Spergel, et al. First-year wilkinson microwave anisotropy probe (WMAP) observations: determination of cosmological parameters. The Astrophysical Journal Supplement Series, 148(1):175, 2003.
    Google ScholarLocate open access versionFindings
  • V. Springel. The cosmological simulation code gadget2. Monthly Notices of the Royal Astronomical Society, 364(4):1105–1134, 2005.
    Google ScholarLocate open access versionFindings
  • R. M. Stallman. Using and porting the gnu compiler collection. Free Software Foundation, 1989.
    Google ScholarLocate open access versionFindings
  • A. Taruya, F. Bernardeau, T. Nishimichi, and S. Codis. RegPT: direct and fast calculation of regularized cosmological power spectrum at two-loop order. arXiv:1208.1191, 2012.
    Findings
  • M. Tegmark, et al. Cosmological parameters from SDSS and WMAP. Physical Review D, 69(10):103501, 2004.
    Google ScholarLocate open access versionFindings
  • R. Thakur, et al. MPI at exascale. Procceedings of SciDAC, 2010.
    Google ScholarFindings
  • J. Tinker, A. V. Kravtsov, A. Klypin, K. Abazajian, M. Warren, G. Yepes, S. Gottlober, and D. E. Holz. Toward a halo mass function for precision cosmology: The limits of universality. The Astrophysical Journal, 688(2):709, 2008.
    Google ScholarLocate open access versionFindings
  • L. Torvalds and J. Hamano. GIT-fast version control system. 2005.
    Google ScholarFindings
  • M. J. Turk, B. D. Smith, J. S. Oishi, S. Skory, S. W. Skillman, T. Abel, and M. L. Norman. yt: A multi-code analysis toolkit for astrophysical simulation data. The Astrophysical Journal Supplement Series, 192:9, 2011.
    Google ScholarLocate open access versionFindings
  • G. Van Rossum and F. L. Drake Jr. Python reference manual. Centrum voor Wiskunde en Informatica, 1995.
    Google ScholarFindings
  • J. Waldvogel. The newtonian potential of a homogeneous cube. Zeitschrift fur angewandte Mathematik und Physik ZAMP, 27(6):867–871, 1976.
    Google ScholarLocate open access versionFindings
  • M. Warren, J. Salmon, D. Becker, M. Goda, T. Sterling, and W. Winckelmans. Pentium pro inside: I. a treecode at 430 gigaflops on ASCI red, II. Price/Performance of $50/mflop on Loki and Hyglac. In Supercomputing, ACM/IEEE 1997 Conference, pages 61–61. 1997.
    Google ScholarLocate open access versionFindings
  • M. S. Warren, K. Abazajian, D. E. Holz, and L. Teodoro. Precision determination of the mass function of dark matter halos. The Astrophysical Journal, 646(2):881, 2006.
    Google ScholarLocate open access versionFindings
  • M. S. Warren, D. J. Becker, M. P. Goda, J. K. Salmon, and T. Sterling. Parallel supercomputing with commodity components. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’97), page 1372–1381. 1997.
    Google ScholarLocate open access versionFindings
  • M. S. Warren and B. Bergen. Poster: The hashed OctTree N-Body algorithm at a petaflop. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, page 1442–1442. 2012.
    Google ScholarLocate open access versionFindings
  • M. S. Warren, C. L. Fryer, and M. P. Goda. The space simulator: Modeling the universe from supernovae to cosmology. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC ’03, page 30–. ACM, New York, NY, USA, 2003.
    Google ScholarLocate open access versionFindings
  • M. S. Warren, T. C. Germann, P. S. Lomdahl, D. M. Beazley, and J. K. Salmon. Avalon: an Alpha/Linux cluster achieves 10 gflops for $150k. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing, Supercomputing ’98, page 1–11. IEEE Computer Society, Washington, DC, USA, 1998.
    Google ScholarLocate open access versionFindings
  • M. S. Warren and J. K. Salmon. Astrophysical N-body simulations using hierarchical tree data structures. In Supercomputing ’92. Proceedings, page 570–576. 1992.
    Google ScholarLocate open access versionFindings
  • M. S. Warren and J. K. Salmon. A parallel hashed Oct-Tree N-body algorithm. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing, Supercomputing ’93, page 12–21. ACM, New York, NY, USA, 1993.
    Google ScholarLocate open access versionFindings
  • M. S. Warren and J. K. Salmon. A portable parallel particle program. Computer Physics Communications, 87(1–2):266–290, 1995.
    Google ScholarLocate open access versionFindings
  • M. S. Warren, E. H. Weigle, and W. Feng. High-density computing: a 240-processor beowulf in one cubic meter. In Supercomputing, ACM/IEEE 2002 Conference, page 61–61. 2002.
    Google ScholarLocate open access versionFindings
  • M. S. Warren, W. Zurek, P. Quinn, and J. Salmon. The shape of the invisible halo: N-body simulations on parallel supercomputers. AIP Conference Proceedings, 222:216, 1991.
    Google ScholarLocate open access versionFindings
  • W. A. Watson, I. T. Iliev, A. D’Aloisio, A. Knebe, P. R. Shapiro, and G. Yepes. The halo mass function through the cosmic ages. arXiv:1212.0095, 2012.
    Findings
  • W. A. Watson, I. T. Iliev, J. M. Diego, S. Gottlober, A. Knebe, E. Martınez-Gonzalez, and G. Yepes. Statistics of extreme objects in the juropa hubble volume simulation. arXiv e-print 1305.1976, 2013.
    Findings
  • J. J. Willcock, T. Hoefler, N. G. Edmonds, and A. Lumsdaine. AM++: a generalized active message framework. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT ’10, page 401–410. 2010.
    Google ScholarLocate open access versionFindings
  • 4. Cambridge University Press, 4 edition, 1999.
    Google ScholarFindings
  • J. Wu, Z. Lan, X. Xiong, N. Y. Gnedin, and A. V. Kravtsov. Hierarchical task mapping of cell-based AMR cosmology simulations. In SC ’12, page 75:1–75:10. IEEE Computer Society Press, Los Alamitos, CA, USA, 2012.
    Google ScholarLocate open access versionFindings
  • L. Ying, G. Biros, and D. Zorin. A kernelindependent adaptive fast multipole algorithm in two and three dimensions. Journal of Computational Physics, 196(2):591–626, 2004.
    Google ScholarLocate open access versionFindings
  • R. Yokota. An FMM based on dual tree traversal for many-core architectures. arXiv:1209.3516, 2012.
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科