A Study On Balancing Parallelism, Data Locality, And Recomputation In Existing Pde Solvers

SC(2014)

引用 42|浏览67
暂无评分
摘要
Structured-grid PDE solver frameworks parallelize over boxes, which are rectangular domains of cells or faces in a structured grid. In the Chombo framework, the box sizes are typically 16(3) or 32(3), but larger box sizes such as 128(3) would result in less surface area and therefore less storage, copying, and/or ghost cells communication overhead. Unfortunately, current on-node parallelization schemes perform poorly for these larger box sizes. In this paper, we investigate 30 different inter-loop optimization strategies and demonstrate the parallel scaling advantages of some of these variants on NUMA multicore nodes. Shifted, fused, and communication-avoiding variants for 128(3) boxes result in close to ideal parallel scaling and come close to matching the performance of 16(3) boxes on three different multicore systems for a benchmark that is a proxy for program idioms found in Computational Fluid Dynamic (CFD) codes.
更多
查看译文
关键词
grid computing,multiprocessing systems,optimisation,parallel processing,partial differential equations,CFD codes,Chombo framework,NUMA multicore nodes,PDE solvers,communication-avoiding variants,computational fluid dynamic codes,data locality,inter-loop optimization strategies,multicore systems,node parallelization schemes,parallel scaling,parallelism balancing,partial differential equation,program idioms,structured-grid PDE solver frameworks,
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要