Autotuning divide-and-conquer stencil computations.

Ekanathan Palamadai Natarajan,Maryam Mehri Dehnavi,Charles E. Leiserson

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE(2017)

引用 5|浏览44
暂无评分
摘要
This paper explores autotuning strategies for serial divide-and-conquer stencil computations, comparing the efficacy of traditional heuristic autotuning with that of pruned-exhaustive autotuning. We present a pruned-exhaustive autotuner called Ztune that searches for optimal divide-and-conquer trees for stencil computations. Ztune uses three pruning propertiesspace-time equivalence, divide subsumption, and favored dimensionthat greatly reduce the size of the search domain without significantly sacrificing the quality of the autotuned code. We compared the performance of Ztune with that of a state-of-the-art heuristic autotuner called OpenTuner in tuning the divide-and-conquer algorithm used in Pochoir stencil compiler. Over a nightly run on ten application benchmarks across two machines with different hardware configurations, the Ztuned code ran 5% -12% faster on average, and the OpenTuner tuned code ran from 9% slower to 2% faster on average, than Pochoir's default code. In the best case, the Ztuned code ran 40% faster, and the OpenTuner tuned code ran 33% faster than Pochoir's code. Whereas the autotuning time of Ztune for each benchmark could be measured in minutes, to achieve comparable results, the autotuning time of OpenTuner was typically measured in hours or days. Surprisingly, for some benchmarks, Ztune actually autotuned faster than the time it takes to perform the stencil computation once.
更多
查看译文
关键词
autotuning,divide-and-conquer,stencil computations,trapezoidal decomposition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要