Towards Compiler-Agnostic Performance In Finite-Difference Codes

PARALLEL COMPUTING: ON THE ROAD TO EXASCALE(2015)

引用 5|浏览52
暂无评分
摘要
In this paper we evaluate the performance implications of applying a technique which we call PSyKAl to finite difference Ocean models. In PSyKAl the code related to the underlying science is formally separated from code related to parallelisation and single core optimisations. This separation of concerns allows scientists to code their science independently of the underlying hardware architecture (thereby keeping a single code base) and for optimisation specialists to be able to tailor the code for a particular machine independently of the science code. A finite difference shallow water benchmark optimised for cache-based architectures is taken as the starting point. A vanilla PSyKAl version is written and the performance of the two compared. The optimisations that were applied to the original benchmark (loop fusion etc.) are then manually applied to the PSyKAl version as a set of code modifications to the optimisation layer. Performance results are presented for the Cray, Intel and GNU compilers on Intel Ivybridge and Haswell processors and for the IBM compiler on Power8. Results show that the combined set of code modifications obtain performance that is within a few percent of the original code for all compiler and architecture combinations on all tested problem sizes. The only exception to this (other than where we see performance improvement) is the Gnu compiler on Haswell for one problem size. Our tests indicate that this may be due to immature support for that architecture in the Gnu compiler - no such problem is seen on the Ivy Bridge system. Further, the original code performed poorly using the IBM compiler on Power8 and needed to be modified to obtain performant code. Therefore, the PSyKAl approach can be used with negligible performance loss and sometimes small performance gains compared to the original optimised code. We also find that there is no single best hand-optimised implementation of the code for all of the compilers tested.
更多
查看译文
关键词
Performance, Code-generation, Finite-difference
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要