An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability

2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)(2018)

引用 36|浏览39
暂无评分
摘要
System and node architectures continue to diversify to better balance on-node computation, memory capacity, memory bandwidth, interconnect bandwidth, power, and cost for specific computational workloads. For many application developers, achieving performance portability (effectively exploiting the capabilities of multiple architectures) is a desired goal. Unfortunately, dramatically different per-node performance coupled with differences in machine balance can lead to developers being unable to determine whether they have attained performance portability or simply written portable code. The Roofline model provides a means of quantitatively assessing how well a given application makes use of a target platform’s computational capabilities. In this paper, we extend the Roofline model so that it 1) empirically captures a more realistic set of performance bounds for CPUs and GPUs, 2) factors in the true cost of different floating-point instructions when counting FLOPs, 3) incorporates the effects of different memory access patterns, and 4) with appropriate pairing of code performance and Roofline ceiling, facilitates the performance portability analysis.
更多
查看译文
关键词
Computer architecture,Bandwidth,Kernel,Graphics processing units,Computational modeling,Measurement,Instruction sets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要