An Empirical Evaluation of Design Abstraction and Performance of Thrust Framework

2017 46th International Conference on Parallel Processing Workshops (ICPPW)(2017)

引用 6|浏览4
暂无评分
摘要
High performance computing applications are far more difficult to write, therefore, practitioners expect a well-tuned software to last long and provide optimized performance even when the hardware is upgraded. It may also be necessary to write software using sufficient abstraction over the hardware so that it is capable of running on heterogeneous architecture. Therefore, it is required to have a proper programming abstraction paradigm that strikes a balance between the abstraction and visibility over the hardware so that the programmer can write a program without having to understand the hardware nuances, yet exploit the compute power optimally. In this paper we have analyzed the power of design abstraction and performance of a popular design abstraction framework called Thrust. We have shown quantitatively that while it is easier to write an application using Thrust compared to writing the same in the native CUDA or OpenMP backends, the framework does not provide any abstraction over the memory hierarchy of the underlying backend to the programmer. We have compared the performance of three Thrust applications with their corresponding native versions in CUDA, OpenMP, Xeon-Phi and the CPP backends and demonstrate that the current Thrust version performs poorly in most of the cases when the application is compute intensive. However, the framework provides close to the native performance for a non-compute intensive applications. We analyze the reasons for the performance and highlight the improvements necessary for the framework.
更多
查看译文
关键词
Design Abstraction,Thrust,Shared memory,Cyclomatic complexity,CUDA,OpenMP,Xeon-Phi
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要