A Performance Model for GPU Architectures That Considers On-Chip Resources: Application to Medical Image Registration

IEEE Transactions on Parallel and Distributed Systems(2019)

引用 9|浏览87
暂无评分
摘要
Graphics processing units (GPUs) have become extremely important devices for accelerating computing performance in many applications. However, there have been few accurate models to estimate the performance of such applications running on modern GPUs. In this paper, we propose a performance model to estimate the execution times for massively parallel programs running on NVIDIA GPUs, one that takes on-chip resources and cost of data transfer between CPU and GPU into consideration. Four different GPUs with different architectures were used to evaluate our model. We demonstrated the effectiveness of the proposed model by applying it to various tasks in medical image registration. Experiments have demonstrated that by capturing on-chip GPU resources and data transfer time with our model, we were able to obtain a more accurate prediction of the actual running time, compared to the traditional model. Moreover, by using the optimal value of the block size parameter, estimated by our model, to accelerate the landmark tracking task on GPU devices, speedups of approximately 80×, 100×, 200× and 800×, on the C2050, K20c, M5000 and P100 can be achieved, making it possible to track massive numbers of landmarks and thereby improving the registration accuracy.
更多
查看译文
关键词
Graphics processing units,Computational modeling,Predictive models,System-on-chip,Computer architecture,Image registration,Data transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要