A Performance Model for GPU Architectures That Considers On-Chip Resources: Application to Medical Image Registration
IEEE Transactions on Parallel and Distributed Systems(2019)
摘要
Graphics processing units (GPUs) have become extremely important devices for accelerating computing performance in many applications. However, there have been few accurate models to estimate the performance of such applications running on modern GPUs. In this paper, we propose a performance model to estimate the execution times for massively parallel programs running on NVIDIA GPUs, one that takes on-chip resources and cost of data transfer between CPU and GPU into consideration. Four different GPUs with different architectures were used to evaluate our model. We demonstrated the effectiveness of the proposed model by applying it to various tasks in medical image registration. Experiments have demonstrated that by capturing on-chip GPU resources and data transfer time with our model, we were able to obtain a more accurate prediction of the actual running time, compared to the traditional model. Moreover, by using the optimal value of the block size parameter, estimated by our model, to accelerate the landmark tracking task on GPU devices, speedups of approximately 80×, 100×, 200× and 800×, on the C2050, K20c, M5000 and P100 can be achieved, making it possible to track massive numbers of landmarks and thereby improving the registration accuracy.
更多查看译文
关键词
Graphics processing units,Computational modeling,Predictive models,System-on-chip,Computer architecture,Image registration,Data transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要