Automatic block dimensioning on GPU-accelerated programs through particle swarm optimization

Information and Software Technology(2020)

引用 1|浏览13
暂无评分
摘要
Context Nowadays, the use of GPU to improve performance of computationally expensive systems are widely explored. On GPU-accelerated programs, performance is related to the partition of the problem into blocks of threads in such a way that the parallel tasks to be executed better fit the GPU architecture. Although there exists some general guidelines to help defining block dimensions, finding the optimum partition is still a complex and problem dependent task. In this work, it has been investigated the use of particle swarm optimization (PSO) to optimize blocks dimensions aiming to minimize programs execution time. The approach was evaluated on a GPU-accelerated wind field calculation program, in which block dimensioning was based on literature guidelines and empirical adjusts. Before PSO optimization, the program was about 25 times faster than the sequential program. After applying PSO, speedup increased to about 60 times. Unexpected optimized configurations were observed, ratifying that finding optimum dimensioning is a complex task. So the use of a robust optimization tool, such as PSO, demonstrated to be very profitable, allowing automatic optimization of blocks dimensions without necessity of a priori knowledge about problem, programs peculiarities and GPU architecture. Objective Improve speedup of GPU-accelerated programs by automatic defining optimized block dimensions using PSO. Method A GPU-accelerated wind field calculation problem has been focused. A PSO was interfaced to the program in order to find the block dimensions that leads to a minimum execution time. Results were compared to literature results. Results The speedup obtained with the proposed approach is more than 2 times the original speedup. Conclusion PSO, demonstrated to be very profitable, allowing automatic optimization of blocks dimensions without necessity of a priori knowledge about problem/programs peculiarities and/or GPU architecture.
更多
查看译文
关键词
Performance optimization,GPU,CUDA,Particle swarm optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要