Towards Optimal Fast Matrix Multiplication on CPU-GPU Platforms

PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021(2022)

引用 0|浏览8
暂无评分
摘要
Increasing computing power has become available through the use of GPUs, bringing new opportunities to accelerate fast matrix multiplication using GPUs. Although researchers have proposed several optimization schemes for the Strassen algorithm on the GPU, they have not fully utilized the computing resources of CPU. In this paper, we propose a CPU-GPU heterogeneous implementation for the Winograd algorithm based on task graph scheduling. It uses work-stealing scheduler to achieve balanced load. We also propose two recursive task graph extension strategies: homogeneous and heterogeneous extension. We invoke different execution strategies in different recursive levels and design a predictor based on the random forest regression model to make a decision. Finally, the experimental evaluations are performed on a CPU-GPU heterogeneous platform. It shows that the improved Winograd algorithm achieves an average speedup of 1.6x, 1.5x and 1.4x against to cuBLAS, Winograd on CPU, and Winograd on GPU for matrices with matrix dimension greater than 5000, respectively.
更多
查看译文
关键词
Winograd algorithm, Matrix multiplication, Random forest regression, CPU-GPU heterogeneous architecture
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要