Tailoring CUTLASS GEMM using Supervised Learning

2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD(2023)

引用 0|浏览4
暂无评分
摘要
General matrix multiplication (GEMM) is a core computation kernel for deep neural networks. CUTLASS, a state-of-the-art open-source CUDA-based linear-algebra template library, provides a highly optimized tiling-based GEMM. However, CUTLASS GEMM often cannot achieve the optimal performance when its tiling configuration is not appropriately chosen because the performance varies significantly depending on some factors such as the tile size and shape, as well as the target graphics processing unit (GPU) architecture. Thus, determining the optimal tiling configuration is a major challenge in achieving the best performance of a tiling-based GEMM. To address this problem, we propose CUTLASS-tailor, a novel end-to-end framework that predicts the best tile parameters for target CUTLASS GEMM operations and underlying GPUs using a neural network model. We trained the prediction model using a suitable synthetic dataset that includes various input matrix combinations with different sizes and structures. Furthermore, to cover the various GPUs with a universal model, we also included the number of GPU cores and the amount of shared memory as GPU hardware features for the input of the CUTLASS-tailor network. On a test dataset from several real-world GEMMs, CUTLASS-tailor-based GEMM operations outperformed the GEMM operations using cuBLAS by up to 1.94x on an NVIDIA TitanXp GPU, and also showed that CUTLASS-tailor can find better tile parameters than well-known search algorithms.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要