A Comprehensive Performance Model of Sparse Matrix-Vector Multiplication to Guide Kernel Optimization

IEEE Transactions on Parallel and Distributed Systems(2023)

引用 2|浏览26
暂无评分
摘要
Sparse Matrix-Vector Multiplication (SpMV) is important in scientific and industrial applications and remains a well-known challenge for modern CPUs due to high sparsity and irregularity. Many researchers try to improve SpMV performance by designing dedicated data formats and computation patterns. However, out-of-order superscalar CPUs have complex micro-architectures where exist complicated interactions and restrictions among software and hardware factors. It is hard to systematically study the effectiveness of optimization methods on the overall performance, as its benefits may be undermined by other factors. In this paper, we thoroughly study the execution of SpMV on modern CPUs and propose a comprehensive performance model to reveal the critical factors and their relationships. Specifically, we first study the coding characteristics of SpMV kernels to identify key factors worthy of attention. Then we model the execution of SpMV as two overlapped parts: CPU pipeline and memory latency. Both are carefully modeled with related hardware and software factors. We also model SIMD performance with the usage of specific SIMD instructions and vector registers. Experiments show that our model matches the actual execution of real-world processors. Guided by the model, we propose SpV8, a novel SpMV kernel that optimizes critical factors to improve computation efficiency and memory bandwidth. Experiments on Intel/AMD x86 and ARM AArch64 platforms show that SpV8 outperforms several state-of-the-art approaches with large margins, achieving average $3.4\times$ over Intel Math Kernel Library and $1.4\times$ over the best existing approach. Such results indicate that the proposed model is capable of valuable guidance for efficient SpMV optimizations.
更多
查看译文
关键词
Optimization,performance model,sparse-vector matrix multiplication,SIMD
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要