Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi

CGO(2015)

引用 72|浏览79
暂无评分
摘要
Recently, the Intel Xeon Phi coprocessor has received increasing attention in high performance computing due to its simple programming model and highly parallel architecture. In this paper, we implement sparse matrix vector multiplication (SpMV) for scale-free matrices on the Xeon Phi architecture and optimize its performance. Scale-free sparse matrices are widely used in various application domains, such as in the study of social networks, gene networks and web graphs. We propose a novel SpMV format called vectorized hybrid COO+CSR (VHCC). Our SpMV implementation employs 2D jagged partitioning, tiling and vectorized prefix sum computations to improve hardware resource utilization, and thus overall performance. As the achieved performance depends on the number of vertical panels, we also develop a performance tuning method to guide its selection. Experimental results demonstrate that our SpMV implementation achieves an average 3x speedup over Intel MKL for a wide range of scale-free matrices.
更多
查看译文
关键词
matrix multiplication,intel xeon phi coprocessor,vhcc,sparse matrices,parallel architectures,hardware resource utilization,vectorized hybrid coo+csr,mathematics computing,high performance computing,auto-tuning,spmv,performance optimization,programming model,coprocessors,sparse matrix-vector multiplication,parallel architecture,vectors,coalescing,tuning,instruction sets,dynamic analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要