HETEROGENEOUS ARCHITECTURE FOR SPARSE DATA PROCESSING

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2022)

引用 1|浏览14
暂无评分
摘要
Sparse matrices are very common types of information used in scientific and machine learning applications including deep neural networks. Sparse data representations lead to storage efficiencies by avoiding storing zero values. However, sparse representations incur metadata computational overheads - software first needs to find row/column locations of non-zero values before performing necessary computations. Such metadata accesses involve indirect memory accesses (of the form a[b[i]] where a[.] and b[.] are large arrays) and they are cache and prefetch-unfriendly, resulting in frequent load stalls. In this paper, we will explore a dedicated hardware for a memory-side accelerator called Hardware Helper Thread (HHT) that performs all the necessary index computations to fetch only the nonzero elements from sparse matrix and sparse vector and supply those values to the primary core, creating heterogeneity within a single CPU core. We show both performance gains and energy savings of HHT for sparse matrix-dense vector multiplication (SpMV) and sparse matrix-sparse vector multiplication (SpMSpV). The ASIC HHT shows average performance gains ranging between 1.7 and 3.5 depending on the sparsity levels, vector-widths used by RISCV vector instructions and if the Vector (in Matrix-Vector multiplication) is sparse or dense. We also show energy savings of 19% on average when ASIC HHT is used compared to baseline (for SpMV), and the HHT requires 38.9% of a RISCV core area.
更多
查看译文
关键词
Sparse matrices,DNN,Hardware Accelerators,RISCV
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要