SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs

2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)(2020)

引用 35|浏览30
暂无评分
摘要
General Matrix Multiplication (GEMM) is the key operation in Deep Neural Networks (DNNs). While dense GEMM uses SIMD CPUs efficiently, sparse GEMM is much less efficient, especially at the modest levels of unstructured sparsity common in DNN inference/training. Thus, most DNNs use dense GEMM.In this paper, we propose SAVE, a novel vector engine for CPUs that efficiently skips ineffectual computation due to sparsity in dense DNN implementations. SAVE's hardware extensions to the vector pipeline are transparent to software. SAVE accelerates FP32 and mixed-precision kernels with unstructured sparsity from both weights and activations. Further, SAVE is not DNN-specific and can potentially speed-up any vector workload with sparsity. To evaluate SAVE, we use simulations of a 28-core machine and run VGG16, ResNet-50, and GNMT, with and without pruning. With realistic sparsity, SAVE accelerates inference by 1.37x-1.68x and end-to-end training by 1.28x-1.64x.
更多
查看译文
关键词
deep neural network,CPU,training,inference,sparsity,vector processing unit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要