Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs

Kunpeng Xie, Ye Lu, Xinyu He, Dezhi Yi, Huijuan Dong,Yao Chen

ACM Transactions on Architecture and Code Optimization(2023)

引用 0|浏览0
暂无评分
摘要
Convolutional Neural Networks (CNNs) can benefit from the computational reductions provided by the Winograd minimal filtering algorithm and weight pruning. However, harnessing the potential of both methods simultaneously introduces complexity in designing pruning algorithms and accelerators. Prior studies aimed to establish regular sparsity patterns in the Winograd domain, but they were primarily suited for small tiles, with domain transformation dictating the sparsity ratio. The irregularities in data access and domain transformation pose challenges in accelerator design, especially for larger Winograd tiles. This paper introduces ”Winols,” an innovative algorithm-hardware co-design strategy that emphasizes the strengths of the large-tiling Winograd algorithm. Through a spatial-to-Winograd relevance degree evaluation, we extensively explore domain transformation and propose a cross-domain pruning technique that retains sparsity across both spatial and Winograd domains. To compress pruned weight matrices, we invent a relative column encoding scheme. We further design an FPGA-based accelerator for CNN models with large Winograd tiles and sparse matrix-vector operations. Evaluations indicate our pruning method achieves up to 80% weight tile sparsity in the Winograd domain without compromising accuracy. Our Winols accelerator outperforms dense accelerator by a factor of 31.7 × in inference latency. When compared with prevailing sparse Winograd accelerators, Winols reduces latency by an average of 10.9 ×, and improves DSP and energy efficiencies by over 5.6 × and 5.7 ×, respectively. When compared with the CPU and GPU platform, Winols accelerator with tile size 8 × 8 achieves 24.6 × and 2.84 × energy efficiency improvements, respectively.
更多
查看译文
关键词
Large-Tiling,Sparse CNNs,Cross-Domain Pruning,Winograd Accelerator
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要