Bit-pragmatic Deep Neural Network Computing.

MICRO(2017)

引用 262|浏览181
暂无评分
摘要
Deep Neural Networks expose a high degree of parallelism, making them amenable to highly data parallel architectures. However, data-parallel architectures often accept inefficiency in individual computations for the sake of overall efficiency. We show that on average, activation values of convolutional layers during inference in modern Deep Convolutional Neural Networks (CNNs) contain 92% zero bits. Processing these zero bits entails ineffectual computations that could be skipped. We propose Pragmatic (PRA), a massively data-parallel architecture that eliminates most of the ineffectual computations on-the-fly, improving performance and energy efficiency compared to state-of-the-art high-performance accelerators [5]. The idea behind PRA is deceptively simple: use serial-parallel shift-and-add multiplication while skipping the zero bits of the serial input. However, a straightforward implementation based on shift-and-add multiplication yields unacceptable area, power and memory access overheads compared to a conventional bit-parallel design. PRA incorporates a set of design decisions to yield a practical, area and energy efficient design. Measurements demonstrate that for convolutional layers, PRA is 4.31$times$ faster than DaDianNao [5] (DaDN) using a 16-bit fixed-point representation. While PRA requires 1.68$times$ more area than DaDN, the performance gains yield a 1.70$times$ increase in energy efficiency in a 65nm technology. With 8-bit quantized activations, PRA is 2.25$times$ faster and 1.31$times$ more energy efficient than an 8-bit version of DaDN. CCS CONCEPTS • Computing methodologies $rightarrow$ Machine learning; Neural networks; • Computer systems organization $rightarrow$ Single instruction, multiple data; • Hardware $rightarrow$ Arithmetic and datapath circuits;
更多
查看译文
关键词
Hardware Accelerators, Machine Learning, Neural Networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要