CRANIA: Unlocking Data and Value Reuse in Iterative Neural Network Architectures

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)(2020)

引用 1|浏览2
暂无评分
摘要
A common inefficiency in traditional Convolutional Neural Network (CNN) architectures is that they do not adapt to variations in inputs. Not all inputs require the same amount of computation to be correctly classified, and not all of the weights in the network contribute equally to generate the output. Recent work introduces the concept of iterative inference, enabling per-input approximation. Such an iterative CNN architecture clusters weights based on their importance and saves significant power by incrementally fetching weights from off-chip memory until the classification result is accurate enough. Unfortunately, this comes at a cost of increased execution time since some inputs need to go through multiple rounds of inference, negating the savings in energy. We propose Cache Reuse Approximation for Neural Iterative Architectures (CRANIA) to overcome this inefficiency. We recognize that the re-execution and clustering built into these iterative CNN architectures unlock significant temporal data reuse and spatial value reuse, respectively. CRANIA introduces a lightweight cache+compression architecture customized to the iterative clustering algorithm, enabling up to 9 × energy savings and speeding up inference by 5.8 × with only 0.3% area overhead.
更多
查看译文
关键词
traditional Convolutional Neural Network architectures,iterative inference,per-input approximation,iterative CNN architecture clusters weights,increased execution time,CRANIA,spatial value reuse,lightweight cache+compression architecture,iterative clustering algorithm,energy savings,unlocking data,Iterative Neural Network Architectures,temporal data reuse,Cache Reuse Approximation for Neural Iterative Architectures
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要