NASA: Accelerating Neural Network Design with a NAS Processor

2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)(2021)

Cited 4|Views4
No score
Abstract
Neural network search (NAS) projects a promising direction to automate the design process of efficient and powerful neural network architectures. Nevertheless, the NAS techniques have to dynamically generate a large number of candidate neural networks, and iteratively train and evaluate these on-line generated network architectures, thus they are extremely time-consuming even when deployed on large GPU clusters, which dramatically hinders the adoption of NAS. Though recently there are many specialized architectures proposed to accelerate the training or inference of neural networks, we observe that existing neural network accelerators are typically targeted at static neural network architectures, and they are not suitable to accelerate the evaluation of the dynamical neural network candidates evolving during the NAS process, which cannot be deployed onto current accelerators via the off-line compilation.To enable rapid and energy-efficient NAS in compact single-chip solutions, we propose NASA, a specialized architecture for one-shot based NAS acceleration. It is able to generate, schedule, and evaluate the candidate neural network architectures for the target machine learning workload with high speed, significantly alleviating the processing bottleneck of one-shot NAS. Motivated by the observation that there are considerable computation sharing opportunities among the different neural network candidates generated in one-shot NAS, NASA is equipped with an on-chip network fusion unit to remove the redundant computation during the network mapping stage. In addition, the NASA accelerator can partition and re-schedule the candidate neural network architectures at fine-granularity to maximize the chance of data reuse and improve the utilization of the accelerator arrays integrated to accelerate network evaluation. According to our experiments on multiple one-shot NAS tasks, NASA achieves 33.52× performance speedup and 214.33× energy consumption reduction on average when compared to aCPU-GPU system.
More
Translated text
Key words
one-shot NAS tasks,network mapping stage,on-chip network fusion unit,candidate neural network architectures,NAS acceleration,energy-efficient NAS,dynamical neural network candidates,static neural network architectures,on-line generated network architectures,NAS processor,accelerating neural network design
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined