LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators
CoRR(2023)
摘要
In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a
promising approach to address the rapidly growing computational demands of Deep
Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC
accelerators achieves high degrees of parallelism. However, two challenges that
arise in this approach are the highly non-uniform distribution of layer
processing times and high area requirements. We propose LRMP, a method to
jointly apply layer replication and mixed precision quantization to improve the
performance of DNNs when mapped to area-constrained NVM-based IMC accelerators.
LRMP uses a combination of reinforcement learning and integer linear
programming to search the replication-quantization design space using a model
that is closely informed by the target hardware architecture. Across five DNN
benchmarks, LRMP achieves 2.8-9$\times$ latency and 11.8-19$\times$ throughput
improvement at iso-accuracy.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要