Optimizing Memory-Access Patterns for Deep Learning Accelerators

Zheng Hongbin, Oh Sejong, Wang Huiqing, Briggs Preston, Gai Jiading,Jain Animesh,Liu Yizhi, Heaton Rich,Huang Randy,Wang Yida

arxiv（2020）

引用 0|浏览18

暂无评分

摘要

Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.

查看译文

关键词

deep learning accelerators,deep learning,patterns,memory-access

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要