Optimizing Memory-Access Patterns for Deep Learning Accelerators

Zheng Hongbin, Oh Sejong, Wang Huiqing, Briggs Preston, Gai Jiading,Jain Animesh,Liu Yizhi, Heaton Rich,Huang Randy,Wang Yida

arxiv(2020)

引用 0|浏览18
暂无评分
摘要
Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.
更多
查看译文
关键词
deep learning accelerators,deep learning,patterns,memory-access
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要