Overcoming Data Transfer Bottlenecks in DNN Accelerators via Layer-Conscious Memory Managment.

FPGA(2019)

引用 10|浏览164
暂无评分
摘要
Deep Neural Networks (DNNs) are rapidly evolving to satisfy the performance and accuracy requirements in many real world applications. The evolution renders DNNs more and more complex in terms of network topology, data sizes and layer types. Currently most state-of-the-art DNN accelerators adopt a uniform memory hierarchy (UMH) design methodology, which means that the data transferring of all convolutional and fully connected layers must go through the same memory levels. Unfortunately, for some layers, the performance is always bounded by off-chip memory transferring. It is caused by the saturating of data reuse happening in on-chip buffers, resulting in underutilization of on-chip memory. To address this issue, we propose a layer-conscious memory hierarchy (LCMH) methodology for DNN accelerators. LCMH could determine the memory levels of all the layers according to their requirements for off-chip memory bandwidth and on-chip buffer size for the data sources. As a result, the off-chip memory footprints of memory bounded layers could be avoided by keeping the data of them on chip. In addition, we provide architectural support for the accelerators equipped with LCMH. Experimental results show that designs with layer- conscious memory management could achieve up to 36% speedup compared with the designs wth UMH and 5% improvement over state-of-the-art designs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要