Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management

Proceedings of the 56th Annual Design Automation Conference 2019(2019)

引用 75|浏览178
暂无评分
摘要
Deep Neural Networks (DNNs) are becoming more and more complex than before. Previous hardware accelerator designs neglect the layer diversity in terms of computation and communication behavior. On-chip memory resources are underutilized for the memory bounded layers, leading to suboptimal performance. In addition, the increasing complexity of DNN structures makes it difficult to do on-chip memory allocation. To address these issues, we propose a layer conscious memory management framework for FPGA-based DNN hardware accelerators. Our framework exploits the layer diversity and the disjoint lifespan information of memory buffers to efficiently utilize the on-chip memory to improve the performance of the layers bounded by memory and thus the entire performance of DNNs. It consists of four key techniques working coordinately with each other. We first devise a memory allocation algorithm to allocate on-chip buffers for the memory bound layers. In addition, buffer sharing between different layers is applied to improve on-chip memory utilization. Finally, buffer prefetching and splitting are used to further reduce latency. Experiments show that our techniques can achieve 1.36X performance improvement compared with previous designs.
更多
查看译文
关键词
data transfer bottlenecks,FPGA-based DNN accelerators,layer diversity,communication behavior,suboptimal performance,DNN structures,on-chip memory allocation,layer conscious memory management framework,FPGA-based DNN hardware accelerators,disjoint lifespan information,memory buffers,entire performance,memory allocation algorithm,on-chip buffers,on-chip memory utilization,deep neural networks,hardware accelerator designs,onchip memory resources
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要