High performance deep neural network on low cost mobile GPU

2016 IEEE International Conference on Consumer Electronics (ICCE)(2016)

引用 13|浏览9
暂无评分
摘要
In recent years, machine learning based on deep neural networks (DNN) is playing an increasingly important role. Artificial intelligence applications using DNN are achieving higher and higher accuracy levels. However, the multi-layer characteristic of a DNN makes for huge computational complexity consumption requirements. In order to feasibly run DNN applications on mobile devices, an efficient DNN flow optimized for a mobile GPU is desired. In this paper, a mobile-GPU-accelerated DNN flow is proposed. By the proposed input buffer address remapping scheme, shader assembly code optimization and kernel merging between computing nodes, 10.6 FPS is achieved in a 35.2 GFLOPS mobile GPU with 94.9mJ per frame, which is a 58x speed up and a 104x more energy efficient compared to a pure mobile CPU solution. Compared with state-of-the-art GPU accelerator devices and libraries, the proposed scheme provides a 226%∼1000% higher computing efficiency.
更多
查看译文
关键词
high performance deep neural network,machine learning,deep neural network,multilayer characteristic,computational complexity,mobile device,mobile GPU,mobile-GPU-accelerated DNN flow,buffer address remapping scheme,shader assembly code optimization,10.6 FPS,35.2 GFLOPS mobile GPU,GPU accelerator device,GPU accelerator library
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要