Architecture-Centric Bottleneck Analysis for Deep Neural Network Applications

2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)(2019)

引用 3|浏览27
暂无评分
摘要
The ever-growing complexity and popularity of machine learning and deep learning applications have motivated an urgent need of effective and efficient support for these applications on contemporary computing systems. In this paper, we thoroughly analyze the various DNN algorithms on three widely used architectures (CPU, GPU, and Xeon Phi). The DNN algorithms we choose for evaluation include i) Unet - for biomedical image segmentation, based on Convolutional Neural Network (CNN), ii) NMT - for neural machine translation based on Recurrent Neural Network (RNN), iii) ResNet-50, and iv) DenseNet - both for image processing based on CNNs. The ultimate goal of this paper is to answer four fundamental questions: i) whether the different DNN networks exhibit similar behavior on a given execution platform? ii) whether, across different platforms, a given DNN network exhibits different behaviors? iii) for the same execution platform and the same DNN network, whether different execution phases have different behaviors? and iv) are the current major general-purpose platforms tuned sufficiently well for different DNN algorithms? Motivated by these questions, we conduct an in-depth investigation of running DNN applications on modern systems. Specifically, we first identify the most time-consuming functions (hotspot functions) across different networks and platforms. Next, we characterize performance bottlenecks and discuss them in detail. Finally, we port selected hotspot functions to a cycle-accurate simulator, and use the results to direct architectural optimizations to better support DNN applications.
更多
查看译文
关键词
DNN,CPU,GPU,Xeon Phi,Characterization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要