Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem

2022 25th Euromicro Conference on Digital System Design (DSD)（2022）

引用 0|浏览20

暂无评分

摘要

Machine Learning (ML) frameworks are tools that facilitate the development and deployment of ML models. These tools are major catalysts of the recent explosion in ML models and hardware accelerators thanks to their high programming abstraction. However, such an abstraction also obfuscates the run-time execution of the model and complicates the understanding and identification of performance bottlenecks. In this paper, we demystify how a modern ML framework manages code execution from a high-level programming language. We focus our work on the TensorFlow eager execution, which remains obscure to many users despite being the simplest mode of execution in TensorFlow. We describe in detail the process followed by the runtime to run code on a CPU-GPU tandem. We propose new metrics to analyze the framework's runtime performance overhead. We use our metrics to conduct in-depth analysis of the inference process of two Convolutional Neural Networks (CNNs) (LeNet-5 and ResNet-50) and a transformer (BERT) for different batch sizes. Our results show that GPU kernels execution need to be long enough to exploit thread parallelism, and effectively hide the runtime overhead of the ML framework.

查看译文

关键词

ML frameworks,TensorFlow eager execution,profiling,CPU-GPU tandem

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要