Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks

Peng Gong, Yanhe Ma,Cheng Li, Xiaoyan Ma,Sam H. Noh

arXiv (Cornell University)(2023)

引用 0|浏览3
暂无评分
摘要
In this paper, we primarily focus on understanding the data preprocessing pipeline for DNN Training in the public cloud. First, we run experiments to test the performance implications of the two major data preprocessing methods using either raw data or record files. The preliminary results show that data preprocessing is a clear bottleneck, even with the most efficient software and hardware configuration enabled by NVIDIA DALI, a high-optimized data preprocessing library. Second, we identify the potential causes, exercise a variety of optimization methods, and present their pros and cons. We hope this work will shed light on the new co-design of ``data storage, loading pipeline'' and ``training framework'' and flexible resource configurations between them so that the resources can be fully exploited and performance can be maximized.
更多
查看译文
关键词
neural networks,training,deep
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要