Accelerating Deep Learning Training Through Transparent Storage Tiering

Marco Dantas,Diogo Leitão,Peter Cui,Ricardo Macedo,Xinlian Liu,Weijia Xu,João Paulo

2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)（2022）

引用 3|浏览34

暂无评分

摘要

We present Monarch,a framework-agnostic storage middleware that transparently employs storage tiering to accelerate Deep Learning (DL) training. It leverages existing storage tiers of modern supercomputers (i.e., compute node's local storage and shared parallel file system (PFS)), while considering the I/O patterns of DL frameworks to improve data placement across tiers. Monarchaims at accelerating DL training and decreasing the I/O pressure imposed over the PFS. We apply Monarchto TensorFlow and PyTorch, while validating its performance and applicability under different models and dataset sizes. Results show that, even when the training dataset can only be partially stored at local storage, Monarchreduces TensorFlow's and PyTorch's training time by up to 28% and 37% for I/O-intensive models, respectively. Furthermore, Monarchdecreases the number of I/O operations submitted to the PFS by up to 56%.

查看译文

关键词

I/O optimization,storage tiering,deep learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要