Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
arXiv: Distributed, Parallel, and Cluster Computing, Volume abs/1606.04487, 2016.
We study the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs. We first focus on the single-node setting and show that by using standard batching and data-parallel tech...More
Full Text (Upload PDF)