High Performance Parallel Stochastic Gradient Descent In Shared Memory

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(2016)

引用 38|浏览199
暂无评分
摘要
Stochastic Gradient Descent (SGD) is a popular optimization method used to train a variety of machine learning models. Most of SGD work to-date has concentrated on improving its statistical efficiency, in terms of rate of convergence to the optimal solution. At the same time, as parallelism of modern CPUs continues to increase through progressively higher core counts, it is imperative to understand the parallel hardware efficiency of SGD, which often comes at odds with its statistical efficiency.In this paper, we explore several modern parallelization methods of SGD on a shared memory system, in the context of sparse and convex optimization problems. Specifically, we develop optimized parallel implementations of several SGD algorithms, and show that their parallel efficiency is severely limited by inter-core communication. We propose a new, scalable, communication-avoiding, many-core friendly implementation of SGD, called HogBatch; which exposes parallelism on several levels, minimizes the impact on statistical efficiency, and, as a result significantly outperforms the other methods. On a variety of datasets, HogBatch demonstrates near linear scalability on a system with 14 cores, as well as delivers up to a 20X speedup over previous methods.
更多
查看译文
关键词
linear scalability,HogBatch,scalable communication,intercore communication,optimized parallel implementations,convex optimization problem,sparse optimization problem,shared memory system,parallel hardware efficiency,CPU parallelism,optimal solution,convergence rate,statistical efficiency improvement,SGD,optimization method,high-performance parallel stochastic gradient descent
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要