Accelerating deep neural network learning for speech recognition on a cluster of GPUs

SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis Denver CO USA November, 2017(2017)

引用 4|浏览38
暂无评分
摘要
We train deep neural networks to solve the acoustic modeling problem for large-vocabulary continuous speech recognition. We employ distributed processing using a cluster of GPUs. On modern GPUs, the sequential implementation takes over a day to train, and efficient parallelization without losing accuracy is notoriously hard. We show that ASGD methods for parallelization are not efficient for this application. Even with 4 GPUs, the overhead is significant, and the accuracies achieved are poor. We adapt a P-learner K-step model averaging algorithm that with 4 GPUs achieves accuracies comparable to that achieved by the sequential implementation. We further introduce adaptive measures that make our parallel implementation scale to the full cluster of 20 GPUs. Ultimately our parallel implementation achieves better accuracies than the sequential implementation with a 6.1 times speedup.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要