Big Data Analytics Performance For Large Out-Of-Core Matrix Solvers On Advanced Hybrid Architectures

Raghavendra Shruti Rao,Milton Halem,John Dorband

INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE(2015)

引用 0|浏览11
暂无评分
摘要
This paper examines the performance of advanced computer architectures for large Out-Of-Core matrices to assess the optimal Big Data system configurations., The performance evaluation is based on a large dense Lower-Upper Matrix Decomposition (LUD) employing a highly tuned, I/O managed, slab based LUD software package developed by the Lockheed Martin Corporation. We present extensive benchmark studies conducted with this package on UMBC's Bluegrit and Bluewave clusters, and NASA-GFSC's Discover cluster systems.Our results show the speedup for a single node achieved by Phi Co-Processors relative to the host CPU SandyBridge processor is about a 1.5X improvement, which is an even smaller relative performance gain compared with the studies by F. Masci where he obtains a 2-2.5x performance. Surprisingly, the Westmere with the Tesla GPU scales comparably with the Sandy Bridge and the Phi Co-Processor up to 12 processes and then fails to continue to scale. The performances across 20 CPU nodes of SandyBridge obtains a uniform speedup of 0.5X over Westmere for problem sizes of 10K, 20K and 40K unknowns. With an Infiniband DDR, the performance of Nehalem processors is comparable to Westmere without the interconnect.
更多
查看译文
关键词
Matrix Multiplication, Out-Of-Core Matrices, Hybrid Architectures, Phi Co Processors, Tesla GPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要