Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction.

Richard L. Graham,Devendar Bureddy,Pak Lui, Hal Rosenstock,Gilad Shainer,Gil Bloch, Dror Goldenberg,Mike Dubman, Sasha Kotchubievsky, Vladimir Koushnir,Lion Levi,Alex Margolin, Tamir Ronen,Alexander Shpiner, Oded Wertheim,Eitan Zahavi

COMHPC@SC(2016)

引用 131|浏览54
暂无评分
摘要
Increased system size and a greater reliance on utilizing system parallelism to achieve computational needs, requires innovative system architectures to meet the simulation challenges. As a step towards a new network class of co-processors - intelligent network devices, which manipulate data traversing the data-center network, this paper describes the SHArP technology designed to offload collective operation processing to the network. This is implemented in Mellanox's SwitchIB-2 ASIC, using innetwork trees to reduce data from a group of sources, and to distribute the result. Multiple parallel jobs with several partially overlapping groups are supported each with several reduction operations in-flight. Large performance enhancements are obtained, with an improvement of a factor of 2.1 for an eight byte MPI Allreduce() operation on 128 hosts, going from 6.01 to 2.83 microseconds. Pipelining is used for an improvement of a factor of 3.24 in the latency of a 4096 byte MPI Allreduce() operations, declining from 46.93 to 14.48 microseconds.
更多
查看译文
关键词
scalable hierarchical aggregation protocol,hardware architecture,data reduction,system parallelism,innovative system architectures,coprocessors,intelligent network devices,data center network,SHArP technology,Mellanox SwitchIB-2 ASIC,network trees,parallel jobs,pipelining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要