Efficient High Performance Collective Communication For The Cell Blade

ICS(2009)

引用 10|浏览16
暂无评分
摘要
This paper presents high-performance collective communication algorithms and implementations that exploit the unique architectural features of the Cell heterogeneous multicore processor. This paper specifically describes novel algorithms for the barrier, broadcast, reduce, all-reduce, and all-gather collective operations, and shows the efficiency of these by comparing them to the previous fastest known implementations of these operations targeting the Cell. The new implementations are faster than the published state-of-the-art, achieving up to 19.21 times the performance (95% reduction in latency) of the previous published collective communication work for the Cell [19, 25]. The results presented show performance both within a chip and across the two Cell chips on a Cell blade [10].
更多
查看译文
关键词
Collective communication,algorithms,reductions,Cell processor
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要