The Importance of Non-Data-Communication Overheads in MPI
International Journal of High Performance Computing Applications(2010)
摘要
With processor speeds no longer doubling every 18芒聙聰24 months owing to the exponential increase in power consumption and heat dissipation, modern high-end computing systems tend to rely less on the performance of single processing units and instead rely on achieving high performance by using the parallelism of a massive number of low-frequency/low-power processing cores. Using such low-frequency cores, however, puts a premium on end-host pre- and post-communication processing required within communication stacks, such as the Message Passing Interface (MPI) implementation. Similarly, small amounts of serialization within the communication stack that were acceptable on small/medium systems can be brutal on massively parallel systems. Thus, in this paper, we study the different non-data-communication overheads within the MPI implementation on the IBM Blue Gene/P system. Specifically, we analyze various aspects of MPI, including the MPI stack overhead itself, overhead of allocating and queueing requests, queue searches within the MPI stack, multi-request operations, and various others. Our experiments, that scale up to 131,072 cores of the largest Blue Gene/P system in the world (80% of the total system size), reveal several insights into overheads in the MPI stack, which were not previously considered significant, but can have a substantial impact on such massive systems.
更多查看译文
关键词
modern high-end computing system,communication stack,parallel system,non-data-communication overheads,single processing unit,low-power processing core,p system,massive system,mpi implementation,post-communication processing,medium system,low frequency,message passing interface
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要