Performance Benefits of Heterogeneous Computing in HPC Workloads

Parallel and Distributed Processing Symposium Workshops & PhD Forum(2012)

引用 8|浏览0
Chip multi-processors (CMPs) with increasing number of processor cores are now becoming widely available. To take advantage of many-core CMPs, applications must be parallelized. However, due to the nature of algorithm / programming model, some parts of the application would remain serial. According to Amdahl's law, the speedup of a parallel application is limited by the amount of serial execution it has. For a CMP with many cores, this can be a serious limitation. To take full advantage of the increasing number of cores, one must try to reduce the execution time of the serial portion of a parallel program. However, rewriting an application takes time and often the return on the effort invested may not justify parallelizing every part of the program. Heterogeneous many-core CMP design is one possible solution to support massive parallel execution and to provide a reasonable single-thread performance. In this paper, we use a simple spreadsheet model to evaluate homogeneous and heterogeneous CMP designs using execution profiles of real HPC applications. Evaluated on 12 parallel HPC applications, we show that heterogeneous CMPs can outperform homogeneous CMPs by up to 1.35x with an average speedup of 1.06x when both the heterogeneous CMPs and homogeneous CMPs are constrained to use the same power budget. Our study found the heterogeneous CMPs can take advantage of serial portion of execution that is as little as 2% of total run time to provide performance benefit. This suggests heterogeneous computing can help mitigate the effect of not parallelizing some portions of an application due to return on investment concern on programming efforts.
hpc workloads,execution profile,execution time,heterogeneous many-core cmp design,heterogeneous computing,many-core cmps,homogeneous cmps,serial portion,massive parallel execution,heterogeneous cmp,heterogeneous cmps,performance benefits,instruction sets,amdahl law,computational modeling,hpc,processor cores,servers,benchmark testing,transistors,throughput,parallel processing
AI 理解论文
Chat Paper