Mapping Streaming Applications onto GPU Systems

Huynh Phung Huynh,Andrei Hagiescu,Zhong-Liang Ong,Weng-Fai Wong,Rick Siow Mong Goh

IEEE Trans. Parallel Distrib. Syst.（2014）

引用 11|浏览40

暂无评分

摘要

Graphics processing units leverage on a large array of parallel processing cores to boost the performance of the streaming computation patterns frequently found in graphics applications. Unfortunately, while many other general purpose applications also exhibit streaming behavior, they possess unfavorable data layout and poor computation-to-communication ratios that may penalize any straight-forward GPU implementation. In this paper we describe a performance-driven code generation framework that maps general purpose streaming applications onto GPU systems. This automated framework takes into account the idiosyncrasies of the GPU pipeline and the unique memory hierarchy. The framework has been implemented as a back-end to the StreamIt programming language compiler. Several key features in this framework ensure maximized performance and scalability. First, the generated code increases the effectiveness of the on-chip memory hierarchy by employing a heterogeneous mix of compute and memory access threads. Our scheme goes against the conventional wisdom of GPU programming which is to use a large number of homogeneous threads. Second, we utilise an efficient stream graph partitioning algorithm to handle larger applications and achieve the best performance under the given on-chip memory constraints. Lastly, the framework maps complex applications onto multiple GPUs using a highly effective parallel execution scheme. Our comprehensive experiments show its scalability and significant speedup compared to a previous state-of-the-art solution.

查看译文

关键词

gpu pipeline,gpu systems,gpu,streamit,streaming application,graphics processing units,gpu programming,streamit programming language compiler,general purpose streaming applications,streaming application mapping,scalable,memory access threads,multi-gpu,stream graph partitioning algorithm,onchip memory hierarchy,performance-driven code generation framework,program compilers,pipeline processing,memory management,steady state,schedules,layout,parallel processing,instruction sets

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要