Data-Aware Scheduling Of Legacy Kernels On Heterogeneous Platforms With Distributed Memory

SPAA(2010)

引用 30|浏览16
暂无评分
摘要
In this paper, we describe a runtime to automatically enhance the performance of applications running on heterogeneous platforms consisting of a multi-core (CPU) and a throughput-oriented many-core (CPU) The CPU and GPU are connected by a non-coherent interconnect such as PCI-E, and as such do not have shared memory Heterogeneous platforms available today such as [9] are of this type. Our goal is to enable the programmer to seamlessly use such a system without rewriting the application and with minimal knowledge of the underlying architectural details Assuming that applications perform function calls to computational kernels with available CPU and GPU implementations, our runtime achieves this goal by automatically scheduling the kernels and managing data placement. In particular, It intercepts function calls to well-known computational kernels and schedules them on CPU or GPU based on their argument size and location To improve performance, it defers all data transfers between the CPU and the GPU until necessary By managing data placement transparently to the programmer, it provides a unified memory view despite the underlying separate memory sub-systemsWe experimentally evaluate our runtime on a heterogeneous platform consisting of a 2 5GHz quad-core Xeon CPU and an NVIDIA C870 GPU Given array sorting, parallel reduction, dense and sparse matrix operations and ranking as computational kernels, we use our runtime to automatically retarget SSI [25], K-means [32] and two synthetic applications to the above platform with no code changes We find that, in most cases, performance improves if the computation is moved to the data, and not vice-versa. For instance, even if a particular instance of a kernel is slower on the GPU than on the CPU, the overall application may be faster if the kernel is scheduled on the GPU anyway, especially if the kernel data is already located on the GPU memory due to prior decisions Our results show that data-aware CPU/GPU scheduling improves performance by up to 25% over the best data-agnostic scheduling on the same platform
更多
查看译文
关键词
Heterogeneous platforms,multi-core processors,accelerators,distributed memory,runtime
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要