Multi-GPU work sharing in a task-based dataflow programming model

Joseph John,Josh Milthorpe,Thomas Herault,George Bosilca

Future Generation Computer Systems（2024）

引用 0|浏览0

暂无评分

摘要

Today, multi-GPU computing nodes are the mainstay of most high-performance computing systems. Despite significant progress in programmability, building an application that efficiently utilizes all the GPUs in a computing node is still a significant challenge, especially using the existing shared-memory and message-passing paradigms. In this aspect, the task-based dataflow programming model has emerged as an alternative for multi-GPU computing nodes.Most task-based dataflow runtimes have dynamic task mapping, where tasks are mapped to different GPUs based on the current load, but once the mapping has been established, there is no rebalancing of tasks even if an imbalance is detected. In this paper, we examine how automatic dynamic work sharing between GPUs within a compute node can improve the performance of an application through better workload distribution. We demonstrate performance improvement through dynamic work sharing using a Block-Sparse GEneral Matrix Multiplication (BSpGEMM) benchmark. Although we use PaRSEC, a task-based dataflow runtime, as the vehicle for this research, the ideas discussed here are transferable to any task-based dataflow runtime.

查看译文

关键词

Tasks,Runtime,Work sharing,PaRSEC,GPU,Load balancing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要