A gem5 Implementation of the Sequential Codelet Model: Reducing Overhead and Expanding the Software Memory Interface.

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis(2023)

引用 0|浏览5
暂无评分
摘要
Modern tasking models define applications in a fine-grained manner that necessitates lower overhead per segment of computation. Fine-grained tasks, if done right, enable higher utilization of many-core systems. While previous work has seen multiple implementations of hardware support for a variety of tasking models, many lack the support required by the rise of heterogeneity in high performance computing. Moreover, the previously proposed hardware supports are short of supporting the expanding memory interfaces for data-centric needs and memory utilization. In this paper, we propose and implement a hardware support scheme of the sequential codelet model (SCM). The hardware support makes it possible to demonstrate SCM’s potential advantage on heterogeneous workloads and SCM’s capability of supporting the expanding software memory interface. This hardware support can be simulated at the desired scale in gem5 by designing and adding custom hardware modules to a conventional system in simulation, therefore avoiding unnecessary development effort, The gem5 implementation of the Sequential Codelet Model functions as a foundation to demonstrate the benefits offered by the SCM program execution model by moving hardware support closer to program semantics. The implementation utilizes conventional Intel Skylake CPU cores and caches while maintaining an open path to heterogeneity, streaming, and data recoding. Minimal hardware modules per core effectively reduce the overhead of the PXM runtime without removing the possibility of conventional program execution on the system. We compare the overhead with DARTS, a software implementation of the base Codelet Model that has been shown to be an effective vehicle for fine-grained execution, and show a 20x reduction in overhead. The increased efficiency allows for even more finely-grained programs to be effective on the system.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要