LLC Buffer for Arbitrary Data Sharing in Heterogeneous Systems

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)(2016)

引用 1|浏览21
暂无评分
摘要
Closely coupled CPU and GPGPU system with the shared last level cache (LLC) enables fine-grained data exchange. However, traditional data exchange causes unnecessary LLC misses and degrades the entire system performance. We believe that the cache organization is not suitable for the temporary data exchange in the closely coupled system. We analyze the memory access pattern and discover the inefficiency data exchange. When the exchanged data cannot fit in the LLC, the low LLC hit rate exacerbates core stalls and memory contention. We also show that the stalls cannot be entirely covered by increasing the compute load or parallelism. In previous work, a simple LLC buffer is introduced to replace the cache with an architecture-supported data queue. However, the simple design limits the data element size and requires a potentially very large storage for pending requests. In this paper, we propose an improved LLC buffer. It adopts element-atom data organization to enable data exchange of arbitrary size. A simple hardware-software collaborated protocol is adopted, and eliminates the pending requests. The experiment results reveal that it has an average speedup of 48.2% compared with the traditional way, but incurs a 7.5% slowdown compared with the simple LLC buffer due to the overhead of the protocol. We also compare it with the fine-grain task, which implements a data exchange channel between CPU and GPGPU. The results show that the improved LLC buffer has less storage overhead but higher access efficiency than the fine-grain task.
更多
查看译文
关键词
LLC buffer,arbitrary data sharing,heterogeneous systems,closely coupled CPU,GPGPU,shared last level cache,fine-grained data exchange,LLC misses,system performance,cache organization,temporary data exchange,closely coupled system,memory access pattern,LLC hit rate,memory contention,parallelism,architecture-supported data queue,data element size,hardware-software,fine-grain task,data exchange channel
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要