Hybrid memory cube performance characterization on data-centric workloads.

SC(2015)

引用 42|浏览10
暂无评分
摘要
ABSTRACTThe Hybrid Memory Cube is an early commercial product embodying attributes of future stacked DRAM architectures, namely large capacity, high bandwidth, on-package memory controller, and high speed serial interface. We study the performance and energy of a Gen2 HMC on data-centric workloads through a combination of emulation and execution on an HMC FPGA board. An in-house FPGA emulator has been used to obtain memory traces for a small collection of data-centric benchmarks. Our FPGA emulator is based on a 32-bit ARM processor and non-intrusively captures complete memory access traces at only 20X slowdown from real time. We have developed tools to run combined trace fragments from multiple benchmarks on the HMC board, giving a unique capability to characterize HMC performance and power usage under a data parallel workload. We find that the HMC's separate read and write channels are not well exploited by read-dominated data-centric workloads. Our benchmarks achieve between 66% -- 80% of peak bandwidth (80 GB/s for 32-byte packets with 50--50 read/write mix) on the HMC, suggesting that combined read/write channels might show higher utilization on these access patterns. Bandwidth scales linearly up to saturation with increased demand on highly concurrent application workloads with many independent memory requests. There is a corresponding increase in latency, ranging from 80 ns on an extremely light load to 130 ns at high bandwidth.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要