Accelerating Graph and Machine Learning Workloads Using a Shared Memory Multicore Architecture with Auxiliary Support for In-hardware Explicit Messaging

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(2017)

引用 25|浏览101
暂无评分
摘要
Shared Memory stands out as a sine qua non for parallel programming of many commercial and emerging multicore processors. It optimizes patterns of communication that benefit common programming styles. As parallel programming is now mainstream, those common programming styles are challenged with emerging applications that communicate often and involve large amount of data. Such applications include graph analytics and machine learning, and this paper focuses on these domains. We retain the shared memory model and introduce a set of lightweight in-hardware explicit messaging instructions in the instruction set architecture (ISA). A set of auxiliary communication models are proposed that utilize explicit messages to accelerate synchronization primitives, and efficiently move computation towards data. The results on a 256-core simulated multicore demonstrate that the proposed communication models improve performance and dynamic energy by an average of 4× and 42% respectively over traditional shared memory.
更多
查看译文
关键词
parallel programming,programming styles,graph analytics,machine learning workload,shared memory multicore architecture,lightweight in-hardware explicit messaging instructions,instruction set architecture,auxiliary communication models,synchronization primitives,simulated multicore,performance improve,dynamic energy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要