A Highly Efficient SGEMM Implementation using DMA on the Intel/Movidius Myriad-2

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)(2020)

引用 4|浏览8
暂无评分
摘要
Reducing energy consumption and achieving high energy efficiency in computation has become the top priority in High Performance Computing. High energy efficiency generally requires high resource utilization since energy demand for any applications and architectures is dependent on active time. We show that by using DMA the 28nm CMOS node Myriad-2 Vision Processing Unit can achieve 25 GFLOPs/W for FP32 matrixmultiplication. Our main contributions are: (i) An analysis of data transfer needs for inner and outer-product formulations of matrix multiplication with respect to the Myriad-2 memory hierarchy, (ii) An efficient use of DMA for managing matrix block transfers between on-chip and main memory (iii) A detailed analysis of the effects of matrix block shapes and DRAM page faults on performance and energy efficiency.
更多
查看译文
关键词
SGEMM,Matrix multiplication,DMA,Myriad 2,Energy efficiency,High performance,Outer product formulation,DRAM page faults,high bandwidth
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要