Micro-Kernels for Portable and Efficient Matrix Multiplication in Deep Learning

Research Square (Research Square)(2022)

引用 0|浏览0
暂无评分
摘要
Abstract We demonstrate that it is possible to rapidly assemble an amplevariety of high performance micro-kernels for the general matrix multiplica-tion (gemm) using vector intrinsics to exploit the SIMD (single instruction,multiple data) units in current general-purpose processors. For the particulartype of applications arising in deep learning, our experiments expose that theintrinsics-based micro-kernels can deliver efficiency on par with or even higherthan the conventional, carefully tuned implementations of gemm in currentlinear algebra libraries for ARM-based processors equipped with 128-bit SIMDunits and, to a lower extent, in processors with 512-bit SIMD units.
更多
查看译文
关键词
efficient matrix multiplication,deep learning,portable,micro-kernels
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要