Micro-Kernels for Portable and Efficient Matrix Multiplication in Deep Learning
Research Square (Research Square)(2022)
摘要
Abstract We demonstrate that it is possible to rapidly assemble an amplevariety of high performance micro-kernels for the general matrix multiplica-tion (gemm) using vector intrinsics to exploit the SIMD (single instruction,multiple data) units in current general-purpose processors. For the particulartype of applications arising in deep learning, our experiments expose that theintrinsics-based micro-kernels can deliver efficiency on par with or even higherthan the conventional, carefully tuned implementations of gemm in currentlinear algebra libraries for ARM-based processors equipped with 128-bit SIMDunits and, to a lower extent, in processors with 512-bit SIMD units.
更多查看译文
关键词
efficient matrix multiplication,deep learning,portable,micro-kernels
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要