Micro-Kernels for Portable and Efficient Matrix Multiplication in Deep Learning

Guillermo Alaejos,Adrián Castelló,Héctor Martínez,Pedro Alonso-Jordá,Francisco D. Igual,Enrique S. Quintana-Ortí

Research Square (Research Square)（2022）

引用 0|浏览0

暂无评分

摘要

Abstract We demonstrate that it is possible to rapidly assemble an amplevariety of high performance micro-kernels for the general matrix multiplica-tion (gemm) using vector intrinsics to exploit the SIMD (single instruction,multiple data) units in current general-purpose processors. For the particulartype of applications arising in deep learning, our experiments expose that theintrinsics-based micro-kernels can deliver efficiency on par with or even higherthan the conventional, carefully tuned implementations of gemm in currentlinear algebra libraries for ARM-based processors equipped with 128-bit SIMDunits and, to a lower extent, in processors with 512-bit SIMD units.

查看译文

关键词

efficient matrix multiplication,deep learning,portable,micro-kernels

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要