SWIRL plus plus : Evaluating Performance Models to Guide Code Transformation in Convolutional Neural Networks

LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2019(2021)

引用 2|浏览10
暂无评分
摘要
Convolutional Neural Networks (CNNs) are ubiquitous in applications ranging from self-driving cars to various branches of health care. CPUs with large core counts and wide SIMD support are used in HPC clusters and supercomputers; therefore, high-performance CPU implementations of CNNs are valuable, in addition to the more prevalent GPU implementations. In this paper, we describe SWIRL++, an optimization approach for CNNs that incorporates an analytical performance model to identify optimization strategies that minimize data movement overheads of CNN execution. We integrate the model with the SWIRL DSL compiler to automatically generate high-performance implementations of CNNs, optimized for cache hierarchies, and both thread-level and SIMD parallelism. We compare resulting performance of generated code with TensorFlow, integrated with Intel's MKL-DNN library (TF-MKL), and PyTorch on an Intel Xeon 8280 CascadeLake platform. Performance exceeds PyTorch on average by 2x, and is comparable on average for both TF-MKL and the SWIRL compiler, showing that an automated code optimization approach achieves performance comparable to handtuned libraries and DSL compiler techniques.
更多
查看译文
关键词
Optimizing compilers, Convolutional neural networks, Performance models, Autotuning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要