Predictive data locality optimization for higher-order tensor computations

PLDI(2021)

引用 5|浏览15
暂无评分
摘要
ABSTRACTAutomating locality optimization is still an open problem for compiler writers. Compiler-based approaches, guided by analytical cost models have achieved some success in matching high performance libraries on a restricted set of computations such as general matrix multiply (GEMM). On the other hand, library-based approaches may present some open scalability concerns. Recent developments in convolutional neural networks has seen an explosion of models, each with differing combinations of parameters. Manually tuning each of these configurations can take many development months. Further, these operations are called multiple times during machine learning training, which necessitates highly optimized implementations. 2D convolutional operators are unique in that they consist of 7-deep loop nests with different loops carrying reuse for different tensors, making the problem of identifying an optimal loop ordering hard. We devise a machine learning-based compiler which learns a regression model, correlating performance with the loop order. We integrate this model with other traditional compiler analysis for transformations such as loop unrolling and vectorization, relying on the MultiLevel Intermediate Representation (MLIR) compiler framework. We achieve an average speedup of 1.67x and 1.41x against oneDNN for 2D convolution forward and weight update kernels respectively. We are also at 0.88x and 0.96x the performance of oneDNN’s best performing implementation which applies additional data layout transformations.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要