Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG

Parallel Architectures and Compilation Techniques(2013)

引用 31|浏览28
暂无评分
摘要
Modern microprocessors exploit data level parallelism through in-core data-parallel accelerators in the form of short vector ISA extentions such as SSE/AVX and NEON. Although these ISA extentions have existed for decades, compilers do not generate good quality, high-performance vectorized code without significant programmer intervention and manual optimization. The fundamental problem is that the architecture is too rigid, which overly complicates the compiler's role and simultaneously restricts the types of codes that the compiler can profitably map to these data-parallel accelerators. We take a fundamentally new approach that first makes the architecture more flexible and exposes this flexibility to the compiler. Counter-intuitively, increasing the complexity of the accelerator's interface to the compiler enables a more robust and efficient system that supports many types of codes. This system also enables the performance of auto-acceleration to be comparable to that of manually-optimized implementations. To address the challenges of compiling for flexible accelerators, we propose a variant of Program Dependence Graph called the Access Execute Program Dependence Graph to capture spatio-temporal aspects of memory accesses and computations. We implement a compiler that uses this representation and evaluate it by considering both a suite of kernels developed and tuned for SSE, and “challenge” data-parallel applications, the Parboil benchmarks. We show that our compiler, which targets the DySER accelerator, provides high-quality code for the kernels and full applications, commonly reaching within 30% of manually-optimized and out-performs compiler-produced SSE code by 1.8×.
更多
查看译文
关键词
graph theory,microprocessor chips,optimisation,parallel processing,program compilers,DySER accelerator,PDG,SIMD shackles,access execute program dependence graph,compilers,data level parallelism,data-parallel accelerators,flexible microarchitecture,microprocessors,optimization,spatio-temporal aspects,vector ISA extentions,Accelerators,Access Execute Program Dependence Graph,DySER,SIMD,Vectorization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要