Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors

Parallel and Distributed Processing Symposium Workshops & PhD Forum(2012)

引用 32|浏览0
SIMD vectorization has received significant attention in the past decade as an important method to accelerate scientific applications, media and embedded applications on SIMD architectures such as Intel® SSE, AVX, and IBM* AltiVec. However, most of the focus has been directed at loops, effectively executing their iterations on multiple SIMD lanes concurrently relying upon program hints and compiler analysis. This paper presents a set of new C/C++ high-level vector extensions for SIMD programming, and the Intel® C++ product compiler that is extended to translate these vector extensions and produce optimized SIMD instruction sequences of vectorized functions and loops. For a function, our main idea is to vectorize the entire function for callers instead of just vectorizing loops (if any) inside the function. It poses the challenge of dealing with complicated control-flow in the function body, and matching caller and callee for SIMD vector calls while vectorizing caller functions (or loops) and callee functions. Our compilation methods for automatically compiling vector extensions are described. We present performance results of several non-trivial visual computing, computational, and simulation workloads, utilizing SIMD units through the vector extensions on Intel® Multicore 128-bit SIMD processors, and we show that significant SIMD speedups (3.07x to 4.69x) are achieved over the serial execution.
simd vectorization,utilizing simd unit,simd vector call,simd instruction sequence,multicore-simd processors,compiling c,loop vectorizaion,high-level vector extension,multiple simd lane,significant simd speedup,multicore 128-bit simd processor,simd programming,simd extensions,vector extension,compiler,vectors,embedded systems,multicore,iterations,hardware,iterative methods,parallel processing,cloning,programming,simd,vectorization
AI 理解论文