Efficient parallelization of the Discrete Wavelet Transform algorithm using memory-oblivious optimizations

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)(2015)

引用 2|浏览29
暂无评分
摘要
As the rate of single-thread CPU performance improvement per generation has diminished due to lower transistor-speed scaling and energy related issues, researchers and industry have shifted their interest towards multi-core and many-core architectures for improving performance. Comparisons between optimized applications for parallel architectures have been quantified many times in the literature, but contradictory results have been reported mainly due to biased methods of evaluating and comparing these architectures. In this paper, we present memory-oblivious optimizations of the widely used Discrete Wavelet Transform (DWT), and provide detailed comparisons of the algorithm on Intel and AMD multi-core CPUs, Nvidia many-core GPUs, as well as the Intel's Xeon Phi many-core coprocessor. Our results indicate that, compared to their respective non-optimized single thread implementations, memory-oblivious optimization delivers up to 17.9×–197.2× performance improvement for the various architectures examined. Furthermore, compared to the state-of-the-art, the presented CPU and GPU memory-oblivious implementations are 2.6× and 1.3× faster respectively than the fastest implementations of DWT currently available in the literature. No comparison to the state-of-the-art can be made for the Xeon Phi, as, to the best of our knowledge, this is the first study that optimizes the DWT for this newfangled architecture.
更多
查看译文
关键词
discrete wavelet transform algorithm,memory-oblivious optimizations,single-thread CPU performance improvement,transistor-speed scaling,energy related issues,multicore architectures,many-core architectures,parallel architectures,DWT,AMD multicore CPU,Intel multicore CPU,Intel Xeon Phi many-core coprocessor,nonoptimized single thread implementations,GPU memory-oblivious implementations,CPU memory-oblivious implementations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要