Impact of Vectorization and Multithreading on Performance and Energy Consumption on Jetson Boards

Sylvain Jubertie,Emmanuel Melin, Naly Raliravaka, Emmanuel Bodèle, Pablo Escot Bocanegra

2018 International Conference on High Performance Computing & Simulation (HPCS)(2018)

引用 1|浏览4
暂无评分
摘要
ARM processors are well known for their energy efficiency and are consequently widely used in embedded platforms. Like other processor architectures, they are built with different levels of parallelism, from Instruction Level Parallelism (out-of- order and superscalar capabilities) to Thread Level Parallelism (multicore), to increase their performance levels. These processors are now also targeting the HPC domain and will equip the Fujitsu Post-K supercomputer. Some ARM processors from the Cortex-A series, which equip smartphones and tablets, also provide Data Level Parallelism through SIMD units called NEON. These units are able to process 128-bit of data at a time, for example four 32bit floating point values. Taking advantage of these units requires code vectorization which may be performed automatically by the compiler or explicitly by using NEON intrinsics. Exploiting all these levels of parallelism may lead to better performance as well as a higher energy consumption. This is not an issue in the HPC domain where application development is driven by finding the best performance. However, developing for embedded applications is driven by finding the best trade-off between energy consumption and performance. In this paper, we propose to study the impact of vectorization and multithreading on both performance and energy consumption on some Nvidia Jetson boards. Results show that depending on the algorithm and on its implementation, vectorization may bring a similar speedup as an OpenMP scalar implementation but with a lower energy consumption. However, combining vectorization and multithreading may lead close to both the best performance level and the lowest energy consumption but not when running cores at their maximum frequencies.
更多
查看译文
关键词
digital signal processor,NEON intrinsics,multithreading,Fujitsu post-k supercomputer,thread level parallelism,instruction level parallelism,data level parallelism,smartphones,processor architectures,embedded platforms,energy efficiency,ARM processors,lower energy consumption,Nvidia Jetson boards,embedded applications,HPC domain,higher energy consumption,code vectorization,SIMD units,tablets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要