AVX overhead profiling: How much does your fast code slow you down?

APSys(2020)

引用 4|浏览12
暂无评分
摘要
The AVX2 and AVX-512 instructions found in recent Intel CPUs can increase the performance of vectorized code. Their complexity and increased power consumption, however, causes the CPU to reduce its frequency. This frequency reduction can affect parts of the workload which do not use AVX2 or AVX-512, with previous work reporting an overall slowdown of more than 10% for various workloads with AVX-512-enabled parts. Although countermeasures against this frequency reduction overhead exist, they themselves cause additional overhead and are therefore only viable if the gains are larger than the additional overhead. It is, however, often not clear how much AVX2/AVX-512 frequency reduction overhead is present. In this paper, we describe a sampling profiler to determine the magnitude of the overhead as an aid during software development or during the selection of countermeasures. Our profiler temporarily stops individual CPU cores to let the cores recover their maximum (non-AVX) frequency. The profiler then observes whether the frequency is immediately reduced again once the workload is resumed to determine whether the previous frequency reduction was actually necessary. The resulting information is used to calculate the approximate AVX2/AVX-512 frequency reduction overhead. In the case of AVX-512, our prototype is able to estimate the overhead with an average error of 1.2 percentage points for various benchmarks. We describe potential improvements to our design, and we describe a novel hardware-software interface which would allow more accurate measurement of the overhead.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要