Parallel Prefix Sum with SIMD
ADMS@VLDB(2023)
摘要
The prefix sum operation is a useful primitive with a broad range of
applications. For database systems, it is a building block of many important
operators including join, sort and filter queries. In this paper, we study
different methods of computing prefix sums with SIMD instructions and multiple
threads. For SIMD, we implement and compare horizontal and vertical
computations, as well as a theoretically work-efficient balanced tree version
using gather/scatter instructions. With multithreading, the memory bandwidth
can become the bottleneck of prefix sum computations. We propose a new method
that partitions data into cache-sized smaller partitions to achieve better data
locality and reduce bandwidth demands from RAM. We also investigate four
different ways of organizing the computation sub-procedures, which have
different performance and usability characteristics. In the experiments we find
that the most efficient prefix sum computation using our partitioning technique
is up to 3x faster than two standard library implementations that already use
SIMD and multithreading.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要