Computing Group-By and Aggregates on Massively Parallel Systems.

2023 IEEE International Conference on Big Data (BigData)(2023)

引用 0|浏览0
暂无评分
摘要
The Group-By/Aggregate operation stands as a pivotal element in query processing within database management systems. Commonly, this operation is executed through two primary methods: sorting the tuples and hashing the keys. While prior research has often favored hash-based implementations for GPU acceleration, sort-based approaches remain pertinent, particularly for datasets with high cardinality. Hash-based methods can become inefficient in managing memory under such conditions. In this study, we introduce a meticulously designed implementation based on radix-hashing, optimized specifically for GPUs, with a key focus on ensuring consistent performance, especially in scenarios involving high cardinalities. Empirical evaluations showcase that our design not only maintains performance consistency across varying levels of cardinality but also outperforms the current state-of-the-art hash-based GPU implementation by up to 10 times in terms of throughput. This performance improvement is most pronounced when dealing with high-cardinality datasets. Furthermore, our implementation’s performance is highly competitive with the state-of-the-art when processing datasets with low cardinality. These findings position our implementation as an excellent choice for executing the Group-By/Aggregate operation on high-cardinality datasets. Its consistent performance also makes it a robust candidate in scenarios where cardinality is uncertain or varies.
更多
查看译文
关键词
Query Processing,Group-By/Aggregate,GPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要