Enabling Efficient Random Access to Hierarchically Compressed Text Data on Diverse GPU Platforms

Yihua Hu,Feng Zhang, Yifei Xia, Zhiming Yao, Letian Zeng,Haipeng Ding,Zhewei Wei,Xiao Zhang,Jidong Zhai,Xiaoyong Du,Siqi Ma

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS（2023）

Cited 0|Views135

No score

Abstract

The tremendous computing capacity of GPU offers significant potential in processing hierarchically compressed text data without decompression. However, current GPU techniques offer only traversal-based text data analytics; random access is exceedingly inefficient, limiting their utility significantly. To address this issue, we develop a novel and widely applicable solution that prompts random access to hierarchically compressed text data without decompression in GPU memory. We address three main challenges for enabling efficient random access to compressed text data on GPUs. The first challenge is designing GPU data structures that facilitate random access. The second challenge is efficiently generating data structures on GPU. The CPU is inefficient when generating data structures for random access, and this inefficiency increases considerably when PCIe transmission is incorporated. The third challenge is query processing on compressed text data in GPU memory. Random accesses, such as data updates, cause massive conflicts among countless threads. In order to address the first challenge, we develop several compressed GPU data structures, including indexing within the intricate GPU memory hierarchy. To handle the second challenge, we propose a two-phase process for producing these data structures on GPU. For the third challenge, a double-parsing design is proposed as a solution to avoid conflicts. We evaluate our solution on three platforms, two server-grade GPU platforms and one edge-grade GPU platform, using five real-world datasets. Experimental results show that random access operations on GPU achieve an average speedup of 52.98x compared to the state-of-the-art solution.

Translated text

Key words

Big data applications,data compression,parallel architectures,query processing,text analysis

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined