MulRF: A Multi-dimensional Range Filter for Sublinear Time Range Query Processing
IEEE Transactions on Knowledge and Data Engineering(2024)
摘要
Range query is an important operation on big multi-dimensional data. This paper studies the problem of multi-dimensional range query filtering for speeding up the range query processing by avoiding reading the useless data. To solve the problem, a novel multi-dimensional range filter is proposed to filter the multi-dimensional range queries, while the existing one-dimensional range filters can not provide efficient filtering. Based on the multi-dimensional range filter, an efficient range query processing algorithm is presented. It can directly return the locations of the I/O units that contain the data in the query result without any access to the input dataset. The time complexity of the algorithm is
$O(3^{m}h)$
, where
$h$
is the number of I/O units partially overlapping with a range query, and
$m$
is the dimension number. Since
$m$
is usually
$o(\sqrt{\log n})$
, it is a sublinear time algorithm if
$V=O(n)$
, where
$n$
is the size of the input dataset,
$V=\prod _{i=1}^{m}d_{i}$
, and
$d_{i}$
is the number of distinct values on the
$i$
-th dimension of the dataset for
$1\leq i\leq m$
. Experimental results show that the multi-dimensional range filter has low false positive rate and good filtering efficiency. The proposed range query processing algorithm achieves at least 3
$\sim$
7 times improvement compared to the one-dimensional filter based algorithms on different datasets.
更多查看译文
关键词
Multi-dimensional Data,Range Query,Range Filter,Sublinear Time
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要