High-Dimensional Similarity Searches Using A Metric Pseudo-Grid

ICDE Workshops(2005)

引用 20|浏览0
暂无评分
摘要
Despite the proposal of numerous tree-based access structures for high dimensional similarity searches, techniques based on a sequential scan have been shown to be simple yet quite efficient alternatives. Given that random accesses to disk are expensive, a linear scan of the (smaller) pre-processed dataset is often much more efficient than even a relatively small number of random disk accesses yielded by tree-based indices. In this paper we present a technique which uses a pseudo-partition of a general metric space analog to the VA-file's partition of the vector space. The rationale is to use a number of pivot objects in the metric space, each one determining a number of hyper-rings in this space. The intersection of those rings, determine pseudo-cells analog to the VA-file cells in the vector space. In order to speedup query processing the data set is clustered (using any applicable clustering technique). Clusters not intersecting cells intersected by the query region cannot contribute to the answer set. Thus, only a few clusters are searched using an I/O efficient linear scan of the cluster's data. The proposed technique, which we call the M-GRID, is, by construction, applicable to both general metric spaces and to traditional vector spaces as long as a metric distance function is used. The M-GRID is robust to several parameters and experiments with synthetic and real data sets show that it is able to perform nearest neighbor queries up to 10 times faster than the VA-File.
更多
查看译文
关键词
high-dimensional similarity,general metric space analog,traditional vector space,metric pseudo-grid,proposed technique,metric space,efficient alternative,vector space,applicable clustering technique,metric distance function,general metric space,indexing,similarity search,random access,vectors,clustering algorithms,protein sequence,distance function,robustness,information retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要