LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries
arxiv(2024)
摘要
With the proliferation of spatio-textual data, Top-k KNN spatial keyword
queries (TkQs), which return a list of objects based on a ranking function that
evaluates both spatial and textual relevance, have found many real-life
applications. Existing geo-textual indexes for TkQs use traditional retrieval
models like BM25 to compute text relevance and usually exploit a simple linear
function to compute spatial relevance, but its effectiveness is limited. To
improve effectiveness, several deep learning models have recently been
proposed, but they suffer severe efficiency issues. To the best of our
knowledge, there are no efficient indexes specifically designed to accelerate
the top-k search process for these deep learning models.
To tackle these issues, we propose a novel technique, which Learns to Index
the Spatio-Textual data for answering embedding based spatial keyword queries
(called LIST). LIST is featured with two novel components. Firstly, we propose
a lightweight and effective relevance model that is capable of learning both
textual and spatial relevance. Secondly, we introduce a novel machine learning
based Approximate Nearest Neighbor Search (ANNS) index, which utilizes a new
learning-to-cluster technique to group relevant queries and objects together
while separating irrelevant queries and objects. Two key challenges in building
an effective and efficient index are the absence of high-quality labels and
unbalanced clustering results. We develop a novel pseudo-label generation
technique to address the two challenges. Experimental results show that LIST
significantly outperforms state-of-the-art methods on effectiveness, with
improvements up to 19.21
three orders of magnitude faster than the most effective baseline.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要