Continuous top-k approximated join of streaming and evolving distributed data

SEMANTIC WEB(2020)

引用 1|浏览48
暂无评分
摘要
Continuously finding the most relevant (shortly, top-k) answer of a query that joins streaming and distributed data is getting a growing attention. In recent years, this is in particular happening in Social Media and IoT. It is well known that, in those settings, remaining reactive can be challenging, because accessing the distributed data can be highly time consuming as well as rate-limited. In this paper, we investigate the problem of continuous top-k query evaluation over a data stream joined with a distributed dataset in even a more extreme situation: the distributed data evolves. We propose the Topk+N algorithm and the AcquaTop framework. They keep up to date a local replica of the distributed dataset and guarantees reactiveness by construction, but to do so they may need to approximate the result. Therefore, we propose two maintenance policies to update the replica: the Top Selection Maintenance (AT-TSM) policy maximizes the relevancy, while the Border Selection Maintenance (AT-BSM) policy maximizes the accuracy of the top-k result. We contribute a theoretical proof of the correctness of Topk+N algorithm and we study its complexity. Moreover, we provide empirical evidence that the proposed policies within AcquaTop framework produce more relevant and accurate results than the state of the art.
更多
查看译文
关键词
Continuous top-k join,RDF data stream,distributed dataset,RSP engine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要