Efficient Join Synopsis Maintenance for Data Warehouse

SIGMOD/PODS '20: International Conference on Management of Data Portland OR USA June, 2020(2020)

引用 10|浏览71
暂无评分
摘要
Various sources such as daily business operations and sensors from different IoT applications constantly generate a lot of data. They are often loaded into a data warehouse system to perform complex analysis over. It, however, can be extremely costly if the query involves joins, especially many-to-many joins over multiple large tables. A join synopsis, i.e., a small uniform random sample over the join result, often suffices as a representative alternative to the full join result for many applications such as histogram construction, model training and etc. Towards that end, we propose a novel algorithm SJoin that can maintain a join synopsis over a pre-specified general θ-join query in a dynamic database with continuous inflows of updates. Central to SJoin is maintaining a weighted join graph index, which assists to efficiently replace join results in the synopsis upon update. We conduct extensive experiments using TPC-DS and a simulated road sensor data over several complex join queries and they demonstrate the clear advantage of SJoin over the best available baseline.
更多
查看译文
关键词
join synopsis, random sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要