BETZE: Benchmarking Data Exploration Tools with (Almost) Zero Effort

2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022)(2022)

引用 2|浏览10
暂无评分
摘要
In this paper, we propose BETZE, a benchmark generator to evaluate the performance of data exploration solutions for semi-structured data. It is tailored to the typical query capabilities of modern JSON document stores and can be extended to match more. At its core, the query generator mimics the behavior of a data scientist through a model similar to the random surfer idea known from PageRank. We propose preset parameters that pose different query loads to the system, intended to reflect novice, intermediate, and expert users interacting with the system. The proposed approach analyzes a given JSON dataset and generates queries into an intermediate representation that is then translated to system-specific query syntax. We have implemented support for MongoDB, PostgreSQL, jq, and our own JSON processor JODA, and describe how additional tools can be supported. To get started, we report on a first experimental study, showing the versatility of the benchmark generator, using the NoBench dataset, and real-world data obtained from Twitter and Reddit.
更多
查看译文
关键词
JSON,benchmarking,semi structured,exploring
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要