Separation or Not: On Handing Out-of-Order Time-Series Data in Leveled LSM-Tree

Yuyuan Kang,Xiangdong Huang,Shaoxu Song,Lingzhe Zhang,Jialin Qiao,Chen Wang,Jianmin Wang,Julian Feinauer

2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022)（2022）

引用 6|浏览17

暂无评分

摘要

LSM-Tree is widely adopted for storing time-series data in Internet of Things. According to conventional policy (denoted by pi(c)), when writing, the data will first be buffered in MemTable in memory. When it is full, the data will be written to the disk to form SSTables. Compaction is triggered to sort the data in each layer of the LSM-Tree on the disk. However, the arrival of data can be unordered due to reasons such as transition delay. Apache IoTDB uses in-order and out-of-order MemTables to separately buffer the in-order and out-of-order data to accelerate queries, namely the separation policy (denoted by pi(s)). However, given a specific space of memory budget to buffer the data, write amplification (WA) of the leveled LSM-Tree will be influenced by pi(s). Whether the influence by separation is positive or negative, and how intense WA is influenced, depend on the properties of workloads and the capacity of the in-order and out-of-order MemTables. It is highly demanded to build robust models for estimating the expected amount of data rewritten in each compaction, and predicting the WA under pi(c) and pi(s). Note that as an industrial paper, rather than proposing novel techniques for research problems, we focus on the practice of whether separating or not for lower write amplification. Experiments on synthetic and real-world datasets show that the models for estimating WA are accurate under various delay distributions. In addition, based on the estimation models, we implement an analyzer module in the open-source Apache IoTDB, for choosing the policy with lower WA. We apply the method in the use case of our industrial partner, a service provider of engineering machinery. The use case verifies the effectiveness of deciding whether separation or not by WA estimation.

查看译文

关键词

Leveled LSM-Tree, Write Amplification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要