Lineage Checkpoint Approach for Long-lineage Problem in Apache Spark

Minhyeok Kweun,Woo-Yeon Lee,Goeun Kim, Jisoo Hwang, Yoonkyong Lee

2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2020)

引用 2|浏览0
暂无评分
摘要
In distributed data processing frameworks, data lineage is widely used to achieve fault-tolerance efficiently. However, as data analytics apps become more complex, long lineage incurs the performance and reliability issues. Existing data checkpoint solutions can alleviate the problem, however they bring new types of heavy inevitable overheads or lack fault-tolerance. To overcome the limitations, we propose the solution that checkpoints lineage graph instead of data itself, preserving the lineage information and reconciling the full lineage graph when needed. In evaluations, we show that our lineage checkpoint solution outperforms the data checkpoint solutions in terms of performance and fault-tolerance.
更多
查看译文
关键词
checkpoints lineage graph,lineage information,long-lineage problem,apache spark,distributed data processing frameworks,data lineage,data analytics apps,reliability issues,lineage checkpoint solution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要