Concept-Driven Load Shedding: Reducing Size and Error of Voluminous and Variable Data Streams
2018 IEEE International Conference on Big Data (Big Data)(2018)
摘要
Load shedding is a technique that aims to ameliorate the consequences of the Velocity and the Volume of Big Data stream processing. When temporal input spikes appear, tuples are shed until a Stream Processing Engine's (SPE) processing capacity is not overwhelmed and results are produced in a timely fashion. Existing load shedding techniques have become obsolete and are not applicable to modern use-cases which require the extraction of patterns from continuously evolving (i.e., Variable) voluminous streams.In this work, we identify the shortcomings of existing load shedding techniques when applied to streams with concept drift. We propose Concept-Driven load shedding (CoD), which aims at limiting the data volume imposed on the SPE while producing high accuracy results. On top of that, we designed CoD for modern SPEs and made its overhead negligible. Our experiments indicate that CoD can deliver more than 10x more accurate results compared to the state of the art in load shedding. Also, CoD can offer up to 2.25× better performance compared to normal processing and reduce the processed data volume significantly.
更多查看译文
关键词
CoD,Big Data stream processing,load shedding,variable data streams,stream processing engine processing capacity,SPE processing capacity,data volume limit,real-time data analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络