Storage Optimization for Large-Scale Distributed Stream Processing Systems
ACM Transactions on Storage (TOS)(2008)
摘要
We consider storage in an extremely large-scale distrib- uted computer system designed for stream processing ap- plications. In such systems, incoming data and intermedi- ate results may need to be stored to enable future analyses. The quantity of such data would dominate even the largest storage system. Thus, a mechanism is needed to keep the most useful data. One recently introduced approach is to employ retention value functions, which effectively assign each data object a value that changes over time (5). Stor- age space is then reclaimed automatically by deleting data of lowest current value. In such large systems, there will naturally be multiple file systems available, each with dif- ferent properties. Choosing the right file system for a given incoming data stream presents a challenge. In this paper we provide a novel and effective scheme for optimizing the placement of data within a distributed storage subsystem employing retention value functions. The goal is to keep the data of highest overall value, while simultaneously balanc- ing the read load to the file system.
更多查看译文
关键词
system theory,optimization problem,theory,assignment problem,load balancing,simulation experiment,value function,optimization,distributed storage,storage system,load balance,stream processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络