Mining block correlations to improve storage performance
TOS(2005)
摘要
Block correlations are common semantic patterns in storage systems. They can be exploited for improving the effectiveness of storage caching, prefetching, data layout, and disk scheduling. Unfortunately, information about block correlations is unavailable at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough for discovering block correlations in storage systems.In this article, we propose two algorithms, C-Miner and C-Miner*, that use a data mining technique called frequent sequence mining to discover block correlations in storage systems. Both algorithms run reasonably fast with feasible space requirement, indicating that they are practical for dynamically inferring correlations in a storage system. C-Miner is a direct application of a frequent-sequence mining algorithm with a few modifications; compared with C-Miner, C-Miner* is redesigned for mining block correlations by making concessions for the specific problem of long sequences in storage system traces. Therefore, C-Miner* can discover 7--109% more correlation rules within 2--15 times shorter time than C-Miner. Moreover, we have also evaluated the benefits of block correlation-directed prefetching and data layout through experiments. Our results using real system workloads show that correlation-directed prefetching and data layout can reduce average I/O response time by 12--30% compared to the base case, and 7--25% compared to the commonly used sequential prefetching scheme for most workloads.
更多查看译文
关键词
storage caching,file system,mining methods and algorithms,file system management,correlation-directed prefetching,storage system,storage system trace,block correlations,block correlation,data mining technique,mining block correlation,storage performance,storage system level,data layout,storage management,disk scheduling,data mining,sequence mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络