MixApart: Decoupled Analytics for Shared Storage Systems.

HotStorage'12: Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems(2012)

引用 66|浏览45
暂无评分
摘要
Distributed file systems built for data analytics and enterprise storage systems have very different functionality requirements. For this reason, enabling analytics on enterprise data commonly introduces a separate analytics storage silo. This generates additional costs, and inefficiencies in data management, e.g., whenever data needs to be archived, copied, or migrated across silos. MixApart uses an integrated data caching and scheduling solution to allow MapReduce computations to analyze data stored on enterprise storage systems. The front-end caching layer enables the local storage performance required by data analytics. The shared storage back-end simplifies data management. We evaluate MixApart using a 100-core Amazon EC2 cluster with micro-benchmarks and production workload traces. Our evaluation shows that MixApart provides (i) up to 28% faster performance than the traditional ingest-then-compute workflows used in enterprise IT analytics, and (ii) comparable performance to an ideal Hadoop setup without data ingest, at similar cluster sizes.
更多
查看译文
关键词
data analytics,enterprise storage system,back-end simplifies data management,data management,enterprise data,integrated data,enabling analytics,local storage performance,separate analytics storage silo,shared storage,decoupled analytics,shared storage system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要