MLOC: Multi-level Layout Optimization Framework for Compressed Scientific Data Exploration with Heterogeneous Access Patterns

Parallel Processing(2012)

引用 33|浏览0
暂无评分
摘要
The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their runtime environments. The growing gap gets exacerbated by exploratory dataâ聙"intensive analytics, such as querying simulation data for regions of interest with multivariate, spatio-temporal constraints. Query-driven data exploration induces heterogeneous access patterns that further stress the performance of the underlying storage system. To partially alleviate the problem, data reduction via compression and multi-resolution data extraction are becoming an integral part of I/O systems. While addressing the data size issue, these techniques introduce yet another mix of access patterns to a heterogeneous set of possibilities. Moreover, how extreme-scale datasets are partitioned into multiple files and organized on a parallel file systems augments to an already combinatorial space of possible access patterns. To address this challenge, we present MLOC, a parallel Multilevel Layout Optimization framework for Compressed scientific spatio-temporal data at extreme scale. MLOC proposes multiple fine-grained data layout optimization kernels that form a generic core from which a broader constellation of such kernels can be organically consolidated to enable an effective data exploration with various combinations of access patterns. Specifically, the kernels are optimized for access patterns induced by (a) queryâ聙"driven multivariate, spatio-temporal constraints, (b) precisionâ聙"driven data analytics, (c) compressionâ聙"driven data reduction, (d) multi-resolution data sampling, and (e) multiâ聙"file data partitioning and organization on a parallel file system. MLOC organizes these optimization kernels within a multiâ聙"level architecture, on which all the levels can be flexibly re-ordered by userâ聙"defined priorities. When tested on queryâ聙"driven exploration of compressed data, MLOC demonstrates a superior performance compared to any state-of-the-art scientific database management technologies.
更多
查看译文
关键词
multi-resolution data,compressed scientific spatio-temporal data,heterogeneous access patterns,driven data analytics,multi-level layout optimization framework,exploratory data,data size issue,driven data reduction,access pattern,query-driven data exploration,data reduction,compressed scientific data exploration,effective data exploration,data compression,distributed databases,kernel,optimization,organizations,sampling methods,layout,throughput,data models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要