Parallel Data Layout Optimization of Scientific Data through Access-driven Replication

semanticscholar(2014)

引用 3|浏览1
暂无评分
摘要
Efficient I/O on large-scale spatio-temporal scientific data requires scrutiny of both the logical layout of the data (e.g., row-major vs. column-major) and the physical layout (e.g., distribution on parallel filesystems). For increasingly complex datasets, hand optimization is a difficult matter prone to error and not scalable to the increasing heterogeneity of analysis workloads. Given these factors, we present a partial data replication system called RADAR. We capture datatypeand collectiveaware I/O access patterns (indicating logical access) via MPIIO tracing and use a combination of coarse-grained and finegrained performance modeling to evaluate and select optimized physical data distributions for the task at hand. Compared with existing methods, we store all replica data and metadata, along with the original, untouched data, under a single file container using the object abstraction in parallel filesystems. Our system can produce up to many-fold improvements in commonly used subvolume decomposition access patterns, while the modeling approach is capable of determining whether such optimizations should be undertaken in the first place.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要