SAGA: array storage as a DB with support for structural aggregations

SSDBM(2014)

引用 60|浏览34
暂无评分
摘要
In recent years, many Array DBMSs, including SciDB and RasDaMan have emerged to meet the needs of data management applications where the natural structures are the arrays. These systems, like their relational counterparts, involve an expensive data ingestion phase. The paradigm of using native storage as a DB and providing database-like support (e.g., the NoDB approach) has recently been shown to be an effective approach for dealing with infrequently queried data, where data ingestion costs cannot be justified, though only in context of relational data. Applications that generate massive arrays, such as the scientific simulations, often store the data in one of a small number of array storage formats, like NetCDF or HDF5. Thus, a natural question is, \"can database-like functionality be supported over native array storage?\". In this paper, we present algorithms, different partitioning strategies, and an analytical model for supporting structural (grid, sliding, hierarchical, and circular) aggregations over native array storage, and describe implementation of this approach in a system we refer to as Structural AGgregations over Array storage (SAGA). We show how the relative performance of different partitioning strategies changes with varying amount of computation in the aggregation function and different levels of data skew, and our model is effective in choosing the best partitioning strategy. Performance comparison with SciDB shows that despite working on native array storage, the aggregation costs with our system are lower. Finally, we also show that our structural aggregation implementations achieve high parallel efficiency.
更多
查看译文
关键词
array databases,design,experimentation,measurement,performance,scientific databases,statistical databases,structural aggregations,systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要