ArrayBridge: Interweaving Declarative Array Processing in SciDB with Imperative HDF5-Based Programs

2018 IEEE 34th International Conference on Data Engineering (ICDE)(2018)

引用 17|浏览51
暂无评分
摘要
Scientists are increasingly turning to datacenter-scale computers to analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC programs even for the most mundane queries. This impedance mismatch is due to the cumbersome and costly data format conversions that are needed to use scientific data management tools, such as SciDB, in an HPC setting. Our goal is to make declarative array manipulations from SciDB interoperable with imperative, file-centric analyses from HDF5-based programs. This paper describes ArrayBridge, a bi-directional array view mechanism for the HDF5 file format, that allows scientists to use SciDB, TensorFlow and HDF5-based analysis code in the same file-centric pipeline without converting between file formats. In addition to fast querying over HDF5 array objects, ArrayBridge produces arrays in the HDF5 file format as easily as it can read from it. ArrayBridge also supports time travel queries from imperative codes through the unmodified HDF5 API, and automatically deduplicates between versions for space efficiency. Our performance evaluation in a large scientific computing facility shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine and is 3× faster than TileDB.
更多
查看译文
关键词
array processing,in situ processing,SciDB,HDF5
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要