Multi-Temperate Logical Data Warehouse Design For Large-Scale Healthcare Data

Bryan Martin,Karen C. Davis

BIG DATA RESEARCH(2021)

引用 0|浏览7
暂无评分
摘要
Modern hardware architectures and advances in database technology are driving increased adoption of logical data warehouses (LDWs) that complement traditional physical data warehousing (PDW) approaches. In contrast to PDW design methodologies that emphasize physical consolidation of all data of interest on a single (perhaps distributed) computing platform, along with early-binding approaches that pre-materialize transformations and changes to the source data, LDW techniques allow for the integration and transformation of data at run-time and typically physically move or modify much less data in advance. In an environment with premium hardware such as multi-temperate storage, the successful design of LDWs depends on replication of high value data to their physical core to maximize spatial locality. Identifying and collocating high value data is a non-trivial task that has not been adequately explored in the context of LDWs in multi-temperate storage systems. In this paper, we gather queries to construct an OLAP workload for use in supporting and evaluating LDW design algorithms for a large healthcare organization. We introduce new algorithms to address the preprocessing of the workload, identification of data clusters to support OLAP queries, and assignment of clusters to appropriate (hot, warm, and cold) storage tiers, allowing the LDW to deliver results more efficiently by covering a higher percentage of its query workload using the fastest storage devices. Any use case involving copying data from sources to tiered storage targets for analytic querying could benefit from the techniques and solutions presented here. (C) 2021 Elsevier Inc. All rights reserved.
更多
查看译文
关键词
Data warehouse design, OLAP workloads, Healthcare data management, Data partitioning algorithms, Logical data warehouses, Columnar databases
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要