Data Provenance Hybridization Supporting Extreme-Scale Scientific Workflowapplications

2016 NEW YORK SCIENTIFIC DATA SUMMIT (NYSDS)(2016)

引用 6|浏览5
暂无评分
摘要
As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs. To that end, the US Department of Energy Integrated end-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project is currently investigating an integrated approach to prediction and diagnosis of these extreme-scale scientific workflows. To gain insight and a more quantitative understanding of a workflow's performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow's execution. In this paper, we describe IPPD's provenance management solution (ProvEn) and its hybrid data store combining both of these data provenance perspectives. We discuss design and implementation details that include provenance disclosure, scalability, data integration, and a discussion on query and analysis capabilities. We also present use case examples for climate modeling and thermal modeling application domains.
更多
查看译文
关键词
provenance, scientific workflow, performance, ontology, real-time, scalability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要