From Program Chains to Exploratory Workflows: PopinSnake for Genomic Insertion Detection

2023 IEEE 19th International Conference on e-Science (e-Science)(2023)

引用 0|浏览4
暂无评分
摘要
Scientific data analysis is often an exploratory process, so that it is imperative to trace the precise sequencing and configuration of analysis steps in order to ensure reproducibility of scientific results. In addition, responsible use of computational resources and the avoidance of redundant computation are crucial when dealing with large datasets and complex analysis steps. By using a workflow engine, one may address the need for reproducibility and resource efficiency and also benefit from increased portability to various hardware infrastructures and software environments. However, traditional analysis pipelines that are implemented as chains of program calls do not meet the prerequisites of an exploratory workflow. In this paper, we focus on the question of how to transform an existing data analysis program into an exploratory workflow that can be executed using a common workflow engine and that also includes means to explore intermediate results and act upon them in an interactive manner. Specifically, we exemplify the transformation process using the genomic variation detection program PopIns, which we transform into an exploratory workflow through modularization of the functionality, its migration to the Snakemake workflow engine, and the integration of user interaction features. By combining a highly automated workflow with possibilities of guided user exploration, we are able to reduce the Time-To-Insights on the analysis. We further report on lessons learned from this transformation, thereby providing a generalization from the specific case. This way, we provide some general guidance on the transformation of static analysis programs into exploratory workflows, thereby making data analysis more accessible and user efficient.
更多
查看译文
关键词
Exploratory Workflow,Interactive Data Analysis,Structural Variant Detection,Non-Reference Sequence Variants
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要