Facilitating the sharing of electrophysiology data analysis results through in-depth provenance capture
arXiv (Cornell University)(2023)
摘要
Scientific research demands reproducibility and transparency, particularly in
data-intensive fields like electrophysiology. Electrophysiology data is
typically analyzed using scripts that generate output files, including figures.
Handling these results poses several challenges due to the complexity and
interactivity of the analysis process. These stem from the difficulty to
discern the analysis steps, parameters, and data flow from the results, making
knowledge transfer and findability challenging in collaborative settings.
Provenance information tracks data lineage and processes applied to it, and
provenance capture during the execution of an analysis script can address those
challenges. We present Alpaca (Automated Lightweight Provenance Capture), a
tool that captures fine-grained provenance information with minimal user
intervention when running data analysis pipelines implemented in Python
scripts. Alpaca records inputs, outputs, and function parameters and structures
information according to the W3C PROV standard. We demonstrate the tool using a
realistic use case involving multichannel local field potential recordings of a
neurophysiological experiment, highlighting how the tool makes result details
known in a standardized manner in order to address the challenges of the
analysis process. Ultimately, using Alpaca will help to represent results
according to the FAIR principles, which will improve research reproducibility
and facilitate sharing the results of data analyses.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要