Creating a Tools Ecosystem for Cross-Discipline Environmental Data Reuse

IEEE BigData(2021)

引用 1|浏览6
暂无评分
摘要
Reusing data is difficult even within well-defined science communities and only gets worse when combining data from multiple communities and disciplines. Through the lens of current work on constructing an environmental epidemiological data set from multiple disciplinary sources, we demonstrate the need for a new tool ecosystem to support heterogeneous Big Data science. Extending existing community standards for schemas and/or data formats through human auditing and wrangling of the data is not feasible at scale. This work therefore suggests new approaches for the multi-disciplinary communities to build a shared tool ecosystem for big data. We discuss both the larger context of data wrangling of epidemiological data sets for novel artificial intelligence algorithms and the specific lessons from working with these multi-disciplinary data sets. Adopting a more model-driven, automatable approach promises not only better efficiency but also removes key sources of human-generated errors and promotes reuse and reproducibility of science data.
更多
查看译文
关键词
spatial time-series data,domain-specific modeling,data wrangling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要