plantR: An R package and workflow for managing species records from biological collections

biorxiv(2023)

引用 5|浏览8
暂无评分
摘要
Species records from biological collections are becoming increasingly available online. This unprecedented availability of records has largely supported recent studies in taxonomy, biogeography, macroecology and biodiversity conservation. Biological collections vary in their documentation and notation standards, which have changed through time. For different reasons, neither collections nor data repositories perform the editing, formatting and standardisation of the data, leaving these tasks to the final users of the species records (e.g. taxonomists, ecologists and conservationists). These tasks are challenging, particularly when working with millions of records from hundreds of biological collections. To help collection curators and final users perform those tasks, we introduce plantR, an open-source package that provides a comprehensive toolbox to manage species records from biological collections. The package is accompanied by the proposal of a reproducible workflow to manage this type of data in taxonomy, ecology and biodiversity conservation. It is implemented in R and designed to handle relatively large datasets as fast as possible. Initially designed to handle plant species records, many of the plantR features also apply to other groups of organisms, given that the data structure is similar. The plantR workflow includes tools to (a) download records from different data repositories, (b) standardise typical fields associated with species records, (c) validate the locality, geographical coordinates, taxonomic nomenclature and species identifications, including the retrieval of duplicates across collections, and (d) summarise and export records, including the construction of species lists with vouchers. Other R packages provide tools to tackle some of the workflow steps described above. But in addition to the new tools and resources related to data standardisation and validation, the greatest strength of plantR is to provide a comprehensive and user-friendly workflow in one single environment, performing all tasks from data retrieval to export. Thus, plantR can help researchers better assess data quality and avoid data leakage in a wide variety of studies using species records.
更多
查看译文
关键词
biodiversity, data cleaning, data download, duplicate records, gazetteer, GBIF, herbarium, taxonomic validation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要