WeChat Mini Program
Old Version Features

“Garbage Bags Full of Files”: Exploring Sociotechnical Perceptions of Formats Within the Recovery and Reuse of Scientific Data

Proceedings Of The Association For Information Science And Technology(2023)

University of Maryland USA

Cited 0|Views3
Abstract
ABSTRACTThis paper explores the social and technical perceptions of physical and digital formats as they relate to work in the recovery and reuse of scientific data, specifically historical, archival, and defunct data sources. Proprietary and obsolete formats, or formats that need significant transformation work, stand out as central challenges for scientists and data curators who are recovering reusable data from archival or legacy data sources. The challenges confronting data sharing and reuse of contemporary scientific data are already known to be myriad; formats often pose a major, compounding challenge to retrospective data curation research and practice. Based on 23 qualitative interviews with practitioners conducting data recovery and reuse, ranging from marine biologists to data librarians, we study how they understand, engage with, and utilize formats within their data curation work. This paper enumerates the formats deployed throughout the scientific data curation process and explores how practitioners creating and curating scientific data based on historical and archival materials encounter, make sense of, and utilize formats. The paper focuses on practitioner perceptions of formats around the following themes: how practitioners' historical relationships to certain challenging formats inform their ongoing curation practices; the importance of contexts in prioritizing or ignoring formats within scientific curation work; and how formats reveal larger sociotechnical issues. The paper concludes by with practical and theoretical implications of navigating formats within the recovery and reuse of scientific data and offers suggestions for reconfiguring formats within broader data curation lifecycles.
More
Translated text
求助PDF
上传PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Upload PDF to Generate Summary
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers
David S. H. Rosenthal
2010

被引用38 | 浏览

Julian Vearncombe,Angela Riganti, David Isles, Sian Bright
2017

被引用9 | 浏览

Dave Rice, Annie Schweikert
2019

被引用2 | 浏览

Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文探讨了科研数据恢复与重用过程中,科研人员和数据管理者对物理和数字格式的社会和技术认知,以及这些认知如何影响数据保存和利用。

方法】:作者通过对23名从事数据恢复和重用工作的从业者进行定性访谈,分析他们如何理解、处理和利用各种数据格式。

实验】:研究基于对海洋生物学家到数据图书馆员等不同背景从业者的访谈,未特指具体的数据集名称,重点在于从业者如何处理历史和存档数据中的格式问题,并得出格式问题在科学数据保存和重用中的实践和理论意义。