“Garbage Bags Full of Files”: Exploring Sociotechnical Perceptions of Formats Within the Recovery and Reuse of Scientific Data
Proceedings Of The Association For Information Science And Technology(2023)
University of Maryland USA
Abstract
ABSTRACTThis paper explores the social and technical perceptions of physical and digital formats as they relate to work in the recovery and reuse of scientific data, specifically historical, archival, and defunct data sources. Proprietary and obsolete formats, or formats that need significant transformation work, stand out as central challenges for scientists and data curators who are recovering reusable data from archival or legacy data sources. The challenges confronting data sharing and reuse of contemporary scientific data are already known to be myriad; formats often pose a major, compounding challenge to retrospective data curation research and practice. Based on 23 qualitative interviews with practitioners conducting data recovery and reuse, ranging from marine biologists to data librarians, we study how they understand, engage with, and utilize formats within their data curation work. This paper enumerates the formats deployed throughout the scientific data curation process and explores how practitioners creating and curating scientific data based on historical and archival materials encounter, make sense of, and utilize formats. The paper focuses on practitioner perceptions of formats around the following themes: how practitioners' historical relationships to certain challenging formats inform their ongoing curation practices; the importance of contexts in prioritizing or ignoring formats within scientific curation work; and how formats reveal larger sociotechnical issues. The paper concludes by with practical and theoretical implications of navigating formats within the recovery and reuse of scientific data and offers suggestions for reconfiguring formats within broader data curation lifecycles.
MoreTranslated text
求助PDF
上传PDF
View via Publisher
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
- Pretraining has recently greatly promoted the development of natural language processing (NLP)
- We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
- We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
- The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
- Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Upload PDF to Generate Summary
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Related Papers
2015
被引用5 | 浏览
2013
被引用10 | 浏览
2012
被引用4 | 浏览
2011
被引用38 | 浏览
2016
被引用13 | 浏览
2022
被引用12 | 浏览
2023
被引用20 | 浏览
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper