Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset

DIGITAL HUMANITIES QUARTERLY(2021)

引用 0|浏览2
暂无评分
摘要
The increasing roles of machine learning and artificial intelligence in the construction of cultural heritage and humanities datasets necessitate critical examination of the myriad biases introduced by machines, algorithms, and the humans who build and deploy them. From image classification to optical character recognition, the effects of decisions ostensibly made by machines compound through the digitization pipeline and redouble in each step, mediating our interactions with digitally-rendered artifacts through the search and discovery process. As a result, scholars within the digital humanities community have begun advocating for the proper contextualization of cultural heritage datasets within the socio-technical systems in which they are created and utilized. One such approach to this contextualization is the data archaeology, a form of humanistic excavation of a dataset that Paul Fyfe defines as "recover[ing] and reconstitut[ing] media objects within their changing ecologies" [Fyfe 2016]. Within critical data studies, this excavation of a dataset - including its construction and mediation via machine learning - has proven to be a capacious approach. However, the data archaeology has yet to be adopted as standard practice among cultural heritage practitioners who produce such datasets with machine learning. In this article, I present a data archaeology of the Library of Congress's Newspaper Navigator dataset, which I created as part of the Library of Congress's Innovator in Residence program [Lee et al. 2020]. The dataset consists of visual content extracted from 16 million historic newspaper pages in the Chronicling America database using machine learning techniques. In this case study, I examine the manifold ways in which a Chronicling America newspaper page is transmuted and decontextualized during its journey from a physical artifact to a series of probabilistic photographs, illustrations, maps, comics, cartoons, headlines, and advertisements in the Newspaper Navigator dataset [Fyfe 2016]. Accordingly, I draw from fields of scholarship including media archaeology, critical data studies, science and technology studies, and the autoethnography throughout. To excavate the Newspaper Navigator dataset, I consider the digitization journeys of four different pages in Black newspapers included in Chronicling America, all of which reproduce the same photograph of W.E.B. Du Bois in an article announcing the launch of The Crisis, the official magazine of the NAACP. In tracing the newspaper pages' journeys, I unpack how each step in the Chronicling America and Newspaper Navigator pipelines, such as the imaging process and the construction of training data, not only imprints bias on the resulting Newspaper Navigator dataset but also propagates the bias through the pipeline via the machine learning algorithms employed. Along the way, I investigate the limitations of the Newspaper Navigator dataset and machine learning techniques more generally as they relate to cultural heritage, with a particular focus on marginalization and erasure via algorithmic bias, which implicitly rewrites the archive itself. In presenting this case study, I argue for the value of the data archaeology as a mechanism for contextualizing and critically examining cultural heritage datasets within the communities that create, release, and utilize them. I offer this autoethnographic investigation of the Newspaper Navigator dataset in the hope that it will be considered not only by users of this dataset in particular but also by digital humanities practitioners and end users of cultural heritage datasets writ large.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要