EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT
arxiv(2023)
摘要
We present a complete software pipeline for revealing the hidden texts of the
Herculaneum papyri using X-ray CT images. This enhanced virtual unwrapping
pipeline combines machine learning with a novel geometric framework linking 3D
and 2D images. We also present EduceLab-Scrolls, a comprehensive open dataset
representing two decades of research effort on this problem. EduceLab-Scrolls
contains a set of volumetric X-ray CT images of both small fragments and
intact, rolled scrolls. The dataset also contains 2D image labels that are used
in the supervised training of an ink detection model. Labeling is enabled by
aligning spectral photography of scroll fragments with X-ray CT images of the
same fragments, thus creating a machine-learnable mapping between image spaces
and modalities. This alignment permits supervised learning for the detection of
"invisible" carbon ink in X-ray CT, a task that is "impossible" even for human
expert labelers. To our knowledge, this is the first aligned dataset of its
kind and is the largest dataset ever released in the heritage domain. Our
method is capable of revealing accurate lines of text on scroll fragments with
known ground truth. Revealed text is verified using visual confirmation,
quantitative image metrics, and scholarly review. EduceLab-Scrolls has also
enabled the discovery, for the first time, of hidden texts from the Herculaneum
papyri, which we present here. We anticipate that the EduceLab-Scrolls dataset
will generate more textual discovery as research continues.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要