Establishing the provenance of historical manuscripts with a novel distance measure
Pattern Analysis and Applications(2013)
摘要
The recent digitization of more than 20 million books has been led by initiatives from countries wishing to preserve their cultural heritage and by several commercial endeavors, including the Google Print Library Project. It is expected that within a few years a significant fraction of the world’s books will be online. However, for millions of complete books and tens of millions of loose pages, the provenance of the manuscripts may be completely unknown or disputed, thus denying historians an understanding of the context in which the content was created. In a handful of cases, it may be possible for experts to regain the provenance by examining linguistic, cultural and/or stylistic clues. However, such experts are a rarity and these investigations are time-consuming and expensive. One technique used by experts to establish provenance is the examination of the ornate initial letters appearing in the questioned manuscript. By comparing the initial letters in the manuscript to annotated initial letters whose origin is known, the provenance can be determined. In this work, we show for the first time that we can reproduce this ability with a computer algorithm. We use a recently introduced technique to measure texture similarity and show that it can recognize initial letters with an accuracy that rivals or exceeds human performance. A brute force implementation of this measure would require several months to process a single large book; however, we introduce a novel lower bound that allows us to process the books in hours or minutes.
更多查看译文
关键词
Image similarity,Classification,Clustering,Texture
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络