Approximate Search for Keywords in Handwritten Text Images

DOCUMENT ANALYSIS SYSTEMS, DAS 2022(2022)

引用 0|浏览4
暂无评分
摘要
Thanks to the ability to deal with the intrinsic uncertainty of handwritten text in historical documents, Probabilistic Indexing (PrIx) has emerged as an alternative to traditional automatic transcription to retrieve information from such documents. Using PrIx, adequate search techniques have been developed that not only allow for typical single-word queries, but also support complex multi-word boolean and word-sequence queries, which are commonly used in many free-text document search applications today. Here we focus on another type of text-image PrIx-based queries; namely approximate (or "fuzzy") word spelling, also commonly provided by many conventional plain-text search tools. When handwritten historical documents are considered, approximate spelling has proved to be a remarkably useful search asset in practice. However, its performance had not been formally assessed so far. We explain how approximate-spelling has been developed for large-scale PrIx's and provide an empirical analysis of precision-recall performance and computational efficiency. Experiments with the well-known "Bentham Papers" large manuscript collection show that the proposed approximate-spelling search techniques generally improve the already good search accuracy of exact-spelling queries, while computing performance gracefully scales up to deal with very large collections of handwritten text images.
更多
查看译文
关键词
Handwritten text processing, Keyword spotting, Approximate search, Levenshtein distance, Information search and retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要