Indexing for a Digital Library of George Washington's Manuscripts: A Study of Word Matching Techniques

msra(2002)

引用 59|浏览5
暂无评分
摘要
In a multimedia world, one would like electronic access to all kinds of information. But a lot of important information still only exists on paper and it is a challenge to eciently ac- cess or navigate this information even if it is scanned in. The previously proposed \word spotting" idea is an approach for accessing and navigating a collection of handwritten docu- ments available as images using an index automatically gen- erated by matching words as pictures. The most dicult task in solving this problem is the matching of word im- ages. The quality of the aged documents and the variations in handwriting make this a challenging problem. Here we present a number of word matching techniques along with new normalization methods that are crucial for their success. Ecient pruning techniques, which quickly reduce the set of possible matches for a given word, are also discussed. Our results show that the best of the discussed matching algo- rithms achieves an average precision of 73% for documents of reasonable quality.
更多
查看译文
关键词
multimedia content analysis,audio/image/video processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要