Read-Write-Learn: Self-Learning for Handwriting Recognition

Adrian Boteanu, Du Cheng,Serdar Kadioglu

PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2023(2023)

引用 0|浏览4
暂无评分
摘要
Handwriting recognition relies on supervised data for training. Annotations typically include both the written text and the author's identity to facilitate the recognition of a particular style. A large annotation set is required for robust recognition, which is not always available in historical texts and low-annotation languages. To mitigate this challenge, we propose the Read-Write-Learn framework. In this setting, we augment the training process of handwriting recognition with a language model and a handwriting generator. Specifically, in the first reading step, we employ a language model to identify text that is likely detected correctly by the recognition model. Then, in the writing step, we generate more training data in the same writing style. Finally, in the learning step, we use the newly generated data in the same writing style to finetune the recognition model. Our Read-Write-Learn framework allows the recognition model to incrementally converge on the new style. Our experiments on historical handwritten documents demonstrate the benefits of the approach, and we present several examples to showcase improved recognition.
更多
查看译文
关键词
handwriting recognition,handwriting generation,self-learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要