Distilling knowledge of bidirectional language model for scene text recognition

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP(2023)

引用 0|浏览0
暂无评分
摘要
This paper proposes a knowledge distillation method for an external bidirectional language model trained by masked language modeling to achieve high accuracy in scene text recognition. In Asian languages such as Japanese, it is necessary to perform text recognition in units of multiple words or sentences rather than individual words because words are not separated by spaces, and so high-level linguistic knowledge is needed to recognize text correctly. To enhance linguistic knowledge, several methods that use an external language model have been proposed, but these methods fail to consider future context well in performing text recognition because they revise the text candidates yielded by autoregressive text recognition models, which consider mainly past context. To overcome this deficiency, our key idea is to enhance a text recognition model by utilizing knowledge of an external bidirectional language model trained by masked language modeling, which reflects not only past but also future context. So as to actively consider future context in text recognition, our proposed method introduces a distillation loss term that makes the output probability of the text recognition model closer to that of the bidirectional language model. Experiments on Japanese scene text recognition demonstrate the effectiveness of the proposed method.
更多
查看译文
关键词
Scene text recognition,language model,knowledge distillation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要