Are Attention blocks better than BiLSTM for text recognition?

ICMLT(2023)

引用 0|浏览0
暂无评分
摘要
This paper studies the impact of using Sequential Attention blocks versus Bidirectional Long-Short-Term Memory (BiLSTM) layers for Optical Character Recognition (OCR). The main target is to improve the inference time - specifically on CPU - of state-of-the-art OCRs, with also the additional constraint of being trainable with only a restricted amount of data. While OCR research often focuses on improving recognition accuracy, there has been little emphasis on optimizing processing speed and model weights. In this context, experimental results presented in this paper show the superiority of Attention blocks compared to BiLSTM layers. Attention blocks appear to be up to 5x faster on CPU, while achieving better and similar decoding rates on a typical industrial dataset of identity document text fields and publicly available Scene Text Recognition (STR) datasets, respectively. Also, in addition to being faster and accurate, which was the primary goal, it appears that Attention blocks lead to lighter models.
更多
查看译文
关键词
Optical character recognition,Attention blocks,LSTM,sequence modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要