Effective Multi-Hot Encoding and Classifier for Lightweight Scene Text Recognition with a Large Character Set

IEEE Transactions on Circuits and Systems for Video Technology(2022)

引用 2|浏览11
暂无评分
摘要
Deploying a lightweight deep model for scene text recognition task on mobile devices has great commercial value. However, the conventional softmax-based one-hot classification module becomes a cumbersome obstacle when handling multi-languages or languages with large character set (e.g., Chinese) due to the rapid expansion of model parameters with the number of classes. To this end, we propose an Effective Multi-hot encoding and classification modUle (EMU) for scene text recognition in the scenario of multi-languages or languages with large character set. Specifically, EMU generates a binary multi-hot label for each class with a real-valued sub-network in training stage and produces the prediction by calculating the inner product between the multi-hot code and the multi-hot label. Compared to the softmax-based one-hot classifier, EMU reduces the storage requirement and the time cost in inference stage significantly, retaining similar performance. Furthermore, we design a convolution feature based Lightweight TransFormer to learn the effective features for EMU and consequently develop a lightweight scene text recognition framework, termed Light-Former-EMU. We conduct extensive experiments on seven public English benchmarks and two real-world Chinese challenge benchmarks. Experimental results verify the effectiveness of the proposed EMU and demonstrate the promising performance of the proposed Light-Former-EMU.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要