IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
CoRR(2023)
摘要
Nowadays, scene text recognition has attracted more and more attention due to
its diverse applications. Most state-of-the-art methods adopt an
encoder-decoder framework with the attention mechanism, autoregressively
generating text from left to right. Despite the convincing performance, this
sequential decoding strategy constrains inference speed. Conversely,
non-autoregressive models provide faster, simultaneous predictions but often
sacrifice accuracy. Although utilizing an explicit language model can improve
performance, it burdens the computational load. Besides, separating linguistic
knowledge from vision information may harm the final prediction. In this paper,
we propose an alternative solution, using a parallel and iterative decoder that
adopts an easy-first decoding strategy. Furthermore, we regard text recognition
as an image-based conditional text generation task and utilize the discrete
diffusion strategy, ensuring exhaustive exploration of bidirectional contextual
information. Extensive experiments demonstrate that the proposed approach
achieves superior results on the benchmark datasets, including both Chinese and
English text images.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要