ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
CVPR 2024(2024)
摘要
In recent years, text-image joint pre-training techniques have shown
promising results in various tasks. However, in Optical Character Recognition
(OCR) tasks, aligning text instances with their corresponding text regions in
images poses a challenge, as it requires effective alignment between text and
OCR-Text (referring to the text in images as OCR-Text to distinguish from the
text in natural language) rather than a holistic understanding of the overall
image content. In this paper, we propose a new pre-training method called
OCR-Text Destylization Modeling (ODM) that transfers diverse styles of text
found in images to a uniform style based on the text prompt. With ODM, we
achieve better alignment between text and OCR-Text and enable pre-trained
models to adapt to the complex and diverse styles of scene text detection and
spotting tasks. Additionally, we have designed a new labeling generation method
specifically for ODM and combined it with our proposed Text-Controller module
to address the challenge of annotation costs in OCR tasks, allowing a larger
amount of unlabeled data to participate in pre-training. Extensive experiments
on multiple public datasets demonstrate that our method significantly improves
performance and outperforms current pre-training methods in scene text
detection and spotting tasks. Code is available at
https://github.com/PriNing/ODM.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要