ViSA: Visual and Semantic Alignment for Robust Scene Text Recognition.

ICDAR (2)(2023)

引用 0|浏览49
暂无评分
摘要
Unsupervised domain adaptation is studied to meet the challenge of scene text recognition in diverse scenarios. Existing methods try to align source and target domain at the image or character level. However, these approaches are somewhat coarse-grained as they involve irrelevant information or ignore category attributes. To address the above issues, we propose a novel Vi sual and S emantic A lignment (ViSA) method to reduce the domain shifts in the high-frequency domain and category space. Specifically, the high-frequency domain alignment extracts the high-frequency components of global visual features, which allows the domain classifier to focus on the text-relevant features. Furthermore, the category space alignment is introduced to align character features at the category level. In the category space alignment, cross-domain contrastive learning and prototype-consistency matching are adopted to minimize the distance between domains. ViSA is flexible to be plugged into various existing recognizers. In addition, ViSA can be conducted in training and dropped in evaluation, which means no impact on inference speed. Adequate experiments verify the superiority of each module of ViSA, and our method achieves state-of-the-art results on several benchmarks, such as SVT, IC13 and IC15.
更多
查看译文
关键词
recognition,semantic alignment,text,scene
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要