Scene Text Segmentation via Multi-Task Cascade Transformer With Paired Data Synthesis.

Quang-Vinh Dang,Guee-Sang Lee

IEEE Access(2023)

引用 1|浏览0
暂无评分
摘要
The scene text segmentation task provides a wide range of practical applications. However, the number of images in the available datasets for scene text segmentation is not large enough to effectively train deep learning-based models, leading to limited performance. To solve this problem, we employ paired data generation to secure sufficient data samples for text segmentation via Text Image-conditional GANs. Furthermore, existing models implicitly model text attributes such as size, layout, font, and structure, which hinders their performance. To remedy this, we propose a Multi-task Cascade Transformer network that explicitly learns these attributes using large volumes of generated synthetic data. The transformer-based network includes two auxiliary tasks and one main task for text segmentation. The auxiliary tasks help the network learn text regions to focus on, as well as the structure of the text through different words and fonts, to support the main task. To bridge the gap between different datasets, we train the proposed network on paired synthetic data before fine-tuning it on real data. Our experiments on publicly available scene text segmentation datasets show that our method outperforms existing methods.
更多
查看译文
关键词
segmentation,text,data synthesis,multi-task
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要