Improving CTC-based Handwritten Chinese Text Recognition with Cross-Modality Knowledge Distillation and Feature Aggregation

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME(2023)

引用 0|浏览47
暂无评分
摘要
Offline handwritten Chinese text recognition (HCTR) models based on connectionist temporal classification (CTC) have recently achieved impressive results. Due to the conditional independence assumption and per-frame prediction characteristics, CTC-based models cannot capture semantic relationships between output tokens and leverage global visual features of characters. To solve these issues, we propose a Cross-Modality knowledge distillation approach that leverages pretrained LM (BERT) to transfer contextual semantic information, and then design a feature aggregation module to dynamically aggregate local and global features. Experimental results on the HCTR datasets (CASIA-HWDB, ICDAR2013, HCCDOC) show that our proposed method can significantly improve the model's performance.
更多
查看译文
关键词
Offline handwritten Chinese text recognition, CTC, Language model, Knowledge Distillation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要