Self-Supervised Pre-Training for Table Structure Recognition Transformer
CoRR(2024)
摘要
Table structure recognition (TSR) aims to convert tabular images into a
machine-readable format. Although hybrid convolutional neural network
(CNN)-transformer architecture is widely used in existing approaches, linear
projection transformer has outperformed the hybrid architecture in numerous
vision tasks due to its simplicity and efficiency. However, existing research
has demonstrated that a direct replacement of CNN backbone with linear
projection leads to a marked performance drop. In this work, we resolve the
issue by proposing a self-supervised pre-training (SSP) method for TSR
transformers. We discover that the performance gap between the linear
projection transformer and the hybrid CNN-transformer can be mitigated by SSP
of the visual encoder in the TSR model. We conducted reproducible ablation
studies and open-sourced our code at https://github.com/poloclub/unitable to
enhance transparency, inspire innovations, and facilitate fair comparisons in
our domain as tables are a promising modality for representation learning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要