RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization
arxiv(2024)
摘要
Large ground-truth datasets and recent advances in deep learning techniques
have been useful for layout detection. However, because of the restricted
layout diversity of these datasets, training on them requires a sizable number
of annotated instances, which is both expensive and time-consuming. As a
result, differences between the source and target domains may significantly
impact how well these models function. To solve this problem, domain adaptation
approaches have been developed that use a small quantity of labeled data to
adjust the model to the target domain. In this research, we introduced a
synthetic document dataset called RanLayNet, enriched with automatically
assigned labels denoting spatial positions, ranges, and types of layout
elements. The primary aim of this endeavor is to develop a versatile dataset
capable of training models with robustness and adaptability to diverse document
formats. Through empirical experimentation, we demonstrate that a deep layout
identification model trained on our dataset exhibits enhanced performance
compared to a model trained solely on actual documents. Moreover, we conduct a
comparative analysis by fine-tuning inference models using both PubLayNet and
IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that
models enriched with our dataset are optimal for tasks such as achieving 0.398
and 0.588 mAP95 score in the scientific document domain for the TABLE class.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要