The YOLO model that still excels in document layout analysis

Qilin Deng,Mayire Ibrayim,Askar Hamdulla, Chunhu Zhang

SIGNAL IMAGE AND VIDEO PROCESSING(2023)

引用 0|浏览0
暂无评分
摘要
Document layout analysis can help people better understand and use the information in a document. However, the diversity of document layouts and considerable variation in aspect ratios among document objects pose significant challenges. In this study, we designed the multi-convolutional deformable separation (MCDS) module as the main structure of the network, using the YOLO model as a baseline. Integration of this module into the backbone and neck layers enhances the image feature extraction process significantly. Moreover, we incorporate ParNet-Attention to direct the network's focus toward document objects through parallel networks, thereby facilitating a more exhaustive feature extraction. To optimize the model's predictive potential, the decouple fusion head is employed within the head layer. This technique leverages multi-scale features based on the decoupled head, thereby enhancing the accuracy of predictions. Our proposed model achieves remarkable performance on three distinct public datasets with varying characteristics, namely ICDAR-POD, PubLayNet, and IIIT-AR-13K. Notably, in ICDAR-POD, both IoU(0.6) and IoU(0.8) achieve the optimal mean Average Precision (mAP), 96.2 and 94.4, respectively.
更多
查看译文
关键词
Document layout analysis,Document object detection,Deep learning,Feature extraction,Multi-scale features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要