Mathematical Variable Detection In Pdf Scientific Documents

INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II(2019)

引用 4|浏览8
暂无评分
摘要
The detection of mathematical expression from PDF documents has been studied and advanced for recent years. In the process, the detection of variables of inline expressions that are represented by alphabetical characters is a challenge. Compared to other components of inline expressions, there are many factors that cause the ambiguities for the detection of variables. In this paper, the error in detecting variables in PDF scientific documents is analytically presented. Novel rules are proposed to improve the accuracy in the detection process. The experimental results on benchmark datasets containing English and Vietnamese documents show the effectiveness of the proposed method. The comparison with existing methods demonstrates the out-performance of the proposed method. Furthermore, pre-trained deep Convolutional Neural Networks are employed and optimized to automatically extract visual features of extracted components from PDF and machine learning algorithms are used to improve the accuracy of the detection.
更多
查看译文
关键词
PDF document analysis, Mathematical expression extraction, Machine learning, Rule-based classification, Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要