Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning

WWW '20: The Web Conference 2020 Taipei Taiwan April, 2020(2020)

引用 10|浏览123
暂无评分
摘要
Data Science has been one of the most popular fields in higher education and research activities. It takes tons of time to read the experimental section of thousands of papers and figure out the performance of the data science techniques. In this work, we build an experimental evidence extraction system to automate the integration of tables (in the paper PDFs) into a database of experimental results. First, it crops the tables and recognizes the templates. Second, it classifies the column names and row names into “method”, “dataset”, or “evaluation metric”, and then unified all the table cells into (method, dataset, metric, score)-quadruples. We propose hybrid features including structural and semantic table features as well as an ensemble learning approach for column/row name classification and table unification. SQL statements can be used to answer questions such as whether a method is the state-of-the-art or whether the reported numbers are conflicting.
更多
查看译文
关键词
Data Science Education, PDF Tables, Information Extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要