A Hybrid Probabilistic Approach For Table Understanding

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE(2021)

引用 14|浏览170
暂无评分
摘要
Tables of data are used to record vast amounts of socioeconomic, scientific, and governmental information. Although humans create tables using underlying organizational principles, unfortunately AI systems struggle to understand the contents of these tables. This paper introduces an end-to-end system for table understanding, the process of capturing the relational structure of data in tables. We introduce models that identify cell types, group these cells into blocks of data that serve a similar functional role, and predict the relationships between these blocks. We introduce a hybrid, neuro-symbolic approach, combining embedded representations learned from thousands of tables with probabilistic constraints that capture regularities in how humans organize tables. Our neurosymbolic model is better able to capture positional invariants of headers and enforce homogeneity of data types. One limitation in this research area is the lack of rich datasets for evaluating end-to-end table understanding, so we introduce a new benchmark dataset comprised of 431 diverse tables from data.gov. The evaluation results show that our system achieves the state-of-the-art performance on cell type classification, block identification, and relationship prediction, improving over prior efforts by up to 7% of macro F1 score.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要