Learning table extraction from examples

COLING '04 Proceedings of the 20th international conference on Computational Linguistics(2004)

引用 100|浏览0
暂无评分
摘要
Information extraction from tables in web pages is a challenging problem due to the diverse nature of table formats and the vocabulary variants in attribute names. This paper presents a new approach to automated table extraction that exploits formatting cues in semi-structured HTML tables, learns lexical variants from training examples and uses a vector space model to deal with non-exact matches among labels. We conducted experiments with this method on a set of tables collected from 157 university web sites, and obtained the information extraction performance of 91.4% in the Fl-measure, showing the effectiveness of the combined use of structural table parsing and example-based label learning.
更多
查看译文
关键词
automated table extraction,web page,attribute name,table format,structural table parsing,university web site,information extraction performance,challenging problem,semi-structured html table,information extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要