Automatic Web Information Extraction In The Roadrunner System
Revised Papers from the HUMACS, DASWIS, ECOMO, and DAMA on ER 2001 Workshops(2002)
摘要
This paper presents ROADRUNNER, a research project that aims at developing solutions for automatically extracting data from large HTML data sources. The target of our research are data-intensive Web sites, i.e., HTML-based sites with a fairly complex structure, that publish large amounts of data. The paper describes the top-level software architecture of the ROADRUNNER System, and the novel research challenges posed by the attempt to automate the information extraction process.
更多查看译文
关键词
Automatic Web Information Extraction,ROADRUNNER System
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络