Sub Node Extraction with Tree Based Wrappers

ECAI(2008)

引用 2|浏览11
暂无评分
摘要
String based as well as tree based methods have been used to learn wrappers for extraction from semi-structured documents (e.g., HTML documents). Previous work has shown that tree based approaches perform better while needing less examples than string based approaches. A disadvantage is that they can only extract complete text nodes, whereas string based approaches can extract within text nodes. This paper proposes a hybrid approach that combines the advantages of both systems and compares it experimentally with a string based approach on some sub node extraction tasks.
更多
查看译文
关键词
complete text node,hybrid approach,html document,sub node extraction,sub node extraction task,semi-structured document,previous work,text node
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要