Reformatting web documents via header trees

ACL(2005)

引用 5|浏览2
暂无评分
摘要
We propose a new method for reformatting web documents by extracting semantic structures from web pages. Our approach is to extract trees that describe hierarchical relations in documents. We developed an algorithm for this task by employing the EM algorithm and clustering techniques. Preliminary experiments showed that our approach was more effective than baseline methods.
更多
查看译文
关键词
header tree,web page,reformatting web document,preliminary experiment,clustering technique,new method,hierarchical relation,em algorithm,semantic structure,web pages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要