A Quantitative Comparison of Semantic Web Page Segmentation Approaches

ICWE 2015 Proceedings of the 15th International Conference on Engineering the Web in the Big Data Era - Volume 9114(2015)

引用 17|浏览32
暂无评分
摘要
We compare three known semantic web page segmentation algorithms, each serving as an example of a particular approach to the problem, and one self-developed algorithm, WebTerrain, that combines two of the approaches. We compare the performance of the four algorithms for a large benchmark of modern websites we have constructed, examining each algorithm for a total of eight configurations. We found that all algorithms performed better on random pages on average than on popular pages, and results are better when running the algorithms on the HTML obtained from the DOM rather than on the plain HTML. Overall there is much room for improvement as we find the best average F-score to be 0.49, indicating that for modern websites currently available algorithms are not yet of practical use.
更多
查看译文
关键词
semantic web
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要