DOM-based keyword extraction from web pages

Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing(2019)

引用 6|浏览25
暂无评分
摘要
We present D-rank, an unsupervised, language and domain independent method for automatically extracting keywords from a single web page. The method does not use any corpus, and relies only on the information and features on the web page including page URL, word frequency, title, hyperlinks, and headers, which are extracted from DOM tree of the page. Different scores are assigned to the words according to their importance that is specified by their positions in the web page. Experimental results on web pages in three different languages show the effectiveness of the proposed method.
更多
查看译文
关键词
DOM structure, keyword extraction, language independent, unsupervised, web page
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要