Publication Date Prediction Through Reverse Engineering Of The Web

WSDM(2016)

引用 6|浏览31
暂无评分
摘要
In this paper, we focus on one of the most challenging tasks in temporal information retrieval: detection of a web page publication date. The natural approach to this problem is to find the publication date in the HTML body of a page. However, there are two fundamental problems with this approach. First, not all web pages contain the publication dates in their texts. Second, it is hard to distinguish the publication date among all the dates found in the page's text.The approach we suggest in this paper supplements methods of date extraction from the page's text with novel link based methods of dating. Some of our link-based methods are based on a probabilistic model of the Web graph structure evolution, which relies on the publication dates of web pages as on its parameters. We use this model to estimate the publication dates of web pages: based only on the link structure currently observed, we perform a "reverse engineering" to reveal the whole process of the Web's evolution.
更多
查看译文
关键词
Publication dates,web pages,link-based method,likelihood optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要