Modelling Anchor Text Retrieval in Book Search based on Back-of-Book Index

msra(2008)

引用 29|浏览13
暂无评分
摘要
This paper proposes a probabilistic logic abstraction for modelling tf-boosting approaches to anchor text retrieval, adapted for the task of page-search in books. The underlying idea is to view the back- of-book index (BoBI) as a list of anchors pointing to pages in the book. First, we model the direct application of hypertext-based tf- boosting to books and show that this naive method of propagating anchor-text from the BoBI does not deliver the desired tf-boosting effect. To address this, we then propose a revised anchor-text re- trieval model based on a novel voter approach. In this approach, each page of the book, where a given term occurs, acts as a virtual voter to the pages referenced by the BoBI for that term. The tf- boosting effect is achieved by propagating term weights from the voter pages to the pages in the BoBI. We use probabilistic Datalog for the high-level abstract modelling of retrieval strategies, which allows for the evolution and transfer of successful techniques from one domain, such as anchor-text retrieval in Web IR, to a similar domain, such as book search.
更多
查看译文
关键词
probabilistic logic,anchor text,indexation,information retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要