Modelling Anchor Text Retrieval in Book Search based on Back-of-Book Index


引用 29|浏览13
This paper proposes a probabilistic logic abstraction for modelling tf-boosting approaches to anchor text retrieval, adapted for the task of page-search in books. The underlying idea is to view the back- of-book index (BoBI) as a list of anchors pointing to pages in the book. First, we model the direct application of hypertext-based tf- boosting to books and show that this naive method of propagating anchor-text from the BoBI does not deliver the desired tf-boosting effect. To address this, we then propose a revised anchor-text re- trieval model based on a novel voter approach. In this approach, each page of the book, where a given term occurs, acts as a virtual voter to the pages referenced by the BoBI for that term. The tf- boosting effect is achieved by propagating term weights from the voter pages to the pages in the BoBI. We use probabilistic Datalog for the high-level abstract modelling of retrieval strategies, which allows for the evolution and transfer of successful techniques from one domain, such as anchor-text retrieval in Web IR, to a similar domain, such as book search.
probabilistic logic,anchor text,indexation,information retrieval
AI 理解论文
Chat Paper