Improving Retrieval Performance using World-Knowledge Generated Features


引用 0|浏览0
Information Retrieval is the task of retrieving information items (documents, images, videos etc.) most relevant to a given user query. The common approach in textual IR systems is to index and retrieve documents by selecting representative key words and phrases within them, using various statistical, linguistic and semantic methods, and viewing each document as a vector in the vector space defined by these keywords. Once a user submits a query into the system, retrieving and ranking result documents is carried out by representing the user’s query as yet another vector in this vector space and measuring its similarity to that of the target documents. Over the years, many techniques have been developed to improve the performance of information retrieval systems. Among these are document indexing and query reformulation methods, which are used to extract lexical and statistical information from documents to better anticipate and act on user queries, which in turn have grown shorter and more ambiguous as IR systems reached larger and less professional audiences. Such methods were mostly based on analyzing the target documents corpus, and were therefore limited to the information contained in these documents only. The use of external knowledge in these techniques has been sparse and usually limited to domain-specific problems, as very few knowledge resources were available for this purpose, and were either general but limited in their depth, or in-depth but focused on a specific domain.
AI 理解论文
Chat Paper