Reliable Information Access Final Workshop Report

msra（2004）

引用 37|浏览10

暂无评分

摘要

For many years the standard approach to question answering, or searching for information, has involved information retrieval systems. These systems were initially Boolean systems, requiring major effort from the users, but new systems allow natural language input for questions and return ranked lists of documents. Whereas the question answering (QA) task has evolved beyond just the ranked list approach to answers, the QA systems depend on the information retrieval (IR) technology at two different stages. First, the IR technology is needed to make the initial cut at finding the information from many gigabytes of text. Most of the QA systems use heavy amounts of NLP technology and therefore use IR technology to narrow the pool of potential information to 100 or 200 documents before doing more intensive processing. But a second, and equally important, need of IR systems is to provide a fall back position for questions that are beyond the current abilities of the QA systems. Having a ranked list of documents is clearly better than having nothing! Current IR systems generally provide a set of reasonable results for the QA systems to work with. However results from these systems are unpredictable in that there are occasional failures. Even more important for the AQUAINT program, these systems are unpredictable to the system researchers; that is, the systems cannot reliably customize output on a per question basis. This leads to lower precision in the top set of documents for some questions, and radical variance in performance using different retrieval technologies for the same input question. Even so, the current statistical approaches to IR have shown themselves to be effective and reliable in both research and commercial settings. Almost by definition, statistical IR focuses on the matching of word occurrences in topics and documents rather than on semantically understanding the texts. Given that, improvements in statistical IR must come from either re-weighting the importance of existing word matches, or from expanding the texts and adding new words that can be matched. Thus query expansion has been a central focus of statistical IR throughout its research history. Experimental environments such as TREC show that retrieval results vary widely according to both topic and system. This is true for both the basic IR systems and for any of the more advanced implementations using, for example, query expansion. Some retrieval approaches work well on one topic but poorly on a second, while other approaches may work poorly on the first topic, but succeed on the second. If we could determine in advance which approach would work well, then a dual approach could strongly improve performance. Unfortunately, despite many efforts no one knows how to choose good approaches on a per topic basis. The major problem in understanding retrieval variability is that the variability is due to a number of factors. There are topic factors due to the topic statement itself and to the relationship of the topic to the document collection as a whole, and then there are system dependent factors including the approach algorithm and implementation details. In general, any particular researcher is working with only one system and thus finds it very difficult to separate out the topic variability factors from the system variability. Thus the causes of retrieval variability are poorly understood. For most IR algorithms, we do not sufficiently understand the reasons for retrieval variability of that algorithm well enough to be able to predict whether the algorithm will succeed or fail on a topic. This in turn means we don’t understand the basic characteristics of our algorithms. For example, query expansion works well on average, but is it working because the expansion adds

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要