Contextual information retrieval based on algorithmic information theory and statistical outlier detection

Porto(2008)

引用 10|浏览10
暂无评分
摘要
This work presents an Information Retrieval technique based on algorithmic information theory (using the normalized compression distance), statistical data outlier detection, and a novel database structure. The paper shows how they all can be integrated to retrieve information from generic databases using long text-based queries. Two important problems are addressed. On the one hand, we analyze and tyr to solve the detection of a particular case of false positives: when the distance among two documents is outlyingly low but there is not actual similarity. On the other hand, we propose a way to structure the database such that the similarity distance estimation scales well with the length of the size of the query. All design choices are justified with an experimental evaluation.
更多
查看译文
关键词
information retrieval,information theory,text analysis,algorithmic information theory,contextual information retrieval,generic databases,long text-based queries,statistical data outlier detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要