Mining pure high-order word associations via information geometry for information retrieval

ACM Trans. Inf. Syst.(2013)

引用 32|浏览24
暂无评分
摘要
The classical bag-of-word models for information retrieval (IR) fail to capture contextual associations between words. In this article, we propose to investigate pure high-order dependence among a number of words forming an unseparable semantic entity, that is, the high-order dependence that cannot be reduced to the random coincidence of lower-order dependencies. We believe that identifying these pure high-order dependence patterns would lead to a better representation of documents and novel retrieval models. Specifically, two formal definitions of pure dependence—unconditional pure dependence (UPD) and conditional pure dependence (CPD)—are defined. The exact decision on UPD and CPD, however, is NP-hard in general. We hence derive and prove the sufficient criteria that entail UPD and CPD, within the well-principled information geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods for extracting word patterns with pure high-order dependence. Our methods are applied to and extensively evaluated on three typical IR tasks: text classification and text retrieval without and with query expansion.
更多
查看译文
关键词
conditional pure dependence,information retrieval,pure high-order dependence pattern,pure dependence,pure high-order dependence,high-order dependence,cpd identification procedure,pure high-order word association,feasible upd,information geometry,unconditional pure dependence,novel retrieval model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要