Statistical Relational Learning at U Penn

msra(2008)

引用 23|浏览38
暂无评分
摘要
We do statistical relational learning by incrementally extracting data from a relational database, and computing features of that data which are then used in a classical discriminative statistical model component. Candidate features for the model are generated by a structured search in the space of relational database queries and selected using statistical information criteria. The structuring of the search space is inspired by techniques in inductive logic programming (ILP), but the use of statistical modeling relaxes the necessity of limiting the search space to logical expressions. We use a rich feature space that includes clusters, which can be generated incrementally and used to augment the basic relational schema. Current areas of research include determining optimal model selection criteria for use in this setting where an infinite sequence of features can be incrementally generated and the use of intelligent search heuristics to focus search on more promising subspaces. A growing number of machine learning applications of high interest involves the analysis of data which is both noisy and is of complex relational structure. This dictates a natural choice in such domains: the use of statistical rather than deterministic modeling and relational rather than propositional representation (Popescul et al., 2002). Classical statistical learners provide powerful modeling component but are often limited to a "flat" file propositional domain representation where potential features are fix ed-size attribute vectors. Often the manual process of preparing such attributes is costly and not obvious when more complex regularities are involved. We are developing a methodology which combines the strengths of classical statistical models with the higher expressivity of features automatically generated from a relational database. Our interest in statistical relational learning developed while working on modeling in CiteSeer1, an online digital library of computer science papers. CiteSeer contains a rich set of relational data, including citation information, the
更多
查看译文
关键词
relational data,machine learning,feature space,search space,relational database,statistical model,statistical relational learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要