Factor graphs for relational regression

Factor graphs for relational regression(2008)

引用 23|浏览22
暂无评分
摘要
Traditional methods for supervised learning treat the input data as a set of independent and identically distributed points in a high-dimensional space. These methods completely ignore the rich underlying relational structure that might be inherent in many important problems. For instance, the data samples may be related to each other in ways such that the unknown variables associated with any sample not only depends on its individual attributes, but also depends on the variables associated with related samples. One regression problem of this nature, whose importance is emphasized by the present economic crises, is understanding real estate prices. The price of a house clearly depends on its individual attributes, such as, the number of bedrooms. However, the price also depends on the neighborhood in which the house lies and on the time period in which it was sold. This effect of neighborhood and time on the price is not directly measurable. It is merely reflected in the prices of other houses in the vicinity that were sold around the same time period. Uncovering and using these spatio-temporal dependencies can certainly help better understand house prices, while at the same time improving prediction accuracy. The models used to achieve this task fall in the class of Statistical Relational Learning. The underlying probabilistic graphical model takes as input a single instance of the entire collection of samples along with their relationship structure. The dependencies among samples is learnt with the help of parameter sharing and collective inference. The drawback of most such models proposed so far is that they cater only to classification problems. To this end, we propose a relational factor graph framework for doing regression in relational data. A single factor graph is used to capture, one, dependencies among individual variables of data points, and two, dependencies among variables associated with multiple data points. The proposed models are capable of capturing hidden inter-sample dependencies via latent variables. They also allow for log-likelihood functions that are non-linear in parameter space thereby allowing for considerably more complex architectures. Efficient inference and learning algorithms are proposed. The models are applied to predicting the prices of real estate properties. A by-product of it is a house price index. The relational aspect of the model accounts for the hidden spatio-temporal influences on the price of every house. The experiments show that one can achieve considerably superior performance by identifying and using the underlying spatio-temporal structure associated with the problem. To the best of our knowledge this is the first work in the direction of relational regression, especially in the frame-based class of statistical relational learning models. Furthermore, this is also the first work in constructing house price indices by simultaneously accounting for the spatio-temporal effects on house prices using large-scale industry standard data set.
更多
查看译文
关键词
large-scale industry standard data,relational regression,input data,multiple data point,data sample,time period,factor graph,data point,house lie,individual attribute,house price index,house price
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要