Combining Active Sampling with Parameter Estimation and Prediction in Single Networks

Joel J. Pfeiffer,Jennifer Neville,Paul N. Bennett

user-5edf3a5a4c775e09d87cc848（2013）

引用 1|浏览3

暂无评分

摘要

A typical assumption in network classification methods is that the full network is available to both learn the model and apply the model for prediction. Often this assumption is appropriate (publicly visible friendship links in social networks), however in other domains, while the underlying relational structure exists, there may be a cost associated with acquiring the edges. In this preliminary work we explore the problem domain of active sampling – where our goal is to maximize the number of positive (e.g., fraudulent) nodes identified, while simultaneously querying for network structure that is likely to improve estimates. We outline the problem domain formally and discuss five subdomains that are likely to be observed in real world scenarios. For our key finding, we show when the parameter estimates are learned from the distribution of labeled samples they are biased with respect to the parameters for the distribution of unlabeled samples, which negatively impacts the number of positive instances recalled. Additionally, we demonstrate that the estimation of the generative distribution from the labeled samples is also biased.

查看译文

关键词

Problem domain,Sampling (statistics),Estimation theory,Social network,Data mining,Computer science,Generative grammar,Friendship,Network classification,Network structure

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要