Quantification in Data Streams: Initial Results

André Gustavo Maletzke,Denis Moreira dos Reis,Gustavo E. A. P. A. Batista

2017 Brazilian Conference on Intelligent Systems (BRACIS)（2017）

引用 12|浏览46

暂无评分

摘要

In the last decades, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant number of methods that can be used to solve problems in diverse tasks, more prominently in classification, prediction, and clustering. However, a relevant task known as quantification has remained largely unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. In this paper, we discuss the relevance and challenges of quantification for data streams and compare how it differs from the batch setting, in which quantification has attracted more attention from the research community. We propose an algorithm to estimate the class distribution in a data stream and frame our algorithm in the active learning framework. In addition, we define two other approaches as baseline and topline strategies for this problem. The experimental results demonstrate that our algorithm has significantly higher quantification accuracy than the baseline and almost as large as the topline while requiring a fraction of the true labels requested by the latter approach.

查看译文

关键词

data stream,quantification,concept drift,verification latency

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要