Noisy Data Set Identification
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS(2013)
摘要
Real data are often corrupted by noise, which can be provenient from errors in data collection, storage and processing. The presence of noise hampers the induction of Machine Learning models from data, which can have their predictive or descriptive performance impaired, while also making the training time longer. Moreover, these models can be overly complex in order to accomodate such errors. Thus, the identification and reduction of noise in a data set may benefit the learning process. In this paper, we thereby investigate the use of data complexity measures to identify the presence of noise in a data set. This identification can support the decision regarding the need of the application of noise redution techniques.
更多查看译文
关键词
Noisy data, Noise identification, Data Complexity Measures
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络