Instance selection for regression by discretization.

Expert Syst. Appl.(2016)

引用 25|浏览28
暂无评分
摘要
A new technique for instance selection and noise filtering for regression is proposed.The method use instance selection for classification after output value discretization.The method is much simpler and more robust to noise than other specifically designed for regression. An important step in building expert and intelligent systems is to obtain the knowledge that they will use. This knowledge can be obtained from experts or, nowadays more often, from machine learning processes applied to large volumes of data. However, for some of these learning processes, if the volume of data is large, the knowledge extraction phase is very slow (or even impossible). Moreover, often the origin of the data sets used for learning are measure processes in which the collected data can contain errors, so the presence of noise in the data is inevitable. It is in such environments where an initial step of noise filtering and reduction of data set size plays a fundamental role. For both tasks, instance selection emerges as a possible solution that has proved to be useful in various fields. In this paper we focus mainly on instance selection for noise removal. In addition, in contrast to most of the existing methods, which applied instance selection to classification tasks (discrete prediction), the proposed approach is used to obtain instance selection methods for regression tasks (prediction of continuous values). The different nature of the value to predict poses an extra difficulty that explains the low number of articles on the subject of instance selection for regression.More specifically the idea used in this article to adapt to regression problems \"classic\" instance-selection algorithms for classification is as simple as the discretization of the numerical output variable. In the experimentation, the proposed method is compared with much more sophisticated methods, specifically designed for regression, and shows to be very competitive.The main contributions of the paper include: (i) a simple way to adapt to regression instance selection algorithms for classification, (ii) the use of this approach to adapt a popular noise filter called ENN (edited nearest neighbor), and (iii) the comparison of this noise filter against two other specifically designed for regression, showing to be very competitive despite its simplicity.
更多
查看译文
关键词
Instance selection,Regression,Mutual information,Noise filtering,Class noise
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要