Classifying imbalanced data in distance-based feature space

Knowledge and Information Systems(2015)

引用 25|浏览76
暂无评分
摘要
Class imbalance is a significant issue in practical classification problems. Important countermeasures, such as re-sampling, instance-weighting, and cost-sensitive learning have been developed, but there are limitations as well as advantages to respective approaches. The synthetic re-sampling methods have wide applicability, but require a vector representation to generate additional instances. The instance-based methods can be applied to distance space data, but are not tractable with regard to a global objective. The cost-sensitive learning can minimize the expected cost given the costs of error, but generally does not extend to nonlinear measures, such as F-measure and area under the curve. In order to address the above shortcomings, this paper proposes a nearest neighbor classification model which employs a class-wise weighting scheme to counteract the class imbalance and a convex optimization technique to learn its weight parameters. As a result, the proposed model maintains the simple instance-based rule for prediction, yet retains a mathematical support for learning to maximize a nonlinear performance measure over the training set. An empirical study is conducted to evaluate the performance of the proposed algorithm on the imbalanced distance space data and make comparison with existing methods.
更多
查看译文
关键词
Class imbalance,Weighted nearest neighbor classifier,Structural classifier
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要