Label-noise reduction with support vector machines

Pattern Recognition(2012)

引用 37|浏览32
暂无评分
摘要
The problem of detection of label-noise in large datasets is investigated. We consider applications where data are susceptible to label error and a human expert is available to verify a limited number of such labels in order to cleanse the data. We show the support vectors of a Support Vector Machine (SVM) contain almost all of these noisy labels. Therefore, the verification of support vectors allows efficient cleansing of the data. Empirical results are presented for two experiments. In the first experiment, two datasets from the character recognition domain are used and artificial random noise is applied in their labeling. In the second experiment, a large dataset of plankton images, that contains inadvertent human label error, is considered. It is shown that up to 99% of all label-noise from such datasets can be detected by verifying just the support vectors of the SVM classifier.
更多
查看译文
关键词
character recognition,data handling,image classification,support vector machines,SVM classifier,artificial random noise,character recognition domain,data cleansing,human expert,human label error,label-noise reduction,large datasets,noisy labels,plankton images,support vector machines,support vectors verification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要