Gaussian distribution resampling via Chebyshev distance for food computing

APPLIED SOFT COMPUTING(2024)

引用 0|浏览2
暂无评分
摘要
The problem of data imbalance often occurs in the real-world food domain. Traditional classification algorithms are prone to overfitting on imbalanced datasets, and the decision surface will be biased toward majority-class samples, making it difficult to identify minority-class samples. Although previous resampling techniques can deal with the imbalance problem by balancing the dataset, they may produce class overlap because the anchor samples are not appropriately selected and the generated dataset does not conform to the original distribution. This paper proposes an adaptive resampling technique based on Gaussian distribution oversampling combined with random undersampling (GDRS) to address the abovementioned problems. The technique is based on the Chebyshev distance combining the weight information of the minority-class samples to select a suitable anchor sample. A new dataset conforming to the original distribution is generated in the form of a Gaussian distribution around the anchor sample. Then the random undersampling technique is combined to reduce the possibility of overfitting. The technique is applied to five UCI datasets and compared with seven imbalanced learning methods. The experimental results demonstrate that our method GDRS yields optimal performance. We also validate the effectiveness of our method in dealing with real dairy datasets with different imbalance ratios, which is prom-ising for application in the food field.
更多
查看译文
关键词
Food computing,Imbalanced learning,Gaussian distribution oversampling,Random undersampling,Chebyshev distance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要