TFSFB: Two-stage feature selection via fusing fuzzy multi-neighborhood rough set with binary whale optimization for imbalanced data

Inf. Fusion(2023)

引用 15|浏览11
暂无评分
摘要
Obtaining informative features is crucial in imbalanced classification. However, existing neighborhood rough set-based feature selection approaches easily overlook the diversity and complexity of data distributions, and it is difficult to obtain this global optimal feature subset from imbalanced and high-dimensional datasets. To tackle these limitations, we construct a new two-stage feature subset selection scheme by fusing the fuzzy multi-neighborhood rough set (FMRS) with the binary whale optimization algorithm (BWOA) for imbalanced data. First, to evaluate those distributions of different features, this standard deviation coefficient is introduced to construct a fuzzy multi-neighborhood radius set. Then, the fuzzy multi-neighborhood granule and fuzzy mem-bership degree are presented to establish the novel FMRS, and the feature significance measure in the view of algebra is developed to balance the approximate properties and influences of different features in the negative and positive classes. Second, fuzzy multi-neighborhood conditional entropy is defined to maximize information quantity of class-imbalanced data from the information view, and then by fusing the two evaluation perspectives above, this mixed metric is provided to fully assess this uncertainty of class-imbalanced datasets. These internal and external significant metrics are designed to obtain this preselected candidate set of features based on the filter FMRS model at this first stage. Third, a control factor can be developed to dominate the whale position update, and a novel fitness function will be constructed when fusing the dependency degree and entropy measure with the reduction ratio to evaluate this optimal subset of features. Adopting population partitioning and local interference schemes can prevent the BWOA from becoming trapped within a local optimum. To reduce this search space of evolution, the dynamic bitmask is used to improve the BWOA, and then an optimal subset of features is acquired through continuous iterations of this wrapper BWOA at this second stage. Finally, a new two-stage algorithm for feature subset selection by fusing FMRS and BWOA is provided to process class-imbalanced data, where this particle swarm optimization algorithm confirms those optimized parameters. Experiments on 31 datasets show that our algorithm is efficient and can achieve excellent classification efficiency for binary and multiclass imbalanced data.
更多
查看译文
关键词
Feature selection,Fuzzy multi-neighborhood,Feature significance,Binary whale optimization,Imbalanced classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要