MapReduce accelerated attribute reduction based on neighborhood entropy with Apache Spark

Chuan Luo,Qian Cao,Tianrui Li,Hongmei Chen,Sizhao Wang

Expert Systems with Applications（2023）

引用 5|浏览39

暂无评分

摘要

Attribute reduction is nowadays an extremely important data preprocessing technique in the field of data mining, which has gained much attention due to its ability to provide better generalization performance and learning speed for analysis model. Rough set theory offers a systematic powerful framework for attribute reduction in terms of classificatory and decision abilities under uncertainty. In this paper, we present a parallel neighborhood entropy-based attribute reduction method with neighborhood rough sets that uses the Apache Spark cluster computing model to realize the parallelization of algorithm in a distributed computing environment. In leveraging the horizontal partitioning strategy to alleviate the task of data parallelism, three quantitative measures of attribute sets, i.e., neighborhood approximation accuracy, neighborhood credibility and coverage degrees are parallelized to accelerate the computation of decision neighborhood entropy during the heuristic search iterative process. A novel parallel heuristic attribute reduction algorithm is then developed by employing several operations from Spark API to ease the code parallelization. Extensive experimental results indicate the superiority and notable strengths of the proposed algorithm in terms of the criteria for evaluating parallel performance, i.e., scalability and extensibility.

查看译文

关键词

Attribute reduction,Neighborhood rough sets,Uncertainty measure,Parallel computing,Apache Spark

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要