Voting-based instance selection from large data sets with MapReduce and random weight networks.

Inf. Sci.(2016)

引用 48|浏览36
暂无评分
摘要
Instance selection is an important preprocessing step in machine learning. By choosing a subset of a data set, it achieves the same performance of a machine learning algorithm as if the whole data set is used, and it enables a machine learning algorithm to be feasible for and to work effectively with large data sets. Based on voting mechanism, this paper proposes a large data sets instance selection algorithm with MapReduce and random weight networks (RWNs). Firstly, the proposed algorithm employs the Map of MapReduce to partition the large data sets into some small subsets, and deploys them to different cloud computing nodes. Secondly, the informative instances are selected in parallel with an instance selection algorithm. Thirdly, the Reduce of MapReduce is used to collect the selected instances from different cloud computing nodes and a selected instance subset is obtained. The previous three processes are repeated p times (p is a parameter defined by the user), and p instance subsets are obtained. Finally, the voting method is used to select the most informative instances from the p subsets. The random weight network classifier is trained with the selected instance subset, and the testing accuracy is verified on the testing set. The proposed algorithm is experimentally compared with three state-of-the-art approaches which are CNN, ENN and RNN. The experimental results show that the proposed algorithm is effective and efficient. (C) 2016 Elsevier Inc. All rights reserved.
更多
查看译文
关键词
EXTREME LEARNING-MACHINE,FUNCTIONAL-LINK NET,MAP REDUCE SOLUTION,NEURAL-NETWORKS,BIG DATA,ALGORITHM,CLASSIFICATION,CLASSIFIERS,INFORMATION,SYSTEMS
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要