RMHC-MR: Instance selection by random mutation hill climbing algorithm with MapReduce in big data

Procedia Computer Science(2017)

引用 4|浏览14
暂无评分
摘要
Instance selection is used to reduce the size of training set by removing redundant, erroneous and noisy instances and is an important pre-processing step in KDD (knowledge discovery in databases). Recently, to process very large data set, several methods divide the training set into disjoint subsets and apply instance selection algorithms to each subset independently. In this paper, we analyze the limitation of these methods and give our viewpoint about how to “divide and conquer” in instance selection procedure. Furthermore, we propose an instance selection method based on random mutation hill climbing (RMHC) algorithm with MapReduce framework, called RMHC-MR. The experimental result shows that RMHC-MR has a good performance in terms of classification accuracy and reduction rate.
更多
查看译文
关键词
Instance selection,MapReduce,big data,nearest neighbor,classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要