A Time Efficient Approach for Distributed Feature Selection Partitioning by Features.

Laura Moran-Fernandez,Verónica Bolón-Canedo,Amparo Alonso-Betanzos

CAEPIA（2015）

引用 10|浏览4

暂无评分

摘要

With the advent of high dimensionality, feature selection has become indispensable in real-world scenarios. However, most of the traditional methods only work in a centralized manner, which --ironically-- increase the running time requirements when they are applied to this type of data. For this reason, we propose a distributed filter approach for vertically partitioned data. The idea is to split the data by features and apply a filter at each partition performing several rounds to obtain a final subset of features. Different than existing procedures to combine the partial outputs of the different partitions of data, we propose a merging process according to the theoretical complexity of these feature subsets instead of classification error. Experimental results tested in five datasets show that the running time decreases considerably. Moreover, regarding the classification accuracy, our approach was able to match, and in some cases even improve, the standard algorithms applied to the non-partitioned datasets.

查看译文

关键词

Feature Selection, Classification Accuracy, Feature Subset, Feature Selection Method, High Dimensional Dataset

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要