Designing a Feature Selection Technique for Analyzing Mixed Data

2020 10th Annual Computing and Communication Workshop and Conference (CCWC)(2020)

引用 7|浏览17
暂无评分
摘要
Since large-scale data analysis often requires a vast amount of computational time, even with using high-end computational resources, researchers proposed various data analysis approaches. Although their approaches are well designed, most of them still suffer from analyzing large-scale data efficiently. To overcome the limitation, identifying the optimal numbers of features is critical. In this paper, a new technique is introduced to boost model performances by determining optimal features in noisy mixed data. It performs a continuous evaluation to determine the best possible features that suit to a chosen data analysis algorithm. The proposed technique is compared with three different feature selection techniques: Principal Component Analysis (PCA), Analysis of variance (ANOVA) test, and Mutual Information (MI). To show the effectiveness of our proposed technique, a performance evaluation was conducted with three machine learning algorithms: Decision Tree, Random Forest, and k-Nearest Neighbor (KNN). From the evaluation with three different financial datasets, we determined about 5 ~ 10% performance improvement when utilizing our proposed technique.
更多
查看译文
关键词
Feature Selection,Mixed Data Analysis,Machine Learning,Performance Metric,Financial Datasets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要