An exploration of the impact of Feature quality versus Feature quantity on the performance of a machine learning model

Krupa Bhayani, Devansh Tanna,Vinod Maan,Dhiraj, Sandeep Kumar

2023 IEEE International Conference on Contemporary Computing and Communications (InC4)(2023)

引用 0|浏览0
暂无评分
摘要
About 0.62 trillion bytes of data are generated every hour globally. These figures have been increasing as a result of digitalization and social networks. Some data ecosystems capture, store, and manage this big DATA. The basis is to be able to analyze their information and extract their value. This fact is a gold mine for companies researching and using this data. This leads us to follow how essential and valuable data is in this growing age. For any machine learning model, the selection of data is necessary. In this paper, several experiments have been performed to check the importance of data quality vs. data quantity on model performance. This clearly indicates comparing the data’s richness regarding feature quality (e.g., features in images) and the amount of data for any machine learning model. Images are classified into two sets based on features, then removing redundant features from them, then training a machine learning model. Model getting trained with non-redundant data gives highest accuracy (>80%) in all cases versus the one with all features, proving the importance of feature variability and not just the feature count.
更多
查看译文
关键词
Machine Learning,Deep Learning,Feature engineering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要