Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0

Engineering Applications of Artificial Intelligence(2022)

引用 10|浏览21
暂无评分
摘要
Normalisation is a preprocessing technique widely employed in Machine Learning (ML)-based solutions for industry to equalise the features’ contribution. However, few researchers have analysed the normalisation effect and its implications on the ML algorithm performance, especially on Euclidean distance-based algorithms, such as the well-known K-Nearest Neighbours and K-means. In this sense, this paper formally analyses the effect of normalisation yielding results significantly far from the state-of-the-art traditional claims. In particular, this paper shows that normalisation does not equalise the contribution of the features, with the consequent impact on the performance of the learning process for a particular problem. More concretely, this demonstration is made on K-Nearest Neighbours and K-means Euclidean distance-based ML algorithms. This paper concludes that normalisation can be viewed as an unsupervised Feature Weighting method. In this context, a new metric (Normalisation weight) for measuring the impact of normalisation on the features is presented. Likewise, an analysis of the normalisation effect on the Euclidean distance is conducted and a new metric referred to as Proportional influence that measures the features influence on the Euclidean distance is proposed. Both metrics enable the automatic selection of the most appropriate normalisation method for a particular engineering problem, which can significantly improve both the computational cost and classification performance of K-Nearest Neighbours and K-means algorithms. The analytical conclusions are validated on well-known datasets from the UCI repository and a real-life application from the refinery industry.
更多
查看译文
关键词
Feature normalisation,Feature weighting,Machine learning,Euclidean distance,K-nearest neighbours,K-means
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要