Interpreting the Curse of Dimensionality from Distance Concentration and Manifold Effect
CoRR(2023)
摘要
The characteristics and interpretability of data become more abstract and
complex as the dimensionality increases. Common patterns and relationships that
hold in in low-dimensional space may fail to hold in higher-dimensional space.
This phenomenon leads to a decreasing performance for the regression,
classification or clustering models or algorithms, which is known as curse of
dimensionality. Curse of dimensionality can be attributed to many causes. In
this paper, we first summarize five challenges associated with manipulating
high-dimensional data, and explains the potential causes for the failure of
regression, classification or clustering tasks. Subsequently, we delve into two
major causes of the curse of dimensionality, distance concentration and
manifold effect, by performing theoretical and empirical analyses. The results
demonstrate that nearest neighbor search (NNS) using three typical distance
measurements, Minkowski distance, Chebyshev distance, and cosine distance,
becomes meaningless as the dimensionality increases. Meanwhile, the data
incorporates more redundant features, and the variance contribution of
principal component analysis (PCA) is skewed towards a few dimensions. By
interpreting the causes of the curse of dimensionality, we can better
understand the limitations of current models and algorithms, and drive to
improve the performance of data analysis and machine learning tasks in
high-dimensional space.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要