NN-Descent on High-Dimensional Data.

WIMS(2018)

引用 29|浏览111
暂无评分
摘要
K-nearest neighbor graphs (K-NNGs) are used in many data-mining and machine-learning algorithms. Naive construction of K-NNGs has a complexity of O(n2), which could be a problem for large-scale data sets. In order to achieve higher efficiency, many exact and approximate algorithms have been developed, including the NN-Descent algorithm of Dong, Charikar and Li. Empirical evidence suggests that the practical complexity of this algorithm is in O(n1.14), which is a significant improvement over brute force construction. However, NN-Descent has a major drawback --- it produces good results only on data of low intrinsic dimensionality. This paper presents an experimental analysis of this behavior, and investigates possible solutions. We link the quality of performance of NN-Descent with the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs --- points with high in-degrees in the K-NNG. We propose two approaches to alleviate the observed negative influence of hubs on NN-Descent performance.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要