Dbscan Revisited: Mis-Claim, Un-Fixability, And Approximation

MOD(2015)

引用 291|浏览861
暂无评分
摘要
DBSCAN is a popular method for clustering multi-dimensional objects. Just as notable as the method's vast success is the research community's quest for its efficient computation. The original KDD'96 paper claimed an algorithm with O(n log n) running time, where n is the number of objects. Unfortunately, this is a mis-claim; and that algorithm actually requires O(n(2)) time. There has been a fix in 2D space, where a genuine O(n log n)-time algorithm has been found. Looking for a fix for dimensionality d >= 3 is currently an important open problem.In this paper, we prove that for d >= 3, the DBSCAN problem requires Omega(n(4/3)) time to solve, unless very significant breakthroughs-ones widely believed to be impossible-could be made in theoretical computer science. This (i) explains why the community's search for fixing the aforementioned mis-claim has been futile for d >= 3, and (ii) indicates (sadly) that all DBSCAN algorithms must be intolerably slow even on moderately large n in practice. Surprisingly, we show that the running time can be dramatically brought down to O(n) in expectation regardless of the dimensionality d, as soon as slight inaccuracy in the clustering results is permitted. We formalize our findings into the new notion of rho-approximate DBSCAN, which we believe should replace DBSCAN on big data due to the latter's computational intractability.
更多
查看译文
关键词
DBSCAN,Density-Based Clustering,Algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要