Performance Evaluation of DBSCAN With Similarity Join Algorithms

VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE(2019)

引用 0|浏览3
暂无评分
摘要
Clustering is an important Data Mining operation that groups objects into clusters based on their similarity. The similarity join is a primitive operation used in clustering which retrieves the most similar pairs from two input data-sets based on a dissimilarity function (also named metric). In this article, we transform DBSCAN's (Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with noise) algorithmic schema by replacing the multiple range queries with a single similarity join to minimize the hyperparameter. Thus, instead of the two hyperparameters required by the DBSCAN, our approach requires only the neighborhood radius epsilon hyperparameter. We propose two implementations for DBSCAN with similarity join: i) QuickDBSCAN that uses an adapted QuickJoin algorithm and ii) KDTreeDBSCAN that uses k-d-tree indexing structure. The experimental results show that DBSCAN with similarity join outperforms the classic DBSCAN.
更多
查看译文
关键词
DBSCAN, similarity join, k-d-tree indexing, QuickJoin
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要