Smart-Sample: An Ecient Algorithm for Clustering Large High-Dimensional Datasets

msra(2014)

引用 24|浏览2
暂无评分
摘要
Finding useful related patterns in a dataset is an important task in many interesting applications. In particular, one common need in many algorithms, is the ability to separate a given dataset into a small number of clusters. Each cluster represents a subset of data-points from the dataset, which are considered similar. In some cases, it is also necessary to distinguish data points that are not part of a pattern from the other data-points. This paper introduces a new data clustering method named smart-sample and com- pares its performance to several clustering methodologies. We show that smart-sample clusters successfully large high-dimensional datasets. In addition, smart-sample out- performs other methodologies in terms of running-time. A variation of the smart-sample algorithm, which guarantees eciency in terms of I/O, is also presented. We describe how to achieve an approximation of the in-memory smart-sample algorithm using a constant number of scans with a single sort operation on the disk.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要