$k$ -means is a fundamental unsupervised learning in data mining. Its computational require"/>

Scalable Kernel $k$-Means With Randomized Sketching: From Theory to Algorithm

IEEE Transactions on Knowledge and Data Engineering(2023)

引用 0|浏览6
暂无评分
摘要
Kernel $k$ -means is a fundamental unsupervised learning in data mining. Its computational requirements are typically at least quadratic in the number of data, which are prohibitive for large-scale scenarios. To address these issues, we propose a novel randomized sketching approach SKK based on the circulant matrix. SKK projects the kernel matrix left and right according to the proposed sketch matrices to obtain a smaller one and accelerates the matrix-matrix product by the fast Fourier transform based on the circulant matrix, which can greatly reduce the computational requirements of the approximate kernel $k$ -means estimator with the same generalization bound as the exact kernel $k$ -means in the statistical setting. In particular, theoretical analysis shows that taking the sketch dimension of $\sqrt{n}$ is sufficient for SKK to achieve the optimal excess risk bound with only a fraction of computations, where $n$ is the number of data. The extensive experiments verify our theoretical analysis, and SKK achieves the state-of-the-art performances on 12 real-world datasets. To the best of our knowledge, in randomized sketching, this is the first time that unsupervised learning makes such a significant breakthrough.
更多
查看译文
关键词
randomized sketching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要