Asymptotic Optimality of Self-Representative Low-Rank Approximation and Its Applications

user-5d8054e8530c708f9920ccce(2021)

引用 0|浏览13
暂无评分
摘要
We propose a novel technique for sampling representatives from a large, unsupervised dataset. The approach is based on the concept of {\em self-rank}, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-K approximation. As the exact computation of self-rank requires a computationally expensive combinatorial search, we propose an efficient algorithm that jointly estimates self-rank and selects the optimal samples with high accuracy. A theoretical upper bound is derived that reaches the tightest bound for two asymptotic cases. The best approximation ratio for self-representative low-rank approximation was presented in ICML 2017~\cite{Chierichetti-icml-2017}, which was further improved by the bound 1+K reported in~NeurIPS 2019~\cite{dan2019optimal}. Both of these bounds depend solely on the number of selected samples. In this paper, for the first time, we present an adaptive approximation ratio depending on spectral properties of the original dataset, A∈RN×M. In particular, our performance bound is proportional to the condition number κ(A). Our derived approximation ratio is expressed as 1+(κ(A)2−1)/(N−K) which approaches 1 in two asymptotic cases. In addition to evaluating the proposed algorithm on a synthetic dataset, we show that the proposed sampling scheme can be utilized in real-world applications such as graph node sampling for optimizing the shortest path criterion, and learning a classifier with sampled data.
更多
查看译文
关键词
Upper and lower bounds,Low-rank approximation,Shortest path problem,Sampling (statistics),Combinatorial search,Condition number,Computation,Algorithm,Classifier (linguistics),Mathematics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要