Asymptotic Optimality of Self-Representative Low-Rank Approximation and Its Applications


引用 0|浏览13
We propose a novel technique for sampling representatives from a large, unsupervised dataset. The approach is based on the concept of {\em self-rank}, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-K approximation. As the exact computation of self-rank requires a computationally expensive combinatorial search, we propose an efficient algorithm that jointly estimates self-rank and selects the optimal samples with high accuracy. A theoretical upper bound is derived that reaches the tightest bound for two asymptotic cases. The best approximation ratio for self-representative low-rank approximation was presented in ICML 2017~\cite{Chierichetti-icml-2017}, which was further improved by the bound 1+K reported in~NeurIPS 2019~\cite{dan2019optimal}. Both of these bounds depend solely on the number of selected samples. In this paper, for the first time, we present an adaptive approximation ratio depending on spectral properties of the original dataset, A∈RN×M. In particular, our performance bound is proportional to the condition number κ(A). Our derived approximation ratio is expressed as 1+(κ(A)2−1)/(N−K) which approaches 1 in two asymptotic cases. In addition to evaluating the proposed algorithm on a synthetic dataset, we show that the proposed sampling scheme can be utilized in real-world applications such as graph node sampling for optimizing the shortest path criterion, and learning a classifier with sampled data.
Upper and lower bounds,Low-rank approximation,Shortest path problem,Sampling (statistics),Combinatorial search,Condition number,Computation,Algorithm,Classifier (linguistics),Mathematics
AI 理解论文
Chat Paper