Breaking the Limits of Subspace Inference.

Claudia R. Solís-Lemus,Daniel L. Pimentel-Alarcón

Allerton(2018)

引用 0|浏览32
暂无评分
摘要
Inferring low-dimensional subspaces that describe high-dimensional, highly incomplete datasets has become a routinely procedure in modern data science. This paper is about a curious phenomenon related to the amount of information required to estimate a subspace. On one hand, it has been shown that information-theoretically, data in $\mathbb {R}^{\mathrm {d}}$ must be observed on at least $\ell =\mathrm {r}+1$ coordinates to uniquely identify an r-dimensional subspace that approximates it. On the other hand, it is well- known that the subspace containing a dataset can be estimated through its sample covariance matrix, which only requires observing 2 coordinates per datapoint (regardless of $\mathrm {r}!$). At first glance, this may seem to contradict the information-theoretic bound. The key lies in the subtle difference between identifiability (uniqueness) and estimation (most probable). It is true that if we only observed $\ell \leq \mathrm {r}$ coordinates per datapoint, there will be infinitely many r-dimensional subspaces that perfectly agree with the observations. However, some subspaces may be more likely than others, which are revealed by the sample covariance. This raises several fundamental questions: what are the algebraic relationships hidden in 2 coordinates that allow estimating an r-dimensional subspace? Moreover, are $\ell = 2$ coordinates per datapoint necessary for estimation, or is it possible with only $\ell =1$? In this paper we show that under certain assumptions, it is possible to estimate some subspaces up to finite choice with as few as $\ell =1$ entry per column. This paper raises the question of whether there exist other subspace estimation methods that allow $\ell \leq \mathrm {r}$ coordinates per datapoint, and that are more efficient than the sample covariance, which converges slowly in the number of data points n.
更多
查看译文
关键词
Estimation,Covariance matrices,Signal processing algorithms,Data science,Standards organizations,Organizations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要