Matrix Inversion free variational inference in Conditional Student’s T Processes

semanticscholar(2022)

引用 0|浏览5
暂无评分
摘要
Gaussian Processes (GP) are data-efficient Bayesian non-parametric models that offer calibrated uncertainty quantification and are robust to overfitting, recently finding applicability in data-scarce domains (Timonen et al., 2019; Wang et al., 2020) or where the uncertainty is of utmost importance (Chen et al., 2014) .Their drawback resides in the computational complexity of inverting the covariance matrix, which is cubic in computation and quadratic in memory. This has motivated research on sparse GP (SGP) methods (Seeger et al., 2003; Quinonero-Candela and Rasmussen, 2005). Titsias (2009) has addressed this problem by leaving the prior distribution of the GP unchanged, with sparsity being enforced in the posterior through inducing points learnt by variational inference. Hensman et al. (2013) proposed an inducing point framework scalable to large data, obtaining posterior formulas conditioned on these artificial points. However, this scales supralinearly with regards to inducing point numbers, resulting in O(M2N +M3) computation and O(MN +M2) storage complexity, where M is the number of inducing points. Therefore, a major obstacle towards the wider adoption of GP in large scale datasets is related to the computational cost of matrix inversion and log determinants. With this in mind, in van der Wilk et al. (2020) the authors propose a lower bound that can be computed without computationally expensive matrix operations such as inversion. Similar in scope, we propose to learn variational approximations of the covariance matrix, implicitly also over inverses, of inducing and training points, thereby proposing a computationally efficient approximate posterior over covariance matrices within the probabilistic framework of Student’s T Processes (STP) (Shah et al., 2014). Compared to van der Wilk et al. (2020), where the authors solve the issue of inversefree predictive equations by using a highly structured posterior over U , we obtain similar properties by virtue of our Bayesian hierarchical process, with an additional KL divergence term over our approximation of K−1 uu , thus favouring the retrieval of the exact solution, alongside showing that it works on large scale datasets, a task which was not tackled in the latter work due to training instability.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要