Semi-Supervised U-statistics
arxiv(2024)
摘要
Semi-supervised datasets are ubiquitous across diverse domains where
obtaining fully labeled data is costly or time-consuming. The prevalence of
such datasets has consistently driven the demand for new tools and methods that
exploit the potential of unlabeled data. Responding to this demand, we
introduce semi-supervised U-statistics enhanced by the abundance of unlabeled
data, and investigate their statistical properties. We show that the proposed
approach is asymptotically Normal and exhibits notable efficiency gains over
classical U-statistics by effectively integrating various powerful prediction
tools into the framework. To understand the fundamental difficulty of the
problem, we derive minimax lower bounds in semi-supervised settings and
showcase that our procedure is semi-parametrically efficient under regularity
conditions. Moreover, tailored to bivariate kernels, we propose a refined
approach that outperforms the classical U-statistic across all degeneracy
regimes, and demonstrate its optimality properties. Simulation studies are
conducted to corroborate our findings and to further demonstrate our framework.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要