Self-Regularity of Non-Negative Output Weights for Overparameterized Two-Layer Neural Networks

IEEE Transactions on Signal Processing(2022)

引用 2|浏览16
暂无评分
摘要
We consider the problem of finding a two-layer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that “fits” a training data set as accurately as possible as quantified by the training error; and study the following question: does a low training error guarantee that the norm of the output layer (outer norm) itself is small? We answer affirmatively this question for the case of non-negative output weights. Using a simple covering number argument, we establish that under quite mild distributional assumptions on the input/label pairs; any such network achieving a small training error on polynomially many data necessarily has a well-controlled outer norm. Notably, our results (a) have a polynomial (in $d$ ) sample complexity, (b) are independent of the number of hidden units (which can potentially be very high), (c) are oblivious to the training algorithm; and (d) require quite mild assumptions on the data (in particular the input vector $X\in \mathbb {R}^{d}$ need not have independent coordinates). We then leverage our bounds to establish generalization guarantees for such networks through fat-shattering dimension , a scale-sensitive measure of the complexity class that the network architectures we investigate belong to. Notably, our generalization bounds also have good sample complexity (polynomials in $d$ with a low degree), and are in fact near-linear for some important cases of interest.
更多
查看译文
关键词
Deep learning,neural networks,gradient descent,self-regularity,sample complexity,covering number
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要