How Many Samples Are Needed To Estimate A Convolutional Neural Network?

Simon S. Du,Yining Wang,Xiyu Zhai,Sivaraman Balakrishnan,Ruslan Salakhutdinov,Aarti Singh

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018)（2018）

引用 71|浏览90

暂无评分

摘要

A widespread folklore for explaining the success of Convolutional Neural Networks (CNNs) is that CNNs use a more compact representation than the Fully-connected Neural Network (FNN) and thus require fewer training samples to accurately estimate their parameters. We initiate the study of rigorously characterizing the sample complexity of estimating CNNs. We show that for an m-dimensional convolutional filter with linear activation acting on a d-dimensional input, the sample complexity of achieving population prediction error of epsilon is (O) over tilde (m/epsilon(2))(2), whereas the sample-complexity for its FNN counterpart is lower bounded by Omega (d/epsilon(2)) samples. Since, in typical settings m << d, this result demonstrates the advantage of using a CNN. We further consider the sample complexity of estimating a onehidden-layer CNN with linear activation where both the m-dimensional convolutional filter and the r-dimensional output weights are unknown. For this model, we show that the sample complexity is (O) over tilde((m + r)/epsilon(2)) when the ratio between the stride size and the filter size is a constant. For both models, we also present lower bounds showing our sample complexities are tight up to logarithmic factors. Our main tools for deriving these results are a localized empirical process analysis and a new lemma characterizing the convolutional structure. We believe that these tools may inspire further developments in understanding CNNs.

查看译文

关键词

convolutional neural network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要