Non-approximability of constructive global $\mathcal{L}^2$ minimizers by gradient descent in Deep Learning

Thomas Chen, Patricia Muñoz Ewald

arXiv (Cornell University)(2023)

引用 0|浏览0
暂无评分
摘要
We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. In particular, we prove that the globally minimizing weights and biases for the $\mathcal{L}^2$ cost obtained constructively in [Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically not be approximated via the gradient descent flow. We therefore conclude that the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient descent method.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要