Interplay between depth and width for interpolation in neural ODEs

CoRR(2024)

引用 0|浏览2
暂无评分
摘要
Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width p and number of layer transitions L (effectively the depth L+1). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset D comprising N pairs of points or two probability measures in ℝ^d within a Wasserstein error margin ε>0. Our findings reveal a balancing trade-off between p and L, with L scaling as O(1+N/p) for dataset interpolation, and L=O(1+(pε^d)^-1) for measure interpolation. In the autonomous case, where L=0, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of ε-approximate controllability and establish an error decay of ε∼ O(log(p)p^-1/d). This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates D. In the high-dimensional setting, we further demonstrate that p=O(N) neurons are likely sufficient to achieve exact control.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要