Peeking Behind the Curtains of Residual Learning
CoRR(2024)
摘要
The utilization of residual learning has become widespread in deep and
scalable neural nets. However, the fundamental principles that contribute to
the success of residual learning remain elusive, thus hindering effective
training of plain nets with depth scalability. In this paper, we peek behind
the curtains of residual learning by uncovering the "dissipating inputs"
phenomenon that leads to convergence failure in plain neural nets: the input is
gradually compromised through plain layers due to non-linearities, resulting in
challenges of learning feature representations. We theoretically demonstrate
how plain neural nets degenerate the input to random noise and emphasize the
significance of a residual connection that maintains a better lower bound of
surviving neurons as a solution. With our theoretical discoveries, we propose
"The Plain Neural Net Hypothesis" (PNNH) that identifies the internal path
across non-linear layers as the most critical part in residual learning, and
establishes a paradigm to support the training of deep plain neural nets devoid
of residual connections. We thoroughly evaluate PNNH-enabled CNN architectures
and Transformers on popular vision benchmarks, showing on-par accuracy, up to
0.3
ResNets and vision Transformers.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要