State Space Models as Foundation Models: A Control Theoretic Overview
arxiv(2024)
摘要
In recent years, there has been a growing interest in integrating linear
state-space models (SSM) in deep neural network architectures of foundation
models. This is exemplified by the recent success of Mamba, showing better
performance than the state-of-the-art Transformer architectures in language
tasks. Foundation models, like e.g. GPT-4, aim to encode sequential data into a
latent space in order to learn a compressed representation of the data. The
same goal has been pursued by control theorists using SSMs to efficiently model
dynamical systems. Therefore, SSMs can be naturally connected to deep sequence
modeling, offering the opportunity to create synergies between the
corresponding research areas. This paper is intended as a gentle introduction
to SSM-based architectures for control theorists and summarizes the latest
research developments. It provides a systematic review of the most successful
SSM proposals and highlights their main features from a control theoretic
perspective. Additionally, we present a comparative analysis of these models,
evaluating their performance on a standardized benchmark designed for assessing
a model's efficiency at learning long sequences.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要