Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
CoRR(2024)
摘要
In recent years, Transformers have become the de-facto architecture for
sequence modeling on text and a variety of multi-dimensional data, such as
images and video. However, the use of self-attention layers in a Transformer
incurs prohibitive compute and memory complexity that scales quadratically
w.r.t. the sequence length. A recent architecture, Mamba, based on state space
models has been shown to achieve comparable performance for modeling text
sequences, while scaling linearly with the sequence length. In this work, we
present Mamba-ND, a generalized design extending the Mamba architecture to
arbitrary multi-dimensional data. Our design alternatively unravels the input
data across different dimensions following row-major orderings. We provide a
systematic comparison of Mamba-ND with several other alternatives, based on
prior multi-dimensional extensions such as Bi-directional LSTMs and S4ND.
Empirically, we show that Mamba-ND demonstrates performance competitive with
the state-of-the-art on a variety of multi-dimensional benchmarks, including
ImageNet-1K classification, HMDB-51 action recognition, and ERA5 weather
forecasting.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要