VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting
arxiv(2024)
摘要
Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded
unparalleled results in predicting temporal and spatial dynamics. However,
modeling extensive global information remains a formidable challenge; CNNs are
limited by their narrow receptive fields, and ViTs struggle with the intensive
computational demands of their attention mechanisms. The emergence of recent
Mamba-based architectures has been met with enthusiasm for their exceptional
long-sequence modeling capabilities, surpassing established vision models in
efficiency and accuracy, which motivates us to develop an innovative
architecture tailored for spatiotemporal forecasting. In this paper, we propose
the VMRNN cell, a new recurrent unit that integrates the strengths of Vision
Mamba blocks with LSTM. We construct a network centered on VMRNN cells to
tackle spatiotemporal prediction tasks effectively. Our extensive evaluations
show that our proposed approach secures competitive results on a variety of
tasks while maintaining a smaller model size. Our code is available at
https://github.com/yyyujintang/VMRNN-PyTorch.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要